[Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration

Stefan Hajnoczi posted 2 patches 6 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180427162312.18583-1-stefanha@redhat.com
Test checkpatch passed
Test docker-build@min-glib passed
Test docker-mingw@fedora passed
Test s390x passed
qapi/block-core.json |   7 ++-
block/file-posix.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 150 insertions(+), 3 deletions(-)
[Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration
Posted by Stefan Hajnoczi 6 years ago
v2:
 * Add comment on !__linux__ situation [Fam]
 * Add file-posix.c x-check-cache-dropped=on|off option [DaveG, Kevin]

file-posix.c only supports shared storage live migration with -drive
cache.direct=off due to cache consistency issues.  There are two main shared
storage configurations: files on NFS and host block devices on SAN LUNs.

The problem is that QEMU starts on the destination host before the source host
has written everything out to the disk.  The page cache on the destination host
may contain stale data read when QEMU opened the image file (before migration
handover).  Using O_DIRECT avoids this problem but prevents users from taking
advantage of the host page cache.

Although cache=none is the recommended setting for virtualization use cases,
there are scenarios where cache=writeback makes sense.  If the guest has much
less RAM than the host or many guests share the same backing file, then the
host page cache can significantly improve disk I/O performance.

This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
on Linux so that shared storage live migration works.  I have sent it as an RFC
because cache consistency is not binary, there are corner cases which I've
described in the actual patch, and this may require more discussion.

Regarding NFS, QEMU relies on O_DIRECT rather than the close-to-open
consistency model (see nfs(5)), which is the basic guarantee provided by NFS.
After this patch cache consistency is no longer provided by O_DIRECT.

This patch series relies on fdatasync(2) (source) +
posix_fadvise(POSIX_FADV_DONTNEED) (destination) instead.  I believe it is safe
for both NFS and SAN LUNs.  Maybe we should use fsync(2) instead of
fdatasync(2) so that NFS has up-to-date inode metadata?

Stefan Hajnoczi (2):
  block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  block/file-posix: add x-check-page-cache=on|off option

 qapi/block-core.json |   7 ++-
 block/file-posix.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 150 insertions(+), 3 deletions(-)

-- 
2.14.3


Re: [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration
Posted by Fam Zheng 5 years, 12 months ago
On Fri, 04/27 17:23, Stefan Hajnoczi wrote:
> v2:
>  * Add comment on !__linux__ situation [Fam]
>  * Add file-posix.c x-check-cache-dropped=on|off option [DaveG, Kevin]

Reviewed-by: Fam Zheng <famz@redhat.com>


Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration
Posted by Stefan Hajnoczi 5 years, 11 months ago
On Fri, Apr 27, 2018 at 05:23:10PM +0100, Stefan Hajnoczi wrote:
> v2:
>  * Add comment on !__linux__ situation [Fam]
>  * Add file-posix.c x-check-cache-dropped=on|off option [DaveG, Kevin]
> 
> file-posix.c only supports shared storage live migration with -drive
> cache.direct=off due to cache consistency issues.  There are two main shared
> storage configurations: files on NFS and host block devices on SAN LUNs.
> 
> The problem is that QEMU starts on the destination host before the source host
> has written everything out to the disk.  The page cache on the destination host
> may contain stale data read when QEMU opened the image file (before migration
> handover).  Using O_DIRECT avoids this problem but prevents users from taking
> advantage of the host page cache.
> 
> Although cache=none is the recommended setting for virtualization use cases,
> there are scenarios where cache=writeback makes sense.  If the guest has much
> less RAM than the host or many guests share the same backing file, then the
> host page cache can significantly improve disk I/O performance.
> 
> This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
> on Linux so that shared storage live migration works.  I have sent it as an RFC
> because cache consistency is not binary, there are corner cases which I've
> described in the actual patch, and this may require more discussion.
> 
> Regarding NFS, QEMU relies on O_DIRECT rather than the close-to-open
> consistency model (see nfs(5)), which is the basic guarantee provided by NFS.
> After this patch cache consistency is no longer provided by O_DIRECT.
> 
> This patch series relies on fdatasync(2) (source) +
> posix_fadvise(POSIX_FADV_DONTNEED) (destination) instead.  I believe it is safe
> for both NFS and SAN LUNs.  Maybe we should use fsync(2) instead of
> fdatasync(2) so that NFS has up-to-date inode metadata?
> 
> Stefan Hajnoczi (2):
>   block/file-posix: implement bdrv_co_invalidate_cache() on Linux
>   block/file-posix: add x-check-page-cache=on|off option
> 
>  qapi/block-core.json |   7 ++-
>  block/file-posix.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 150 insertions(+), 3 deletions(-)

Kevin: Are you happy with this series?

Stefan
Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration
Posted by Kevin Wolf 5 years, 11 months ago
Am 08.05.2018 um 12:32 hat Stefan Hajnoczi geschrieben:
> On Fri, Apr 27, 2018 at 05:23:10PM +0100, Stefan Hajnoczi wrote:
> > v2:
> >  * Add comment on !__linux__ situation [Fam]
> >  * Add file-posix.c x-check-cache-dropped=on|off option [DaveG, Kevin]
> > 
> > file-posix.c only supports shared storage live migration with -drive
> > cache.direct=off due to cache consistency issues.  There are two main shared
> > storage configurations: files on NFS and host block devices on SAN LUNs.
> > 
> > The problem is that QEMU starts on the destination host before the source host
> > has written everything out to the disk.  The page cache on the destination host
> > may contain stale data read when QEMU opened the image file (before migration
> > handover).  Using O_DIRECT avoids this problem but prevents users from taking
> > advantage of the host page cache.
> > 
> > Although cache=none is the recommended setting for virtualization use cases,
> > there are scenarios where cache=writeback makes sense.  If the guest has much
> > less RAM than the host or many guests share the same backing file, then the
> > host page cache can significantly improve disk I/O performance.
> > 
> > This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
> > on Linux so that shared storage live migration works.  I have sent it as an RFC
> > because cache consistency is not binary, there are corner cases which I've
> > described in the actual patch, and this may require more discussion.
> > 
> > Regarding NFS, QEMU relies on O_DIRECT rather than the close-to-open
> > consistency model (see nfs(5)), which is the basic guarantee provided by NFS.
> > After this patch cache consistency is no longer provided by O_DIRECT.
> > 
> > This patch series relies on fdatasync(2) (source) +
> > posix_fadvise(POSIX_FADV_DONTNEED) (destination) instead.  I believe it is safe
> > for both NFS and SAN LUNs.  Maybe we should use fsync(2) instead of
> > fdatasync(2) so that NFS has up-to-date inode metadata?
> > 
> > Stefan Hajnoczi (2):
> >   block/file-posix: implement bdrv_co_invalidate_cache() on Linux
> >   block/file-posix: add x-check-page-cache=on|off option
> > 
> >  qapi/block-core.json |   7 ++-
> >  block/file-posix.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 150 insertions(+), 3 deletions(-)
> 
> Kevin: Are you happy with this series?

Yes, I think I am.

I'm still kind of concerned about misleading people into believing that
cache=writeback + live migration is generally safe when it's only in
special cases, but that's not really a concern about the code, but about
how we communicate the feature.

Kevin
Re: [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration
Posted by Stefan Hajnoczi 5 years, 11 months ago
On Fri, Apr 27, 2018 at 05:23:10PM +0100, Stefan Hajnoczi wrote:
> v2:
>  * Add comment on !__linux__ situation [Fam]
>  * Add file-posix.c x-check-cache-dropped=on|off option [DaveG, Kevin]
> 
> file-posix.c only supports shared storage live migration with -drive
> cache.direct=off due to cache consistency issues.  There are two main shared
> storage configurations: files on NFS and host block devices on SAN LUNs.
> 
> The problem is that QEMU starts on the destination host before the source host
> has written everything out to the disk.  The page cache on the destination host
> may contain stale data read when QEMU opened the image file (before migration
> handover).  Using O_DIRECT avoids this problem but prevents users from taking
> advantage of the host page cache.
> 
> Although cache=none is the recommended setting for virtualization use cases,
> there are scenarios where cache=writeback makes sense.  If the guest has much
> less RAM than the host or many guests share the same backing file, then the
> host page cache can significantly improve disk I/O performance.
> 
> This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
> on Linux so that shared storage live migration works.  I have sent it as an RFC
> because cache consistency is not binary, there are corner cases which I've
> described in the actual patch, and this may require more discussion.
> 
> Regarding NFS, QEMU relies on O_DIRECT rather than the close-to-open
> consistency model (see nfs(5)), which is the basic guarantee provided by NFS.
> After this patch cache consistency is no longer provided by O_DIRECT.
> 
> This patch series relies on fdatasync(2) (source) +
> posix_fadvise(POSIX_FADV_DONTNEED) (destination) instead.  I believe it is safe
> for both NFS and SAN LUNs.  Maybe we should use fsync(2) instead of
> fdatasync(2) so that NFS has up-to-date inode metadata?
> 
> Stefan Hajnoczi (2):
>   block/file-posix: implement bdrv_co_invalidate_cache() on Linux
>   block/file-posix: add x-check-page-cache=on|off option
> 
>  qapi/block-core.json |   7 ++-
>  block/file-posix.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 150 insertions(+), 3 deletions(-)
> 
> -- 
> 2.14.3
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan