[libvirt] [PATCH RFC] iohelper: Introduces a small sleep do avoid hunging other tasks

Leonardo Bras posted 1 patch 4 years, 11 months ago
Test syntax-check passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20190509205935.26192-1-leonardo@linux.ibm.com
src/util/iohelper.c | 8 ++++++++
1 file changed, 8 insertions(+)
[libvirt] [PATCH RFC] iohelper: Introduces a small sleep do avoid hunging other tasks
Posted by Leonardo Bras 4 years, 11 months ago
While dumping very large VMs (over 128GB), iohelper seems to cause
very intense IO usage on the disk, and it causes some processes
(like journald) to hung, and depending on kernel configuration,
to panic.

This change creates a time window, after every 10GB written, so
this processes can write to the disk, and avoid hunging.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
---
 src/util/iohelper.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/util/iohelper.c b/src/util/iohelper.c
index ddc338b7c7..164c1e2085 100644
--- a/src/util/iohelper.c
+++ b/src/util/iohelper.c
@@ -52,6 +52,8 @@ runIO(const char *path, int fd, int oflags)
     unsigned long long total = 0;
     bool direct = O_DIRECT && ((oflags & O_DIRECT) != 0);
     off_t end = 0;
+    const unsigned long long sleep_step = (long long)10*1024*1024*1024;
+    unsigned long long next_sleep = sleep_step;
 
 #if HAVE_POSIX_MEMALIGN
     if (posix_memalign(&base, alignMask + 1, buflen)) {
@@ -128,6 +130,12 @@ runIO(const char *path, int fd, int oflags)
 
         total += got;
 
+        /* sleeps for a while to avoid hunging other tasks */
+        if (total > next_sleep) {
+            next_sleep += sleep_step;
+            usleep(100*1000);
+        }
+
         /* handle last write size align in direct case */
         if (got < buflen && direct && fdout == fd) {
             ssize_t aligned_got = (got + alignMask) & ~alignMask;
-- 
2.20.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC] iohelper: Introduces a small sleep do avoid hunging other tasks
Posted by Leonardo Bras 4 years, 11 months ago
On Thu, 2019-05-09 at 17:59 -0300, Leonardo Bras wrote:
> While dumping very large VMs (over 128GB), iohelper seems to cause
> very intense IO usage on the disk, and it causes some processes
> (like journald) to hung, and depending on kernel configuration,
> to panic.
> 
> This change creates a time window, after every 10GB written, so
> this processes can write to the disk, and avoid hunging.
> 
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> ---
>  src/util/iohelper.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/src/util/iohelper.c b/src/util/iohelper.c
> index ddc338b7c7..164c1e2085 100644
> --- a/src/util/iohelper.c
> +++ b/src/util/iohelper.c
> @@ -52,6 +52,8 @@ runIO(const char *path, int fd, int oflags)
>      unsigned long long total = 0;
>      bool direct = O_DIRECT && ((oflags & O_DIRECT) != 0);
>      off_t end = 0;
> +    const unsigned long long sleep_step = (long long)10*1024*1024*1024;
> +    unsigned long long next_sleep = sleep_step;
>  
>  #if HAVE_POSIX_MEMALIGN
>      if (posix_memalign(&base, alignMask + 1, buflen)) {
> @@ -128,6 +130,12 @@ runIO(const char *path, int fd, int oflags)
>  
>          total += got;
>  
> +        /* sleeps for a while to avoid hunging other tasks */
> +        if (total > next_sleep) {
> +            next_sleep += sleep_step;
> +            usleep(100*1000);
> +        }
> +
>          /* handle last write size align in direct case */
>          if (got < buflen && direct && fdout == fd) {
>              ssize_t aligned_got = (got + alignMask) & ~alignMask;

please note there is a typo on the subject:
current:
    iohelper: Introduces a small sleep do avoid hunging other tasks
fixed:
    iohelper: Introduces a small sleep to avoid hunging other tasks

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC] iohelper: Introduces a small sleep do avoid hunging other tasks
Posted by Daniel P. Berrangé 4 years, 11 months ago
On Thu, May 09, 2019 at 05:59:36PM -0300, Leonardo Bras wrote:
> While dumping very large VMs (over 128GB), iohelper seems to cause
> very intense IO usage on the disk, and it causes some processes
> (like journald) to hung, and depending on kernel configuration,
> to panic.

What causes a panic here ?

This just sounds like a I/O schedular issue in the end. The
storage can't cope with the huge data sizes and the kernel is
not fairly giving time to other processes. Surely the answer
here is to tune the kernel params to improve I/O fairness

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC] iohelper: Introduces a small sleep do avoid hunging other tasks
Posted by Peter Krempa 4 years, 11 months ago
On Thu, May 09, 2019 at 17:59:36 -0300, Leonardo Bras wrote:
> While dumping very large VMs (over 128GB), iohelper seems to cause
> very intense IO usage on the disk, and it causes some processes
> (like journald) to hung, and depending on kernel configuration,
> to panic.
> 
> This change creates a time window, after every 10GB written, so
> this processes can write to the disk, and avoid hunging.
> 
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>

I don't think this approach makes sense. This looks like something on
your host is seriously wrong as anything that would write a large file
would to the same to your host.

Unfortunately libvirt's API does not have a possibility to pass in
bandwidth limit for the dump operation which would allow to limit the
bandwidth when calling the API.

You can e.g. use cgroups to limit the bandwidth of children of the
libvirtd to work around this problem or perhaps switch to a better I/O
scheduler.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list