From nobody Sun Apr 28 08:40:59 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.25 as permitted sender) client-ip=209.132.183.25; envelope-from=libvir-list-bounces@redhat.com; helo=mx4-phx2.redhat.com; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.25 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; Return-Path: Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by mx.zohomail.com with SMTPS id 1489408461488430.6833603638689; Mon, 13 Mar 2017 05:34:21 -0700 (PDT) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id v2DCU48w009682; Mon, 13 Mar 2017 08:30:04 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id v2DCU2x7015356 for ; Mon, 13 Mar 2017 08:30:02 -0400 Received: from moe.brq.redhat.com (dhcp129-131.brq.redhat.com [10.34.129.131]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v2DCTxff002868; Mon, 13 Mar 2017 08:30:01 -0400 From: Michal Privoznik To: libvir-list@redhat.com Date: Mon, 13 Mar 2017 13:29:52 +0100 Message-Id: <184c10b397a32617ddfc121555b41e79be75a323.1489405375.git.mprivozn@redhat.com> In-Reply-To: References: In-Reply-To: References: X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-loop: libvir-list@redhat.com Cc: mfleming@suse.de Subject: [libvirt] [PATCH 1/2] virTimeBackOffWait: Avoid long periods of sleep X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Type: text/plain; charset="utf-8" While connecting to qemu monitor, the first thing we do is for it to show up. However, we are doing it with some timeout to avoid indefinite waits (e.g. when qemu doesn't create the monitor socket at all). After beaa447a29 we are using exponential back off timeout meaning, after the first connection attempt we wait 1ms, then 2ms, then 4 and so on. This allows us to bring down wait time for small domains where qemu initializes quickly. However, on the other end of this scale are some domains with huge amounts of guest memory. Now imagine that we've gotten up to wait time of 15 seconds. The next one is going to be 30 seconds, and the one after that whole minute. Well, okay - with current code we are not going to wait longer than 30 seconds in total, but this is going to change in the next commit. The exponential back off is usable only for first few iterations. Then it needs to be caped (one second was chosen as the limit) and switch to constant wait time. Signed-off-by: Michal Privoznik --- src/util/virtime.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/src/util/virtime.c b/src/util/virtime.c index aac96918a..650b1d0f9 100644 --- a/src/util/virtime.c +++ b/src/util/virtime.c @@ -390,6 +390,9 @@ virTimeBackOffStart(virTimeBackOffVar *var, return 0; } =20 + +#define VIR_TIME_BACK_OFF_CAP 1000 + /** * virTimeBackOffWait * @var: Timeout variable (with type virTimeBackOffVar *). @@ -410,7 +413,9 @@ virTimeBackOffStart(virTimeBackOffVar *var, * The while loop that runs the body of the code repeatedly, with an * exponential backoff. It first waits for first milliseconds, then * runs the body, then waits for 2*first ms, then runs the body again. - * Then 4*first ms, and so on. + * Then 4*first ms, and so on, up until wait time would reach + * VIR_TIME_BACK_OFF_CAP (whole second). Then it switches to constant + * waiting time of VIR_TIME_BACK_OFF_CAP. * * When timeout milliseconds is reached, the while loop ends. * @@ -429,8 +434,13 @@ virTimeBackOffWait(virTimeBackOffVar *var) if (t > var->limit_t) return 0; /* ends the while loop */ =20 + /* Compute next wait time. Should go above VIR_TIME_BACK_OFF_CAP + * mark, cap it there to avoid long useless sleeps. */ next =3D var->next; - var->next *=3D 2; + if (var->next < VIR_TIME_BACK_OFF_CAP) + var->next *=3D 2; + else if (var->next > VIR_TIME_BACK_OFF_CAP) + var->next =3D VIR_TIME_BACK_OFF_CAP; =20 /* If sleeping would take us beyond the limit, then shorten the * sleep. This is so we always run the body just before the final --=20 2.11.0 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list From nobody Sun Apr 28 08:40:59 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.24 as permitted sender) client-ip=209.132.183.24; envelope-from=libvir-list-bounces@redhat.com; helo=mx3-phx2.redhat.com; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.24 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; Return-Path: Received: from mx3-phx2.redhat.com (mx3-phx2.redhat.com [209.132.183.24]) by mx.zohomail.com with SMTPS id 1489408416837623.9743034691863; Mon, 13 Mar 2017 05:33:36 -0700 (PDT) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx3-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id v2DCU4hE008375; Mon, 13 Mar 2017 08:30:04 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id v2DCU3RU015377 for ; Mon, 13 Mar 2017 08:30:03 -0400 Received: from moe.brq.redhat.com (dhcp129-131.brq.redhat.com [10.34.129.131]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v2DCTxfg002868; Mon, 13 Mar 2017 08:30:02 -0400 From: Michal Privoznik To: libvir-list@redhat.com Date: Mon, 13 Mar 2017 13:29:53 +0100 Message-Id: <70108c0f23c3e7a35f16bca1a50d6b1933382930.1489405375.git.mprivozn@redhat.com> In-Reply-To: References: In-Reply-To: References: X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-loop: libvir-list@redhat.com Cc: mfleming@suse.de Subject: [libvirt] [PATCH 2/2] qemu: Adaptive timeout for connecting to monitor X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Type: text/plain; charset="utf-8" There were couple of reports on the list (e.g. [1]) that guests with huge amounts of RAM are unable to start because libvirt kills qemu in the initialization phase. The problem is that if guest is configured to use hugepages kernel has to zero them all out before handing over to qemu process. For instance, 402GiB worth of 1GiB pages took around 105 seconds (~3.8GiB/s). Since we do not want to make the timeout for connecting to monitor configurable [2], we have to teach libvirt to count with this fact. This commit implements "1s per each 1GiB of RAM" approach as suggested here [3]. 1: https://www.redhat.com/archives/libvir-list/2017-March/msg00373.html 3: https://www.redhat.com/archives/libvir-list/2017-March/msg00405.html 2: The reason is that ideally someday it will be Libvirt who creates the monitor socket and qemu will just use it. Signed-off-by: Michal Privoznik --- src/qemu/qemu_capabilities.c | 2 +- src/qemu/qemu_monitor.c | 36 +++++++++++++++++++++++++++++++----- src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_process.c | 8 ++++++++ tests/qemumonitortestutils.c | 1 + 5 files changed, 42 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 5a3b4ac50..54dfd22d8 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -4761,7 +4761,7 @@ virQEMUCapsInitQMPCommandRun(virQEMUCapsInitQMPComman= dPtr cmd, cmd->vm->pid =3D cmd->pid; =20 if (!(cmd->mon =3D qemuMonitorOpen(cmd->vm, &cmd->config, true, - &callbacks, NULL))) + 0, &callbacks, NULL))) goto ignore; =20 virObjectLock(cmd->mon); diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index d71f84c80..272350bf5 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -327,11 +327,13 @@ qemuMonitorDispose(void *obj) =20 =20 static int -qemuMonitorOpenUnix(const char *monitor, pid_t cpid) +qemuMonitorOpenUnix(const char *monitor, + pid_t cpid, + unsigned long long timeout) { struct sockaddr_un addr; int monfd; - virTimeBackOffVar timeout; + virTimeBackOffVar timebackoff; int ret =3D -1; =20 if ((monfd =3D socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { @@ -348,9 +350,9 @@ qemuMonitorOpenUnix(const char *monitor, pid_t cpid) goto error; } =20 - if (virTimeBackOffStart(&timeout, 1, 30*1000 /* ms */) < 0) + if (virTimeBackOffStart(&timebackoff, 1, timeout * 1000) < 0) goto error; - while (virTimeBackOffWait(&timeout)) { + while (virTimeBackOffWait(&timebackoff)) { ret =3D connect(monfd, (struct sockaddr *) &addr, sizeof(addr)); =20 if (ret =3D=3D 0) @@ -871,10 +873,30 @@ qemuMonitorOpenInternal(virDomainObjPtr vm, } =20 =20 +#define QEMU_DEFAULT_MONITOR_WAIT 30 + +/** + * qemuMonitorOpen: + * @vm: domain object + * @config: monitor configuration + * @json: enable JSON on the monitor + * @timeout: how much seconds add to default timeout + * @cb: monitor event handles + * @opaque: opaque data for @cb + * + * Opens the monitor for running qemu. It may happen that it + * takes some time for qemu to create the monitor socket (e.g. + * because kernel is zeroing configured hugepages), therefore we + * wait up to default + timeout seconds for the monitor to show + * up after which a failure is claimed. + * + * Returns monitor object, NULL on error. + */ qemuMonitorPtr qemuMonitorOpen(virDomainObjPtr vm, virDomainChrSourceDefPtr config, bool json, + unsigned long long timeout, qemuMonitorCallbacksPtr cb, void *opaque) { @@ -882,10 +904,14 @@ qemuMonitorOpen(virDomainObjPtr vm, bool hasSendFD =3D false; qemuMonitorPtr ret; =20 + timeout +=3D QEMU_DEFAULT_MONITOR_WAIT; + switch (config->type) { case VIR_DOMAIN_CHR_TYPE_UNIX: hasSendFD =3D true; - if ((fd =3D qemuMonitorOpenUnix(config->data.nix.path, vm ? vm->pi= d : 0)) < 0) + if ((fd =3D qemuMonitorOpenUnix(config->data.nix.path, + vm ? vm->pid : 0, + timeout)) < 0) return NULL; break; =20 diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 847e9458a..3c37a6ffe 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -246,6 +246,7 @@ char *qemuMonitorUnescapeArg(const char *in); qemuMonitorPtr qemuMonitorOpen(virDomainObjPtr vm, virDomainChrSourceDefPtr config, bool json, + unsigned long long timeout, qemuMonitorCallbacksPtr cb, void *opaque) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(4); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index b9c1847bb..6a9c53aea 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1658,6 +1658,7 @@ qemuConnectMonitor(virQEMUDriverPtr driver, virDomain= ObjPtr vm, int asyncJob, qemuDomainObjPrivatePtr priv =3D vm->privateData; int ret =3D -1; qemuMonitorPtr mon =3D NULL; + unsigned long long timeout =3D 0; =20 if (qemuSecuritySetDaemonSocketLabel(driver->securityManager, vm->def)= < 0) { VIR_ERROR(_("Failed to set security context for monitor for %s"), @@ -1665,6 +1666,12 @@ qemuConnectMonitor(virQEMUDriverPtr driver, virDomai= nObjPtr vm, int asyncJob, return -1; } =20 + /* When using hugepages, kernel zeroes them out before + * handing them over to qemu. This can be very time + * consuming. Therefore, add a seconds to timeout for each + * 1GiB of guest RAM. */ + timeout =3D vm->def->mem.total_memory / (1024 * 1024); + /* Hold an extra reference because we can't allow 'vm' to be * deleted until the monitor gets its own reference. */ virObjectRef(vm); @@ -1675,6 +1682,7 @@ qemuConnectMonitor(virQEMUDriverPtr driver, virDomain= ObjPtr vm, int asyncJob, mon =3D qemuMonitorOpen(vm, priv->monConfig, priv->monJSON, + timeout, &monitorCallbacks, driver); =20 diff --git a/tests/qemumonitortestutils.c b/tests/qemumonitortestutils.c index cfd0a38cb..89857a662 100644 --- a/tests/qemumonitortestutils.c +++ b/tests/qemumonitortestutils.c @@ -1175,6 +1175,7 @@ qemuMonitorTestNew(bool json, if (!(test->mon =3D qemuMonitorOpen(test->vm, &src, json, + 0, &qemuMonitorTestCallbacks, driver))) goto error; --=20 2.11.0 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list From nobody Sun Apr 28 08:40:59 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.24 as permitted sender) client-ip=209.132.183.24; envelope-from=libvir-list-bounces@redhat.com; helo=mx3-phx2.redhat.com; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.24 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; Return-Path: Received: from mx3-phx2.redhat.com (mx3-phx2.redhat.com [209.132.183.24]) by mx.zohomail.com with SMTPS id 1489579836848987.4377592099072; Wed, 15 Mar 2017 05:10:36 -0700 (PDT) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx3-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id v2FC6QWj007188; Wed, 15 Mar 2017 08:06:26 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id v2FC6OcX016189 for ; Wed, 15 Mar 2017 08:06:24 -0400 Received: from moe.brq.redhat.com (dhcp129-131.brq.redhat.com [10.34.129.131]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v2FC6NRS027144 for ; Wed, 15 Mar 2017 08:06:24 -0400 From: Michal Privoznik To: libvir-list@redhat.com Date: Wed, 15 Mar 2017 13:06:12 +0100 Message-Id: In-Reply-To: References: X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-loop: libvir-list@redhat.com Subject: [libvirt] [PATCH 3/2] docs: Document adaptive timeout for qemu monitor X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Michal Privoznik --- docs/news.xml | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/docs/news.xml b/docs/news.xml index 04783aa5e..6ce6ab362 100644 --- a/docs/news.xml +++ b/docs/news.xml @@ -43,7 +43,20 @@
- + + + qemu: Adaptive timeout for connecting to monitor + + + When starting qemu, libvirt waits for qemu to create the monitor + socket which libvirt connects to. Historically, there was sharp = 30 + seconds timeout after which the qemu process was killed. This + approach is suboptimal as in some scenarios with huge amounts of + guest RAM it can take a minute or more for kernel to allocate and + zero out pages for qemu. The timeout is now flexible and compute= d by + libvirt at domain startup. + +
--=20 2.11.0 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list