From nobody Mon Sep 16 19:07:11 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; dmarc=fail(p=reject dis=none) header.from=linux.ibm.com Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 1721403906735469.7104408946167; Fri, 19 Jul 2024 08:45:06 -0700 (PDT) Received: by lists.libvirt.org (Postfix, from userid 996) id 8089CA77; Fri, 19 Jul 2024 11:45:05 -0400 (EDT) Received: from lists.libvirt.org (localhost [IPv6:::1]) by lists.libvirt.org (Postfix) with ESMTP id 82B579EE; Fri, 19 Jul 2024 11:44:30 -0400 (EDT) Received: by lists.libvirt.org (Postfix, from userid 996) id 79B7F9E5; Fri, 19 Jul 2024 11:44:27 -0400 (EDT) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id 8873C9D0 for ; Fri, 19 Jul 2024 11:44:26 -0400 (EDT) Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 46JEvTMi019652 for ; Fri, 19 Jul 2024 15:44:25 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40fsga89xk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 19 Jul 2024 15:44:24 +0000 (GMT) Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 46JFiOGI028906 for ; Fri, 19 Jul 2024 15:44:24 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40fsga89xe-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Jul 2024 15:44:24 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 46JCnNtd029216; Fri, 19 Jul 2024 15:44:23 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 40dwkjfv7d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Jul 2024 15:44:22 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 46JFiJxT55902586 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 19 Jul 2024 15:44:21 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E4B1B20043; Fri, 19 Jul 2024 15:44:18 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BEEA120040; Fri, 19 Jul 2024 15:44:18 +0000 (GMT) Received: from fiuczyvm.. (unknown [9.171.133.182]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 19 Jul 2024 15:44:18 +0000 (GMT) X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from :to:cc:subject:date:message-id:content-transfer-encoding :mime-version; s=pp1; bh=O4gFQEN4/VyoPmSnUmYIkJg3YkOmTRU7oWoERSa 2ln4=; b=D/sfjiRKoRY6cNBBs1Zaqzp2Z4FbCCloXwhNvH6Na1VdOWK96KsRk5b TPBB4tUBalo28tImOewEndq3T9QQcwiF6STZf28eLphHBCBXpf4SWU4dd28PdXE+ c/zpRSWGHjKUXX6uF+rjl1mOWAODlRiXz6IrtX8k0SeKQgYO0fgq54P6tFSeAFtw xKASbp6ajRSKJvVc/0wvNEq03C4GwVOIQQk2gibawCOz+MNXghI4EPrIabsh0tLM bf8CRb6S3+ayoDYr0h2lqmPeStTb/AhsqB/f38jtatKhRJJLHBLxT8n+LqYTeAWc QYSSEAAkAVJuuzj/NCNWG87kNLaL0Ng== From: Boris Fiuczynski To: devel@lists.libvirt.org Subject: [PATCH v3] qemu: add a monitor to /proc/$pid when killing times out Date: Fri, 19 Jul 2024 17:44:18 +0200 Message-ID: <20240719154418.111314-1-fiuczy@linux.ibm.com> X-Mailer: git-send-email 2.45.1 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: WuUAbhe9J7LuayTN52mXtm68ZXQiIGZ5 X-Proofpoint-GUID: 0P0UFstL0W3rEqXVD030j5fzP-SAEKXW X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-19_06,2024-07-18_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 malwarescore=0 bulkscore=0 adultscore=0 suspectscore=0 phishscore=0 mlxscore=0 lowpriorityscore=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2407110000 definitions=main-2407190117 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: T2VZXVWPAH6N7RFOY7IFGO7RVT6GFCYK X-Message-ID-Hash: T2VZXVWPAH6N7RFOY7IFGO7RVT6GFCYK X-MailFrom: fiuczy@linux.ibm.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-config-1; header-match-config-2; header-match-config-3; header-match-devel.lists.libvirt.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: jdenemar@redhat.com, mhartmay@linux.ibm.com X-Mailman-Version: 3.2.2 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1721403907557116600 Content-Type: text/plain; charset="utf-8" In cases when a QEMU process takes longer than the time sigterm and sigkill are issued to kill the process do not simply fail and leave the VM in state VIR_DOMAIN_SHUTDOWN until the daemon stops. Instead set up an fd on /proc/$pid and get notified when the QEMU process finally has terminated to cleanup the VM state. Resolves: https://issues.redhat.com/browse/RHEL-28819 Signed-off-by: Boris Fiuczynski Reviewed-by: Michal Privoznik --- src/qemu/qemu_domain.c | 8 +++ src/qemu/qemu_domain.h | 2 + src/qemu/qemu_driver.c | 18 ++++++ src/qemu/qemu_process.c | 124 ++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_process.h | 1 + 5 files changed, 148 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2134b11038..8147ff02fd 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1889,6 +1889,11 @@ qemuDomainObjPrivateFree(void *data) =20 virChrdevFree(priv->devs); =20 + if (priv->pidMonitored >=3D 0) { + virEventRemoveHandle(priv->pidMonitored); + priv->pidMonitored =3D -1; + } + /* This should never be non-NULL if we get here, but just in case... */ if (priv->mon) { VIR_ERROR(_("Unexpected QEMU monitor still active during domain de= letion")); @@ -1934,6 +1939,8 @@ qemuDomainObjPrivateAlloc(void *opaque) priv->blockjobs =3D virHashNew(virObjectUnref); priv->fds =3D virHashNew(g_object_unref); =20 + priv->pidMonitored =3D -1; + /* agent commands block by default, user can choose different behavior= */ priv->agentTimeout =3D VIR_DOMAIN_AGENT_RESPONSE_TIMEOUT_BLOCK; priv->migMaxBandwidth =3D QEMU_DOMAIN_MIG_BANDWIDTH_MAX; @@ -11680,6 +11687,7 @@ qemuProcessEventFree(struct qemuProcessEvent *event) case QEMU_PROCESS_EVENT_RESET: case QEMU_PROCESS_EVENT_NBDKIT_EXITED: case QEMU_PROCESS_EVENT_MONITOR_EOF: + case QEMU_PROCESS_EVENT_SHUTDOWN_COMPLETED: case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index d777559119..a5092dd7f0 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -119,6 +119,7 @@ struct _qemuDomainObjPrivate { =20 bool beingDestroyed; char *pidfile; + int pidMonitored; =20 virDomainPCIAddressSet *pciaddrs; virDomainUSBAddressSet *usbaddrs; @@ -469,6 +470,7 @@ typedef enum { QEMU_PROCESS_EVENT_UNATTENDED_MIGRATION, QEMU_PROCESS_EVENT_RESET, QEMU_PROCESS_EVENT_NBDKIT_EXITED, + QEMU_PROCESS_EVENT_SHUTDOWN_COMPLETED, =20 QEMU_PROCESS_EVENT_LAST } qemuProcessEventType; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 9f3013e231..6b1e4084f6 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4041,6 +4041,21 @@ processNbdkitExitedEvent(virDomainObj *vm, } =20 =20 +static void +processShutdownCompletedEvent(virQEMUDriver *driver, + virDomainObj *vm) +{ + if (virDomainObjBeginJob(vm, VIR_JOB_MODIFY) < 0) + return; + + if (virDomainObjIsActive(vm)) + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_UNKNOWN, + VIR_ASYNC_JOB_NONE, 0); + + virDomainObjEndJob(vm); +} + + static void qemuProcessEventHandler(void *data, void *opaque) { struct qemuProcessEvent *processEvent =3D data; @@ -4101,6 +4116,9 @@ static void qemuProcessEventHandler(void *data, void = *opaque) case QEMU_PROCESS_EVENT_NBDKIT_EXITED: processNbdkitExitedEvent(vm, processEvent->data); break; + case QEMU_PROCESS_EVENT_SHUTDOWN_COMPLETED: + processShutdownCompletedEvent(driver, vm); + break; case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 25dfd04272..c6f7d34168 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -25,6 +25,9 @@ #include #include #include +#if WITH_SYS_SYSCALL_H +# include +#endif #if defined(__linux__) # include #elif defined(__FreeBSD__) @@ -8387,9 +8390,114 @@ qemuProcessCreatePretendCmdBuild(virDomainObj *vm, } =20 =20 +#if WITH_SYS_SYSCALL_H && defined(SYS_pidfd_open) +typedef struct { + virDomainObj *vm; + int pidfd; +} qemuProcessInShutdownEventData; + + +static qemuProcessInShutdownEventData* +qemuProcessInShutdownEventDataNew(virDomainObj *vm, int pidfd) +{ + qemuProcessInShutdownEventData *d =3D g_new(qemuProcessInShutdownEvent= Data, 1); + d->vm =3D virObjectRef(vm); + d->pidfd =3D pidfd; + return d; +} + + +static void +qemuProcessInShutdownEventDataFree(qemuProcessInShutdownEventData *d) +{ + virObjectUnref(d->vm); + VIR_FORCE_CLOSE(d->pidfd); + g_free(d); +} + + +static void +qemuProcessInShutdownPidfdCb(int watch, + int fd, + int events G_GNUC_UNUSED, + void *opaque) +{ + qemuProcessInShutdownEventData *data =3D opaque; + virDomainObj *vm =3D data->vm; + + VIR_DEBUG("vm=3D%p name=3D%s pid=3D%lld fd=3D%d", + vm, vm->def->name, (long long)vm->pid, fd); + + virEventRemoveHandle(watch); + + virObjectLock(vm); + + VIR_INFO("QEMU process %lld finally completed termination", + (long long)vm->pid); + + QEMU_DOMAIN_PRIVATE(vm)->pidMonitored =3D -1; + qemuProcessEventSubmit(vm, QEMU_PROCESS_EVENT_SHUTDOWN_COMPLETED, + 0, 0, NULL); + + virObjectUnlock(vm); +} +#endif /* WITH_SYS_SYSCALL_H && defined(SYS_pidfd_open) */ + + +static int +qemuProcessInShutdownStartMonitor(virDomainObj *vm) +{ +#if WITH_SYS_SYSCALL_H && defined(SYS_pidfd_open) + qemuDomainObjPrivate *priv =3D vm->privateData; + qemuProcessInShutdownEventData *data; + int pidfd; + int ret =3D -1; + + VIR_DEBUG("vm=3D%p name=3D%s pid=3D%lld pidMonitored=3D%d", + vm, vm->def->name, (long long)vm->pid, + priv->pidMonitored); + + if (priv->pidMonitored >=3D 0) { + VIR_DEBUG("Monitoring qemu in-shutdown process %i already set up",= vm->pid); + goto cleanup; + } + + pidfd =3D syscall(SYS_pidfd_open, vm->pid, 0); + if (pidfd < 0) { + if (errno =3D=3D ESRCH) /* process has already terminated */ + ret =3D 1; + goto cleanup; + } + + data =3D qemuProcessInShutdownEventDataNew(vm, pidfd); + if ((priv->pidMonitored =3D virEventAddHandle(pidfd, + VIR_EVENT_HANDLE_READABLE, + qemuProcessInShutdownPidfd= Cb, + data, + (virFreeCallback)qemuProce= ssInShutdownEventDataFree)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("failed to monitor qemu in-shutdown process %1$i"), + vm->pid); + qemuProcessInShutdownEventDataFree(data); + goto cleanup; + } + VIR_DEBUG("Monitoring qemu in-shutdown process %i for termination", vm= ->pid); + ret =3D 0; + + cleanup: + return ret; +#else /* !WITH_SYS_SYSCALL_H || !defined(SYS_pidfd_open) */ + VIR_DEBUG("Monitoring qemu process %i not implemented", vm->pid); + return -1; +#endif /* !WITH_SYS_SYSCALL_H || !defined(SYS_pidfd_open) */ +} + + int qemuProcessKill(virDomainObj *vm, unsigned int flags) { + int ret =3D -1; + VIR_DEBUG("vm=3D%p name=3D%s pid=3D%lld flags=3D0x%x", vm, vm->def->name, (long long)vm->pid, flags); @@ -8410,10 +8518,16 @@ qemuProcessKill(virDomainObj *vm, unsigned int flag= s) =20 /* Request an extra delay of two seconds per current nhostdevs * to be safe against stalls by the kernel freeing up the resources */ - return virProcessKillPainfullyDelay(vm->pid, - !!(flags & VIR_QEMU_PROCESS_KILL_F= ORCE), - vm->def->nhostdevs * 2, - false); + ret =3D virProcessKillPainfullyDelay(vm->pid, + !!(flags & VIR_QEMU_PROCESS_KILL_FO= RCE), + vm->def->nhostdevs * 2, + false); + + if (ret < 0 && (flags & VIR_QEMU_PROCESS_KILL_MONITOR_ON_ERROR)) + if (qemuProcessInShutdownStartMonitor(vm) =3D=3D 1) + ret =3D 0; /* process termination detected */ + + return ret; } =20 =20 @@ -8438,7 +8552,7 @@ qemuProcessBeginStopJob(virDomainObj *vm, * cleared inside qemuProcessStop */ priv->beingDestroyed =3D true; =20 - if (qemuProcessKill(vm, killFlags) < 0) + if (qemuProcessKill(vm, killFlags|VIR_QEMU_PROCESS_KILL_MONITOR_ON_ERR= OR) < 0) goto error; =20 /* Wake up anything waiting on domain condition */ diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index cb67bfcd2d..2324aeb7bd 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -180,6 +180,7 @@ typedef enum { VIR_QEMU_PROCESS_KILL_FORCE =3D 1 << 0, VIR_QEMU_PROCESS_KILL_NOWAIT =3D 1 << 1, VIR_QEMU_PROCESS_KILL_NOCHECK =3D 1 << 2, /* bypass the running vm chec= k */ + VIR_QEMU_PROCESS_KILL_MONITOR_ON_ERROR =3D 1 << 3, /* on error enable p= rocess monitor */ } virQemuProcessKillMode; =20 int qemuProcessKill(virDomainObj *vm, unsigned int flags); --=20 2.45.0