From nobody Mon Sep 16 19:22:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; dmarc=fail(p=reject dis=none) header.from=linux.ibm.com Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 1721217737802491.32574125967733; Wed, 17 Jul 2024 05:02:17 -0700 (PDT) Received: by lists.libvirt.org (Postfix, from userid 996) id 5FC0EADC; Wed, 17 Jul 2024 08:02:16 -0400 (EDT) Received: from lists.libvirt.org (localhost [IPv6:::1]) by lists.libvirt.org (Postfix) with ESMTP id 276EAA59; Wed, 17 Jul 2024 08:01:43 -0400 (EDT) Received: by lists.libvirt.org (Postfix, from userid 996) id 03326A55; Wed, 17 Jul 2024 08:01:40 -0400 (EDT) Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id 16ED3A44 for ; Wed, 17 Jul 2024 08:01:39 -0400 (EDT) Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 46HBRKWN024669 for ; Wed, 17 Jul 2024 12:01:38 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40e6h3gwx2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 17 Jul 2024 12:01:38 +0000 (GMT) Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 46HC1bBG011945 for ; Wed, 17 Jul 2024 12:01:37 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40e6h3gwwy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 17 Jul 2024 12:01:37 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 46H9aA0x000502; Wed, 17 Jul 2024 12:01:36 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 40dwkk4163-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 17 Jul 2024 12:01:34 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 46HC1RWm32637616 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 17 Jul 2024 12:01:29 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B2DF32004E; Wed, 17 Jul 2024 12:01:27 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8C9A320063; Wed, 17 Jul 2024 12:01:27 +0000 (GMT) Received: from fiuczyvm.. (unknown [9.152.222.239]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 17 Jul 2024 12:01:27 +0000 (GMT) X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from :to:cc:subject:date:message-id:content-transfer-encoding :mime-version; s=pp1; bh=ER/Lp2xiKPRKFE6+sSUr+b1ov6WjlKJw9i3WUrx lG1w=; b=AcgTZYjyWmNzVSN89QRJOnm+QAlng8Uuv/T//B5+Ni0mR6JMZtvBECf NC1p1P3qQOIC4CxgAOkA/Uu8d1TE28yfTNMvJem6QQBGholFqzGBj0Xcb10FD76z uBm3I2ONqaF7/mc35G/m9l+SvDSEPzjqvSvwUWyhEvrzyyY0feOMBLaN03MjcQTl vzP2GKnthC2RraqfDWazAahVtYH2sIocRnexSde77n4KdcuK35YOKM42/pxciG5O Mt0pWa7FhW30T7ff4ZVLqp/LBgCzbmcSZK30Y2HePEUpf4uwvUgkjTRZnpMEAnm/ kY6q6dNmN/mUMwj6vr2i74VrQvTIV2g== From: Boris Fiuczynski To: devel@lists.libvirt.org Subject: [PATCH] qemu: add a monitor to /proc/$pid when killing times out Date: Wed, 17 Jul 2024 14:01:27 +0200 Message-ID: <20240717120127.103871-1-fiuczy@linux.ibm.com> X-Mailer: git-send-email 2.45.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: y1jKYOHQGP51MC0OXeycNZUtxium0_45 X-Proofpoint-ORIG-GUID: tNRTqD-BXKDUKGWksURGfHoGk3exghP8 X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-17_08,2024-07-17_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 bulkscore=0 suspectscore=0 mlxscore=0 impostorscore=0 spamscore=0 phishscore=0 adultscore=0 clxscore=1015 lowpriorityscore=0 priorityscore=1501 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2407110000 definitions=main-2407170092 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: JYOIN7QNJQWMKTHPPG7P27SBBU6EACR7 X-Message-ID-Hash: JYOIN7QNJQWMKTHPPG7P27SBBU6EACR7 X-MailFrom: fiuczy@linux.ibm.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-config-1; header-match-config-2; header-match-config-3; header-match-devel.lists.libvirt.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: jdenemar@redhat.com, mhartmay@linux.ibm.com X-Mailman-Version: 3.2.2 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1721217738637116600 Content-Type: text/plain; charset="utf-8" In cases when a QEMU process takes longer than the time sigterm and sigkill are issued to kill the process do not simply fail and leave the VM in state VIR_DOMAIN_SHUTDOWN until the daemon stops. Instead set up an fd on /proc/$pid and get notified when the QEMU process finally has terminated to cleanup the VM state. Resolves: https://issues.redhat.com/browse/RHEL-28819 Signed-off-by: Boris Fiuczynski --- src/qemu/qemu_domain.c | 8 +++ src/qemu/qemu_domain.h | 2 + src/qemu/qemu_driver.c | 18 ++++++ src/qemu/qemu_process.c | 127 ++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_process.h | 1 + 5 files changed, 151 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2134b11038..96f4e41a11 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1889,6 +1889,11 @@ qemuDomainObjPrivateFree(void *data) =20 virChrdevFree(priv->devs); =20 + if (priv->watchPid >=3D 0) { + virEventRemoveHandle(priv->watchPid); + priv->watchPid =3D -1; + } + /* This should never be non-NULL if we get here, but just in case... */ if (priv->mon) { VIR_ERROR(_("Unexpected QEMU monitor still active during domain de= letion")); @@ -1934,6 +1939,8 @@ qemuDomainObjPrivateAlloc(void *opaque) priv->blockjobs =3D virHashNew(virObjectUnref); priv->fds =3D virHashNew(g_object_unref); =20 + priv->watchPid =3D -1; + /* agent commands block by default, user can choose different behavior= */ priv->agentTimeout =3D VIR_DOMAIN_AGENT_RESPONSE_TIMEOUT_BLOCK; priv->migMaxBandwidth =3D QEMU_DOMAIN_MIG_BANDWIDTH_MAX; @@ -11680,6 +11687,7 @@ qemuProcessEventFree(struct qemuProcessEvent *event) case QEMU_PROCESS_EVENT_RESET: case QEMU_PROCESS_EVENT_NBDKIT_EXITED: case QEMU_PROCESS_EVENT_MONITOR_EOF: + case QEMU_PROCESS_EVENT_SHUTDOWN_COMPLETED: case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index d777559119..e5366c6e8c 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -119,6 +119,7 @@ struct _qemuDomainObjPrivate { =20 bool beingDestroyed; char *pidfile; + int watchPid; =20 virDomainPCIAddressSet *pciaddrs; virDomainUSBAddressSet *usbaddrs; @@ -469,6 +470,7 @@ typedef enum { QEMU_PROCESS_EVENT_UNATTENDED_MIGRATION, QEMU_PROCESS_EVENT_RESET, QEMU_PROCESS_EVENT_NBDKIT_EXITED, + QEMU_PROCESS_EVENT_SHUTDOWN_COMPLETED, =20 QEMU_PROCESS_EVENT_LAST } qemuProcessEventType; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 9f3013e231..6b1e4084f6 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4041,6 +4041,21 @@ processNbdkitExitedEvent(virDomainObj *vm, } =20 =20 +static void +processShutdownCompletedEvent(virQEMUDriver *driver, + virDomainObj *vm) +{ + if (virDomainObjBeginJob(vm, VIR_JOB_MODIFY) < 0) + return; + + if (virDomainObjIsActive(vm)) + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_UNKNOWN, + VIR_ASYNC_JOB_NONE, 0); + + virDomainObjEndJob(vm); +} + + static void qemuProcessEventHandler(void *data, void *opaque) { struct qemuProcessEvent *processEvent =3D data; @@ -4101,6 +4116,9 @@ static void qemuProcessEventHandler(void *data, void = *opaque) case QEMU_PROCESS_EVENT_NBDKIT_EXITED: processNbdkitExitedEvent(vm, processEvent->data); break; + case QEMU_PROCESS_EVENT_SHUTDOWN_COMPLETED: + processShutdownCompletedEvent(driver, vm); + break; case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 25dfd04272..d6dbd7ba53 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -25,6 +25,7 @@ #include #include #include +#include #if defined(__linux__) # include #elif defined(__FreeBSD__) @@ -8387,9 +8388,119 @@ qemuProcessCreatePretendCmdBuild(virDomainObj *vm, } =20 =20 +typedef struct { + virDomainObj *vm; + int pidfd; +} qemuProcessInShutdownEventData; + + +static qemuProcessInShutdownEventData* +qemuProcessInShutdownEventDataNew(virDomainObj *vm, int pidfd) +{ + qemuProcessInShutdownEventData *d =3D g_new(qemuProcessInShutdownEvent= Data, 1); + d->vm =3D virObjectRef(vm); + d->pidfd =3D pidfd; + return d; +} + + +static void +qemuProcessInShutdownEventDataFree(qemuProcessInShutdownEventData *d) +{ + virObjectUnref(d->vm); + VIR_FORCE_CLOSE(d->pidfd); + g_free(d); +} + + +static void +qemuProcessInShutdownStopMonitor(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + + VIR_DEBUG("vm=3D%p name=3D%s pid=3D%lld watchPid=3D%d", + vm, vm->def->name, (long long)vm->pid, + priv->watchPid); + + virObjectLock(vm); + if (priv->watchPid >=3D 0) { + virEventRemoveHandle(priv->watchPid); + priv->watchPid =3D -1; + } + virObjectUnlock(vm); +} + + +static void +qemuProcessInShutdownPidfdCb(int watch G_GNUC_UNUSED, + int fd, + int events G_GNUC_UNUSED, + void *opaque) +{ + qemuProcessInShutdownEventData *data =3D opaque; + virDomainObj *vm =3D data->vm; + + VIR_DEBUG("vm=3D%p name=3D%s pid=3D%lld fd=3D%d", + vm, vm->def->name, (long long)vm->pid, fd); + + VIR_DEBUG("QEMU process %lld finally completed termination", + (long long)vm->pid); + qemuProcessInShutdownStopMonitor(vm); + + qemuProcessEventSubmit(vm, QEMU_PROCESS_EVENT_SHUTDOWN_COMPLETED, + 0, 0, NULL); +} + + +static int +qemuProcessInShutdownStartMonitor(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + qemuProcessInShutdownEventData *data; + int pidfd; + int ret =3D -1; + + VIR_DEBUG("vm=3D%p name=3D%s pid=3D%lld watchPid=3D%d", + vm, vm->def->name, (long long)vm->pid, + priv->watchPid); + + if (priv->watchPid >=3D 0) { + VIR_DEBUG("Monitoring qemu in-shutdown process %i already set up",= vm->pid); + goto cleanup; + } + + pidfd =3D syscall(SYS_pidfd_open, vm->pid, 0); + if (pidfd < 0) { + if (errno =3D=3D ESRCH) /* process has already terminated */ + ret =3D 1; + goto cleanup; + } + + data =3D qemuProcessInShutdownEventDataNew(vm, pidfd); + if ((priv->watchPid =3D virEventAddHandle(pidfd, + VIR_EVENT_HANDLE_READABLE, + qemuProcessInShutdownPidfdCb, + data, + (virFreeCallback)qemuProcessIn= ShutdownEventDataFree)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("failed to monitor qemu in-shutdown process %1$i"), + vm->pid); + qemuProcessInShutdownEventDataFree(data); + goto cleanup; + } + VIR_DEBUG("Monitoring qemu in-shutdown process %i for termination", vm= ->pid); + ret =3D 0; + + cleanup: + return ret; +} + + int qemuProcessKill(virDomainObj *vm, unsigned int flags) { + int ret =3D -1; + VIR_DEBUG("vm=3D%p name=3D%s pid=3D%lld flags=3D0x%x", vm, vm->def->name, (long long)vm->pid, flags); @@ -8410,10 +8521,16 @@ qemuProcessKill(virDomainObj *vm, unsigned int flag= s) =20 /* Request an extra delay of two seconds per current nhostdevs * to be safe against stalls by the kernel freeing up the resources */ - return virProcessKillPainfullyDelay(vm->pid, - !!(flags & VIR_QEMU_PROCESS_KILL_F= ORCE), - vm->def->nhostdevs * 2, - false); + ret =3D virProcessKillPainfullyDelay(vm->pid, + !!(flags & VIR_QEMU_PROCESS_KILL_FO= RCE), + vm->def->nhostdevs * 2, + false); + + if (ret < 0 && (flags & VIR_QEMU_PROCESS_KILL_MONITOR_ON_ERROR)) + if (qemuProcessInShutdownStartMonitor(vm) =3D=3D 1) + ret =3D 0; /* process termination detected */ + + return ret; } =20 =20 @@ -8438,7 +8555,7 @@ qemuProcessBeginStopJob(virDomainObj *vm, * cleared inside qemuProcessStop */ priv->beingDestroyed =3D true; =20 - if (qemuProcessKill(vm, killFlags) < 0) + if (qemuProcessKill(vm, killFlags|VIR_QEMU_PROCESS_KILL_MONITOR_ON_ERR= OR) < 0) goto error; =20 /* Wake up anything waiting on domain condition */ diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index cb67bfcd2d..2324aeb7bd 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -180,6 +180,7 @@ typedef enum { VIR_QEMU_PROCESS_KILL_FORCE =3D 1 << 0, VIR_QEMU_PROCESS_KILL_NOWAIT =3D 1 << 1, VIR_QEMU_PROCESS_KILL_NOCHECK =3D 1 << 2, /* bypass the running vm chec= k */ + VIR_QEMU_PROCESS_KILL_MONITOR_ON_ERROR =3D 1 << 3, /* on error enable p= rocess monitor */ } virQemuProcessKillMode; =20 int qemuProcessKill(virDomainObj *vm, unsigned int flags); --=20 2.45.0