From nobody Thu Apr 3 10:05:07 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1741385870; cv=none; d=zohomail.com; s=zohoarc; b=URaRjcOY/HMGJFlrUyHCdqEVas77Yd2RIJ8selNE5B1cyl7u1CH/OyhvVLGYnh0MjTxP/PnNux3W6DHisIcEwoC7y44cH+4VVsJTvoD/NrzJKuwU3Jni4Fjcj9MTIHauJrPeBDgmTnNYKomfcis01vb4lgMhjy3+0e30+UBm3RU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1741385870; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=RAEyVNp/AZ40N8kx3WluPGDoxq+FiGv4Fk+Txpq0sqk=; b=DPTYtzpeyi+W9bT2OvEf/rDDcJP4mgWLx+aAOLQZMg1E9sX5wsiJqC68g7JHSXU1jr/cgW1BJ7qheJa1GS/1y3JWHIImDir9WtoR1eUQIaIE7zyFb5LQHSkODwXPKONL75nG2lQIBIAyiMyqjVmPavjK6ixIODJsTHK0LW6SXG8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1741385870412718.5369994566984; Fri, 7 Mar 2025 14:17:50 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tqg0T-00026s-GN; Fri, 07 Mar 2025 17:17:25 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqg03-0001ul-IR for qemu-devel@nongnu.org; Fri, 07 Mar 2025 17:16:59 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqg01-0007oC-04 for qemu-devel@nongnu.org; Fri, 07 Mar 2025 17:16:59 -0500 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-345-oJq7u8p-NZKQsLfEbbOhrA-1; Fri, 07 Mar 2025 17:16:52 -0500 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 484011955D4B; Fri, 7 Mar 2025 22:16:51 +0000 (UTC) Received: from merkur.redhat.com (unknown [10.45.226.27]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id AB070180174F; Fri, 7 Mar 2025 22:16:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741385815; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RAEyVNp/AZ40N8kx3WluPGDoxq+FiGv4Fk+Txpq0sqk=; b=GsDpZwKfmecsJ7RFQsiCcQ6Fwtjn28iRdM5rExvhzY7hEunNyuNaE6W5CGXsh56JqTuM97 hWNYb7x6rSe5xSfMn0oNa3C/EyHta2g91u0ocUVcply8LO/FnW80iSoL8rZO+kPep09e8v Ubf7jXBaqUcEeFiTyDJOf/IXIgzHyaY= X-MC-Unique: oJq7u8p-NZKQsLfEbbOhrA-1 X-Mimecast-MFC-AGG-ID: oJq7u8p-NZKQsLfEbbOhrA_1741385811 From: Kevin Wolf To: qemu-block@nongnu.org Cc: kwolf@redhat.com, stefanha@redhat.com, pbonzini@redhat.com, afaria@redhat.com, hreitz@redhat.com, qemu-devel@nongnu.org Subject: [PATCH 1/5] file-posix: Support FUA writes Date: Fri, 7 Mar 2025 23:16:30 +0100 Message-ID: <20250307221634.71951-2-kwolf@redhat.com> In-Reply-To: <20250307221634.71951-1-kwolf@redhat.com> References: <20250307221634.71951-1-kwolf@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=kwolf@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1741385872582019000 Content-Type: text/plain; charset="utf-8" Until now, FUA was always emulated with a separate flush after the write for file-posix. The overhead of processing a second request can reduce performance significantly for a guest disk that has disabled the write cache, especially if the host disk is already write through, too, and the flush isn't actually doing anything. Advertise support for REQ_FUA in write requests and implement it for Linux AIO and io_uring using the RWF_DSYNC flag for write requests. The thread pool still performs a separate fdatasync() call. This can be improved later by using the pwritev2() syscall if available. As an example, this is how fio numbers can be improved in some scenarios with this patch (all using virtio-blk with cache=3Ddirectsync on an nvme block device for the VM, fio with ioengine=3Dlibaio,direct=3D1,sync=3D1): | old | with FUA support Reviewed-by: Stefan Hajnoczi ------------------------------+---------------+------------------- bs=3D4k, iodepth=3D1, numjobs=3D1 | 45.6k iops | 56.1k iops bs=3D4k, iodepth=3D1, numjobs=3D16 | 183.3k iops | 236.0k iops bs=3D4k, iodepth=3D16, numjobs=3D1 | 258.4k iops | 311.1k iops However, not all scenarios are clear wins. On another slower disk I saw little to no improvment. In fact, in two corner case scenarios, I even observed a regression, which I however consider acceptable: 1. On slow host disks in a write through cache mode, when the guest is using virtio-blk in a separate iothread so that polling can be enabled, and each completion is quickly followed up with a new request (so that polling gets it), it can happen that enabling FUA makes things slower - the additional very fast no-op flush we used to have gave the adaptive polling algorithm a success so that it kept polling. Without it, we only have the slow write request, which disables polling. This is a problem in the polling algorithm that will be fixed later in this series. 2. With a high queue depth, it can be beneficial to have flush requests for another reason: The optimisation in bdrv_co_flush() that flushes only once per write generation acts as a synchronisation mechanism that lets all requests complete at the same time. This can result in better batching and if the disk is very fast (I only saw this with a null_blk backend), this can make up for the overhead of the flush and improve throughput. In theory, we could optionally introduce a similar artificial latency in the normal completion path to achieve the same kind of completion batching. This is not implemented in this series. Compatibility is not a concern for io_uring, it has supported RWF_DSYNC from the start. Linux AIO started supporting it in Linux 4.13 and libaio 0.3.111. The kernel is not a problem for any supported build platform, so it's not necessary to add runtime checks. However, openSUSE is still stuck with an older libaio version that would break the build. We must detect this at build time to avoid build failures. Signed-off-by: Kevin Wolf --- include/block/raw-aio.h | 8 ++++++-- block/file-posix.c | 26 ++++++++++++++++++-------- block/io_uring.c | 13 ++++++++----- block/linux-aio.c | 24 +++++++++++++++++++++--- meson.build | 4 ++++ 5 files changed, 57 insertions(+), 18 deletions(-) diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h index 626706827f..247bdbff13 100644 --- a/include/block/raw-aio.h +++ b/include/block/raw-aio.h @@ -17,6 +17,7 @@ #define QEMU_RAW_AIO_H =20 #include "block/aio.h" +#include "block/block-common.h" #include "qemu/iov.h" =20 /* AIO request types */ @@ -58,9 +59,11 @@ void laio_cleanup(LinuxAioState *s); =20 /* laio_co_submit: submit I/O requests in the thread's current AioContext.= */ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qio= v, - int type, uint64_t dev_max_batch); + int type, BdrvRequestFlags flags, + uint64_t dev_max_batch); =20 bool laio_has_fdsync(int); +bool laio_has_fua(void); void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context); void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context); #endif @@ -71,7 +74,8 @@ void luring_cleanup(LuringState *s); =20 /* luring_co_submit: submit I/O requests in the thread's current AioContex= t. */ int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t o= ffset, - QEMUIOVector *qiov, int type); + QEMUIOVector *qiov, int type, + BdrvRequestFlags flags); void luring_detach_aio_context(LuringState *s, AioContext *old_context); void luring_attach_aio_context(LuringState *s, AioContext *new_context); #endif diff --git a/block/file-posix.c b/block/file-posix.c index 44e16dda87..0f1c722804 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -194,6 +194,7 @@ static int fd_open(BlockDriverState *bs) } =20 static int64_t raw_getlength(BlockDriverState *bs); +static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs); =20 typedef struct RawPosixAIOData { BlockDriverState *bs; @@ -804,6 +805,10 @@ static int raw_open_common(BlockDriverState *bs, QDict= *options, #endif s->needs_alignment =3D raw_needs_alignment(bs); =20 + if (!s->use_linux_aio || laio_has_fua()) { + bs->supported_write_flags =3D BDRV_REQ_FUA; + } + bs->supported_zero_flags =3D BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK; if (S_ISREG(st.st_mode)) { /* When extending regular files, we get zeros from the OS */ @@ -2477,7 +2482,8 @@ static inline bool raw_check_linux_aio(BDRVRawState *= s) #endif =20 static int coroutine_fn raw_co_prw(BlockDriverState *bs, int64_t *offset_p= tr, - uint64_t bytes, QEMUIOVector *qiov, int= type) + uint64_t bytes, QEMUIOVector *qiov, int= type, + int flags) { BDRVRawState *s =3D bs->opaque; RawPosixAIOData acb; @@ -2508,13 +2514,13 @@ static int coroutine_fn raw_co_prw(BlockDriverState= *bs, int64_t *offset_ptr, #ifdef CONFIG_LINUX_IO_URING } else if (raw_check_linux_io_uring(s)) { assert(qiov->size =3D=3D bytes); - ret =3D luring_co_submit(bs, s->fd, offset, qiov, type); + ret =3D luring_co_submit(bs, s->fd, offset, qiov, type, flags); goto out; #endif #ifdef CONFIG_LINUX_AIO } else if (raw_check_linux_aio(s)) { assert(qiov->size =3D=3D bytes); - ret =3D laio_co_submit(s->fd, offset, qiov, type, + ret =3D laio_co_submit(s->fd, offset, qiov, type, flags, s->aio_max_batch); goto out; #endif @@ -2534,6 +2540,10 @@ static int coroutine_fn raw_co_prw(BlockDriverState = *bs, int64_t *offset_ptr, =20 assert(qiov->size =3D=3D bytes); ret =3D raw_thread_pool_submit(handle_aiocb_rw, &acb); + if (ret =3D=3D 0 && (flags & BDRV_REQ_FUA)) { + /* TODO Use pwritev2() instead if it's available */ + ret =3D raw_co_flush_to_disk(bs); + } goto out; /* Avoid the compiler err of unused label */ =20 out: @@ -2571,14 +2581,14 @@ static int coroutine_fn raw_co_preadv(BlockDriverSt= ate *bs, int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags) { - return raw_co_prw(bs, &offset, bytes, qiov, QEMU_AIO_READ); + return raw_co_prw(bs, &offset, bytes, qiov, QEMU_AIO_READ, flags); } =20 static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offse= t, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags) { - return raw_co_prw(bs, &offset, bytes, qiov, QEMU_AIO_WRITE); + return raw_co_prw(bs, &offset, bytes, qiov, QEMU_AIO_WRITE, flags); } =20 static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs) @@ -2600,12 +2610,12 @@ static int coroutine_fn raw_co_flush_to_disk(BlockD= riverState *bs) =20 #ifdef CONFIG_LINUX_IO_URING if (raw_check_linux_io_uring(s)) { - return luring_co_submit(bs, s->fd, 0, NULL, QEMU_AIO_FLUSH); + return luring_co_submit(bs, s->fd, 0, NULL, QEMU_AIO_FLUSH, 0); } #endif #ifdef CONFIG_LINUX_AIO if (s->has_laio_fdsync && raw_check_linux_aio(s)) { - return laio_co_submit(s->fd, 0, NULL, QEMU_AIO_FLUSH, 0); + return laio_co_submit(s->fd, 0, NULL, QEMU_AIO_FLUSH, 0, 0); } #endif return raw_thread_pool_submit(handle_aiocb_flush, &acb); @@ -3540,7 +3550,7 @@ static int coroutine_fn raw_co_zone_append(BlockDrive= rState *bs, } =20 trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS); - return raw_co_prw(bs, offset, len, qiov, QEMU_AIO_ZONE_APPEND); + return raw_co_prw(bs, offset, len, qiov, QEMU_AIO_ZONE_APPEND, 0); } #endif =20 diff --git a/block/io_uring.c b/block/io_uring.c index f52b66b340..dc967dbf91 100644 --- a/block/io_uring.c +++ b/block/io_uring.c @@ -335,15 +335,17 @@ static void luring_deferred_fn(void *opaque) * */ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s, - uint64_t offset, int type) + uint64_t offset, int type, BdrvRequestFlags fl= ags) { int ret; struct io_uring_sqe *sqes =3D &luringcb->sqeq; + int luring_flags; =20 switch (type) { case QEMU_AIO_WRITE: - io_uring_prep_writev(sqes, fd, luringcb->qiov->iov, - luringcb->qiov->niov, offset); + luring_flags =3D (flags & BDRV_REQ_FUA) ? RWF_DSYNC : 0; + io_uring_prep_writev2(sqes, fd, luringcb->qiov->iov, + luringcb->qiov->niov, offset, luring_flags); break; case QEMU_AIO_ZONE_APPEND: io_uring_prep_writev(sqes, fd, luringcb->qiov->iov, @@ -380,7 +382,8 @@ static int luring_do_submit(int fd, LuringAIOCB *luring= cb, LuringState *s, } =20 int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t o= ffset, - QEMUIOVector *qiov, int type) + QEMUIOVector *qiov, int type, + BdrvRequestFlags flags) { int ret; AioContext *ctx =3D qemu_get_current_aio_context(); @@ -393,7 +396,7 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs,= int fd, uint64_t offset, }; trace_luring_co_submit(bs, s, &luringcb, fd, offset, qiov ? qiov->size= : 0, type); - ret =3D luring_do_submit(fd, &luringcb, s, offset, type); + ret =3D luring_do_submit(fd, &luringcb, s, offset, type, flags); =20 if (ret < 0) { return ret; diff --git a/block/linux-aio.c b/block/linux-aio.c index 194c8f434f..1108ae361c 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -368,15 +368,23 @@ static void laio_deferred_fn(void *opaque) } =20 static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset, - int type, uint64_t dev_max_batch) + int type, BdrvRequestFlags flags, + uint64_t dev_max_batch) { LinuxAioState *s =3D laiocb->ctx; struct iocb *iocbs =3D &laiocb->iocb; QEMUIOVector *qiov =3D laiocb->qiov; + int laio_flags; =20 switch (type) { case QEMU_AIO_WRITE: +#ifdef HAVE_IO_PREP_PWRITEV2 + laio_flags =3D (flags & BDRV_REQ_FUA) ? RWF_DSYNC : 0; + io_prep_pwritev2(iocbs, fd, qiov->iov, qiov->niov, offset, laio_fl= ags); +#else + assert(flags =3D=3D 0); io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset); +#endif break; case QEMU_AIO_ZONE_APPEND: io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset); @@ -409,7 +417,8 @@ static int laio_do_submit(int fd, struct qemu_laiocb *l= aiocb, off_t offset, } =20 int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qio= v, - int type, uint64_t dev_max_batch) + int type, BdrvRequestFlags flags, + uint64_t dev_max_batch) { int ret; AioContext *ctx =3D qemu_get_current_aio_context(); @@ -422,7 +431,7 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset= , QEMUIOVector *qiov, .qiov =3D qiov, }; =20 - ret =3D laio_do_submit(fd, &laiocb, offset, type, dev_max_batch); + ret =3D laio_do_submit(fd, &laiocb, offset, type, flags, dev_max_batch= ); if (ret < 0) { return ret; } @@ -505,3 +514,12 @@ bool laio_has_fdsync(int fd) io_destroy(ctx); return (ret =3D=3D -EINVAL) ? false : true; } + +bool laio_has_fua(void) +{ +#ifdef HAVE_IO_PREP_PWRITEV2 + return true; +#else + return false; +#endif +} diff --git a/meson.build b/meson.build index 0ee79c664d..b2b2c9bb46 100644 --- a/meson.build +++ b/meson.build @@ -2724,6 +2724,10 @@ config_host_data.set('HAVE_OPTRESET', cc.has_header_symbol('getopt.h', 'optreset')) config_host_data.set('HAVE_IPPROTO_MPTCP', cc.has_header_symbol('netinet/in.h', 'IPPROTO_MPTCP')) +if libaio.found() + config_host_data.set('HAVE_IO_PREP_PWRITEV2', + cc.has_header_symbol('libaio.h', 'io_prep_pwritev2'= )) +endif =20 # has_member config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID', --=20 2.48.1 From nobody Thu Apr 3 10:05:07 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1741385866; cv=none; d=zohomail.com; s=zohoarc; b=bMzYH04M5y5mvDyGEsh+XhcutRi0ZEmsoKw5a89a9pNRpwDGPTir+uejKDSYrNfLeHLScf/diT5YC7qkzyO/ThAsaudET49EDfPGP0fZCZIM8KLC6wTUkisT3BbUf+O9PCQm8KloAgwP0i0JpKAVMT7X4GEJT+DC3M2n30eAmD4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1741385866; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=EVbiYipccsLmBbLc7DH923fS89CDAwHcTdukc84RR4Y=; b=Vix/UBefq5eT83neQlY8lJS5jJng89MpV3YcXEWhbr/cqs/BnRTv10Y6q4NJ7lHcjCVhircW/eUTr5wghstoXZ6HxJLXtpy18Kt6Gx6K0QI4O2ofY+nUtCjd7O5BmPF7fR+/SaSz+vqqkz12RIK8/q+QzPCuXhFIG+4DAqiIHJc= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1741385866068572.9344961156577; Fri, 7 Mar 2025 14:17:46 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tqg0U-0002DZ-QZ; Fri, 07 Mar 2025 17:17:26 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqg05-0001v4-Bf for qemu-devel@nongnu.org; Fri, 07 Mar 2025 17:17:01 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqg02-0007oz-OU for qemu-devel@nongnu.org; Fri, 07 Mar 2025 17:17:00 -0500 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-658-fCezoiiwPPGs7yNOFZB_Pg-1; Fri, 07 Mar 2025 17:16:55 -0500 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 21A2A19560BC; Fri, 7 Mar 2025 22:16:54 +0000 (UTC) Received: from merkur.redhat.com (unknown [10.45.226.27]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B81D018009BC; Fri, 7 Mar 2025 22:16:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741385818; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EVbiYipccsLmBbLc7DH923fS89CDAwHcTdukc84RR4Y=; b=YYL/bnkGMOdUlVY6hxGVsQrhPQoALl4Det8hoH9l2mGLPugrg39UUn/VVtV5Ig2tuDSXnO hi4e31aOa1e7cmtiLPV/u6DN9bwBvoTvO9epUsJc3+GoxdR6Ng98XJSaGIeeeUybIgC890 JFhytOnxGjOrJlW/Qsgb6eq+WAN53K0= X-MC-Unique: fCezoiiwPPGs7yNOFZB_Pg-1 X-Mimecast-MFC-AGG-ID: fCezoiiwPPGs7yNOFZB_Pg_1741385814 From: Kevin Wolf To: qemu-block@nongnu.org Cc: kwolf@redhat.com, stefanha@redhat.com, pbonzini@redhat.com, afaria@redhat.com, hreitz@redhat.com, qemu-devel@nongnu.org Subject: [PATCH 2/5] block/io: Ignore FUA with cache.no-flush=on Date: Fri, 7 Mar 2025 23:16:31 +0100 Message-ID: <20250307221634.71951-3-kwolf@redhat.com> In-Reply-To: <20250307221634.71951-1-kwolf@redhat.com> References: <20250307221634.71951-1-kwolf@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=kwolf@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1741385868365019000 Content-Type: text/plain; charset="utf-8" For block drivers that don't advertise FUA support, we already call bdrv_co_flush(), which considers BDRV_O_NO_FLUSH. However, drivers that do support FUA still see the FUA flag with BDRV_O_NO_FLUSH and get the associated performance penalty that cache.no-flush=3Don was supposed to avoid. Clear FUA for write requests if BDRV_O_NO_FLUSH is set. Signed-off-by: Kevin Wolf Reviewed-by: Stefan Hajnoczi --- block/io.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/block/io.c b/block/io.c index d369b994df..1ba8d1aeea 100644 --- a/block/io.c +++ b/block/io.c @@ -1058,6 +1058,10 @@ bdrv_driver_pwritev(BlockDriverState *bs, int64_t of= fset, int64_t bytes, return -ENOMEDIUM; } =20 + if (bs->open_flags & BDRV_O_NO_FLUSH) { + flags &=3D ~BDRV_REQ_FUA; + } + if ((flags & BDRV_REQ_FUA) && (~bs->supported_write_flags & BDRV_REQ_FUA)) { flags &=3D ~BDRV_REQ_FUA; --=20 2.48.1 From nobody Thu Apr 3 10:05:07 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1741385927; cv=none; d=zohomail.com; s=zohoarc; b=ljX2dAUYxJt78cG1Iu9XKY0opV6gVflst+YtmgImbTPlZQoEbcSqfqA/OU95I9PePdbmBpoRjkcLxUPyY4YzrmV9Jx6LQL1tiaSCWR/sg9ScPskEqzahIzzp0CpktDtf/TeYTPse5q4yTWGyshl8UMOjjBM86dhCT9g4aj7MXbM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1741385927; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=M8Xl/SVYSEAKwlrx0RSXnQBf/T5BGKVCST6sd5FrBLU=; b=b1htIC6jRMZzeZyhfnhyy8CuwdscAZoO7n9Pn19BvhK/+SRcKJ3LO2Umh1OnI19rzsEQKpjwQiv9I0z1WLvKkCr2oCZdIfCJ6d33kCPP51sHa/CJtwFKWl3x1uxkGkGKfwACnLYip0c547Y9NBi+Ton66/vL9Vb3BHZY8pXB9jI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 174138592710387.96536341987314; Fri, 7 Mar 2025 14:18:47 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tqg0c-0002Um-Li; Fri, 07 Mar 2025 17:17:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqg07-0001wp-FR for qemu-devel@nongnu.org; Fri, 07 Mar 2025 17:17:04 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqg05-0007pj-Tr for qemu-devel@nongnu.org; Fri, 07 Mar 2025 17:17:03 -0500 Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-591-RzOrYB8KN0ajvNFv7lfhoA-1; Fri, 07 Mar 2025 17:16:58 -0500 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3E0AD1956083; Fri, 7 Mar 2025 22:16:57 +0000 (UTC) Received: from merkur.redhat.com (unknown [10.45.226.27]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C27C018009AE; Fri, 7 Mar 2025 22:16:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741385821; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M8Xl/SVYSEAKwlrx0RSXnQBf/T5BGKVCST6sd5FrBLU=; b=Kd9cjLT8Ea2HcIqm4Q88xA4beBcfv3JuQHVuwk/kSC2h7+s38l/zIPojMqnX8DX38mlOYj LA26RcXOU/SrdSYtoKID5UfX/tbGGjZ//iqDiomDajTwYt6SJVEFSFOxekneol5+luwApu CtGSIsFi2Nvz1D15dGrHepo5PdE8wNg= X-MC-Unique: RzOrYB8KN0ajvNFv7lfhoA-1 X-Mimecast-MFC-AGG-ID: RzOrYB8KN0ajvNFv7lfhoA_1741385817 From: Kevin Wolf To: qemu-block@nongnu.org Cc: kwolf@redhat.com, stefanha@redhat.com, pbonzini@redhat.com, afaria@redhat.com, hreitz@redhat.com, qemu-devel@nongnu.org Subject: [PATCH 3/5] aio: Create AioPolledEvent Date: Fri, 7 Mar 2025 23:16:32 +0100 Message-ID: <20250307221634.71951-4-kwolf@redhat.com> In-Reply-To: <20250307221634.71951-1-kwolf@redhat.com> References: <20250307221634.71951-1-kwolf@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=kwolf@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1741385928528019100 Content-Type: text/plain; charset="utf-8" As a preparation for having multiple adaptive polling states per AioContext, move the 'ns' field into a separate struct. Signed-off-by: Kevin Wolf Reviewed-by: Stefan Hajnoczi --- include/block/aio.h | 6 +++++- util/aio-posix.c | 31 ++++++++++++++++--------------- util/async.c | 3 ++- 3 files changed, 23 insertions(+), 17 deletions(-) diff --git a/include/block/aio.h b/include/block/aio.h index 43883a8a33..49f46e01cb 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -123,6 +123,10 @@ struct BHListSlice { =20 typedef QSLIST_HEAD(, AioHandler) AioHandlerSList; =20 +typedef struct AioPolledEvent { + int64_t ns; /* current polling time in nanoseconds */ +} AioPolledEvent; + struct AioContext { GSource source; =20 @@ -229,7 +233,7 @@ struct AioContext { int poll_disable_cnt; =20 /* Polling mode parameters */ - int64_t poll_ns; /* current polling time in nanoseconds */ + AioPolledEvent poll; int64_t poll_max_ns; /* maximum polling time in nanoseconds */ int64_t poll_grow; /* polling time growth factor */ int64_t poll_shrink; /* polling time shrink factor */ diff --git a/util/aio-posix.c b/util/aio-posix.c index 06bf9f456c..95bddb9e4b 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -585,7 +585,7 @@ static bool try_poll_mode(AioContext *ctx, AioHandlerLi= st *ready_list, return false; } =20 - max_ns =3D qemu_soonest_timeout(*timeout, ctx->poll_ns); + max_ns =3D qemu_soonest_timeout(*timeout, ctx->poll.ns); if (max_ns && !ctx->fdmon_ops->need_wait(ctx)) { /* * Enable poll mode. It pairs with the poll_set_started() in @@ -683,40 +683,40 @@ bool aio_poll(AioContext *ctx, bool blocking) if (ctx->poll_max_ns) { int64_t block_ns =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - star= t; =20 - if (block_ns <=3D ctx->poll_ns) { + if (block_ns <=3D ctx->poll.ns) { /* This is the sweet spot, no adjustment needed */ } else if (block_ns > ctx->poll_max_ns) { /* We'd have to poll for too long, poll less */ - int64_t old =3D ctx->poll_ns; + int64_t old =3D ctx->poll.ns; =20 if (ctx->poll_shrink) { - ctx->poll_ns /=3D ctx->poll_shrink; + ctx->poll.ns /=3D ctx->poll_shrink; } else { - ctx->poll_ns =3D 0; + ctx->poll.ns =3D 0; } =20 - trace_poll_shrink(ctx, old, ctx->poll_ns); - } else if (ctx->poll_ns < ctx->poll_max_ns && + trace_poll_shrink(ctx, old, ctx->poll.ns); + } else if (ctx->poll.ns < ctx->poll_max_ns && block_ns < ctx->poll_max_ns) { /* There is room to grow, poll longer */ - int64_t old =3D ctx->poll_ns; + int64_t old =3D ctx->poll.ns; int64_t grow =3D ctx->poll_grow; =20 if (grow =3D=3D 0) { grow =3D 2; } =20 - if (ctx->poll_ns) { - ctx->poll_ns *=3D grow; + if (ctx->poll.ns) { + ctx->poll.ns *=3D grow; } else { - ctx->poll_ns =3D 4000; /* start polling at 4 microseconds = */ + ctx->poll.ns =3D 4000; /* start polling at 4 microseconds = */ } =20 - if (ctx->poll_ns > ctx->poll_max_ns) { - ctx->poll_ns =3D ctx->poll_max_ns; + if (ctx->poll.ns > ctx->poll_max_ns) { + ctx->poll.ns =3D ctx->poll_max_ns; } =20 - trace_poll_grow(ctx, old, ctx->poll_ns); + trace_poll_grow(ctx, old, ctx->poll.ns); } } =20 @@ -770,8 +770,9 @@ void aio_context_set_poll_params(AioContext *ctx, int64= _t max_ns, /* No thread synchronization here, it doesn't matter if an incorrect v= alue * is used once. */ + ctx->poll.ns =3D 0; + ctx->poll_max_ns =3D max_ns; - ctx->poll_ns =3D 0; ctx->poll_grow =3D grow; ctx->poll_shrink =3D shrink; =20 diff --git a/util/async.c b/util/async.c index 0fe2943609..38667ea091 100644 --- a/util/async.c +++ b/util/async.c @@ -609,7 +609,8 @@ AioContext *aio_context_new(Error **errp) qemu_rec_mutex_init(&ctx->lock); timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx); =20 - ctx->poll_ns =3D 0; + ctx->poll.ns =3D 0; + ctx->poll_max_ns =3D 0; ctx->poll_grow =3D 0; ctx->poll_shrink =3D 0; --=20 2.48.1 From nobody Thu Apr 3 10:05:07 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1741385902; cv=none; d=zohomail.com; s=zohoarc; b=hvnI1kzYpjwbdTBeekiAI/IoObXSqkB27ruqtC1evpwHSRsBKhwqMMuFZ7PBvGkUYsio/LQrOn1Nw9YKLzyNdGVKQeZ2eYVAi0uWMuunbAJI6gMtdN4C8hWkGE4c0gq5IwDH3+LSxyvlEAW8fuo4tvkqz2In9c8a8uH8Az6RNYY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1741385902; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=wJL3WPQI5s6cGIM3kVhEFn9DBhYrLWjKFlvy/ltL1tc=; b=aBGSS751uutyXgsLVlKODnIV1+89Q3ltVju/5t4zz8yt3xH6OOppzcMTwDcvoH6QOxTK68UlzdUOrSBZnlrk9Vk0vhi65L9CNHs8bu62fTgbf7f4G/0080RDN97n3C82Uyd8fB3VvEfXreKL7QrLSE4EPzTa2r4oWyMgOGBieHA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 174138590205894.09953708677187; Fri, 7 Mar 2025 14:18:22 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tqg0c-0002Xm-QR; Fri, 07 Mar 2025 17:17:35 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqg0E-0001xd-3Y for qemu-devel@nongnu.org; Fri, 07 Mar 2025 17:17:15 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqg08-0007qH-6T for qemu-devel@nongnu.org; Fri, 07 Mar 2025 17:17:08 -0500 Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-494-zmfypdpMMjmcEs2UiIXejA-1; Fri, 07 Mar 2025 17:17:01 -0500 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1FB8E18004A9; Fri, 7 Mar 2025 22:17:00 +0000 (UTC) Received: from merkur.redhat.com (unknown [10.45.226.27]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B2CA018009BC; Fri, 7 Mar 2025 22:16:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741385823; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wJL3WPQI5s6cGIM3kVhEFn9DBhYrLWjKFlvy/ltL1tc=; b=cU76rhm/KPTMR5WYnVNkp/gV3a5QqA2D4IBkD9880MNI7dgbFrUWsgpvVPe6rX8OzItzt8 qKfCe42yGUn8I4g7M242PzOv6bK69kZVTGouPsXSR7DWqcchSrCNrMvt+0146lVGG8Gsqg cRgHzAFOMTvRew5Ho64eiZ5DysSofW8= X-MC-Unique: zmfypdpMMjmcEs2UiIXejA-1 X-Mimecast-MFC-AGG-ID: zmfypdpMMjmcEs2UiIXejA_1741385820 From: Kevin Wolf To: qemu-block@nongnu.org Cc: kwolf@redhat.com, stefanha@redhat.com, pbonzini@redhat.com, afaria@redhat.com, hreitz@redhat.com, qemu-devel@nongnu.org Subject: [PATCH 4/5] aio-posix: Factor out adjust_polling_time() Date: Fri, 7 Mar 2025 23:16:33 +0100 Message-ID: <20250307221634.71951-5-kwolf@redhat.com> In-Reply-To: <20250307221634.71951-1-kwolf@redhat.com> References: <20250307221634.71951-1-kwolf@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=kwolf@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1741385904086019100 Content-Type: text/plain; charset="utf-8" Signed-off-by: Kevin Wolf Reviewed-by: Stefan Hajnoczi --- util/aio-posix.c | 77 ++++++++++++++++++++++++++---------------------- 1 file changed, 41 insertions(+), 36 deletions(-) diff --git a/util/aio-posix.c b/util/aio-posix.c index 95bddb9e4b..259827c7ad 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -600,6 +600,46 @@ static bool try_poll_mode(AioContext *ctx, AioHandlerL= ist *ready_list, return false; } =20 +static void adjust_polling_time(AioContext *ctx, AioPolledEvent *poll, + int64_t block_ns) +{ + if (block_ns <=3D poll->ns) { + /* This is the sweet spot, no adjustment needed */ + } else if (block_ns > ctx->poll_max_ns) { + /* We'd have to poll for too long, poll less */ + int64_t old =3D poll->ns; + + if (ctx->poll_shrink) { + poll->ns /=3D ctx->poll_shrink; + } else { + poll->ns =3D 0; + } + + trace_poll_shrink(ctx, old, poll->ns); + } else if (poll->ns < ctx->poll_max_ns && + block_ns < ctx->poll_max_ns) { + /* There is room to grow, poll longer */ + int64_t old =3D poll->ns; + int64_t grow =3D ctx->poll_grow; + + if (grow =3D=3D 0) { + grow =3D 2; + } + + if (poll->ns) { + poll->ns *=3D grow; + } else { + poll->ns =3D 4000; /* start polling at 4 microseconds */ + } + + if (poll->ns > ctx->poll_max_ns) { + poll->ns =3D ctx->poll_max_ns; + } + + trace_poll_grow(ctx, old, poll->ns); + } +} + bool aio_poll(AioContext *ctx, bool blocking) { AioHandlerList ready_list =3D QLIST_HEAD_INITIALIZER(ready_list); @@ -682,42 +722,7 @@ bool aio_poll(AioContext *ctx, bool blocking) /* Adjust polling time */ if (ctx->poll_max_ns) { int64_t block_ns =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - star= t; - - if (block_ns <=3D ctx->poll.ns) { - /* This is the sweet spot, no adjustment needed */ - } else if (block_ns > ctx->poll_max_ns) { - /* We'd have to poll for too long, poll less */ - int64_t old =3D ctx->poll.ns; - - if (ctx->poll_shrink) { - ctx->poll.ns /=3D ctx->poll_shrink; - } else { - ctx->poll.ns =3D 0; - } - - trace_poll_shrink(ctx, old, ctx->poll.ns); - } else if (ctx->poll.ns < ctx->poll_max_ns && - block_ns < ctx->poll_max_ns) { - /* There is room to grow, poll longer */ - int64_t old =3D ctx->poll.ns; - int64_t grow =3D ctx->poll_grow; - - if (grow =3D=3D 0) { - grow =3D 2; - } - - if (ctx->poll.ns) { - ctx->poll.ns *=3D grow; - } else { - ctx->poll.ns =3D 4000; /* start polling at 4 microseconds = */ - } - - if (ctx->poll.ns > ctx->poll_max_ns) { - ctx->poll.ns =3D ctx->poll_max_ns; - } - - trace_poll_grow(ctx, old, ctx->poll.ns); - } + adjust_polling_time(ctx, &ctx->poll, block_ns); } =20 progress |=3D aio_bh_poll(ctx); --=20 2.48.1 From nobody Thu Apr 3 10:05:07 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1741385949; cv=none; d=zohomail.com; s=zohoarc; b=bW+7WuqANeeJeo4+qR1jOs37PSq2U+MdKPSXCiIcDYYSH9wIEBw736Owzorw2mk2vSMkqD3IkwFZTcu4LqhGU7eZWAgOl1NbOXPRpfLheu4QujyDmHU0L6t0s3qWwn6Xssq5wYh/IYqfy3O8bI5+6BDgC4Kzqd+HwIukoYJp6fU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1741385949; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=RaQQrEx2Ahh4c2BeaSu3GJgqvoChkJywKOm5eAsE82U=; b=gCpxtHjGHDXazPmpyjO/8z812I+eesNGU0AEVT0BUAMZhwsioFdFBkhrtfozUjjQZHdcLjMVLLcDlpfBYJ4EceoMmFb/J6Au95S01LNJfi09KILnw2kIwbSv6+qHdRpJdpLThQB/mf1giE+5ucKYRFgRpukTnU/vBstulSuVt00= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1741385949048487.1088806766121; Fri, 7 Mar 2025 14:19:09 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tqg0V-0002Hc-6P; Fri, 07 Mar 2025 17:17:27 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqg0E-0001xe-3q for qemu-devel@nongnu.org; Fri, 07 Mar 2025 17:17:15 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqg0B-0007qn-5u for qemu-devel@nongnu.org; Fri, 07 Mar 2025 17:17:09 -0500 Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-371-RAb5pKIAOBq8VZlPr3lolg-1; Fri, 07 Mar 2025 17:17:03 -0500 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 02DC11800257; Fri, 7 Mar 2025 22:17:03 +0000 (UTC) Received: from merkur.redhat.com (unknown [10.45.226.27]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 95E31180174F; Fri, 7 Mar 2025 22:17:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741385826; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RaQQrEx2Ahh4c2BeaSu3GJgqvoChkJywKOm5eAsE82U=; b=SXNnZSALnqjIqvmNesxGVdWz2jIT613KFB0ThQxZh7Thb7XGe3An8R19aLOS4ldclHH/E6 80KUpzC79Tb9X20GDVfWuToA0sD/FPdpRkMPnIniaqHiuZkE9fsKgPHFZDO2alHo3Mkcn3 v1UA7XxF7tCpzDxBaM3OF0xdALHGskU= X-MC-Unique: RAb5pKIAOBq8VZlPr3lolg-1 X-Mimecast-MFC-AGG-ID: RAb5pKIAOBq8VZlPr3lolg_1741385823 From: Kevin Wolf To: qemu-block@nongnu.org Cc: kwolf@redhat.com, stefanha@redhat.com, pbonzini@redhat.com, afaria@redhat.com, hreitz@redhat.com, qemu-devel@nongnu.org Subject: [PATCH 5/5] aio-posix: Separate AioPolledEvent per AioHandler Date: Fri, 7 Mar 2025 23:16:34 +0100 Message-ID: <20250307221634.71951-6-kwolf@redhat.com> In-Reply-To: <20250307221634.71951-1-kwolf@redhat.com> References: <20250307221634.71951-1-kwolf@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=kwolf@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1741385950213019100 Content-Type: text/plain; charset="utf-8" Adaptive polling has a big problem: It doesn't consider that an event loop can wait for many different events that may have very different typical latencies. For example, think of a guest that tends to send a new I/O request soon after the previous I/O request completes, but the storage on the host is rather slow. In this case, getting the new request from guest quickly means that polling is enabled, but the next thing is performing the I/O request on the backend, which is slow and disables polling again for the next guest request. This means that in such a scenario, polling could help for every other event, but is only ever enabled when it can't succeed. In order to fix this, keep a separate AioPolledEvent for each AioHandler. We will then know that the backend file descriptor always has a high latency and isn't worth polling for, but we also know that the guest is always fast and we should poll for it. This solves at least half of the problem, we can now keep polling for those cases where it makes sense and get the improved performance from it. Since the event loop doesn't know which event will be next, we still do some unnecessary polling while we're waiting for the slow disk. I made some attempts to be more clever than just randomly growing and shrinking the polling time, and even to let callers be explicit about when they expect a new event, but so far this hasn't resulted in improved performance or even caused performance regressions. For now, let's just fix the part that is easy enough to fix, we can revisit the rest later. Signed-off-by: Kevin Wolf --- include/block/aio.h | 1 - util/aio-posix.h | 1 + util/aio-posix.c | 24 +++++++++++++++++++++--- util/async.c | 2 -- 4 files changed, 22 insertions(+), 6 deletions(-) diff --git a/include/block/aio.h b/include/block/aio.h index 49f46e01cb..0ef7ce48e3 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -233,7 +233,6 @@ struct AioContext { int poll_disable_cnt; =20 /* Polling mode parameters */ - AioPolledEvent poll; int64_t poll_max_ns; /* maximum polling time in nanoseconds */ int64_t poll_grow; /* polling time growth factor */ int64_t poll_shrink; /* polling time shrink factor */ diff --git a/util/aio-posix.h b/util/aio-posix.h index 4264c518be..82a0201ea4 100644 --- a/util/aio-posix.h +++ b/util/aio-posix.h @@ -38,6 +38,7 @@ struct AioHandler { #endif int64_t poll_idle_timeout; /* when to stop userspace polling */ bool poll_ready; /* has polling detected an event? */ + AioPolledEvent poll; }; =20 /* Add a handler to a ready list */ diff --git a/util/aio-posix.c b/util/aio-posix.c index 259827c7ad..2251871c61 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -579,13 +579,19 @@ static bool run_poll_handlers(AioContext *ctx, AioHan= dlerList *ready_list, static bool try_poll_mode(AioContext *ctx, AioHandlerList *ready_list, int64_t *timeout) { + AioHandler *node; int64_t max_ns; =20 if (QLIST_EMPTY_RCU(&ctx->poll_aio_handlers)) { return false; } =20 - max_ns =3D qemu_soonest_timeout(*timeout, ctx->poll.ns); + max_ns =3D 0; + QLIST_FOREACH(node, &ctx->poll_aio_handlers, node_poll) { + max_ns =3D MAX(max_ns, node->poll.ns); + } + max_ns =3D qemu_soonest_timeout(*timeout, max_ns); + if (max_ns && !ctx->fdmon_ops->need_wait(ctx)) { /* * Enable poll mode. It pairs with the poll_set_started() in @@ -721,8 +727,14 @@ bool aio_poll(AioContext *ctx, bool blocking) =20 /* Adjust polling time */ if (ctx->poll_max_ns) { + AioHandler *node; int64_t block_ns =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - star= t; - adjust_polling_time(ctx, &ctx->poll, block_ns); + + QLIST_FOREACH(node, &ctx->poll_aio_handlers, node_poll) { + if (QLIST_IS_INSERTED(node, node_ready)) { + adjust_polling_time(ctx, &node->poll, block_ns); + } + } } =20 progress |=3D aio_bh_poll(ctx); @@ -772,10 +784,16 @@ void aio_context_use_g_source(AioContext *ctx) void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns, int64_t grow, int64_t shrink, Error **err= p) { + AioHandler *node; + /* No thread synchronization here, it doesn't matter if an incorrect v= alue * is used once. */ - ctx->poll.ns =3D 0; + qemu_lockcnt_inc(&ctx->list_lock); + QLIST_FOREACH(node, &ctx->aio_handlers, node) { + node->poll.ns =3D 0; + } + qemu_lockcnt_dec(&ctx->list_lock); =20 ctx->poll_max_ns =3D max_ns; ctx->poll_grow =3D grow; diff --git a/util/async.c b/util/async.c index 38667ea091..4124a948fd 100644 --- a/util/async.c +++ b/util/async.c @@ -609,8 +609,6 @@ AioContext *aio_context_new(Error **errp) qemu_rec_mutex_init(&ctx->lock); timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx); =20 - ctx->poll.ns =3D 0; - ctx->poll_max_ns =3D 0; ctx->poll_grow =3D 0; ctx->poll_shrink =3D 0; --=20 2.48.1