From nobody Tue Feb 10 00:53:24 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 16615919790831000.290809345285; Sat, 27 Aug 2022 02:19:39 -0700 (PDT) Received: from localhost ([::1]:34196 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRryb-0003ch-VU for importer@patchew.org; Sat, 27 Aug 2022 05:19:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53530) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRrsc-0006kV-Vq; Sat, 27 Aug 2022 05:13:28 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21]:52558 helo=cstnet.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRrsN-0007t2-0m; Sat, 27 Aug 2022 05:13:26 -0400 Received: from localhost.localdomain (unknown [159.226.43.62]) by APP-01 (Coremail) with SMTP id qwCowAAHR2Eb4AljuYWkAA--.20130S3; Sat, 27 Aug 2022 17:13:03 +0800 (CST) From: Jinhao Fan To: qemu-devel@nongnu.org Cc: its@irrelevant.dk, kbusch@kernel.org, stefanha@gmail.com, Jinhao Fan , Klaus Jensen , qemu-block@nongnu.org (open list:nvme) Subject: [PATCH v3 1/4] hw/nvme: support irq(de)assertion with eventfd Date: Sat, 27 Aug 2022 17:12:55 +0800 Message-Id: <20220827091258.3589230-2-fanjinhao21s@ict.ac.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220827091258.3589230-1-fanjinhao21s@ict.ac.cn> References: <20220827091258.3589230-1-fanjinhao21s@ict.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAAHR2Eb4AljuYWkAA--.20130S3 X-Coremail-Antispam: 1UD129KBjvJXoW3Gr4fAF13Cry7Kw18tF1kZrb_yoWxZFWkpa 4kWrZa9Fs7Gr18Wa1YqanrJr1ru3yrJryDArsxt34xJrn3Cry3AFWUGF1UtFy5XrZ5Xry5 Z3yYqF47u348JaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPG14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r1Y6r1xM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84 ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_Cr1U M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx v20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1l F7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0Ew4C26cxK6c8Ij2 8IcwCY02Avz4vE14v_GFWl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1l x2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14 v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IY x2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87 Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIF yTuYvjfUUHUDUUUUU X-Originating-IP: [159.226.43.62] X-CM-SenderInfo: xidqyxpqkd0j0rv6xunwoduhdfq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=fanjinhao21s@ict.ac.cn; helo=cstnet.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZM-MESSAGEID: 1661591980912100003 Content-Type: text/plain; charset="utf-8" When the new option 'irq-eventfd' is turned on, the IO emulation code signals an eventfd when it want to (de)assert an irq. The main loop eventfd handler does the actual irq (de)assertion. This paves the way for iothread support since QEMU's interrupt emulation is not thread safe. Asserting and deasseting irq with eventfd has some performance implications. For small queue depth it increases request latency but for large queue depth it effectively coalesces irqs. Comparision (KIOPS): QD 1 4 16 64 QEMU 38 123 210 329 irq-eventfd 32 106 240 364 Signed-off-by: Jinhao Fan Signed-off-by: Klaus Jensen --- hw/nvme/ctrl.c | 120 ++++++++++++++++++++++++++++++++++++++++++------- hw/nvme/nvme.h | 3 ++ 2 files changed, 106 insertions(+), 17 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 87aeba0564..51792f3955 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -526,34 +526,57 @@ static void nvme_irq_check(NvmeCtrl *n) } } =20 +static void nvme_irq_do_assert(NvmeCtrl *n, NvmeCQueue *cq) +{ + if (msix_enabled(&(n->parent_obj))) { + trace_pci_nvme_irq_msix(cq->vector); + msix_notify(&(n->parent_obj), cq->vector); + } else { + trace_pci_nvme_irq_pin(); + assert(cq->vector < 32); + n->irq_status |=3D 1 << cq->vector; + nvme_irq_check(n); + } +} + static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq) { if (cq->irq_enabled) { - if (msix_enabled(&(n->parent_obj))) { - trace_pci_nvme_irq_msix(cq->vector); - msix_notify(&(n->parent_obj), cq->vector); + if (cq->assert_notifier.initialized) { + event_notifier_set(&cq->assert_notifier); } else { - trace_pci_nvme_irq_pin(); - assert(cq->vector < 32); - n->irq_status |=3D 1 << cq->vector; - nvme_irq_check(n); + nvme_irq_do_assert(n, cq); } } else { trace_pci_nvme_irq_masked(); } } =20 +static void nvme_irq_do_deassert(NvmeCtrl *n, NvmeCQueue *cq) +{ + if (msix_enabled(&(n->parent_obj))) { + return; + } else { + assert(cq->vector < 32); + if (!n->cq_pending) { + n->irq_status &=3D ~(1 << cq->vector); + } + nvme_irq_check(n); + } +} + static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq) { if (cq->irq_enabled) { - if (msix_enabled(&(n->parent_obj))) { - return; + if (cq->deassert_notifier.initialized) { + /* + * The deassert notifier will only be initilized when MSI-X is= NOT + * in use. Therefore no need to worry about extra eventfd sysc= all + * for pin-based interrupts. + */ + event_notifier_set(&cq->deassert_notifier); } else { - assert(cq->vector < 32); - if (!n->cq_pending) { - n->irq_status &=3D ~(1 << cq->vector); - } - nvme_irq_check(n); + nvme_irq_do_deassert(n, cq); } } } @@ -1338,6 +1361,50 @@ static void nvme_update_cq_head(NvmeCQueue *cq) trace_pci_nvme_shadow_doorbell_cq(cq->cqid, cq->head); } =20 +static void nvme_assert_notifier_read(EventNotifier *e) +{ + NvmeCQueue *cq =3D container_of(e, NvmeCQueue, assert_notifier); + if (event_notifier_test_and_clear(e)) { + nvme_irq_do_assert(cq->ctrl, cq); + } +} + +static void nvme_deassert_notifier_read(EventNotifier *e) +{ + NvmeCQueue *cq =3D container_of(e, NvmeCQueue, deassert_notifier); + if (event_notifier_test_and_clear(e)) { + nvme_irq_do_deassert(cq->ctrl, cq); + } +} + +static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq) +{ + int ret; + + ret =3D event_notifier_init(&cq->assert_notifier, 0); + if (ret < 0) { + return; + } + + event_notifier_set_handler(&cq->assert_notifier, + nvme_assert_notifier_read); + + if (!msix_enabled(&n->parent_obj)) { + ret =3D event_notifier_init(&cq->deassert_notifier, 0); + if (ret < 0) { + event_notifier_set_handler(&cq->assert_notifier, NULL); + event_notifier_cleanup(&cq->assert_notifier); + + return; + } + + event_notifier_set_handler(&cq->deassert_notifier, + nvme_deassert_notifier_read); + } + + return; +} + static void nvme_post_cqes(void *opaque) { NvmeCQueue *cq =3D opaque; @@ -1377,8 +1444,10 @@ static void nvme_post_cqes(void *opaque) QTAILQ_INSERT_TAIL(&sq->req_list, req, entry); } if (cq->tail !=3D cq->head) { - if (cq->irq_enabled && !pending) { - n->cq_pending++; + if (cq->irq_enabled) { + if (!pending) { + n->cq_pending++; + } } =20 nvme_irq_assert(n, cq); @@ -4705,6 +4774,14 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) event_notifier_set_handler(&cq->notifier, NULL); event_notifier_cleanup(&cq->notifier); } + if (cq->assert_notifier.initialized) { + event_notifier_set_handler(&cq->assert_notifier, NULL); + event_notifier_cleanup(&cq->assert_notifier); + } + if (cq->deassert_notifier.initialized) { + event_notifier_set_handler(&cq->deassert_notifier, NULL); + event_notifier_cleanup(&cq->deassert_notifier); + } if (msix_enabled(&n->parent_obj)) { msix_vector_unuse(&n->parent_obj, cq->vector); } @@ -4734,7 +4811,7 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest = *req) n->cq_pending--; } =20 - nvme_irq_deassert(n, cq); + nvme_irq_do_deassert(n, cq); trace_pci_nvme_del_cq(qid); nvme_free_cq(cq, n); return NVME_SUCCESS; @@ -4772,6 +4849,14 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n= , uint64_t dma_addr, } n->cq[cqid] =3D cq; cq->timer =3D timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq); + + /* + * Only enable irq eventfd for IO queues since we always emulate admin + * queue in main loop thread. + */ + if (cqid && n->params.irq_eventfd) { + nvme_init_irq_notifier(n, cq); + } } =20 static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req) @@ -7671,6 +7756,7 @@ static Property nvme_props[] =3D { DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false), DEFINE_PROP_BOOL("legacy-cmb", NvmeCtrl, params.legacy_cmb, false), DEFINE_PROP_BOOL("ioeventfd", NvmeCtrl, params.ioeventfd, false), + DEFINE_PROP_BOOL("irq-eventfd", NvmeCtrl, params.irq_eventfd, false), DEFINE_PROP_UINT8("zoned.zasl", NvmeCtrl, params.zasl, 0), DEFINE_PROP_BOOL("zoned.auto_transition", NvmeCtrl, params.auto_transition_zones, true), diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 79f5c281c2..4850d3e965 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -398,6 +398,8 @@ typedef struct NvmeCQueue { uint64_t ei_addr; QEMUTimer *timer; EventNotifier notifier; + EventNotifier assert_notifier; + EventNotifier deassert_notifier; bool ioeventfd_enabled; QTAILQ_HEAD(, NvmeSQueue) sq_list; QTAILQ_HEAD(, NvmeRequest) req_list; @@ -422,6 +424,7 @@ typedef struct NvmeParams { bool auto_transition_zones; bool legacy_cmb; bool ioeventfd; + bool irq_eventfd; uint8_t sriov_max_vfs; uint16_t sriov_vq_flexible; uint16_t sriov_vi_flexible; --=20 2.25.1 From nobody Tue Feb 10 00:53:24 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1661591961062655.5172829320696; Sat, 27 Aug 2022 02:19:21 -0700 (PDT) Received: from localhost ([::1]:47886 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRryJ-0002nh-Jp for importer@patchew.org; Sat, 27 Aug 2022 05:19:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53236) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRrsP-0006ZY-I2; Sat, 27 Aug 2022 05:13:13 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21]:52594 helo=cstnet.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRrsM-0007tA-N1; Sat, 27 Aug 2022 05:13:13 -0400 Received: from localhost.localdomain (unknown [159.226.43.62]) by APP-01 (Coremail) with SMTP id qwCowAAHR2Eb4AljuYWkAA--.20130S4; Sat, 27 Aug 2022 17:13:04 +0800 (CST) From: Jinhao Fan To: qemu-devel@nongnu.org Cc: its@irrelevant.dk, kbusch@kernel.org, stefanha@gmail.com, Jinhao Fan , Klaus Jensen , qemu-block@nongnu.org (open list:nvme) Subject: [PATCH v3 2/4] hw/nvme: use KVM irqfd when available Date: Sat, 27 Aug 2022 17:12:56 +0800 Message-Id: <20220827091258.3589230-3-fanjinhao21s@ict.ac.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220827091258.3589230-1-fanjinhao21s@ict.ac.cn> References: <20220827091258.3589230-1-fanjinhao21s@ict.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAAHR2Eb4AljuYWkAA--.20130S4 X-Coremail-Antispam: 1UD129KBjvJXoW3JF18trW8XFWUCr18Cw1xKrg_yoW3Gw1kpa 4kGFZ3uFs7JFyxWan0qrsrJrn5u39YqryUJw43K34xCF10kr9xAFW8GF1UAF1rGrZ8XF98 Z398tr4Uu34fXaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPG14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84 ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_Cr1U M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx v20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1l F7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0Ew4C26cxK6c8Ij2 8IcwCY02Avz4vE14v_GFWl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1l x2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14 v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IY x2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87 Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIF yTuYvjTRAjjgUUUUU X-Originating-IP: [159.226.43.62] X-CM-SenderInfo: xidqyxpqkd0j0rv6xunwoduhdfq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=fanjinhao21s@ict.ac.cn; helo=cstnet.cn X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZM-MESSAGEID: 1661591962951100001 Content-Type: text/plain; charset="utf-8" Use KVM's irqfd to send interrupts when possible. This approach is thread safe. Moreover, it does not have the inter-thread communication overhead of plain event notifiers since handler callback are called in the same system call as irqfd write. Signed-off-by: Jinhao Fan Signed-off-by: Klaus Jensen --- hw/nvme/ctrl.c | 145 ++++++++++++++++++++++++++++++++++++++++++- hw/nvme/nvme.h | 3 + hw/nvme/trace-events | 3 + 3 files changed, 149 insertions(+), 2 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 51792f3955..e11328967f 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -192,6 +192,7 @@ #include "qapi/error.h" #include "qapi/visitor.h" #include "sysemu/sysemu.h" +#include "sysemu/kvm.h" #include "sysemu/block-backend.h" #include "sysemu/hostmem.h" #include "hw/pci/msix.h" @@ -1377,8 +1378,115 @@ static void nvme_deassert_notifier_read(EventNotifi= er *e) } } =20 +static int nvme_kvm_vector_use(NvmeCtrl *n, NvmeCQueue *cq, uint32_t vecto= r) +{ + KVMRouteChange c =3D kvm_irqchip_begin_route_changes(kvm_state); + int ret; + + ret =3D kvm_irqchip_add_msi_route(&c, vector, &n->parent_obj); + if (ret < 0) { + return ret; + } + + kvm_irqchip_commit_route_changes(&c); + + cq->virq =3D ret; + + return 0; +} + +static int nvme_kvm_vector_unmask(PCIDevice *pci_dev, unsigned vector, + MSIMessage msg) +{ + NvmeCtrl *n =3D NVME(pci_dev); + int ret; + + trace_pci_nvme_irq_unmask(vector, msg.address, msg.data); + + for (uint32_t i =3D 1; i <=3D n->params.max_ioqpairs; i++) { + NvmeCQueue *cq =3D n->cq[i]; + + if (!cq) { + continue; + } + + if (cq->vector =3D=3D vector) { + if (cq->msg.data !=3D msg.data || cq->msg.address !=3D msg.add= ress) { + ret =3D kvm_irqchip_update_msi_route(kvm_state, cq->virq, = msg, + pci_dev); + if (ret < 0) { + return ret; + } + + kvm_irqchip_commit_routes(kvm_state); + + cq->msg =3D msg; + } + + ret =3D kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + NULL, cq->virq); + if (ret < 0) { + return ret; + } + } + } + + return 0; +} + +static void nvme_kvm_vector_mask(PCIDevice *pci_dev, unsigned vector) +{ + NvmeCtrl *n =3D NVME(pci_dev); + + trace_pci_nvme_irq_mask(vector); + + for (uint32_t i =3D 1; i <=3D n->params.max_ioqpairs; i++) { + NvmeCQueue *cq =3D n->cq[i]; + + if (!cq) { + continue; + } + + if (cq->vector =3D=3D vector) { + kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + cq->virq); + } + } +} + +static void nvme_kvm_vector_poll(PCIDevice *pci_dev, unsigned int vector_s= tart, + unsigned int vector_end) +{ + NvmeCtrl *n =3D NVME(pci_dev); + + trace_pci_nvme_irq_poll(vector_start, vector_end); + + for (uint32_t i =3D 1; i <=3D n->params.max_ioqpairs; i++) { + NvmeCQueue *cq =3D n->cq[i]; + + if (!cq) { + continue; + } + + if (!msix_is_masked(pci_dev, cq->vector)) { + continue; + } + + if (cq->vector >=3D vector_start && cq->vector <=3D vector_end) { + if (event_notifier_test_and_clear(&cq->assert_notifier)) { + msix_set_pending(pci_dev, i); + } + } + } +} + + static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq) { + bool with_irqfd =3D msix_enabled(&n->parent_obj) && + kvm_msi_via_irqfd_enabled(); int ret; =20 ret =3D event_notifier_init(&cq->assert_notifier, 0); @@ -1386,12 +1494,27 @@ static void nvme_init_irq_notifier(NvmeCtrl *n, Nvm= eCQueue *cq) return; } =20 - event_notifier_set_handler(&cq->assert_notifier, - nvme_assert_notifier_read); + if (with_irqfd) { + ret =3D nvme_kvm_vector_use(n, cq, cq->vector); + if (ret < 0) { + event_notifier_cleanup(&cq->assert_notifier); + + return; + } + } else { + event_notifier_set_handler(&cq->assert_notifier, + nvme_assert_notifier_read); + } =20 if (!msix_enabled(&n->parent_obj)) { ret =3D event_notifier_init(&cq->deassert_notifier, 0); if (ret < 0) { + if (with_irqfd) { + kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + cq->virq); + } + event_notifier_set_handler(&cq->assert_notifier, NULL); event_notifier_cleanup(&cq->assert_notifier); =20 @@ -4764,6 +4887,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest= *req) =20 static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) { + bool with_irqfd =3D msix_enabled(&n->parent_obj) && + kvm_msi_via_irqfd_enabled(); uint16_t offset =3D (cq->cqid << 3) + (1 << 2); =20 n->cq[cq->cqid] =3D NULL; @@ -4775,6 +4900,12 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) event_notifier_cleanup(&cq->notifier); } if (cq->assert_notifier.initialized) { + if (with_irqfd) { + kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + cq->virq); + kvm_irqchip_release_virq(kvm_state, cq->virq); + } event_notifier_set_handler(&cq->assert_notifier, NULL); event_notifier_cleanup(&cq->assert_notifier); } @@ -6528,6 +6659,9 @@ static int nvme_start_ctrl(NvmeCtrl *n) uint32_t page_size =3D 1 << page_bits; NvmeSecCtrlEntry *sctrl =3D nvme_sctrl(n); =20 + bool with_irqfd =3D msix_enabled(&n->parent_obj) && + kvm_msi_via_irqfd_enabled(); + if (pci_is_vf(&n->parent_obj) && !sctrl->scs) { trace_pci_nvme_err_startfail_virt_state(le16_to_cpu(sctrl->nvi), le16_to_cpu(sctrl->nvq), @@ -6617,6 +6751,12 @@ static int nvme_start_ctrl(NvmeCtrl *n) =20 nvme_select_iocs(n); =20 + if (n->params.irq_eventfd && with_irqfd) { + return msix_set_vector_notifiers(PCI_DEVICE(n), nvme_kvm_vector_un= mask, + nvme_kvm_vector_mask, + nvme_kvm_vector_poll); + } + return 0; } =20 @@ -7734,6 +7874,7 @@ static void nvme_exit(PCIDevice *pci_dev) pcie_sriov_pf_exit(pci_dev); } =20 + msix_unset_vector_notifiers(pci_dev); msix_uninit(pci_dev, &n->bar0, &n->bar0); memory_region_del_subregion(&n->bar0, &n->iomem); } diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 4850d3e965..b0b986b024 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -20,6 +20,7 @@ =20 #include "qemu/uuid.h" #include "hw/pci/pci.h" +#include "hw/pci/msi.h" #include "hw/block/block.h" =20 #include "block/nvme.h" @@ -396,10 +397,12 @@ typedef struct NvmeCQueue { uint64_t dma_addr; uint64_t db_addr; uint64_t ei_addr; + int virq; QEMUTimer *timer; EventNotifier notifier; EventNotifier assert_notifier; EventNotifier deassert_notifier; + MSIMessage msg; bool ioeventfd_enabled; QTAILQ_HEAD(, NvmeSQueue) sq_list; QTAILQ_HEAD(, NvmeRequest) req_list; diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index fccb79f489..b11fcf4a65 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -2,6 +2,9 @@ pci_nvme_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u" pci_nvme_irq_pin(void) "pulsing IRQ pin" pci_nvme_irq_masked(void) "IRQ is masked" +pci_nvme_irq_mask(uint32_t vector) "IRQ %u gets masked" +pci_nvme_irq_unmask(uint32_t vector, uint64_t addr, uint32_t data) "IRQ %u= gets unmasked, addr=3D0x%"PRIx64" data=3D0x%"PRIu32"" +pci_nvme_irq_poll(uint32_t vector_start, uint32_t vector_end) "IRQ poll, s= tart=3D0x%"PRIu32" end=3D0x%"PRIu32"" pci_nvme_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=3D0x%"PRIx= 64" prp2=3D0x%"PRIx64"" pci_nvme_dbbuf_config(uint64_t dbs_addr, uint64_t eis_addr) "dbs_addr=3D0x= %"PRIx64" eis_addr=3D0x%"PRIx64"" pci_nvme_map_addr(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRI= u64"" --=20 2.25.1 From nobody Tue Feb 10 00:53:24 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1661591854160501.01993966802706; Sat, 27 Aug 2022 02:17:34 -0700 (PDT) Received: from localhost ([::1]:43762 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRrwY-0000x4-Vq for importer@patchew.org; Sat, 27 Aug 2022 05:17:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53238) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRrsQ-0006bh-Hz; Sat, 27 Aug 2022 05:13:14 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21]:52570 helo=cstnet.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRrsM-0007t8-Mf; Sat, 27 Aug 2022 05:13:14 -0400 Received: from localhost.localdomain (unknown [159.226.43.62]) by APP-01 (Coremail) with SMTP id qwCowAAHR2Eb4AljuYWkAA--.20130S5; Sat, 27 Aug 2022 17:13:06 +0800 (CST) From: Jinhao Fan To: qemu-devel@nongnu.org Cc: its@irrelevant.dk, kbusch@kernel.org, stefanha@gmail.com, Jinhao Fan , qemu-block@nongnu.org (open list:nvme) Subject: [PATCH v3 3/4] hw/nvme: add iothread support Date: Sat, 27 Aug 2022 17:12:57 +0800 Message-Id: <20220827091258.3589230-4-fanjinhao21s@ict.ac.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220827091258.3589230-1-fanjinhao21s@ict.ac.cn> References: <20220827091258.3589230-1-fanjinhao21s@ict.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAAHR2Eb4AljuYWkAA--.20130S5 X-Coremail-Antispam: 1UD129KBjvJXoW3Jr4DZFyUCFWDCry3XF4kXrb_yoW3CFWUpF WkWrZ3uan7JF17Zan0van7Aw1ruw48W3WDG34Sywn3Jwn7Gry3AFy0kF129FWrJrZ5XFZ8 Z3y8JF47u348t3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPG14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84 ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_Cr1U M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx v20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1l F7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0Ew4C26cxK6c8Ij2 8IcwCY02Avz4vE14v_GFWl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1l x2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14 v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IY x2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87 Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIF yTuYvjTRC7KsUUUUU X-Originating-IP: [159.226.43.62] X-CM-SenderInfo: xidqyxpqkd0j0rv6xunwoduhdfq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=fanjinhao21s@ict.ac.cn; helo=cstnet.cn X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZM-MESSAGEID: 1661591857325100001 Content-Type: text/plain; charset="utf-8" Add an option "iothread=3Dx" to do emulation in a seperate iothread. This improves the performance because QEMU's main loop is responsible for a lot of other work while iothread is dedicated to NVMe emulation. Moreover, emulating in iothread brings the potential of polling on SQ/CQ doorbells, which I will bring up in a following patch. Iothread can be enabled by: -object iothread,id=3Dnvme0 \ -device nvme,iothread=3Dnvme0 \ Performance comparisons (KIOPS): QD 1 4 16 64 QEMU 41 136 242 338 iothread 53 155 245 309 Signed-off-by: Jinhao Fan --- hw/nvme/ctrl.c | 67 ++++++++++++++++++++++++++++++++++++++++++++------ hw/nvme/ns.c | 21 +++++++++++++--- hw/nvme/nvme.h | 6 ++++- 3 files changed, 82 insertions(+), 12 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index e11328967f..869565d77b 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -4458,7 +4458,13 @@ static int nvme_init_cq_ioeventfd(NvmeCQueue *cq) return ret; } =20 - event_notifier_set_handler(&cq->notifier, nvme_cq_notifier); + if (cq->cqid) { + aio_set_event_notifier(n->ctx, &cq->notifier, true, nvme_cq_notifi= er, + NULL, NULL); + } else { + event_notifier_set_handler(&cq->notifier, nvme_cq_notifier); + } + memory_region_add_eventfd(&n->iomem, 0x1000 + offset, 4, false, 0, &cq->notifier); =20 @@ -4487,7 +4493,13 @@ static int nvme_init_sq_ioeventfd(NvmeSQueue *sq) return ret; } =20 - event_notifier_set_handler(&sq->notifier, nvme_sq_notifier); + if (sq->sqid) { + aio_set_event_notifier(n->ctx, &sq->notifier, true, nvme_sq_notifi= er, + NULL, NULL); + } else { + event_notifier_set_handler(&sq->notifier, nvme_sq_notifier); + } + memory_region_add_eventfd(&n->iomem, 0x1000 + offset, 4, false, 0, &sq->notifier); =20 @@ -4503,7 +4515,12 @@ static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n) if (sq->ioeventfd_enabled) { memory_region_del_eventfd(&n->iomem, 0x1000 + offset, 4, false, 0, &sq->notif= ier); - event_notifier_set_handler(&sq->notifier, NULL); + if (sq->sqid) { + aio_set_event_notifier(n->ctx, &sq->notifier, true, NULL, NULL, + NULL); + } else { + event_notifier_set_handler(&sq->notifier, NULL); + } event_notifier_cleanup(&sq->notifier); } g_free(sq->io_req); @@ -4573,7 +4590,13 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n= , uint64_t dma_addr, sq->io_req[i].sq =3D sq; QTAILQ_INSERT_TAIL(&(sq->req_list), &sq->io_req[i], entry); } - sq->timer =3D timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq); + + if (sq->sqid) { + sq->timer =3D aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS, + nvme_process_sq, sq); + } else { + sq->timer =3D timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq= ); + } =20 if (n->dbbuf_enabled) { sq->db_addr =3D n->dbbuf_dbs + (sqid << 3); @@ -4896,7 +4919,12 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) if (cq->ioeventfd_enabled) { memory_region_del_eventfd(&n->iomem, 0x1000 + offset, 4, false, 0, &cq->notif= ier); - event_notifier_set_handler(&cq->notifier, NULL); + if (cq->cqid) { + aio_set_event_notifier(n->ctx, &cq->notifier, true, NULL, NULL, + NULL); + } else { + event_notifier_set_handler(&cq->notifier, NULL); + } event_notifier_cleanup(&cq->notifier); } if (cq->assert_notifier.initialized) { @@ -4979,7 +5007,13 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n= , uint64_t dma_addr, } } n->cq[cqid] =3D cq; - cq->timer =3D timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq); + + if (cq->cqid) { + cq->timer =3D aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS, + nvme_post_cqes, cq); + } else { + cq->timer =3D timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq); + } =20 /* * Only enable irq eventfd for IO queues since we always emulate admin @@ -7759,6 +7793,14 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *p= ci_dev) if (pci_is_vf(&n->parent_obj) && !sctrl->scs) { stl_le_p(&n->bar.csts, NVME_CSTS_FAILED); } + + if (n->params.iothread) { + n->iothread =3D n->params.iothread; + object_ref(OBJECT(n->iothread)); + n->ctx =3D iothread_get_aio_context(n->iothread); + } else { + n->ctx =3D qemu_get_aio_context(); + } } =20 static int nvme_init_subsys(NvmeCtrl *n, Error **errp) @@ -7831,7 +7873,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **= errp) ns =3D &n->namespace; ns->params.nsid =3D 1; =20 - if (nvme_ns_setup(ns, errp)) { + if (nvme_ns_setup(ns, n->ctx, errp)) { return; } =20 @@ -7862,6 +7904,15 @@ static void nvme_exit(PCIDevice *pci_dev) g_free(n->sq); g_free(n->aer_reqs); =20 + aio_context_acquire(n->ctx); + blk_set_aio_context(n->namespace.blkconf.blk, qemu_get_aio_context(), = NULL); + aio_context_release(n->ctx); + + if (n->iothread) { + object_unref(OBJECT(n->iothread)); + n->iothread =3D NULL; + } + if (n->params.cmb_size_mb) { g_free(n->cmb.buf); } @@ -7885,6 +7936,8 @@ static Property nvme_props[] =3D { HostMemoryBackend *), DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS, NvmeSubsystem *), + DEFINE_PROP_LINK("iothread", NvmeCtrl, params.iothread, TYPE_IOTHREAD, + IOThread *), DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial), DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0), DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0), diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 62a1f97be0..eb9141a67b 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -146,9 +146,11 @@ lbaf_found: return 0; } =20 -static int nvme_ns_init_blk(NvmeNamespace *ns, Error **errp) +static int nvme_ns_init_blk(NvmeNamespace *ns, AioContext *ctx, Error **er= rp) { bool read_only; + AioContext *old_context; + int ret; =20 if (!blkconf_blocksizes(&ns->blkconf, errp)) { return -1; @@ -170,6 +172,17 @@ static int nvme_ns_init_blk(NvmeNamespace *ns, Error *= *errp) return -1; } =20 + old_context =3D blk_get_aio_context(ns->blkconf.blk); + aio_context_acquire(old_context); + ret =3D blk_set_aio_context(ns->blkconf.blk, ctx, errp); + aio_context_release(old_context); + + if (ret) { + error_setg(errp, "Set AioContext on BlockBackend failed"); + return ret; + } + + return 0; } =20 @@ -482,13 +495,13 @@ static int nvme_ns_check_constraints(NvmeNamespace *n= s, Error **errp) return 0; } =20 -int nvme_ns_setup(NvmeNamespace *ns, Error **errp) +int nvme_ns_setup(NvmeNamespace *ns, AioContext *ctx, Error **errp) { if (nvme_ns_check_constraints(ns, errp)) { return -1; } =20 - if (nvme_ns_init_blk(ns, errp)) { + if (nvme_ns_init_blk(ns, ctx, errp)) { return -1; } =20 @@ -563,7 +576,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **e= rrp) } } =20 - if (nvme_ns_setup(ns, errp)) { + if (nvme_ns_setup(ns, n->ctx, errp)) { return; } =20 diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index b0b986b024..224b73e6c4 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -22,6 +22,7 @@ #include "hw/pci/pci.h" #include "hw/pci/msi.h" #include "hw/block/block.h" +#include "sysemu/iothread.h" =20 #include "block/nvme.h" =20 @@ -276,7 +277,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *n= s) } =20 void nvme_ns_init_format(NvmeNamespace *ns); -int nvme_ns_setup(NvmeNamespace *ns, Error **errp); +int nvme_ns_setup(NvmeNamespace *ns, AioContext *ctx, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_shutdown(NvmeNamespace *ns); void nvme_ns_cleanup(NvmeNamespace *ns); @@ -433,6 +434,7 @@ typedef struct NvmeParams { uint16_t sriov_vi_flexible; uint8_t sriov_max_vq_per_vf; uint8_t sriov_max_vi_per_vf; + IOThread *iothread; } NvmeParams; =20 typedef struct NvmeCtrl { @@ -464,6 +466,8 @@ typedef struct NvmeCtrl { uint64_t dbbuf_dbs; uint64_t dbbuf_eis; bool dbbuf_enabled; + IOThread *iothread; + AioContext *ctx; =20 struct { MemoryRegion mem; --=20 2.25.1 From nobody Tue Feb 10 00:53:24 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1661592211116211.9568694175697; Sat, 27 Aug 2022 02:23:31 -0700 (PDT) Received: from localhost ([::1]:46742 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRs2M-00071K-0s for importer@patchew.org; Sat, 27 Aug 2022 05:23:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36500) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRrsY-0006hU-IN; Sat, 27 Aug 2022 05:13:23 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21]:52608 helo=cstnet.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRrsN-0007tH-2i; Sat, 27 Aug 2022 05:13:22 -0400 Received: from localhost.localdomain (unknown [159.226.43.62]) by APP-01 (Coremail) with SMTP id qwCowAAHR2Eb4AljuYWkAA--.20130S6; Sat, 27 Aug 2022 17:13:07 +0800 (CST) From: Jinhao Fan To: qemu-devel@nongnu.org Cc: its@irrelevant.dk, kbusch@kernel.org, stefanha@gmail.com, Jinhao Fan , qemu-block@nongnu.org (open list:nvme) Subject: [PATCH v3 4/4] hw/nvme: add polling support Date: Sat, 27 Aug 2022 17:12:58 +0800 Message-Id: <20220827091258.3589230-5-fanjinhao21s@ict.ac.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220827091258.3589230-1-fanjinhao21s@ict.ac.cn> References: <20220827091258.3589230-1-fanjinhao21s@ict.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAAHR2Eb4AljuYWkAA--.20130S6 X-Coremail-Antispam: 1UD129KBjvJXoWxXr17ur1kJF45Jr4Uuw45KFg_yoWruFW8pF Z5WrZ5uan7Cay7Xa1YqF17Ar1fZ3ykX34jkFs7JrZ7JFn7KryfAFWUta45JFyrWr95Xw15 Aa1qqF17Z3yxX3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPE14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr 1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_ Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6x IIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_ Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0Ew4C26cxK6c 8Ij28IcwCY02Avz4vE14v_GFWl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_ Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17 CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0 I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I 8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73 UjIFyTuYvjfUUVyIUUUUU X-Originating-IP: [159.226.43.62] X-CM-SenderInfo: xidqyxpqkd0j0rv6xunwoduhdfq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=fanjinhao21s@ict.ac.cn; helo=cstnet.cn X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZM-MESSAGEID: 1661592211858100001 Content-Type: text/plain; charset="utf-8" Add AioContext polling handlers for NVMe SQ and CQ. By employing polling, the latency of NVMe IO emulation is greatly reduced. The SQ polling handler checks for updates on the SQ tail shadow doorbell buffer. The CQ polling handler is an empty function because we procatively polls the CQ head shadow doorbell buffer when we want to post a cqe. Updates on the SQ eventidx buffer is stopped during polling to avoid the host doing unnecessary doorbell buffer writes. Comparison (KIOPS): QD 1 4 16 64 QEMU 53 155 245 309 polling 123 165 189 191 Signed-off-by: Jinhao Fan --- hw/nvme/ctrl.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++---- hw/nvme/nvme.h | 1 + 2 files changed, 70 insertions(+), 5 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 869565d77b..a7f8a4220e 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -298,6 +298,8 @@ static const uint32_t nvme_cse_iocs_zoned[256] =3D { =20 static void nvme_process_sq(void *opaque); static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst); +static void nvme_update_sq_eventidx(const NvmeSQueue *sq); +static void nvme_update_sq_tail(NvmeSQueue *sq); =20 static uint16_t nvme_sqid(NvmeRequest *req) { @@ -4447,6 +4449,21 @@ static void nvme_cq_notifier(EventNotifier *e) nvme_post_cqes(cq); } =20 +static bool nvme_cq_notifier_aio_poll(void *opaque) +{ + /* + * We already "poll" the CQ tail shadow doorbell value in nvme_post_cq= es(), + * so we do not need to check the value here. However, QEMU's AioConte= xt + * polling requires us to provide io_poll and io_poll_ready handlers, = so + * use dummy functions for CQ. + */ + return false; +} + +static void nvme_cq_notifier_aio_poll_ready(EventNotifier *n) +{ +} + static int nvme_init_cq_ioeventfd(NvmeCQueue *cq) { NvmeCtrl *n =3D cq->ctrl; @@ -4459,8 +4476,10 @@ static int nvme_init_cq_ioeventfd(NvmeCQueue *cq) } =20 if (cq->cqid) { - aio_set_event_notifier(n->ctx, &cq->notifier, true, nvme_cq_notifi= er, - NULL, NULL); + aio_set_event_notifier(n->ctx, &cq->notifier, true, + nvme_cq_notifier, + nvme_cq_notifier_aio_poll, + nvme_cq_notifier_aio_poll_ready); } else { event_notifier_set_handler(&cq->notifier, nvme_cq_notifier); } @@ -4482,6 +4501,44 @@ static void nvme_sq_notifier(EventNotifier *e) nvme_process_sq(sq); } =20 +static void nvme_sq_notifier_aio_poll_begin(EventNotifier *n) +{ + NvmeSQueue *sq =3D container_of(n, NvmeSQueue, notifier); + + nvme_update_sq_eventidx(sq); + + /* Stop host doorbell writes by stop updating eventidx */ + sq->suppress_db =3D true; +} + +static bool nvme_sq_notifier_aio_poll(void *opaque) +{ + EventNotifier *n =3D opaque; + NvmeSQueue *sq =3D container_of(n, NvmeSQueue, notifier); + uint32_t old_tail =3D sq->tail; + + nvme_update_sq_tail(sq); + + return sq->tail !=3D old_tail; +} + +static void nvme_sq_notifier_aio_poll_ready(EventNotifier *n) +{ + NvmeSQueue *sq =3D container_of(n, NvmeSQueue, notifier); + + nvme_process_sq(sq); +} + +static void nvme_sq_notifier_aio_poll_end(EventNotifier *n) +{ + NvmeSQueue *sq =3D container_of(n, NvmeSQueue, notifier); + + nvme_update_sq_eventidx(sq); + + /* Resume host doorbell writes */ + sq->suppress_db =3D false; +} + static int nvme_init_sq_ioeventfd(NvmeSQueue *sq) { NvmeCtrl *n =3D sq->ctrl; @@ -4494,8 +4551,13 @@ static int nvme_init_sq_ioeventfd(NvmeSQueue *sq) } =20 if (sq->sqid) { - aio_set_event_notifier(n->ctx, &sq->notifier, true, nvme_sq_notifi= er, - NULL, NULL); + aio_set_event_notifier(n->ctx, &sq->notifier, true, + nvme_sq_notifier, + nvme_sq_notifier_aio_poll, + nvme_sq_notifier_aio_poll_ready); + aio_set_event_notifier_poll(n->ctx, &sq->notifier, + nvme_sq_notifier_aio_poll_begin, + nvme_sq_notifier_aio_poll_end); } else { event_notifier_set_handler(&sq->notifier, nvme_sq_notifier); } @@ -6530,7 +6592,9 @@ static void nvme_process_sq(void *opaque) } =20 if (n->dbbuf_enabled) { - nvme_update_sq_eventidx(sq); + if (!sq->suppress_db) { + nvme_update_sq_eventidx(sq); + } nvme_update_sq_tail(sq); } } diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 224b73e6c4..bd486a8e15 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -380,6 +380,7 @@ typedef struct NvmeSQueue { QEMUTimer *timer; EventNotifier notifier; bool ioeventfd_enabled; + bool suppress_db; NvmeRequest *io_req; QTAILQ_HEAD(, NvmeRequest) req_list; QTAILQ_HEAD(, NvmeRequest) out_req_list; --=20 2.25.1