From nobody Tue Feb 10 01:56:53 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 166151429958277.5216969636142; Fri, 26 Aug 2022 04:44:59 -0700 (PDT) Received: from localhost ([::1]:33232 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRXlf-0000sR-Sq for importer@patchew.org; Fri, 26 Aug 2022 07:44:56 -0400 Received: from eggs.gnu.org ([209.51.188.92]:56572) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRXMX-00013v-Ba; Fri, 26 Aug 2022 07:18:57 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21]:47254 helo=cstnet.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRXMT-0006Qi-CQ; Fri, 26 Aug 2022 07:18:56 -0400 Received: from localhost.localdomain (unknown [159.226.43.62]) by APP-01 (Coremail) with SMTP id qwCowADX3yMNrAhjjZVhAA--.2259S3; Fri, 26 Aug 2022 19:18:44 +0800 (CST) From: Jinhao Fan To: qemu-devel@nongnu.org Cc: its@irrelevant.dk, kbusch@kernel.org, stefanha@gmail.com, Jinhao Fan , Klaus Jensen , qemu-block@nongnu.org (open list:nvme) Subject: [PATCH 1/3] hw/nvme: support irq(de)assertion with eventfd Date: Fri, 26 Aug 2022 19:18:32 +0800 Message-Id: <20220826111834.3014912-2-fanjinhao21s@ict.ac.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220826111834.3014912-1-fanjinhao21s@ict.ac.cn> References: <20220826111834.3014912-1-fanjinhao21s@ict.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowADX3yMNrAhjjZVhAA--.2259S3 X-Coremail-Antispam: 1UD129KBjvJXoW3Gr4fAF13Cry7Kw18tF1kZrb_yoWxZFWkpa 4kWrZa9Fs7Gr18Wa1YqanrJr1ru3yrJryDArsxt34xJrn3Cry3AFWUGF1UtFy5XrZ5Xry5 Z3yYqF47u348JaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBG14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r1Y6r1xM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84 ACjcxK6I8E87Iv67AKxVWxJr0_GcWl84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r1j6r18McIj6I8E87Iv67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xv r2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0Ew4C26cxK6c8Ij28Icw CF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j 6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64 vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_ Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1I6r4UMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0x vEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JU0YLgUUUUU= X-Originating-IP: [159.226.43.62] X-CM-SenderInfo: xidqyxpqkd0j0rv6xunwoduhdfq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=fanjinhao21s@ict.ac.cn; helo=cstnet.cn X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZM-MESSAGEID: 1661514302289100001 Content-Type: text/plain; charset="utf-8" When the new option 'irq-eventfd' is turned on, the IO emulation code signals an eventfd when it want to (de)assert an irq. The main loop eventfd handler does the actual irq (de)assertion. This paves the way for iothread support since QEMU's interrupt emulation is not thread safe. Asserting and deasseting irq with eventfd has some performance implications. For small queue depth it increases request latency but for large queue depth it effectively coalesces irqs. Comparision (KIOPS): QD 1 4 16 64 QEMU 38 123 210 329 irq-eventfd 32 106 240 364 Signed-off-by: Jinhao Fan Signed-off-by: Klaus Jensen --- hw/nvme/ctrl.c | 120 ++++++++++++++++++++++++++++++++++++++++++------- hw/nvme/nvme.h | 3 ++ 2 files changed, 106 insertions(+), 17 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 87aeba0564..51792f3955 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -526,34 +526,57 @@ static void nvme_irq_check(NvmeCtrl *n) } } =20 +static void nvme_irq_do_assert(NvmeCtrl *n, NvmeCQueue *cq) +{ + if (msix_enabled(&(n->parent_obj))) { + trace_pci_nvme_irq_msix(cq->vector); + msix_notify(&(n->parent_obj), cq->vector); + } else { + trace_pci_nvme_irq_pin(); + assert(cq->vector < 32); + n->irq_status |=3D 1 << cq->vector; + nvme_irq_check(n); + } +} + static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq) { if (cq->irq_enabled) { - if (msix_enabled(&(n->parent_obj))) { - trace_pci_nvme_irq_msix(cq->vector); - msix_notify(&(n->parent_obj), cq->vector); + if (cq->assert_notifier.initialized) { + event_notifier_set(&cq->assert_notifier); } else { - trace_pci_nvme_irq_pin(); - assert(cq->vector < 32); - n->irq_status |=3D 1 << cq->vector; - nvme_irq_check(n); + nvme_irq_do_assert(n, cq); } } else { trace_pci_nvme_irq_masked(); } } =20 +static void nvme_irq_do_deassert(NvmeCtrl *n, NvmeCQueue *cq) +{ + if (msix_enabled(&(n->parent_obj))) { + return; + } else { + assert(cq->vector < 32); + if (!n->cq_pending) { + n->irq_status &=3D ~(1 << cq->vector); + } + nvme_irq_check(n); + } +} + static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq) { if (cq->irq_enabled) { - if (msix_enabled(&(n->parent_obj))) { - return; + if (cq->deassert_notifier.initialized) { + /* + * The deassert notifier will only be initilized when MSI-X is= NOT + * in use. Therefore no need to worry about extra eventfd sysc= all + * for pin-based interrupts. + */ + event_notifier_set(&cq->deassert_notifier); } else { - assert(cq->vector < 32); - if (!n->cq_pending) { - n->irq_status &=3D ~(1 << cq->vector); - } - nvme_irq_check(n); + nvme_irq_do_deassert(n, cq); } } } @@ -1338,6 +1361,50 @@ static void nvme_update_cq_head(NvmeCQueue *cq) trace_pci_nvme_shadow_doorbell_cq(cq->cqid, cq->head); } =20 +static void nvme_assert_notifier_read(EventNotifier *e) +{ + NvmeCQueue *cq =3D container_of(e, NvmeCQueue, assert_notifier); + if (event_notifier_test_and_clear(e)) { + nvme_irq_do_assert(cq->ctrl, cq); + } +} + +static void nvme_deassert_notifier_read(EventNotifier *e) +{ + NvmeCQueue *cq =3D container_of(e, NvmeCQueue, deassert_notifier); + if (event_notifier_test_and_clear(e)) { + nvme_irq_do_deassert(cq->ctrl, cq); + } +} + +static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq) +{ + int ret; + + ret =3D event_notifier_init(&cq->assert_notifier, 0); + if (ret < 0) { + return; + } + + event_notifier_set_handler(&cq->assert_notifier, + nvme_assert_notifier_read); + + if (!msix_enabled(&n->parent_obj)) { + ret =3D event_notifier_init(&cq->deassert_notifier, 0); + if (ret < 0) { + event_notifier_set_handler(&cq->assert_notifier, NULL); + event_notifier_cleanup(&cq->assert_notifier); + + return; + } + + event_notifier_set_handler(&cq->deassert_notifier, + nvme_deassert_notifier_read); + } + + return; +} + static void nvme_post_cqes(void *opaque) { NvmeCQueue *cq =3D opaque; @@ -1377,8 +1444,10 @@ static void nvme_post_cqes(void *opaque) QTAILQ_INSERT_TAIL(&sq->req_list, req, entry); } if (cq->tail !=3D cq->head) { - if (cq->irq_enabled && !pending) { - n->cq_pending++; + if (cq->irq_enabled) { + if (!pending) { + n->cq_pending++; + } } =20 nvme_irq_assert(n, cq); @@ -4705,6 +4774,14 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) event_notifier_set_handler(&cq->notifier, NULL); event_notifier_cleanup(&cq->notifier); } + if (cq->assert_notifier.initialized) { + event_notifier_set_handler(&cq->assert_notifier, NULL); + event_notifier_cleanup(&cq->assert_notifier); + } + if (cq->deassert_notifier.initialized) { + event_notifier_set_handler(&cq->deassert_notifier, NULL); + event_notifier_cleanup(&cq->deassert_notifier); + } if (msix_enabled(&n->parent_obj)) { msix_vector_unuse(&n->parent_obj, cq->vector); } @@ -4734,7 +4811,7 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest = *req) n->cq_pending--; } =20 - nvme_irq_deassert(n, cq); + nvme_irq_do_deassert(n, cq); trace_pci_nvme_del_cq(qid); nvme_free_cq(cq, n); return NVME_SUCCESS; @@ -4772,6 +4849,14 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n= , uint64_t dma_addr, } n->cq[cqid] =3D cq; cq->timer =3D timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq); + + /* + * Only enable irq eventfd for IO queues since we always emulate admin + * queue in main loop thread. + */ + if (cqid && n->params.irq_eventfd) { + nvme_init_irq_notifier(n, cq); + } } =20 static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req) @@ -7671,6 +7756,7 @@ static Property nvme_props[] =3D { DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false), DEFINE_PROP_BOOL("legacy-cmb", NvmeCtrl, params.legacy_cmb, false), DEFINE_PROP_BOOL("ioeventfd", NvmeCtrl, params.ioeventfd, false), + DEFINE_PROP_BOOL("irq-eventfd", NvmeCtrl, params.irq_eventfd, false), DEFINE_PROP_UINT8("zoned.zasl", NvmeCtrl, params.zasl, 0), DEFINE_PROP_BOOL("zoned.auto_transition", NvmeCtrl, params.auto_transition_zones, true), diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 79f5c281c2..4850d3e965 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -398,6 +398,8 @@ typedef struct NvmeCQueue { uint64_t ei_addr; QEMUTimer *timer; EventNotifier notifier; + EventNotifier assert_notifier; + EventNotifier deassert_notifier; bool ioeventfd_enabled; QTAILQ_HEAD(, NvmeSQueue) sq_list; QTAILQ_HEAD(, NvmeRequest) req_list; @@ -422,6 +424,7 @@ typedef struct NvmeParams { bool auto_transition_zones; bool legacy_cmb; bool ioeventfd; + bool irq_eventfd; uint8_t sriov_max_vfs; uint16_t sriov_vq_flexible; uint16_t sriov_vi_flexible; --=20 2.25.1 From nobody Tue Feb 10 01:56:53 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 166151465993256.60303909466177; Fri, 26 Aug 2022 04:50:59 -0700 (PDT) Received: from localhost ([::1]:34752 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRXrW-0005fv-G9 for importer@patchew.org; Fri, 26 Aug 2022 07:50:58 -0400 Received: from eggs.gnu.org ([209.51.188.92]:59922) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRXMk-0001Dj-CI; Fri, 26 Aug 2022 07:19:11 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21]:47262 helo=cstnet.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRXMS-0006Qp-Rv; Fri, 26 Aug 2022 07:19:04 -0400 Received: from localhost.localdomain (unknown [159.226.43.62]) by APP-01 (Coremail) with SMTP id qwCowADX3yMNrAhjjZVhAA--.2259S4; Fri, 26 Aug 2022 19:18:45 +0800 (CST) From: Jinhao Fan To: qemu-devel@nongnu.org Cc: its@irrelevant.dk, kbusch@kernel.org, stefanha@gmail.com, Jinhao Fan , Klaus Jensen , qemu-block@nongnu.org (open list:nvme) Subject: [PATCH 2/3] hw/nvme: use KVM irqfd when available Date: Fri, 26 Aug 2022 19:18:33 +0800 Message-Id: <20220826111834.3014912-3-fanjinhao21s@ict.ac.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220826111834.3014912-1-fanjinhao21s@ict.ac.cn> References: <20220826111834.3014912-1-fanjinhao21s@ict.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowADX3yMNrAhjjZVhAA--.2259S4 X-Coremail-Antispam: 1UD129KBjvJXoW3JF18trW8XFWUCr18Cw1xKrg_yoW3GFy3pa 4kGFZ3uFs7JFyxWan0vrsrJrn5u39YqryUJw43K347CF10kr9xAFW8GF1UAF1rGrZ8XF98 Z398tr4Uu34fXaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBG14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84 ACjcxK6I8E87Iv67AKxVWxJr0_GcWl84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r1j6r18McIj6I8E87Iv67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xv r2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0Ew4C26cxK6c8Ij28Icw CF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j 6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64 vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_ Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0x vEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRxR6xUUUUU= X-Originating-IP: [159.226.43.62] X-CM-SenderInfo: xidqyxpqkd0j0rv6xunwoduhdfq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=fanjinhao21s@ict.ac.cn; helo=cstnet.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZM-MESSAGEID: 1661514661897100001 Content-Type: text/plain; charset="utf-8" Use KVM's irqfd to send interrupts when possible. This approach is thread safe. Moreover, it does not have the inter-thread communication overhead of plain event notifiers since handler callback are called in the same system call as irqfd write. Signed-off-by: Jinhao Fan Signed-off-by: Klaus Jensen --- hw/nvme/ctrl.c | 145 ++++++++++++++++++++++++++++++++++++++++++- hw/nvme/nvme.h | 3 + hw/nvme/trace-events | 3 + 3 files changed, 149 insertions(+), 2 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 51792f3955..396f3f0cdd 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -192,6 +192,7 @@ #include "qapi/error.h" #include "qapi/visitor.h" #include "sysemu/sysemu.h" +#include "sysemu/kvm.h" #include "sysemu/block-backend.h" #include "sysemu/hostmem.h" #include "hw/pci/msix.h" @@ -1377,8 +1378,115 @@ static void nvme_deassert_notifier_read(EventNotifi= er *e) } } =20 +static int nvme_kvm_vector_use(NvmeCtrl *n, NvmeCQueue *cq, uint32_t vecto= r) +{ + KVMRouteChange c =3D kvm_irqchip_begin_route_changes(kvm_state); + int ret; + + ret =3D kvm_irqchip_add_msi_route(&c, vector, &n->parent_obj); + if (ret < 0) { + return ret; + } + + kvm_irqchip_commit_route_changes(&c); + + cq->virq =3D ret; + + return 0; +} + +static int nvme_kvm_vector_unmask(PCIDevice *pci_dev, unsigned vector, + MSIMessage msg) +{ + NvmeCtrl *n =3D NVME(pci_dev); + int ret; + + trace_pci_nvme_irq_unmask(vector, msg.address, msg.data); + + for (uint32_t i =3D 1; i <=3D n->params.max_ioqpairs; i++) { + NvmeCQueue *cq =3D n->cq[i]; + + if (!cq) { + continue; + } + + if (cq->vector =3D=3D vector) { + if (cq->msg.data !=3D msg.data || cq->msg.address !=3D msg.add= ress) { + ret =3D kvm_irqchip_update_msi_route(kvm_state, cq->virq, = msg, + pci_dev); + if (ret < 0) { + return ret; + } + + kvm_irqchip_commit_routes(kvm_state); + + cq->msg =3D msg; + } + + ret =3D kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + NULL, cq->virq); + if (ret < 0) { + return ret; + } + } + } + + return 0; +} + +static void nvme_kvm_vector_mask(PCIDevice *pci_dev, unsigned vector) +{ + NvmeCtrl *n =3D NVME(pci_dev); + + trace_pci_nvme_irq_mask(vector); + + for (uint32_t i =3D 1; i <=3D n->params.max_ioqpairs; i++) { + NvmeCQueue *cq =3D n->cq[i]; + + if (!cq) { + continue; + } + + if (cq->vector =3D=3D vector) { + kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + cq->virq); + } + } +} + +static void nvme_kvm_vector_poll(PCIDevice *pci_dev, unsigned int vector_s= tart, + unsigned int vector_end) +{ + NvmeCtrl *n =3D NVME(pci_dev); + + trace_pci_nvme_irq_poll(vector_start, vector_end); + + for (uint32_t i =3D 1; i <=3D n->params.max_ioqpairs; i++) { + NvmeCQueue *cq =3D n->cq[i]; + + if (!cq) { + continue; + } + + if (!msix_is_masked(pci_dev, cq->vector)) { + continue; + } + + if (cq->vector >=3D vector_start && cq->vector <=3D vector_end) { + if (event_notifier_test_and_clear(&cq->assert_notifier)) { + msix_set_pending(pci_dev, i); + } + } + } +} + + static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq) { + bool with_irqfd =3D msix_enabled(&n->parent_obj) && + kvm_msi_via_irqfd_enabled(); int ret; =20 ret =3D event_notifier_init(&cq->assert_notifier, 0); @@ -1386,12 +1494,27 @@ static void nvme_init_irq_notifier(NvmeCtrl *n, Nvm= eCQueue *cq) return; } =20 - event_notifier_set_handler(&cq->assert_notifier, - nvme_assert_notifier_read); + if (with_irqfd) { + ret =3D nvme_kvm_vector_use(n, cq, cq->vector); + if (ret < 0) { + event_notifier_cleanup(&cq->assert_notifier); + + return; + } + } else { + event_notifier_set_handler(&cq->assert_notifier, + nvme_assert_notifier_read); + } =20 if (!msix_enabled(&n->parent_obj)) { ret =3D event_notifier_init(&cq->deassert_notifier, 0); if (ret < 0) { + if (with_irqfd) { + kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + cq->virq); + } + event_notifier_set_handler(&cq->assert_notifier, NULL); event_notifier_cleanup(&cq->assert_notifier); =20 @@ -4764,6 +4887,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest= *req) =20 static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) { + bool with_irqfd =3D msix_enabled(&n->parent_obj) && + kvm_msi_via_irqfd_enabled(); uint16_t offset =3D (cq->cqid << 3) + (1 << 2); =20 n->cq[cq->cqid] =3D NULL; @@ -4775,6 +4900,12 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) event_notifier_cleanup(&cq->notifier); } if (cq->assert_notifier.initialized) { + if (with_irqfd) { + kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + cq->virq); + kvm_irqchip_release_virq(kvm_state, cq->virq); + } event_notifier_set_handler(&cq->assert_notifier, NULL); event_notifier_cleanup(&cq->assert_notifier); } @@ -6528,6 +6659,9 @@ static int nvme_start_ctrl(NvmeCtrl *n) uint32_t page_size =3D 1 << page_bits; NvmeSecCtrlEntry *sctrl =3D nvme_sctrl(n); =20 + bool with_irqfd =3D msix_enabled(&n->parent_obj) && + kvm_msi_via_irqfd_enabled(); + if (pci_is_vf(&n->parent_obj) && !sctrl->scs) { trace_pci_nvme_err_startfail_virt_state(le16_to_cpu(sctrl->nvi), le16_to_cpu(sctrl->nvq), @@ -6617,6 +6751,12 @@ static int nvme_start_ctrl(NvmeCtrl *n) =20 nvme_select_iocs(n); =20 + if (with_irqfd) { + return msix_set_vector_notifiers(PCI_DEVICE(n), nvme_kvm_vector_un= mask, + nvme_kvm_vector_mask, + nvme_kvm_vector_poll); + } + return 0; } =20 @@ -7734,6 +7874,7 @@ static void nvme_exit(PCIDevice *pci_dev) pcie_sriov_pf_exit(pci_dev); } =20 + msix_unset_vector_notifiers(pci_dev); msix_uninit(pci_dev, &n->bar0, &n->bar0); memory_region_del_subregion(&n->bar0, &n->iomem); } diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 4850d3e965..b0b986b024 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -20,6 +20,7 @@ =20 #include "qemu/uuid.h" #include "hw/pci/pci.h" +#include "hw/pci/msi.h" #include "hw/block/block.h" =20 #include "block/nvme.h" @@ -396,10 +397,12 @@ typedef struct NvmeCQueue { uint64_t dma_addr; uint64_t db_addr; uint64_t ei_addr; + int virq; QEMUTimer *timer; EventNotifier notifier; EventNotifier assert_notifier; EventNotifier deassert_notifier; + MSIMessage msg; bool ioeventfd_enabled; QTAILQ_HEAD(, NvmeSQueue) sq_list; QTAILQ_HEAD(, NvmeRequest) req_list; diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index fccb79f489..b11fcf4a65 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -2,6 +2,9 @@ pci_nvme_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u" pci_nvme_irq_pin(void) "pulsing IRQ pin" pci_nvme_irq_masked(void) "IRQ is masked" +pci_nvme_irq_mask(uint32_t vector) "IRQ %u gets masked" +pci_nvme_irq_unmask(uint32_t vector, uint64_t addr, uint32_t data) "IRQ %u= gets unmasked, addr=3D0x%"PRIx64" data=3D0x%"PRIu32"" +pci_nvme_irq_poll(uint32_t vector_start, uint32_t vector_end) "IRQ poll, s= tart=3D0x%"PRIu32" end=3D0x%"PRIu32"" pci_nvme_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=3D0x%"PRIx= 64" prp2=3D0x%"PRIx64"" pci_nvme_dbbuf_config(uint64_t dbs_addr, uint64_t eis_addr) "dbs_addr=3D0x= %"PRIx64" eis_addr=3D0x%"PRIx64"" pci_nvme_map_addr(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRI= u64"" --=20 2.25.1 From nobody Tue Feb 10 01:56:53 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 166151545519629.80882719156; Fri, 26 Aug 2022 05:04:15 -0700 (PDT) Received: from localhost ([::1]:60536 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRY4L-0006Qa-Ba for importer@patchew.org; Fri, 26 Aug 2022 08:04:13 -0400 Received: from eggs.gnu.org ([209.51.188.92]:44618) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRXMV-000112-ER; Fri, 26 Aug 2022 07:18:55 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21]:47280 helo=cstnet.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRXMS-0006R6-Pe; Fri, 26 Aug 2022 07:18:55 -0400 Received: from localhost.localdomain (unknown [159.226.43.62]) by APP-01 (Coremail) with SMTP id qwCowADX3yMNrAhjjZVhAA--.2259S5; Fri, 26 Aug 2022 19:18:46 +0800 (CST) From: Jinhao Fan To: qemu-devel@nongnu.org Cc: its@irrelevant.dk, kbusch@kernel.org, stefanha@gmail.com, Jinhao Fan , qemu-block@nongnu.org (open list:nvme) Subject: [PATCH 3/3] hw/nvme: add iothread support Date: Fri, 26 Aug 2022 19:18:34 +0800 Message-Id: <20220826111834.3014912-4-fanjinhao21s@ict.ac.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220826111834.3014912-1-fanjinhao21s@ict.ac.cn> References: <20220826111834.3014912-1-fanjinhao21s@ict.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowADX3yMNrAhjjZVhAA--.2259S5 X-Coremail-Antispam: 1UD129KBjvJXoW3JryUuryrtrWkJrWxAr15XFb_yoW3KrW3pF WkWrZ3uws7JF17Zan0van7Aw1ruw48W3WDG34fAwn3Jwn7Gry3AFy0kFy29FWrJrZ5XFZ8 A3y8JF47u348t3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBG14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84 ACjcxK6I8E87Iv67AKxVWxJr0_GcWl84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r1j6r18McIj6I8E87Iv67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xv r2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0Ew4C26cxK6c8Ij28Icw CF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j 6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw1lIxkGc2Ij64 vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_ Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0x vEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRaiikUUUUU= X-Originating-IP: [159.226.43.62] X-CM-SenderInfo: xidqyxpqkd0j0rv6xunwoduhdfq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=fanjinhao21s@ict.ac.cn; helo=cstnet.cn X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZM-MESSAGEID: 1661515458347100001 Content-Type: text/plain; charset="utf-8" Add an option "iothread=3Dx" to do emulation in a seperate iothread. This improves the performance because QEMU's main loop is responsible for a lot of other work while iothread is dedicated to NVMe emulation. Moreover, emulating in iothread brings the potential of polling on SQ/CQ doorbells, which I will bring up in a following patch. Iothread can be enabled by: -object iothread,id=3Dnvme0 \ -device nvme,iothread=3Dnvme0 \ Performance comparisons (KIOPS): QD 1 4 16 64 QEMU 41 136 242 338 iothread 53 155 245 309 Signed-off-by: Jinhao Fan --- hw/nvme/ctrl.c | 74 +++++++++++++++++++++++++++++++++++++++++++++----- hw/nvme/ns.c | 21 +++++++++++--- hw/nvme/nvme.h | 6 +++- 3 files changed, 89 insertions(+), 12 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 396f3f0cdd..24a367329d 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -4458,7 +4458,13 @@ static int nvme_init_cq_ioeventfd(NvmeCQueue *cq) return ret; } =20 - event_notifier_set_handler(&cq->notifier, nvme_cq_notifier); + if (cq->cqid) { + aio_set_event_notifier(n->ctx, &cq->notifier, true, nvme_cq_notifi= er, + NULL, NULL); + } else { + event_notifier_set_handler(&cq->notifier, nvme_cq_notifier); + } + memory_region_add_eventfd(&n->iomem, 0x1000 + offset, 4, false, 0, &cq->notifier); =20 @@ -4487,7 +4493,13 @@ static int nvme_init_sq_ioeventfd(NvmeSQueue *sq) return ret; } =20 - event_notifier_set_handler(&sq->notifier, nvme_sq_notifier); + if (sq->sqid) { + aio_set_event_notifier(n->ctx, &sq->notifier, true, nvme_sq_notifi= er, + NULL, NULL); + } else { + event_notifier_set_handler(&sq->notifier, nvme_sq_notifier); + } + memory_region_add_eventfd(&n->iomem, 0x1000 + offset, 4, false, 0, &sq->notifier); =20 @@ -4503,7 +4515,12 @@ static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n) if (sq->ioeventfd_enabled) { memory_region_del_eventfd(&n->iomem, 0x1000 + offset, 4, false, 0, &sq->notif= ier); - event_notifier_set_handler(&sq->notifier, NULL); + if (sq->sqid) { + aio_set_event_notifier(n->ctx, &sq->notifier, true, NULL, NULL, + NULL); + } else { + event_notifier_set_handler(&sq->notifier, NULL); + } event_notifier_cleanup(&sq->notifier); } g_free(sq->io_req); @@ -4573,7 +4590,13 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n= , uint64_t dma_addr, sq->io_req[i].sq =3D sq; QTAILQ_INSERT_TAIL(&(sq->req_list), &sq->io_req[i], entry); } - sq->timer =3D timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq); + + if (sq->sqid) { + sq->timer =3D aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS, + nvme_process_sq, sq); + } else { + sq->timer =3D timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq= ); + } =20 if (n->dbbuf_enabled) { sq->db_addr =3D n->dbbuf_dbs + (sqid << 3); @@ -4896,7 +4919,12 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) if (cq->ioeventfd_enabled) { memory_region_del_eventfd(&n->iomem, 0x1000 + offset, 4, false, 0, &cq->notif= ier); - event_notifier_set_handler(&cq->notifier, NULL); + if (cq->cqid) { + aio_set_event_notifier(n->ctx, &cq->notifier, true, NULL, NULL, + NULL); + } else { + event_notifier_set_handler(&cq->notifier, NULL); + } event_notifier_cleanup(&cq->notifier); } if (cq->assert_notifier.initialized) { @@ -4979,7 +5007,13 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n= , uint64_t dma_addr, } } n->cq[cqid] =3D cq; - cq->timer =3D timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq); + + if (cq->cqid) { + cq->timer =3D aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS, + nvme_post_cqes, cq); + } else { + cq->timer =3D timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq); + } =20 /* * Only enable irq eventfd for IO queues since we always emulate admin @@ -4988,6 +5022,13 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n= , uint64_t dma_addr, if (cqid && n->params.irq_eventfd) { nvme_init_irq_notifier(n, cq); } + + if (cq->cqid) { + cq->timer =3D aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS, + nvme_post_cqes, cq); + } else { + cq->timer =3D timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq); + } } =20 static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req) @@ -7759,6 +7800,14 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *p= ci_dev) if (pci_is_vf(&n->parent_obj) && !sctrl->scs) { stl_le_p(&n->bar.csts, NVME_CSTS_FAILED); } + + if (n->params.iothread) { + n->iothread =3D n->params.iothread; + object_ref(OBJECT(n->iothread)); + n->ctx =3D iothread_get_aio_context(n->iothread); + } else { + n->ctx =3D qemu_get_aio_context(); + } } =20 static int nvme_init_subsys(NvmeCtrl *n, Error **errp) @@ -7831,7 +7880,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **= errp) ns =3D &n->namespace; ns->params.nsid =3D 1; =20 - if (nvme_ns_setup(ns, errp)) { + if (nvme_ns_setup(ns, n->ctx, errp)) { return; } =20 @@ -7862,6 +7911,15 @@ static void nvme_exit(PCIDevice *pci_dev) g_free(n->sq); g_free(n->aer_reqs); =20 + aio_context_acquire(n->ctx); + blk_set_aio_context(n->namespace.blkconf.blk, qemu_get_aio_context(), = NULL); + aio_context_release(n->ctx); + + if (n->iothread) { + object_unref(OBJECT(n->iothread)); + n->iothread =3D NULL; + } + if (n->params.cmb_size_mb) { g_free(n->cmb.buf); } @@ -7885,6 +7943,8 @@ static Property nvme_props[] =3D { HostMemoryBackend *), DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS, NvmeSubsystem *), + DEFINE_PROP_LINK("iothread", NvmeCtrl, params.iothread, TYPE_IOTHREAD, + IOThread *), DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial), DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0), DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0), diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 62a1f97be0..eb9141a67b 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -146,9 +146,11 @@ lbaf_found: return 0; } =20 -static int nvme_ns_init_blk(NvmeNamespace *ns, Error **errp) +static int nvme_ns_init_blk(NvmeNamespace *ns, AioContext *ctx, Error **er= rp) { bool read_only; + AioContext *old_context; + int ret; =20 if (!blkconf_blocksizes(&ns->blkconf, errp)) { return -1; @@ -170,6 +172,17 @@ static int nvme_ns_init_blk(NvmeNamespace *ns, Error *= *errp) return -1; } =20 + old_context =3D blk_get_aio_context(ns->blkconf.blk); + aio_context_acquire(old_context); + ret =3D blk_set_aio_context(ns->blkconf.blk, ctx, errp); + aio_context_release(old_context); + + if (ret) { + error_setg(errp, "Set AioContext on BlockBackend failed"); + return ret; + } + + return 0; } =20 @@ -482,13 +495,13 @@ static int nvme_ns_check_constraints(NvmeNamespace *n= s, Error **errp) return 0; } =20 -int nvme_ns_setup(NvmeNamespace *ns, Error **errp) +int nvme_ns_setup(NvmeNamespace *ns, AioContext *ctx, Error **errp) { if (nvme_ns_check_constraints(ns, errp)) { return -1; } =20 - if (nvme_ns_init_blk(ns, errp)) { + if (nvme_ns_init_blk(ns, ctx, errp)) { return -1; } =20 @@ -563,7 +576,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **e= rrp) } } =20 - if (nvme_ns_setup(ns, errp)) { + if (nvme_ns_setup(ns, n->ctx, errp)) { return; } =20 diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index b0b986b024..224b73e6c4 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -22,6 +22,7 @@ #include "hw/pci/pci.h" #include "hw/pci/msi.h" #include "hw/block/block.h" +#include "sysemu/iothread.h" =20 #include "block/nvme.h" =20 @@ -276,7 +277,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *n= s) } =20 void nvme_ns_init_format(NvmeNamespace *ns); -int nvme_ns_setup(NvmeNamespace *ns, Error **errp); +int nvme_ns_setup(NvmeNamespace *ns, AioContext *ctx, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_shutdown(NvmeNamespace *ns); void nvme_ns_cleanup(NvmeNamespace *ns); @@ -433,6 +434,7 @@ typedef struct NvmeParams { uint16_t sriov_vi_flexible; uint8_t sriov_max_vq_per_vf; uint8_t sriov_max_vi_per_vf; + IOThread *iothread; } NvmeParams; =20 typedef struct NvmeCtrl { @@ -464,6 +466,8 @@ typedef struct NvmeCtrl { uint64_t dbbuf_dbs; uint64_t dbbuf_eis; bool dbbuf_enabled; + IOThread *iothread; + AioContext *ctx; =20 struct { MemoryRegion mem; --=20 2.25.1