[SeaBIOS] [PATCH] nvme: don't keep probing namespaces after an active one has been found

Thanos Makatos posted 1 patch 2 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/seabios tags/patchew/20210701142047.47892-1-thanos.makatos@nutanix.com
src/hw/nvme.c | 51 +++++++++++++++++++++++++++++++++++++++------------
1 file changed, 39 insertions(+), 12 deletions(-)

[SeaBIOS] [PATCH] nvme: don't keep probing namespaces after an active one has been found

Posted by Thanos Makatos 2 months, 3 weeks ago
From: Thanos Makatos <thanos@nutanix.com>

The current implementation keeps probing NVMe namespaces even if an
active one has been found. This is unnecessary and most importantly
results in memory allocation failure if the are more than a few dozens
of nmespaces. Also, not probing the remaining namespaces after having
found an active one reduces boot time.

I've tested with 1024 namespaces where 512 where active and it worked
fined.

Signed-off-by: Thanos Makatos <thanos@nutanix.com>
---
 src/hw/nvme.c | 51 +++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/src/hw/nvme.c b/src/hw/nvme.c
index 746e468..73c1a52 100644
--- a/src/hw/nvme.c
+++ b/src/hw/nvme.c
@@ -233,9 +233,22 @@ nvme_admin_identify_ns(struct nvme_ctrl *ctrl, u32 ns_id)
                                 ns_id)->ns;
 }
 
-static void
+static char*
+nvme_ns_desc(const struct nvme_namespace *ns, u32 ns_id)
+{
+    return znprintf(MAXDESCSIZE, "NVMe NS %u: %llu MiB (%llu %u-byte "
+                                 "blocks + %u-byte metadata)\n",
+                                  ns_id, (ns->lba_count * ns->block_size) >> 20,
+                                  ns->lba_count, ns->block_size,
+                                  ns->metadata_size);
+}
+
+/* Returns a pointer to the namespace if it's usable, NULL otherwise. */
+struct nvme_namespace*
 nvme_probe_ns(struct nvme_ctrl *ctrl, u32 ns_idx, u8 mdts)
 {
+    struct nvme_namespace *ns = NULL;
+
     u32 ns_id = ns_idx + 1;
 
     struct nvme_identify_ns *id = nvme_admin_identify_ns(ctrl, ns_id);
@@ -257,7 +270,7 @@ nvme_probe_ns(struct nvme_ctrl *ctrl, u32 ns_idx, u8 mdts)
         goto free_buffer;
     }
 
-    struct nvme_namespace *ns = malloc_fseg(sizeof(*ns));
+    ns = malloc_fseg(sizeof(*ns));
     if (!ns) {
         warn_noalloc();
         goto free_buffer;
@@ -277,6 +290,7 @@ nvme_probe_ns(struct nvme_ctrl *ctrl, u32 ns_idx, u8 mdts)
            buffer size. */
         warn_internalerror();
         free(ns);
+        ns = NULL;
         goto free_buffer;
     }
 
@@ -296,16 +310,13 @@ nvme_probe_ns(struct nvme_ctrl *ctrl, u32 ns_idx, u8 mdts)
 
     ns->dma_buffer = zalloc_page_aligned(&ZoneHigh, NVME_PAGE_SIZE);
 
-    char *desc = znprintf(MAXDESCSIZE, "NVMe NS %u: %llu MiB (%llu %u-byte "
-                          "blocks + %u-byte metadata)",
-                          ns_id, (ns->lba_count * ns->block_size) >> 20,
-                          ns->lba_count, ns->block_size, ns->metadata_size);
-
-    dprintf(3, "%s\n", desc);
-    boot_add_hd(&ns->drive, desc, bootprio_find_pci_device(ctrl->pci));
+    char *desc = nvme_ns_desc(ns, ns_id);
+    dprintf(3, "%s", desc);
+    free(desc);
 
 free_buffer:
     free (id);
+    return ns;
 }
 
 
@@ -560,6 +571,13 @@ nvme_wait_csts_rdy(struct nvme_ctrl *ctrl, unsigned rdy)
     return 0;
 }
 
+static void
+nvme_destroy_io_queues(struct nvme_ctrl *ctrl)
+{
+    nvme_destroy_sq(&ctrl->io_sq);
+    nvme_destroy_cq(&ctrl->io_cq);
+}
+
 /* Returns 0 on success. */
 static int
 nvme_controller_enable(struct nvme_ctrl *ctrl)
@@ -628,15 +646,24 @@ nvme_controller_enable(struct nvme_ctrl *ctrl)
         goto err_destroy_admin_sq;
     }
 
-    /* Populate namespace IDs */
+    /* Find first active namespace. */
     int ns_idx;
-    for (ns_idx = 0; ns_idx < ctrl->ns_count; ns_idx++) {
-        nvme_probe_ns(ctrl, ns_idx, identify->mdts);
+    struct nvme_namespace *ns = NULL;
+    for (ns_idx = 0; ns_idx < ctrl->ns_count && !ns; ns_idx++) {
+        ns = nvme_probe_ns(ctrl, ns_idx, identify->mdts);
+    }
+    if (!ns) {
+        dprintf(3, "no active NVMe namespace\n");
+        goto err_destroy_ioq;
     }
+    boot_add_hd(&ns->drive, nvme_ns_desc(ns, ns_idx + 1),
+                bootprio_find_pci_device(ctrl->pci));
 
     dprintf(3, "NVMe initialization complete!\n");
     return 0;
 
+ err_destroy_ioq:
+    nvme_destroy_io_queues(ctrl);
  err_destroy_admin_sq:
     nvme_destroy_sq(&ctrl->admin_sq);
  err_destroy_admin_cq:
-- 
2.22.3

_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH] nvme: don't keep probing namespaces after an active one has been found

Posted by Gerd Hoffmann 2 months ago
On Thu, Jul 01, 2021 at 02:20:47PM +0000, Thanos Makatos wrote:
> From: Thanos Makatos <thanos@nutanix.com>
> 
> The current implementation keeps probing NVMe namespaces even if an
> active one has been found. This is unnecessary and most importantly
> results in memory allocation failure if the are more than a few dozens
> of nmespaces. Also, not probing the remaining namespaces after having
> found an active one reduces boot time.

[ just back from vacation ]

Well, assuming the first namespace is the one you want actually boot
from.  Which I expect is the common case, but who knows how people
use their machines ...

IIRC there is some work in progress to add bootorder support for
namespaces, where do we stand with that?  Once we have this we should be
able to extend the skip_nonbootable logic to handle not only controllers
but also namespaces.  With that we should be able to initialize the
correct boot namespaces instead of picking the first namespace.

take care,
  Gerd

_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH] nvme: don't keep probing namespaces after an active one has been found

Posted by Thanos Makatos 2 months ago
> -----Original Message-----
> From: Gerd Hoffmann <kraxel@redhat.com>
> Sent: 21 July 2021 15:57
> To: Thanos Makatos <thanos.makatos@nutanix.com>
> Cc: seabios@seabios.org; Thanos Makatos <thanos.makatos@nutanix.com>
> Subject: Re: [PATCH] nvme: don't keep probing namespaces after an active
> one has been found
> 
> On Thu, Jul 01, 2021 at 02:20:47PM +0000, Thanos Makatos wrote:
> > From: Thanos Makatos <thanos@nutanix.com>
> >
> > The current implementation keeps probing NVMe namespaces even if an
> > active one has been found. This is unnecessary and most importantly
> > results in memory allocation failure if the are more than a few dozens
> > of nmespaces. Also, not probing the remaining namespaces after having
> > found an active one reduces boot time.
> 
> [ just back from vacation ]
> 
> Well, assuming the first namespace is the one you want actually boot
> from.  Which I expect is the common case, but who knows how people
> use their machines ...
> 
> IIRC there is some work in progress to add bootorder support for
> namespaces, where do we stand with that?

IIUC I got the go-ahead to implement this in QEMU, so let's see how that goes. Then we can look at the SeaBIOS patch I suppose.

> Once we have this we should be
> able to extend the skip_nonbootable logic to handle not only controllers
> but also namespaces.  With that we should be able to initialize the
> correct boot namespaces instead of picking the first namespace.

I would expect that selecting a particular namespace (which will be enabled by the QEMU patch) will be optional. Therefore, we might still have to figure out from which namespace to boot from, so we'll be in the same situation as we are now (an NVMe controller with lots of namespaces). So, I think the patch I propose is still useful?


_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH] nvme: don't keep probing namespaces after an active one has been found

Posted by Gerd Hoffmann 2 months ago
> > [ just back from vacation ]
> > 
> > Well, assuming the first namespace is the one you want actually boot
> > from.  Which I expect is the common case, but who knows how people
> > use their machines ...
> > 
> > IIRC there is some work in progress to add bootorder support for
> > namespaces, where do we stand with that?
> 
> IIUC I got the go-ahead to implement this in QEMU, so let's see how that goes. Then we can look at the SeaBIOS patch I suppose.
> 
> > Once we have this we should be
> > able to extend the skip_nonbootable logic to handle not only controllers
> > but also namespaces.  With that we should be able to initialize the
> > correct boot namespaces instead of picking the first namespace.
> 
> I would expect that selecting a particular namespace (which will be enabled by the QEMU patch) will be optional. Therefore, we might still have to figure out from which namespace to boot from, so we'll be in the same situation as we are now (an NVMe controller with lots of namespaces). So, I think the patch I propose is still useful?

Yes, when bootorder doesn't specify one (or more) specific namespace(s)
picking the first is fine.

The patch seems to do some unneeded / unrelated changes though.  Why do
you move the boot_add() call?  Why the desc changes?  Why return the
namespace?  Just returning a int or bool telling whenever the namespace
is active or not should be enough, no?

take care,
  Gerd

_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH] nvme: don't keep probing namespaces after an active one has been found

Posted by Thanos Makatos 2 months ago
> The patch seems to do some unneeded / unrelated changes though.  Why
> do
> you move the boot_add() call?  Why the desc changes?  Why return the
> namespace?  Just returning a int or bool telling whenever the namespace
> is active or not should be enough, no?

You're right, these changes are unnecessary, let me fix that.
_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org