[v3] nvme: Add PRP List support

[SeaBIOS] [PATCH v3 4/4] nvme: Split requests by maximum allowed size

Posted by Alexander Graf 3 years, 11 months ago

Some NVMe controllers only support small maximum request sizes, such as
the AWS EBS NVMe implementation which only supports NVMe requests of up
to 32 pages (256kb) at once.

BIOS callers can exceed those request sizes by defining sector counts
above this threshold. Currently we fall back to the bounce buffer
implementation for those. This is slow.

This patch introduces splitting logic to the NVMe I/O request code so
that every NVMe I/O request gets handled in a chunk size that is
consumable by the NVMe adapter, while maintaining the fast path PRPL
logic we just introduced.

Signed-off-by: Alexander Graf <graf@amazon.com>
---
 src/hw/nvme.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/hw/nvme.c b/src/hw/nvme.c
index b92ca52..cc37bca 100644
--- a/src/hw/nvme.c
+++ b/src/hw/nvme.c
@@ -727,6 +727,22 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
     u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
     u16 i;
 
+    /* Split up requests that are larger than the device can handle */
+    if (op->count > ns->max_req_size) {
+        u16 count = op->count;
+
+        /* Handle the first max_req_size elements */
+        op->count = ns->max_req_size;
+        if (nvme_cmd_readwrite(ns, op, write))
+            return res;
+
+        /* Handle the remainder of the request */
+        op->count = count - ns->max_req_size;
+        op->lba += ns->max_req_size;
+        op->buf_fl += (ns->max_req_size * ns->block_size);
+        return nvme_cmd_readwrite(ns, op, write);
+    }
+
     if (!nvme_build_prpl(ns, op)) {
         /* Request goes via PRP List logic */
         return nvme_io_readwrite(ns, op->lba, ns->prp1, op->count, write);
-- 
2.16.4




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH v3 4/4] nvme: Split requests by maximum allowed size

Posted by Kevin O'Connor 3 years, 11 months ago

On Wed, Sep 30, 2020 at 11:10:56PM +0200, Alexander Graf wrote:
> Some NVMe controllers only support small maximum request sizes, such as
> the AWS EBS NVMe implementation which only supports NVMe requests of up
> to 32 pages (256kb) at once.
> 
> BIOS callers can exceed those request sizes by defining sector counts
> above this threshold. Currently we fall back to the bounce buffer
> implementation for those. This is slow.
> 
> This patch introduces splitting logic to the NVMe I/O request code so
> that every NVMe I/O request gets handled in a chunk size that is
> consumable by the NVMe adapter, while maintaining the fast path PRPL
> logic we just introduced.
> 
> Signed-off-by: Alexander Graf <graf@amazon.com>
> ---
>  src/hw/nvme.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/src/hw/nvme.c b/src/hw/nvme.c
> index b92ca52..cc37bca 100644
> --- a/src/hw/nvme.c
> +++ b/src/hw/nvme.c
> @@ -727,6 +727,22 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
>      u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
>      u16 i;
>  
> +    /* Split up requests that are larger than the device can handle */
> +    if (op->count > ns->max_req_size) {
> +        u16 count = op->count;
> +
> +        /* Handle the first max_req_size elements */
> +        op->count = ns->max_req_size;
> +        if (nvme_cmd_readwrite(ns, op, write))
> +            return res;
> +
> +        /* Handle the remainder of the request */
> +        op->count = count - ns->max_req_size;
> +        op->lba += ns->max_req_size;
> +        op->buf_fl += (ns->max_req_size * ns->block_size);
> +        return nvme_cmd_readwrite(ns, op, write);
> +    }

Depending on the disk access, this code could run with a small stack.
I would avoid recursion.

Otherwise, the patch series looks okay to me.  (I don't have enough
knowledge of the nvme code to give a full review though.)

-Kevin
_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH v3 4/4] nvme: Split requests by maximum allowed size

Posted by Graf (AWS), Alexander 3 years, 11 months ago

Hey Kevin,

> Am 03.10.2020 um 03:47 schrieb Kevin O'Connor <kevin@koconnor.net>:
> 
> 
>> On Wed, Sep 30, 2020 at 11:10:56PM +0200, Alexander Graf wrote:
>> Some NVMe controllers only support small maximum request sizes, such as
>> the AWS EBS NVMe implementation which only supports NVMe requests of up
>> to 32 pages (256kb) at once.
>> 
>> BIOS callers can exceed those request sizes by defining sector counts
>> above this threshold. Currently we fall back to the bounce buffer
>> implementation for those. This is slow.
>> 
>> This patch introduces splitting logic to the NVMe I/O request code so
>> that every NVMe I/O request gets handled in a chunk size that is
>> consumable by the NVMe adapter, while maintaining the fast path PRPL
>> logic we just introduced.
>> 
>> Signed-off-by: Alexander Graf <graf@amazon.com>
>> ---
>> src/hw/nvme.c | 16 ++++++++++++++++
>> 1 file changed, 16 insertions(+)
>> 
>> diff --git a/src/hw/nvme.c b/src/hw/nvme.c
>> index b92ca52..cc37bca 100644
>> --- a/src/hw/nvme.c
>> +++ b/src/hw/nvme.c
>> @@ -727,6 +727,22 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
>>     u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
>>     u16 i;
>> 
>> +    /* Split up requests that are larger than the device can handle */
>> +    if (op->count > ns->max_req_size) {
>> +        u16 count = op->count;
>> +
>> +        /* Handle the first max_req_size elements */
>> +        op->count = ns->max_req_size;
>> +        if (nvme_cmd_readwrite(ns, op, write))
>> +            return res;
>> +
>> +        /* Handle the remainder of the request */
>> +        op->count = count - ns->max_req_size;
>> +        op->lba += ns->max_req_size;
>> +        op->buf_fl += (ns->max_req_size * ns->block_size);
>> +        return nvme_cmd_readwrite(ns, op, write);
>> +    }
> 
> Depending on the disk access, this code could run with a small stack.
> I would avoid recursion.

This is tail recursion, which any reasonable compiler converts into a jmp :). Take a look at the .o file - for me it did become a plain jump.

> 
> Otherwise, the patch series looks okay to me.  (I don't have enough
> knowledge of the nvme code to give a full review though).

Thanks :)

Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH v3 4/4] nvme: Split requests by maximum allowed size

Posted by Kevin O'Connor 3 years, 11 months ago

On Sat, Oct 03, 2020 at 05:40:58AM +0000, Graf (AWS), Alexander wrote:
> Hey Kevin,
> 
> > Am 03.10.2020 um 03:47 schrieb Kevin O'Connor <kevin@koconnor.net>:
> > 
> > 
> >> On Wed, Sep 30, 2020 at 11:10:56PM +0200, Alexander Graf wrote:
> >> Some NVMe controllers only support small maximum request sizes, such as
> >> the AWS EBS NVMe implementation which only supports NVMe requests of up
> >> to 32 pages (256kb) at once.
> >> 
> >> BIOS callers can exceed those request sizes by defining sector counts
> >> above this threshold. Currently we fall back to the bounce buffer
> >> implementation for those. This is slow.
> >> 
> >> This patch introduces splitting logic to the NVMe I/O request code so
> >> that every NVMe I/O request gets handled in a chunk size that is
> >> consumable by the NVMe adapter, while maintaining the fast path PRPL
> >> logic we just introduced.
> >> 
> >> Signed-off-by: Alexander Graf <graf@amazon.com>
> >> ---
> >> src/hw/nvme.c | 16 ++++++++++++++++
> >> 1 file changed, 16 insertions(+)
> >> 
> >> diff --git a/src/hw/nvme.c b/src/hw/nvme.c
> >> index b92ca52..cc37bca 100644
> >> --- a/src/hw/nvme.c
> >> +++ b/src/hw/nvme.c
> >> @@ -727,6 +727,22 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
> >>     u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
> >>     u16 i;
> >> 
> >> +    /* Split up requests that are larger than the device can handle */
> >> +    if (op->count > ns->max_req_size) {
> >> +        u16 count = op->count;
> >> +
> >> +        /* Handle the first max_req_size elements */
> >> +        op->count = ns->max_req_size;
> >> +        if (nvme_cmd_readwrite(ns, op, write))
> >> +            return res;
> >> +
> >> +        /* Handle the remainder of the request */
> >> +        op->count = count - ns->max_req_size;
> >> +        op->lba += ns->max_req_size;
> >> +        op->buf_fl += (ns->max_req_size * ns->block_size);
> >> +        return nvme_cmd_readwrite(ns, op, write);
> >> +    }
> > 
> > Depending on the disk access, this code could run with a small stack.
> > I would avoid recursion.
> 
> This is tail recursion, which any reasonable compiler converts into a jmp :). Take a look at the .o file - for me it did become a plain jump.
> 

Okay, that's fine then.

> > 
> > Otherwise, the patch series looks okay to me.  (I don't have enough
> > knowledge of the nvme code to give a full review though).
> 
> Thanks :)

I'll leave a few more days to allow others to comment, but otherwise
I'm fine with committing.

Cheers,
-Kevin
_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH v3 4/4] nvme: Split requests by maximum allowed size

Posted by David Woodhouse 3 years, 10 months ago

On Wed, 2020-10-07 at 12:13 -0400, Kevin O'Connor wrote:
> > > > --- a/src/hw/nvme.c
> > > > +++ b/src/hw/nvme.c
> > > > @@ -727,6 +727,22 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
> > > >      u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
> > > >      u16 i;
> > > > 
> > > > +    /* Split up requests that are larger than the device can handle */
> > > > +    if (op->count > ns->max_req_size) {
> > > > +        u16 count = op->count;
> > > > +
> > > > +        /* Handle the first max_req_size elements */
> > > > +        op->count = ns->max_req_size;
> > > > +        if (nvme_cmd_readwrite(ns, op, write))
> > > > +            return res;
> > > > +
> > > > +        /* Handle the remainder of the request */
> > > > +        op->count = count - ns->max_req_size;
> > > > +        op->lba += ns->max_req_size;
> > > > +        op->buf_fl += (ns->max_req_size * ns->block_size);
> > > > +        return nvme_cmd_readwrite(ns, op, write);
> > > > +    }
> > > 
> > > Depending on the disk access, this code could run with a small stack.
> > > I would avoid recursion.
> > 
> > This is tail recursion, which any reasonable compiler converts into
> > a jmp :). Take a look at the .o file - for me it did become a plain
> > jump.
> > 
> 
> Okay, that's fine then.

It's still kind of icky. This used to be a purely iterative function.
Now for a large unaligned request (which nvme_build_prpl refuses to
handle) you get a mixture of (mostly tail) recursion and iteration.

It'll recurse for each max_req_size, then iterate over each page within
that.

I'd much rather see just a simple iterative loop. Something like this
(incremental, completely untested):

--- a/src/hw/nvme.c
+++ b/src/hw/nvme.c
@@ -472,19 +472,20 @@ static int nvme_add_prpl(struct nvme_namespace *ns, u64 base)
 
 int nvme_build_prpl(struct nvme_namespace *ns, struct disk_op_s *op)
 {
-    int first_page = 1;
+    int first_page = 1, count = op->count;
     u32 base = (long)op->buf_fl;
-    s32 size = op->count * ns->block_size;
+    s32 size;
 
-    if (op->count > ns->max_req_size)
-        return -1;
+    if (count > ns->max_req_size)
+	    count = ns->max_req_size;
 
     nvme_reset_prpl(ns);
 
+    size = count * ns->block_size;
     /* Special case for transfers that fit into PRP1, but are unaligned */
     if (((size + (base & ~NVME_PAGE_MASK)) <= NVME_PAGE_SIZE)) {
         ns->prp1 = op->buf_fl;
-        return 0;
+        return count;
     }
 
     /* Every request has to be page aligned */
@@ -506,7 +507,7 @@ int nvme_build_prpl(struct nvme_namespace *ns, struct disk_op_s *op)
             return -1;
     }
 
-    return 0;
+    return count;
 }
 
 static int
@@ -725,27 +726,16 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
 {
     int res = DISK_RET_SUCCESS;
     u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
-    u16 i;
-
-    /* Split up requests that are larger than the device can handle */
-    if (op->count > ns->max_req_size) {
-        u16 count = op->count;
+    u16 i, count;
 
-        /* Handle the first max_req_size elements */
-        op->count = ns->max_req_size;
-        if (nvme_cmd_readwrite(ns, op, write))
-            return res;
-
-        /* Handle the remainder of the request */
-        op->count = count - ns->max_req_size;
-        op->lba += ns->max_req_size;
-        op->buf_fl += (ns->max_req_size * ns->block_size);
-        return nvme_cmd_readwrite(ns, op, write);
-    }
+    while (op->count && ((count = nvme_build_prpl(ns, op)) > 0)) {
+	res = nvme_io_readwrite(ns, op->lba, ns->prp1, count, write);
+        if (res != DISK_RET_SUCCESS)
+            break;
 
-    if (!nvme_build_prpl(ns, op)) {
-        /* Request goes via PRP List logic */
-        return nvme_io_readwrite(ns, op->lba, ns->prp1, op->count, write);
+        op->count -= count;
+        op->lba += count;
+        op->buf_fl += (count * ns->block_size);
     }
 
     for (i = 0; i < op->count && res == DISK_RET_SUCCESS;) {

_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH v3 4/4] nvme: Split requests by maximum allowed size

Posted by Kevin O'Connor 3 years, 10 months ago

On Thu, Oct 29, 2020 at 05:21:04PM +0000, David Woodhouse wrote:
> On Wed, 2020-10-07 at 12:13 -0400, Kevin O'Connor wrote:
> > > > > --- a/src/hw/nvme.c
> > > > > +++ b/src/hw/nvme.c
> > > > > @@ -727,6 +727,22 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
> > > > >      u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
> > > > >      u16 i;
> > > > > 
> > > > > +    /* Split up requests that are larger than the device can handle */
> > > > > +    if (op->count > ns->max_req_size) {
> > > > > +        u16 count = op->count;
> > > > > +
> > > > > +        /* Handle the first max_req_size elements */
> > > > > +        op->count = ns->max_req_size;
> > > > > +        if (nvme_cmd_readwrite(ns, op, write))
> > > > > +            return res;
> > > > > +
> > > > > +        /* Handle the remainder of the request */
> > > > > +        op->count = count - ns->max_req_size;
> > > > > +        op->lba += ns->max_req_size;
> > > > > +        op->buf_fl += (ns->max_req_size * ns->block_size);
> > > > > +        return nvme_cmd_readwrite(ns, op, write);
> > > > > +    }
> > > > 
> > > > Depending on the disk access, this code could run with a small stack.
> > > > I would avoid recursion.
> > > 
> > > This is tail recursion, which any reasonable compiler converts into
> > > a jmp :). Take a look at the .o file - for me it did become a plain
> > > jump.
> > > 
> > 
> > Okay, that's fine then.
> 
> It's still kind of icky. This used to be a purely iterative function.
> Now for a large unaligned request (which nvme_build_prpl refuses to
> handle) you get a mixture of (mostly tail) recursion and iteration.
> 
> It'll recurse for each max_req_size, then iterate over each page within
> that.
> 
> I'd much rather see just a simple iterative loop. Something like this
> (incremental, completely untested):

FWIW, I agree that a version without recursion (even tail recursion)
would be nice.  If someone wants to test and confirm that, then that
would be great.

-Kevin
_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH v3 4/4] nvme: Split requests by maximum allowed size

Posted by Matt DeVillier 2 years, 7 months ago

Greetings! Was this patch set tested on bare metal hardware? I'm
seeing "GRUB loading: Read Error" when attempting to boot from NVMe on
various Purism Librem devices (Intel Skylake thru Cometlake hardware).
Reverting this series resolves the issue. I'm not seeing anything in
coreboot's cbmem console log (even with the debug level raised), which
I suppose isn't unexpected given it's a grub error and not in SeaBIOS
itself. NVMe is a run of the mill Samsung EVO (960 or 970) and SeaBIOS
reports the max request size is 4096 sectors.

cheers,
Matt




On Fri, Oct 30, 2020 at 6:09 PM Kevin O'Connor <kevin@koconnor.net> wrote:
>
> On Thu, Oct 29, 2020 at 05:21:04PM +0000, David Woodhouse wrote:
> > On Wed, 2020-10-07 at 12:13 -0400, Kevin O'Connor wrote:
> > > > > > --- a/src/hw/nvme.c
> > > > > > +++ b/src/hw/nvme.c
> > > > > > @@ -727,6 +727,22 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
> > > > > >      u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
> > > > > >      u16 i;
> > > > > >
> > > > > > +    /* Split up requests that are larger than the device can handle */
> > > > > > +    if (op->count > ns->max_req_size) {
> > > > > > +        u16 count = op->count;
> > > > > > +
> > > > > > +        /* Handle the first max_req_size elements */
> > > > > > +        op->count = ns->max_req_size;
> > > > > > +        if (nvme_cmd_readwrite(ns, op, write))
> > > > > > +            return res;
> > > > > > +
> > > > > > +        /* Handle the remainder of the request */
> > > > > > +        op->count = count - ns->max_req_size;
> > > > > > +        op->lba += ns->max_req_size;
> > > > > > +        op->buf_fl += (ns->max_req_size * ns->block_size);
> > > > > > +        return nvme_cmd_readwrite(ns, op, write);
> > > > > > +    }
> > > > >
> > > > > Depending on the disk access, this code could run with a small stack.
> > > > > I would avoid recursion.
> > > >
> > > > This is tail recursion, which any reasonable compiler converts into
> > > > a jmp :). Take a look at the .o file - for me it did become a plain
> > > > jump.
> > > >
> > >
> > > Okay, that's fine then.
> >
> > It's still kind of icky. This used to be a purely iterative function.
> > Now for a large unaligned request (which nvme_build_prpl refuses to
> > handle) you get a mixture of (mostly tail) recursion and iteration.
> >
> > It'll recurse for each max_req_size, then iterate over each page within
> > that.
> >
> > I'd much rather see just a simple iterative loop. Something like this
> > (incremental, completely untested):
>
> FWIW, I agree that a version without recursion (even tail recursion)
> would be nice.  If someone wants to test and confirm that, then that
> would be great.
>
> -Kevin
> _______________________________________________
> SeaBIOS mailing list -- seabios@seabios.org
> To unsubscribe send an email to seabios-leave@seabios.org
_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH v3 4/4] nvme: Split requests by maximum allowed size

Posted by Matt DeVillier 2 years, 7 months ago

here's the NVMe init log, was able to get it off a Chromebox with CCD :)

/7aa26000\ Start thread
|7aa26000| Found NVMe controller with version 1.3.0.
|7aa26000|   Capabilities 000000303c033fff
|7aa26000|  q 0x7aa77bce q_idx 1 dbl 0x91201004
|7aa26000|  q 0x7aa77bbc q_idx 0 dbl 0x91201000
|7aa26000| sq 0x7aa77bbc q_idx 0 sqe 0x7aa53000
|7aa26000|   admin submission queue: 0x7aa53000
|7aa26000|   admin completion queue: 0x7aa54000
|7aa26000| sq 0x7aa77bbc next_sqe 0
|7aa26000| sq 0x7aa77bbc commit_sqe 0
|7aa26000| cq 0x7aa77bce head 0 -> 1
|7aa26000| sq 0x7aa77bbc advanced to 1
|7aa26000| NVMe has 1 namespace.
|7aa26000|  q 0x7aa77bf1 q_idx 3 dbl 0x9120100c
|7aa26000| sq 0x7aa77bbc next_sqe 1
|7aa26000| sq 0x7aa77bbc commit_sqe 1
|7aa26000| cq 0x7aa77bce head 1 -> 2
|7aa26000| sq 0x7aa77bbc advanced to 2
|7aa26000|  q 0x7aa77bdf q_idx 2 dbl 0x91201008
|7aa26000| sq 0x7aa77bdf q_idx 2 sqe 0x7aa4e000
|7aa26000| sq 0x7aa77bbc next_sqe 2
|7aa26000| sq 0x7aa77bdf create dword10 00ff0001 dword11 00010001
|7aa26000| sq 0x7aa77bbc commit_sqe 2
|7aa26000| cq 0x7aa77bce head 2 -> 3
|7aa26000| sq 0x7aa77bbc advanced to 3
|7aa26000| sq 0x7aa77bbc next_sqe 3
|7aa26000| sq 0x7aa77bbc commit_sqe 3
|7aa26000| cq 0x7aa77bce head 3 -> 4
|7aa26000| sq 0x7aa77bbc advanced to 4
|7aa26000| NVME NS 1 max request size: 4096 sectors
|7aa26000| NVMe NS 1: 476940 MiB (976773168 512-byte blocks + 0-byte metadata)
|7aa26000| Searching bootorder for: /pci@i0cf8/pci-bridge@1c,4/*@0
|7aa26000| Registering bootable: NVMe NS 1: 476940 MiB (976773168
512-byte blocks + 0-byte metadata) (type:2 prio:103 data:f59e0)
|7aa26000| NVMe initialization complete!
\7aa26000/ End thread

On Mon, Jan 17, 2022 at 7:24 PM Matt DeVillier <matt.devillier@gmail.com> wrote:
>
> Greetings! Was this patch set tested on bare metal hardware? I'm
> seeing "GRUB loading: Read Error" when attempting to boot from NVMe on
> various Purism Librem devices (Intel Skylake thru Cometlake hardware).
> Reverting this series resolves the issue. I'm not seeing anything in
> coreboot's cbmem console log (even with the debug level raised), which
> I suppose isn't unexpected given it's a grub error and not in SeaBIOS
> itself. NVMe is a run of the mill Samsung EVO (960 or 970) and SeaBIOS
> reports the max request size is 4096 sectors.
>
> cheers,
> Matt
>
>
>
>
> On Fri, Oct 30, 2020 at 6:09 PM Kevin O'Connor <kevin@koconnor.net> wrote:
> >
> > On Thu, Oct 29, 2020 at 05:21:04PM +0000, David Woodhouse wrote:
> > > On Wed, 2020-10-07 at 12:13 -0400, Kevin O'Connor wrote:
> > > > > > > --- a/src/hw/nvme.c
> > > > > > > +++ b/src/hw/nvme.c
> > > > > > > @@ -727,6 +727,22 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
> > > > > > >      u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
> > > > > > >      u16 i;
> > > > > > >
> > > > > > > +    /* Split up requests that are larger than the device can handle */
> > > > > > > +    if (op->count > ns->max_req_size) {
> > > > > > > +        u16 count = op->count;
> > > > > > > +
> > > > > > > +        /* Handle the first max_req_size elements */
> > > > > > > +        op->count = ns->max_req_size;
> > > > > > > +        if (nvme_cmd_readwrite(ns, op, write))
> > > > > > > +            return res;
> > > > > > > +
> > > > > > > +        /* Handle the remainder of the request */
> > > > > > > +        op->count = count - ns->max_req_size;
> > > > > > > +        op->lba += ns->max_req_size;
> > > > > > > +        op->buf_fl += (ns->max_req_size * ns->block_size);
> > > > > > > +        return nvme_cmd_readwrite(ns, op, write);
> > > > > > > +    }
> > > > > >
> > > > > > Depending on the disk access, this code could run with a small stack.
> > > > > > I would avoid recursion.
> > > > >
> > > > > This is tail recursion, which any reasonable compiler converts into
> > > > > a jmp :). Take a look at the .o file - for me it did become a plain
> > > > > jump.
> > > > >
> > > >
> > > > Okay, that's fine then.
> > >
> > > It's still kind of icky. This used to be a purely iterative function.
> > > Now for a large unaligned request (which nvme_build_prpl refuses to
> > > handle) you get a mixture of (mostly tail) recursion and iteration.
> > >
> > > It'll recurse for each max_req_size, then iterate over each page within
> > > that.
> > >
> > > I'd much rather see just a simple iterative loop. Something like this
> > > (incremental, completely untested):
> >
> > FWIW, I agree that a version without recursion (even tail recursion)
> > would be nice.  If someone wants to test and confirm that, then that
> > would be great.
> >
> > -Kevin
> > _______________________________________________
> > SeaBIOS mailing list -- seabios@seabios.org
> > To unsubscribe send an email to seabios-leave@seabios.org
_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH v3 4/4] nvme: Split requests by maximum allowed size

Posted by Matt DeVillier 2 years, 7 months ago

and attempting to boot from it:

Booting from Hard Disk...
sq 0x7aa77bdf next_sqe 0
sq 0x7aa77bdf commit_sqe 0
cq 0x7aa77bf1 head 0 -> 1
sq 0x7aa77bdf advanced to 1
ns 1 read lba 0+1: 0
Booting from 0000:7c00
sq 0x7aa77bdf next_sqe 1
sq 0x7aa77bdf commit_sqe 1
cq 0x7aa77bf1 head 1 -> 2
sq 0x7aa77bdf advanced to 2
ns 1 read lba 1+1: 0
sq 0x7aa77bdf next_sqe 2
sq 0x7aa77bdf commit_sqe 2
cq 0x7aa77bf1 head 2 -> 3
sq 0x7aa77bdf advanced to 3
read io: 00000000 00000000 00010003 40270002
ns 1 read lba 2+105: 12

On Tue, Jan 18, 2022 at 11:45 AM Matt DeVillier
<matt.devillier@gmail.com> wrote:
>
> here's the NVMe init log, was able to get it off a Chromebox with CCD :)
>
> /7aa26000\ Start thread
> |7aa26000| Found NVMe controller with version 1.3.0.
> |7aa26000|   Capabilities 000000303c033fff
> |7aa26000|  q 0x7aa77bce q_idx 1 dbl 0x91201004
> |7aa26000|  q 0x7aa77bbc q_idx 0 dbl 0x91201000
> |7aa26000| sq 0x7aa77bbc q_idx 0 sqe 0x7aa53000
> |7aa26000|   admin submission queue: 0x7aa53000
> |7aa26000|   admin completion queue: 0x7aa54000
> |7aa26000| sq 0x7aa77bbc next_sqe 0
> |7aa26000| sq 0x7aa77bbc commit_sqe 0
> |7aa26000| cq 0x7aa77bce head 0 -> 1
> |7aa26000| sq 0x7aa77bbc advanced to 1
> |7aa26000| NVMe has 1 namespace.
> |7aa26000|  q 0x7aa77bf1 q_idx 3 dbl 0x9120100c
> |7aa26000| sq 0x7aa77bbc next_sqe 1
> |7aa26000| sq 0x7aa77bbc commit_sqe 1
> |7aa26000| cq 0x7aa77bce head 1 -> 2
> |7aa26000| sq 0x7aa77bbc advanced to 2
> |7aa26000|  q 0x7aa77bdf q_idx 2 dbl 0x91201008
> |7aa26000| sq 0x7aa77bdf q_idx 2 sqe 0x7aa4e000
> |7aa26000| sq 0x7aa77bbc next_sqe 2
> |7aa26000| sq 0x7aa77bdf create dword10 00ff0001 dword11 00010001
> |7aa26000| sq 0x7aa77bbc commit_sqe 2
> |7aa26000| cq 0x7aa77bce head 2 -> 3
> |7aa26000| sq 0x7aa77bbc advanced to 3
> |7aa26000| sq 0x7aa77bbc next_sqe 3
> |7aa26000| sq 0x7aa77bbc commit_sqe 3
> |7aa26000| cq 0x7aa77bce head 3 -> 4
> |7aa26000| sq 0x7aa77bbc advanced to 4
> |7aa26000| NVME NS 1 max request size: 4096 sectors
> |7aa26000| NVMe NS 1: 476940 MiB (976773168 512-byte blocks + 0-byte metadata)
> |7aa26000| Searching bootorder for: /pci@i0cf8/pci-bridge@1c,4/*@0
> |7aa26000| Registering bootable: NVMe NS 1: 476940 MiB (976773168
> 512-byte blocks + 0-byte metadata) (type:2 prio:103 data:f59e0)
> |7aa26000| NVMe initialization complete!
> \7aa26000/ End thread
>
> On Mon, Jan 17, 2022 at 7:24 PM Matt DeVillier <matt.devillier@gmail.com> wrote:
> >
> > Greetings! Was this patch set tested on bare metal hardware? I'm
> > seeing "GRUB loading: Read Error" when attempting to boot from NVMe on
> > various Purism Librem devices (Intel Skylake thru Cometlake hardware).
> > Reverting this series resolves the issue. I'm not seeing anything in
> > coreboot's cbmem console log (even with the debug level raised), which
> > I suppose isn't unexpected given it's a grub error and not in SeaBIOS
> > itself. NVMe is a run of the mill Samsung EVO (960 or 970) and SeaBIOS
> > reports the max request size is 4096 sectors.
> >
> > cheers,
> > Matt
> >
> >
> >
> >
> > On Fri, Oct 30, 2020 at 6:09 PM Kevin O'Connor <kevin@koconnor.net> wrote:
> > >
> > > On Thu, Oct 29, 2020 at 05:21:04PM +0000, David Woodhouse wrote:
> > > > On Wed, 2020-10-07 at 12:13 -0400, Kevin O'Connor wrote:
> > > > > > > > --- a/src/hw/nvme.c
> > > > > > > > +++ b/src/hw/nvme.c
> > > > > > > > @@ -727,6 +727,22 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
> > > > > > > >      u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
> > > > > > > >      u16 i;
> > > > > > > >
> > > > > > > > +    /* Split up requests that are larger than the device can handle */
> > > > > > > > +    if (op->count > ns->max_req_size) {
> > > > > > > > +        u16 count = op->count;
> > > > > > > > +
> > > > > > > > +        /* Handle the first max_req_size elements */
> > > > > > > > +        op->count = ns->max_req_size;
> > > > > > > > +        if (nvme_cmd_readwrite(ns, op, write))
> > > > > > > > +            return res;
> > > > > > > > +
> > > > > > > > +        /* Handle the remainder of the request */
> > > > > > > > +        op->count = count - ns->max_req_size;
> > > > > > > > +        op->lba += ns->max_req_size;
> > > > > > > > +        op->buf_fl += (ns->max_req_size * ns->block_size);
> > > > > > > > +        return nvme_cmd_readwrite(ns, op, write);
> > > > > > > > +    }
> > > > > > >
> > > > > > > Depending on the disk access, this code could run with a small stack.
> > > > > > > I would avoid recursion.
> > > > > >
> > > > > > This is tail recursion, which any reasonable compiler converts into
> > > > > > a jmp :). Take a look at the .o file - for me it did become a plain
> > > > > > jump.
> > > > > >
> > > > >
> > > > > Okay, that's fine then.
> > > >
> > > > It's still kind of icky. This used to be a purely iterative function.
> > > > Now for a large unaligned request (which nvme_build_prpl refuses to
> > > > handle) you get a mixture of (mostly tail) recursion and iteration.
> > > >
> > > > It'll recurse for each max_req_size, then iterate over each page within
> > > > that.
> > > >
> > > > I'd much rather see just a simple iterative loop. Something like this
> > > > (incremental, completely untested):
> > >
> > > FWIW, I agree that a version without recursion (even tail recursion)
> > > would be nice.  If someone wants to test and confirm that, then that
> > > would be great.
> > >
> > > -Kevin
> > > _______________________________________________
> > > SeaBIOS mailing list -- seabios@seabios.org
> > > To unsubscribe send an email to seabios-leave@seabios.org
_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH v3 4/4] nvme: Split requests by maximum allowed size

Posted by Alexander Graf via SeaBIOS 2 years, 7 months ago

Hi Matt,

On 18.01.22 18:46, Matt DeVillier wrote:
> and attempting to boot from it:
>
> Booting from Hard Disk...
> sq 0x7aa77bdf next_sqe 0
> sq 0x7aa77bdf commit_sqe 0
> cq 0x7aa77bf1 head 0 -> 1
> sq 0x7aa77bdf advanced to 1
> ns 1 read lba 0+1: 0
> Booting from 0000:7c00
> sq 0x7aa77bdf next_sqe 1
> sq 0x7aa77bdf commit_sqe 1
> cq 0x7aa77bf1 head 1 -> 2
> sq 0x7aa77bdf advanced to 2
> ns 1 read lba 1+1: 0
> sq 0x7aa77bdf next_sqe 2
> sq 0x7aa77bdf commit_sqe 2
> cq 0x7aa77bf1 head 2 -> 3
> sq 0x7aa77bdf advanced to 3
> read io: 00000000 00000000 00010003 40270002

This error means

"PRP Offset Invalid: The Offset field for a PRP entry is invalid. This 
may occur when there is a PRP entry with a non-zero offset after the 
first entry or when the Offset field in any PRP entry is not dword 
aligned (i.e., bits 1:0 are not cleared to 00b)."

which points towards a misaligned PRP entry. The block size of this 
request is 105 sectors (52.5kb), so it must be a PRPL. I don't see any 
obvious code paths that would allow us to ever get into a misaligned 
request here.

That said, while trying to understand what is happening here I did 
stumble over a different weird effect: ns->prp1 gets overwritten with 0 
by some interrupt handler. Could you please try the patch / hack below 
and see if it fixes the problem for you? That way I at least know we're 
hunting the same ghost.

If it doesn't help, I'll send you a debug patch that will give us some 
more information about the PRP list SeaBIOS assembles.

Thanks,

Alex

---

diff --git a/src/hw/nvme.c b/src/hw/nvme.c
index f035fa2..99ad7a8 100644
--- a/src/hw/nvme.c
+++ b/src/hw/nvme.c
@@ -257,7 +257,7 @@ nvme_probe_ns(struct nvme_ctrl *ctrl, u32 ns_idx, u8 
mdts)
          goto free_buffer;
      }

-    struct nvme_namespace *ns = malloc_fseg(sizeof(*ns));
+    struct nvme_namespace *ns = malloc_low(sizeof(*ns));
      if (!ns) {
          warn_noalloc();
          goto free_buffer;

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] Re: [PATCH v3 4/4] nvme: Split requests by maximum allowed size

Posted by Matt DeVillier 2 years, 7 months ago

On Tue, Jan 18, 2022 at 4:06 PM Alexander Graf <graf@amazon.de> wrote:
>
> Hi Matt,

hi Alex,

> On 18.01.22 18:46, Matt DeVillier wrote:
> > and attempting to boot from it:
> >
> > Booting from Hard Disk...
> > sq 0x7aa77bdf next_sqe 0
> > sq 0x7aa77bdf commit_sqe 0
> > cq 0x7aa77bf1 head 0 -> 1
> > sq 0x7aa77bdf advanced to 1
> > ns 1 read lba 0+1: 0
> > Booting from 0000:7c00
> > sq 0x7aa77bdf next_sqe 1
> > sq 0x7aa77bdf commit_sqe 1
> > cq 0x7aa77bf1 head 1 -> 2
> > sq 0x7aa77bdf advanced to 2
> > ns 1 read lba 1+1: 0
> > sq 0x7aa77bdf next_sqe 2
> > sq 0x7aa77bdf commit_sqe 2
> > cq 0x7aa77bf1 head 2 -> 3
> > sq 0x7aa77bdf advanced to 3
> > read io: 00000000 00000000 00010003 40270002
>
>
> This error means
>
> "PRP Offset Invalid: The Offset field for a PRP entry is invalid. This
> may occur when there is a PRP entry with a non-zero offset after the
> first entry or when the Offset field in any PRP entry is not dword
> aligned (i.e., bits 1:0 are not cleared to 00b)."
>
>
> which points towards a misaligned PRP entry. The block size of this
> request is 105 sectors (52.5kb), so it must be a PRPL. I don't see any
> obvious code paths that would allow us to ever get into a misaligned
> request here.
>
> That said, while trying to understand what is happening here I did
> stumble over a different weird effect: ns->prp1 gets overwritten with 0
> by some interrupt handler. Could you please try the patch / hack below
> and see if it fixes the problem for you? That way I at least know we're
> hunting the same ghost.

The hack below does indeed work. Not sure if that's good or bad :)

>
> If it doesn't help, I'll send you a debug patch that will give us some
> more information about the PRP list SeaBIOS assembles.
>
>
> Thanks,
>
> Alex
>
> ---
>
>
> diff --git a/src/hw/nvme.c b/src/hw/nvme.c
> index f035fa2..99ad7a8 100644
> --- a/src/hw/nvme.c
> +++ b/src/hw/nvme.c
> @@ -257,7 +257,7 @@ nvme_probe_ns(struct nvme_ctrl *ctrl, u32 ns_idx, u8
> mdts)
>           goto free_buffer;
>       }
>
> -    struct nvme_namespace *ns = malloc_fseg(sizeof(*ns));
> +    struct nvme_namespace *ns = malloc_low(sizeof(*ns));
>       if (!ns) {
>           warn_noalloc();
>           goto free_buffer;
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org

[SeaBIOS] [PATCH] nvme: Clean up nvme_cmd_readwrite()

Posted by David Woodhouse 3 years, 10 months ago

From: David woodhouse <dwmw@amazon.co.uk>

This ended up with an odd mix of recursion (albeit *mostly* tail-recursion)
and interation that could have been prettier. Make it prettier by making
nvme_build_prpl() return the number of blocks it consumes, and just
looping.

If nvme_build_prpl() doesn't want to play, just fall through to the
original loop.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
On Fri, 2020-10-30 at 19:09 -0400, Kevin O'Connor wrote:
> FWIW, I agree that a version without recursion (even tail recursion)
> would be nice.  If someone wants to test and confirm that, then that
> would be great.

Seems to work here... I just tidied it up a little more and added a
dprintf() to match the original loop. Didn't see it actually limiting
the requests in my testing so I made nvme_build_prpl() limit to 32
blocks and then it did... and still seems to put things in the right
place.

 src/hw/nvme.c | 47 +++++++++++++++++++++--------------------------
 1 file changed, 21 insertions(+), 26 deletions(-)

diff --git a/src/hw/nvme.c b/src/hw/nvme.c
index cc37bca..5363d3f 100644
--- a/src/hw/nvme.c
+++ b/src/hw/nvme.c
@@ -472,19 +472,20 @@ static int nvme_add_prpl(struct nvme_namespace *ns, u64 base)
 
 int nvme_build_prpl(struct nvme_namespace *ns, struct disk_op_s *op)
 {
-    int first_page = 1;
+    int first_page = 1, count = op->count;
     u32 base = (long)op->buf_fl;
-    s32 size = op->count * ns->block_size;
+    s32 size;
 
-    if (op->count > ns->max_req_size)
-        return -1;
+    if (count > ns->max_req_size)
+        count = ns->max_req_size;
 
     nvme_reset_prpl(ns);
 
+    size = count * ns->block_size;
     /* Special case for transfers that fit into PRP1, but are unaligned */
     if (((size + (base & ~NVME_PAGE_MASK)) <= NVME_PAGE_SIZE)) {
         ns->prp1 = op->buf_fl;
-        return 0;
+        return count;
     }
 
     /* Every request has to be page aligned */
@@ -506,7 +507,7 @@ int nvme_build_prpl(struct nvme_namespace *ns, struct disk_op_s *op)
             return -1;
     }
 
-    return 0;
+    return count;
 }
 
 static int
@@ -725,34 +726,28 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
 {
     int res = DISK_RET_SUCCESS;
     u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
-    u16 i;
+    u16 i, blocks;
 
-    /* Split up requests that are larger than the device can handle */
-    if (op->count > ns->max_req_size) {
-        u16 count = op->count;
+    while (op->count && ((blocks = nvme_build_prpl(ns, op)) > 0)) {
+        res = nvme_io_readwrite(ns, op->lba, ns->prp1, blocks, write);
+        dprintf(5, "ns %u %s lba %llu+%u: %d\n", ns->ns_id, write ? "write"
+                                                                  : "read",
+                op->lba, blocks, res);
 
-        /* Handle the first max_req_size elements */
-        op->count = ns->max_req_size;
-        if (nvme_cmd_readwrite(ns, op, write))
+        if (res != DISK_RET_SUCCESS)
             return res;
 
-        /* Handle the remainder of the request */
-        op->count = count - ns->max_req_size;
-        op->lba += ns->max_req_size;
-        op->buf_fl += (ns->max_req_size * ns->block_size);
-        return nvme_cmd_readwrite(ns, op, write);
-    }
-
-    if (!nvme_build_prpl(ns, op)) {
-        /* Request goes via PRP List logic */
-        return nvme_io_readwrite(ns, op->lba, ns->prp1, op->count, write);
+        op->count -= blocks;
+        op->lba += blocks;
+        op->buf_fl += (blocks * ns->block_size);
     }
 
     for (i = 0; i < op->count && res == DISK_RET_SUCCESS;) {
-        u16 blocks_remaining = op->count - i;
-        u16 blocks = blocks_remaining < max_blocks ? blocks_remaining
-                                                   : max_blocks;
         char *op_buf = op->buf_fl + i * ns->block_size;
+        u16 blocks_remaining = op->count - i;
+
+        blocks = blocks_remaining < max_blocks ? blocks_remaining
+                                               : max_blocks;
 
         if (write) {
             memcpy(ns->dma_buffer, op_buf, blocks * ns->block_size);
-- 
2.17.1



_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-leave@seabios.org