[PATCH] hw/nvme: add new command abort case

Dmitry Tikhov posted 1 patch 2 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220420082044.n6orslk2aukj2jai@localhost.localdomain
Maintainers: Keith Busch <kbusch@kernel.org>, Klaus Jensen <its@irrelevant.dk>
hw/nvme/dif.c | 5 +++++
1 file changed, 5 insertions(+)
[PATCH] hw/nvme: add new command abort case
Posted by Dmitry Tikhov 2 years ago
NVMe command set specification for end-to-end data protection formatted
namespace states:

    o If the Reference Tag Check bit of the PRCHK field is set to ‘1’ and
      the namespace is formatted for Type 3 protection, then the
      controller:
          ▪ should not compare the protection Information Reference Tag
            field to the computed reference tag; and
          ▪ may ignore the ILBRT and EILBRT fields. If a command is
            aborted as a result of the Reference Tag Check bit of the
            PRCHK field being set to ‘1’, then that command should be
            aborted with a status code of Invalid Protection Information,
            but may be aborted with a status code of Invalid Field in
            Command.

Currently qemu compares reftag in the nvme_dif_prchk function whenever
Reference Tag Check bit is set in the command. For type 3 namespaces
however, caller of nvme_dif_prchk - nvme_dif_check does not increment
reftag for each subsequent logical block. That way commands incorporating
more than one logical block for type 3 formatted namespaces with reftag
check bit set, always fail with End-to-end Reference Tag Check Error.
Comply with spec by handling case of set Reference Tag Check
bit in the type 3 formatted namespace.

Signed-off-by: Dmitry Tikhov <d.tihov@yadro.com>
---
 hw/nvme/dif.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/nvme/dif.c b/hw/nvme/dif.c
index 62d885f83e..63c44c86ab 100644
--- a/hw/nvme/dif.c
+++ b/hw/nvme/dif.c
@@ -26,6 +26,11 @@ uint16_t nvme_check_prinfo(NvmeNamespace *ns, uint8_t prinfo, uint64_t slba,
         return NVME_INVALID_PROT_INFO | NVME_DNR;
     }
 
+    if ((NVME_ID_NS_DPS_TYPE(ns->id_ns.dps) == NVME_ID_NS_DPS_TYPE_3) &&
+        (prinfo & NVME_PRINFO_PRCHK_REF)) {
+        return NVME_INVALID_PROT_INFO;
+    }
+
     return NVME_SUCCESS;
 }
 
-- 
2.35.1


Re: [PATCH] hw/nvme: add new command abort case
Posted by Klaus Jensen 2 years ago
On Apr 20 11:20, Dmitry Tikhov wrote:
> NVMe command set specification for end-to-end data protection formatted
> namespace states:
> 
>     o If the Reference Tag Check bit of the PRCHK field is set to ‘1’ and
>       the namespace is formatted for Type 3 protection, then the
>       controller:
>           ▪ should not compare the protection Information Reference Tag
>             field to the computed reference tag; and
>           ▪ may ignore the ILBRT and EILBRT fields. If a command is
>             aborted as a result of the Reference Tag Check bit of the
>             PRCHK field being set to ‘1’, then that command should be
>             aborted with a status code of Invalid Protection Information,
>             but may be aborted with a status code of Invalid Field in
>             Command.
> 
> Currently qemu compares reftag in the nvme_dif_prchk function whenever
> Reference Tag Check bit is set in the command. For type 3 namespaces
> however, caller of nvme_dif_prchk - nvme_dif_check does not increment
> reftag for each subsequent logical block. That way commands incorporating
> more than one logical block for type 3 formatted namespaces with reftag
> check bit set, always fail with End-to-end Reference Tag Check Error.
> Comply with spec by handling case of set Reference Tag Check
> bit in the type 3 formatted namespace.
> 

Note the "should" and "may" in your quote. What QEMU does right now is
compliant with v1.4. That is, the reftag must NOT be incremented
- it is the same for the first and all subsequent logical blocks.

I'm a bit hesitant to follow v2.0 here, since we do not report v2.0
compliance yet. I'm honestly also a bit perplexed as to how the NVMe TWG
ended up considering this backwards compatible. As far as I can tell
this breaks current hosts that do set the reference tag check bit,
provides a valid ILBRT/EILBRT and expects it to succeed.
Re: [PATCH] hw/nvme: add new command abort case
Posted by Klaus Jensen 2 years ago
On Apr 20 12:13, Klaus Jensen wrote:
> On Apr 20 11:20, Dmitry Tikhov wrote:
> > NVMe command set specification for end-to-end data protection formatted
> > namespace states:
> > 
> >     o If the Reference Tag Check bit of the PRCHK field is set to ‘1’ and
> >       the namespace is formatted for Type 3 protection, then the
> >       controller:
> >           ▪ should not compare the protection Information Reference Tag
> >             field to the computed reference tag; and
> >           ▪ may ignore the ILBRT and EILBRT fields. If a command is
> >             aborted as a result of the Reference Tag Check bit of the
> >             PRCHK field being set to ‘1’, then that command should be
> >             aborted with a status code of Invalid Protection Information,
> >             but may be aborted with a status code of Invalid Field in
> >             Command.
> > 
> > Currently qemu compares reftag in the nvme_dif_prchk function whenever
> > Reference Tag Check bit is set in the command. For type 3 namespaces
> > however, caller of nvme_dif_prchk - nvme_dif_check does not increment
> > reftag for each subsequent logical block. That way commands incorporating
> > more than one logical block for type 3 formatted namespaces with reftag
> > check bit set, always fail with End-to-end Reference Tag Check Error.
> > Comply with spec by handling case of set Reference Tag Check
> > bit in the type 3 formatted namespace.
> > 
> 
> Note the "should" and "may" in your quote. What QEMU does right now is
> compliant with v1.4. That is, the reftag must NOT be incremented
> - it is the same for the first and all subsequent logical blocks.
> 
> I'm a bit hesitant to follow v2.0 here, since we do not report v2.0
> compliance yet. I'm honestly also a bit perplexed as to how the NVMe TWG
> ended up considering this backwards compatible. As far as I can tell
> this breaks current hosts that do set the reference tag check bit,
> provides a valid ILBRT/EILBRT and expects it to succeed.

Ok, so reading the spec more closely...

The Invalid Protection Information should not be set just because the
reference tag check bit is set. It should be set *if* the controller
chooses to check it and it fails. However, in v2.0, the controller is
allowed to ignore it and not perform the check.

So, just because the host sets the bit, that does not mean we fail the
command. However, a difference is that a v2.0 compliant controller
should return Invalid Protection Information or Invalid Field in Command
instead of End-to-end Reference Tag Check Error if the check fails.
Re: [PATCH] hw/nvme: add new command abort case
Posted by Dmitry Tikhov 2 years ago
On Wed, Apr 20, 2022 at 12:36:54, Klaus Jensen wrote:
> On Apr 20 12:13, Klaus Jensen wrote:
> > On Apr 20 11:20, Dmitry Tikhov wrote:
> > > NVMe command set specification for end-to-end data protection formatted
> > > namespace states:
> > > 
> > >     o If the Reference Tag Check bit of the PRCHK field is set to ‘1’ and
> > >       the namespace is formatted for Type 3 protection, then the
> > >       controller:
> > >           ▪ should not compare the protection Information Reference Tag
> > >             field to the computed reference tag; and
> > >           ▪ may ignore the ILBRT and EILBRT fields. If a command is
> > >             aborted as a result of the Reference Tag Check bit of the
> > >             PRCHK field being set to ‘1’, then that command should be
> > >             aborted with a status code of Invalid Protection Information,
> > >             but may be aborted with a status code of Invalid Field in
> > >             Command.
> > > 
> > > Currently qemu compares reftag in the nvme_dif_prchk function whenever
> > > Reference Tag Check bit is set in the command. For type 3 namespaces
> > > however, caller of nvme_dif_prchk - nvme_dif_check does not increment
> > > reftag for each subsequent logical block. That way commands incorporating
> > > more than one logical block for type 3 formatted namespaces with reftag
> > > check bit set, always fail with End-to-end Reference Tag Check Error.
> > > Comply with spec by handling case of set Reference Tag Check
> > > bit in the type 3 formatted namespace.
> > > 
> > 
> > Note the "should" and "may" in your quote. What QEMU does right now is
> > compliant with v1.4. That is, the reftag must NOT be incremented
> > - it is the same for the first and all subsequent logical blocks.
> > 
> > I'm a bit hesitant to follow v2.0 here, since we do not report v2.0
> > compliance yet. I'm honestly also a bit perplexed as to how the NVMe TWG
> > ended up considering this backwards compatible. As far as I can tell
> > this breaks current hosts that do set the reference tag check bit,
> > provides a valid ILBRT/EILBRT and expects it to succeed.
> 
> Ok, so reading the spec more closely...
> 
> The Invalid Protection Information should not be set just because the
> reference tag check bit is set. It should be set *if* the controller
> chooses to check it and it fails. However, in v2.0, the controller is
> allowed to ignore it and not perform the check.
> 
> So, just because the host sets the bit, that does not mean we fail the
> command. However, a difference is that a v2.0 compliant controller
> should return Invalid Protection Information or Invalid Field in Command
> instead of End-to-end Reference Tag Check Error if the check fails.

Can you please link the spec you are referring to?

Re: [PATCH] hw/nvme: add new command abort case
Posted by Klaus Jensen 2 years ago
On Apr 20 13:41, Dmitry Tikhov wrote:
> On Wed, Apr 20, 2022 at 12:36:54, Klaus Jensen wrote:
> > On Apr 20 12:13, Klaus Jensen wrote:
> > > On Apr 20 11:20, Dmitry Tikhov wrote:
> > > > NVMe command set specification for end-to-end data protection formatted
> > > > namespace states:
> > > > 
> > > >     o If the Reference Tag Check bit of the PRCHK field is set to ‘1’ and
> > > >       the namespace is formatted for Type 3 protection, then the
> > > >       controller:
> > > >           ▪ should not compare the protection Information Reference Tag
> > > >             field to the computed reference tag; and
> > > >           ▪ may ignore the ILBRT and EILBRT fields. If a command is
> > > >             aborted as a result of the Reference Tag Check bit of the
> > > >             PRCHK field being set to ‘1’, then that command should be
> > > >             aborted with a status code of Invalid Protection Information,
> > > >             but may be aborted with a status code of Invalid Field in
> > > >             Command.
> > > > 
> > > > Currently qemu compares reftag in the nvme_dif_prchk function whenever
> > > > Reference Tag Check bit is set in the command. For type 3 namespaces
> > > > however, caller of nvme_dif_prchk - nvme_dif_check does not increment
> > > > reftag for each subsequent logical block. That way commands incorporating
> > > > more than one logical block for type 3 formatted namespaces with reftag
> > > > check bit set, always fail with End-to-end Reference Tag Check Error.
> > > > Comply with spec by handling case of set Reference Tag Check
> > > > bit in the type 3 formatted namespace.
> > > > 
> > > 
> > > Note the "should" and "may" in your quote. What QEMU does right now is
> > > compliant with v1.4. That is, the reftag must NOT be incremented
> > > - it is the same for the first and all subsequent logical blocks.
> > > 
> > > I'm a bit hesitant to follow v2.0 here, since we do not report v2.0
> > > compliance yet. I'm honestly also a bit perplexed as to how the NVMe TWG
> > > ended up considering this backwards compatible. As far as I can tell
> > > this breaks current hosts that do set the reference tag check bit,
> > > provides a valid ILBRT/EILBRT and expects it to succeed.
> > 
> > Ok, so reading the spec more closely...
> > 
> > The Invalid Protection Information should not be set just because the
> > reference tag check bit is set. It should be set *if* the controller
> > chooses to check it and it fails. However, in v2.0, the controller is
> > allowed to ignore it and not perform the check.
> > 
> > So, just because the host sets the bit, that does not mean we fail the
> > command. However, a difference is that a v2.0 compliant controller
> > should return Invalid Protection Information or Invalid Field in Command
> > instead of End-to-end Reference Tag Check Error if the check fails.
> 
> Can you please link the spec you are referring to?

NVM Command Set Specification v1.0b, Section 5.2.3. It is exactly what
you quoted above.

I think you are interpreting

  "If a command is aborted as a result of the Reference Tag Check bit of
  the PRCHK field being set to '1', ..."

as

   "If a command is aborted *because* the Reference Tag Check bit of the
   PRCHK field being set to '1', ...".

But that is not what it is saying. IMO, the only meaningful
interpretation is that "If the command is aborted *as a result of* the
check being done *because* the bit is set, *then* return an error".

Your interpretation would break existing hosts that set the bit.
Re: [PATCH] hw/nvme: add new command abort case
Posted by Dmitry Tikhov 2 years ago
On Wed, Apr 20, 2022 at 12:54:44, Klaus Jensen wrote:
> 
> NVM Command Set Specification v1.0b, Section 5.2.3. It is exactly what
> you quoted above.
> 
> I think you are interpreting
> 
>   "If a command is aborted as a result of the Reference Tag Check bit of
>   the PRCHK field being set to '1', ..."
> 
> as
> 
>    "If a command is aborted *because* the Reference Tag Check bit of the
>    PRCHK field being set to '1', ...".
Yeah, i was interpreting it exactly this way.

> 
> But that is not what it is saying. IMO, the only meaningful
> interpretation is that "If the command is aborted *as a result of* the
> check being done *because* the bit is set, *then* return an error".
Ok, but return error in this context still means to return either
Invalid Protection Information or Invalid Field in Command, isn't it?
Or why would it specify
    ...then that command should be aborted with a status code of Invalid
	Protection Information, but may be aborted with a status code of
	Invalid Field in Command
exactly this 2 status codes?

> 
> Your interpretation would break existing hosts that set the bit.

I also opened NVM Express 1.4 "8.3.1.5 Control of Protection Information
Checking - PRCHK" and it says
    For Type 3 protection, if bit 0 of the PRCHK field is set to ‘1’, then
	the command should be aborted with status Invalid Protection
	Information, but may be aborted with status Invalid Field in Command.
	The controller may ignore the ILBRT and EILBRT fields when Type 3
	protection is used because the computed reference tag remains
	unchanged.
I think it marks clear intent to abort cmd with "Invalid Protection
Information" or "Invalid Field in Command" status codes exactly in case
reftag check bit is set. Also isn't "may ignore the ILBRT and EILBRT 
fields" means not to compare reftag with ILBRT/EILBRT? If it is not 
compared then reftag check error can't be returned.

But anyways, spec says that "should" and "may" indicates flexibility of
choice and not mandatory behavior. So if you think that current behavior
is right i don't insist.

Re: [PATCH] hw/nvme: add new command abort case
Posted by Klaus Jensen 2 years ago
On Apr 20 15:31, Dmitry Tikhov wrote:
> On Wed, Apr 20, 2022 at 12:54:44, Klaus Jensen wrote:
> > 
> > NVM Command Set Specification v1.0b, Section 5.2.3. It is exactly what
> > you quoted above.
> > 
> > I think you are interpreting
> > 
> >   "If a command is aborted as a result of the Reference Tag Check bit of
> >   the PRCHK field being set to '1', ..."
> > 
> > as
> > 
> >    "If a command is aborted *because* the Reference Tag Check bit of the
> >    PRCHK field being set to '1', ...".
> Yeah, i was interpreting it exactly this way.
> 
> > 
> > But that is not what it is saying. IMO, the only meaningful
> > interpretation is that "If the command is aborted *as a result of* the
> > check being done *because* the bit is set, *then* return an error".
> Ok, but return error in this context still means to return either
> Invalid Protection Information or Invalid Field in Command, isn't it?
> Or why would it specify
>     ...then that command should be aborted with a status code of Invalid
> 	Protection Information, but may be aborted with a status code of
> 	Invalid Field in Command
> exactly this 2 status codes?
> 
> > 
> > Your interpretation would break existing hosts that set the bit.
> 
> I also opened NVM Express 1.4 "8.3.1.5 Control of Protection Information
> Checking - PRCHK" and it says
>     For Type 3 protection, if bit 0 of the PRCHK field is set to ‘1’, then
> 	the command should be aborted with status Invalid Protection
> 	Information, but may be aborted with status Invalid Field in Command.
> 	The controller may ignore the ILBRT and EILBRT fields when Type 3
> 	protection is used because the computed reference tag remains
> 	unchanged.
> I think it marks clear intent to abort cmd with "Invalid Protection
> Information" or "Invalid Field in Command" status codes exactly in case
> reftag check bit is set. Also isn't "may ignore the ILBRT and EILBRT 
> fields" means not to compare reftag with ILBRT/EILBRT? If it is not 
> compared then reftag check error can't be returned.

What the heck. This is a pretty major difference between v1.4 and v1.4b.
v1.4b does not include that wording (but it *is* present in v1.3d). You
are absolutely right that this conveys the intent to abort the command.
Looks like this was lost in the changes in that section between v1.4 and
v1.4b. This explains the wording in v2.0 - the spec people realized they
screwed up and now they have to accept both behaviors.

> 
> But anyways, spec says that "should" and "may" indicates flexibility of
> choice and not mandatory behavior. So if you think that current behavior
> is right i don't insist.

I'm not so sure now. Another question for the spec people... I'll get
back to you.
Re: [PATCH] hw/nvme: add new command abort case
Posted by Klaus Jensen 1 year, 10 months ago
On Apr 20 14:48, Klaus Jensen wrote:
> On Apr 20 15:31, Dmitry Tikhov wrote:
> > On Wed, Apr 20, 2022 at 12:54:44, Klaus Jensen wrote:
> > > 
> > > NVM Command Set Specification v1.0b, Section 5.2.3. It is exactly what
> > > you quoted above.
> > > 
> > > I think you are interpreting
> > > 
> > >   "If a command is aborted as a result of the Reference Tag Check bit of
> > >   the PRCHK field being set to '1', ..."
> > > 
> > > as
> > > 
> > >    "If a command is aborted *because* the Reference Tag Check bit of the
> > >    PRCHK field being set to '1', ...".
> > Yeah, i was interpreting it exactly this way.
> > 
> > > 
> > > But that is not what it is saying. IMO, the only meaningful
> > > interpretation is that "If the command is aborted *as a result of* the
> > > check being done *because* the bit is set, *then* return an error".
> > Ok, but return error in this context still means to return either
> > Invalid Protection Information or Invalid Field in Command, isn't it?
> > Or why would it specify
> >     ...then that command should be aborted with a status code of Invalid
> > 	Protection Information, but may be aborted with a status code of
> > 	Invalid Field in Command
> > exactly this 2 status codes?
> > 
> > > 
> > > Your interpretation would break existing hosts that set the bit.
> > 
> > I also opened NVM Express 1.4 "8.3.1.5 Control of Protection Information
> > Checking - PRCHK" and it says
> >     For Type 3 protection, if bit 0 of the PRCHK field is set to ‘1’, then
> > 	the command should be aborted with status Invalid Protection
> > 	Information, but may be aborted with status Invalid Field in Command.
> > 	The controller may ignore the ILBRT and EILBRT fields when Type 3
> > 	protection is used because the computed reference tag remains
> > 	unchanged.
> > I think it marks clear intent to abort cmd with "Invalid Protection
> > Information" or "Invalid Field in Command" status codes exactly in case
> > reftag check bit is set. Also isn't "may ignore the ILBRT and EILBRT 
> > fields" means not to compare reftag with ILBRT/EILBRT? If it is not 
> > compared then reftag check error can't be returned.
> 
> What the heck. This is a pretty major difference between v1.4 and v1.4b.
> v1.4b does not include that wording (but it *is* present in v1.3d). You
> are absolutely right that this conveys the intent to abort the command.
> Looks like this was lost in the changes in that section between v1.4 and
> v1.4b. This explains the wording in v2.0 - the spec people realized they
> screwed up and now they have to accept both behaviors.
> 
> > 
> > But anyways, spec says that "should" and "may" indicates flexibility of
> > choice and not mandatory behavior. So if you think that current behavior
> > is right i don't insist.
> 
> I'm not so sure now. Another question for the spec people... I'll get
> back to you.

I got a long an exhaustive description of this issue from the spec
people, and it all boils down to, well, a mistake basically.

The bottom line is that both behaviors *are* acceptable as of now, but
this may change. Not sure how ;) However, I think this might be brough
up with the NVMe TWG, and I'll make sure to follow that discussion.

For now, I think we leave the behavior of *this* device as-is. It's not
that I think anyone really relies on this behavior, but better not to
break it as long as we report v1.4.
Re: [PATCH] hw/nvme: add new command abort case
Posted by Klaus Jensen 1 year, 10 months ago
On May 31 13:13, Klaus Jensen wrote:
> On Apr 20 14:48, Klaus Jensen wrote:
> > On Apr 20 15:31, Dmitry Tikhov wrote:
> > > On Wed, Apr 20, 2022 at 12:54:44, Klaus Jensen wrote:
> > > > 
> > > > NVM Command Set Specification v1.0b, Section 5.2.3. It is exactly what
> > > > you quoted above.
> > > > 
> > > > I think you are interpreting
> > > > 
> > > >   "If a command is aborted as a result of the Reference Tag Check bit of
> > > >   the PRCHK field being set to '1', ..."
> > > > 
> > > > as
> > > > 
> > > >    "If a command is aborted *because* the Reference Tag Check bit of the
> > > >    PRCHK field being set to '1', ...".
> > > Yeah, i was interpreting it exactly this way.
> > > 
> > > > 
> > > > But that is not what it is saying. IMO, the only meaningful
> > > > interpretation is that "If the command is aborted *as a result of* the
> > > > check being done *because* the bit is set, *then* return an error".
> > > Ok, but return error in this context still means to return either
> > > Invalid Protection Information or Invalid Field in Command, isn't it?
> > > Or why would it specify
> > >     ...then that command should be aborted with a status code of Invalid
> > > 	Protection Information, but may be aborted with a status code of
> > > 	Invalid Field in Command
> > > exactly this 2 status codes?
> > > 
> > > > 
> > > > Your interpretation would break existing hosts that set the bit.
> > > 
> > > I also opened NVM Express 1.4 "8.3.1.5 Control of Protection Information
> > > Checking - PRCHK" and it says
> > >     For Type 3 protection, if bit 0 of the PRCHK field is set to ‘1’, then
> > > 	the command should be aborted with status Invalid Protection
> > > 	Information, but may be aborted with status Invalid Field in Command.
> > > 	The controller may ignore the ILBRT and EILBRT fields when Type 3
> > > 	protection is used because the computed reference tag remains
> > > 	unchanged.
> > > I think it marks clear intent to abort cmd with "Invalid Protection
> > > Information" or "Invalid Field in Command" status codes exactly in case
> > > reftag check bit is set. Also isn't "may ignore the ILBRT and EILBRT 
> > > fields" means not to compare reftag with ILBRT/EILBRT? If it is not 
> > > compared then reftag check error can't be returned.
> > 
> > What the heck. This is a pretty major difference between v1.4 and v1.4b.
> > v1.4b does not include that wording (but it *is* present in v1.3d). You
> > are absolutely right that this conveys the intent to abort the command.
> > Looks like this was lost in the changes in that section between v1.4 and
> > v1.4b. This explains the wording in v2.0 - the spec people realized they
> > screwed up and now they have to accept both behaviors.
> > 
> > > 
> > > But anyways, spec says that "should" and "may" indicates flexibility of
> > > choice and not mandatory behavior. So if you think that current behavior
> > > is right i don't insist.
> > 
> > I'm not so sure now. Another question for the spec people... I'll get
> > back to you.
> 
> I got a long an exhaustive description of this issue from the spec
> people, and it all boils down to, well, a mistake basically.
> 
> The bottom line is that both behaviors *are* acceptable as of now, but
> this may change. Not sure how ;) However, I think this might be brough
> up with the NVMe TWG, and I'll make sure to follow that discussion.
> 
> For now, I think we leave the behavior of *this* device as-is. It's not
> that I think anyone really relies on this behavior, but better not to
> break it as long as we report v1.4.

Sigh. I just got a correction on that email and the intention *is* to
remove this. So, I'll queue this up so QEMU can be a front-runner for
compliance ;)

Sorry for the noise.

Thanks, applied to nvme-next!