[PATCH v2] hw/ide/ahci: fix legacy software reset

Niklas Cassel posted 1 patch 7 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20231005100407.1136484-1-nks@flawful.org
Maintainers: John Snow <jsnow@redhat.com>
There is a newer version of this series
hw/ide/ahci.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
[PATCH v2] hw/ide/ahci: fix legacy software reset
Posted by Niklas Cassel 7 months, 1 week ago
From: Niklas Cassel <niklas.cassel@wdc.com>

Legacy software contains a standard mechanism for generating a reset to a
Serial ATA device - setting the SRST (software reset) bit in the Device
Control register.

Serial ATA has a more robust mechanism called COMRESET, also referred to
as port reset. A port reset is the preferred mechanism for error
recovery and should be used in place of software reset.

Commit e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
improved the handling of PxCI, such that PxCI gets cleared after handling
a non-NCQ, or NCQ command (instead of incorrectly clearing PxCI after
receiving an arbitrary FIS).

However, simply clearing PxCI after a non-NCQ, or NCQ command, is not
enough, we also need to clear PxCI when receiving a SRST in the Device
Control register.

This fixes an issue for FreeBSD where the device would fail to reset.
The problem was not noticed in Linux, because Linux uses a COMRESET
instead of a legacy software reset by default.

Fixes: e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
Reported-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
Changes since v1: write the D2H FIS before clearing PxCI.

 hw/ide/ahci.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index babdd7b458..7269dabbdb 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1254,10 +1254,26 @@ static void handle_reg_h2d_fis(AHCIState *s, int port,
         case STATE_RUN:
             if (cmd_fis[15] & ATA_SRST) {
                 s->dev[port].port_state = STATE_RESET;
+                /*
+                 * When setting SRST in the first H2D FIS in the reset sequence,
+                 * the device does not send a D2H FIS. Host software thus has to
+                 * set the "Clear Busy upon R_OK" bit such that PxCI (and BUSY)
+                 * gets cleared. See AHCI 1.3.1, section 10.4.1 Software Reset.
+                 */
+                if (opts & AHCI_CMD_CLR_BUSY) {
+                    ahci_clear_cmd_issue(ad, slot);
+                }
             }
             break;
         case STATE_RESET:
             if (!(cmd_fis[15] & ATA_SRST)) {
+                /*
+                 * When clearing SRST in the second H2D FIS in the reset
+                 * sequence, the device will send a D2H FIS. See SATA 3.5a Gold,
+                 * section 11.4 Software reset protocol.
+                 */
+                ahci_clear_cmd_issue(ad, slot);
+                ahci_write_fis_d2h(ad, false);
                 ahci_reset_port(s, port);
             }
             break;
-- 
2.41.0
Re: [PATCH v2] hw/ide/ahci: fix legacy software reset
Posted by Kevin Wolf 6 months ago
Am 05.10.2023 um 12:04 hat Niklas Cassel geschrieben:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> Legacy software contains a standard mechanism for generating a reset to a
> Serial ATA device - setting the SRST (software reset) bit in the Device
> Control register.
> 
> Serial ATA has a more robust mechanism called COMRESET, also referred to
> as port reset. A port reset is the preferred mechanism for error
> recovery and should be used in place of software reset.
> 
> Commit e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> improved the handling of PxCI, such that PxCI gets cleared after handling
> a non-NCQ, or NCQ command (instead of incorrectly clearing PxCI after
> receiving an arbitrary FIS).
> 
> However, simply clearing PxCI after a non-NCQ, or NCQ command, is not
> enough, we also need to clear PxCI when receiving a SRST in the Device
> Control register.
> 
> This fixes an issue for FreeBSD where the device would fail to reset.
> The problem was not noticed in Linux, because Linux uses a COMRESET
> instead of a legacy software reset by default.
> 
> Fixes: e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> Reported-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
> Changes since v1: write the D2H FIS before clearing PxCI.
> 
>  hw/ide/ahci.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index babdd7b458..7269dabbdb 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -1254,10 +1254,26 @@ static void handle_reg_h2d_fis(AHCIState *s, int port,
>          case STATE_RUN:
>              if (cmd_fis[15] & ATA_SRST) {
>                  s->dev[port].port_state = STATE_RESET;
> +                /*
> +                 * When setting SRST in the first H2D FIS in the reset sequence,
> +                 * the device does not send a D2H FIS. Host software thus has to
> +                 * set the "Clear Busy upon R_OK" bit such that PxCI (and BUSY)
> +                 * gets cleared. See AHCI 1.3.1, section 10.4.1 Software Reset.
> +                 */
> +                if (opts & AHCI_CMD_CLR_BUSY) {
> +                    ahci_clear_cmd_issue(ad, slot);
> +                }

I suspect that AHCI_CMD_CLR_BUSY really should be checked in a more
generic place, but this will do for fixing software reset.

>              }
>              break;
>          case STATE_RESET:
>              if (!(cmd_fis[15] & ATA_SRST)) {
> +                /*
> +                 * When clearing SRST in the second H2D FIS in the reset
> +                 * sequence, the device will send a D2H FIS. See SATA 3.5a Gold,
> +                 * section 11.4 Software reset protocol.
> +                 */
> +                ahci_clear_cmd_issue(ad, slot);
> +                ahci_write_fis_d2h(ad, false);
>                  ahci_reset_port(s, port);

This part isn't mentioned in the commit message at all, and I don't see
how it's related to commit e2a5d9b3d9c3 either. Is this supposed to be a
bonus fix?

ahci_reset_port() already calls ahci_init_d2h() -> ahci_write_fis_d2h().
So I think this new ahci_write_fis_d2h() only sets some state that will
immediately be overwritten again. Which is good, because we didn't set
the signature as described in the SATA software reset protocol yet, that
is only done in ahci_reset_port().

Am I missing something? Why do we need this ahci_write_fis_d2h() call
here?

As for the ahci_clear_cmd_issue(), I'm surprised that ahci_reset_port()
doesn't already clear the register. Wouldn't it make more sense there
than just in this one caller?

Kevin
Re: [PATCH v2] hw/ide/ahci: fix legacy software reset
Posted by Niklas Cassel 6 months ago
On Tue, Nov 07, 2023 at 07:14:07PM +0100, Kevin Wolf wrote:
> Am 05.10.2023 um 12:04 hat Niklas Cassel geschrieben:
> > From: Niklas Cassel <niklas.cassel@wdc.com>
> > 
> > Legacy software contains a standard mechanism for generating a reset to a
> > Serial ATA device - setting the SRST (software reset) bit in the Device
> > Control register.
> > 
> > Serial ATA has a more robust mechanism called COMRESET, also referred to
> > as port reset. A port reset is the preferred mechanism for error
> > recovery and should be used in place of software reset.
> > 
> > Commit e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> > improved the handling of PxCI, such that PxCI gets cleared after handling
> > a non-NCQ, or NCQ command (instead of incorrectly clearing PxCI after
> > receiving an arbitrary FIS).
> > 
> > However, simply clearing PxCI after a non-NCQ, or NCQ command, is not
> > enough, we also need to clear PxCI when receiving a SRST in the Device
> > Control register.
> > 
> > This fixes an issue for FreeBSD where the device would fail to reset.
> > The problem was not noticed in Linux, because Linux uses a COMRESET
> > instead of a legacy software reset by default.
> > 
> > Fixes: e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> > Reported-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> > Changes since v1: write the D2H FIS before clearing PxCI.
> > 
> >  hw/ide/ahci.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> > index babdd7b458..7269dabbdb 100644
> > --- a/hw/ide/ahci.c
> > +++ b/hw/ide/ahci.c
> > @@ -1254,10 +1254,26 @@ static void handle_reg_h2d_fis(AHCIState *s, int port,
> >          case STATE_RUN:
> >              if (cmd_fis[15] & ATA_SRST) {
> >                  s->dev[port].port_state = STATE_RESET;
> > +                /*
> > +                 * When setting SRST in the first H2D FIS in the reset sequence,
> > +                 * the device does not send a D2H FIS. Host software thus has to
> > +                 * set the "Clear Busy upon R_OK" bit such that PxCI (and BUSY)
> > +                 * gets cleared. See AHCI 1.3.1, section 10.4.1 Software Reset.
> > +                 */
> > +                if (opts & AHCI_CMD_CLR_BUSY) {
> > +                    ahci_clear_cmd_issue(ad, slot);
> > +                }
> 
> I suspect that AHCI_CMD_CLR_BUSY really should be checked in a more
> generic place, but this will do for fixing software reset.

This weirdo option "Clear Busy upon R_OK" is for the HBA itself to clear
the bit for the command in PxCI, when it gets the R_OK that the Host to Device
(H2D) FIS was transmitted successfully to the device.
The normal way is that the device sends back a Device to Host (D2H) FIS
which then causes the bit in PxCI to be cleared in the HBA.

Yes, theoretically you could probably build a FIS that has the
"Clear Busy upon R_OK" even for e.g. a regular non-NCQ I/O command
(where PxCI/BUSY is first supposed to be cleared once the command is done...),
however this would surely cause problems as PxCI would then no longer shadow
the state of the device.

If you search for "Clear Busy upon R_OK", the only usage seems
to be for legacy software reset.

I'm quite sure that the option was specifically created for legacy
software reset, since to do a legacy software reset, you need to
create and issue two FISes.
The first FIS has SRST set (assert reset), the second FIS has SRST
cleared (deassert reset).

For regular I/O non-NCQ commands, PxCI will be cleared when the
I/O has completed (by the device sending back a D2H FIS).
For NCQ commands, PxCI will be cleared when the drive has
successfully queued the command (by the device sending back a D2H FIS).

The spec states that there should NOT be a D2H FIS sent back to
the HBA when getting the first H2D FIS (which asserts reset).
(The device will still send back a R_OK that the FIS was transmitted
successfully.)

For the second H2D FIS (that deasserts reset), the spec says that the
device SHOULD send a D2H FIS back (after the device diagnostic is done).


> 
> >              }
> >              break;
> >          case STATE_RESET:
> >              if (!(cmd_fis[15] & ATA_SRST)) {
> > +                /*
> > +                 * When clearing SRST in the second H2D FIS in the reset
> > +                 * sequence, the device will send a D2H FIS. See SATA 3.5a Gold,
> > +                 * section 11.4 Software reset protocol.
> > +                 */
> > +                ahci_clear_cmd_issue(ad, slot);
> > +                ahci_write_fis_d2h(ad, false);
> >                  ahci_reset_port(s, port);
> 
> This part isn't mentioned in the commit message at all, and I don't see
> how it's related to commit e2a5d9b3d9c3 either. Is this supposed to be a
> bonus fix?

There are two ways to reset.
COMRESET which is a simple register write.
or
legacy software reset, where you create and issue two H2D FISes
(just like when submitting an I/O command).
For legacy software reset, the first H2D FIS should not send back a
D2H FIS. The second H2D FIS should.

ahci_reset_port() is used by both ways.



Previously, the QEMU model didn't handle a legacy software according
to spec, it simply cleared PxCI blindly without checking "Clear Busy
upon R_OK" for non-I/O commands.


In SATA 3.5a Gold, section 11.4 Software reset protocol.
When SRST goes from 1->0, it should execute a device diagnostic.
At the end of the diagnostic, it should transfer a D2H FIS with good
status.


Reading other chapters again, the execution of device diagnostic,
and sending of good status, is also done when a reset is done via
a COMRESET.

So even while a D2H FIS is usually sent back as a response to a H2D FIS,
it appears that also a COMRESET (which is a register write), triggers
a D2H FIS.


I did know that the device sends a D2H FIS on power up, in order to update
PxSIG. (This is why we have the ahci_init_d2h() function.)

However, it turns out that the D2H FIS that updates PxSIG and the D2H FIS
that is send out after a SRST 1->0 (and a COMRESET), is the same as the
D2H FIS that updates PxSIG.

But if we remove:

> > +                ahci_clear_cmd_issue(ad, slot);
> > +                ahci_write_fis_d2h(ad, false);

Then PxCI for the second H2D FIS (that is executed like) a normal command
will not get cleared here...


So where should it get cleared?
In ahci_init_d2h() is the correct place... ahci_init_d2h() does
ahci_write_fis_d2h(), but is currently the only ahci_write_fis_d2h()
that doesn't also do a ahci_clear_cmd_issue().


However, simply adding a ahci_clear_cmd_issue() in ahci_init_d2h()
is not trivial. The whole ahci_init_d2h() is a big hack, as the comments
in the code already says... And ahci_init_d2h() is not only called from
ahci_reset_port()...

I think I would prefer to keep the ahci_clear_cmd_issue() above,
drop the ahci_write_fis_d2h(ad, false), and modify the comment to
say that the matching D2H FIS is really the ahci_write_fis_d2h()
done by ahci_init_d2h()... which is the _singular_ D2H FIS sent out
after the device diagnostic triggered by SRST 1->0 has completed,
which send the good status (which is what the HBA initializes PxSIG
to), and additionally clears PxCI.

(For a reset triggered by a COMRESET, everything is the same, except
it doesn't clear PxCI, since there was no command created and issued.)


> ahci_reset_port() already calls ahci_init_d2h() -> ahci_write_fis_d2h().
> So I think this new ahci_write_fis_d2h() only sets some state that will
> immediately be overwritten again. Which is good, because we didn't set
> the signature as described in the SATA software reset protocol yet, that
> is only done in ahci_reset_port().
> 
> Am I missing something? Why do we need this ahci_write_fis_d2h() call
> here?
> 
> As for the ahci_clear_cmd_issue(), I'm surprised that ahci_reset_port()
> doesn't already clear the register. Wouldn't it make more sense there
> than just in this one caller?

A start/stop of the engine (PxCMD.ST) does clear PxCI, as I implemented in:
https://github.com/qemu/qemu/commit/d73b84d0b664e60fffb66f46e84d0db4a8e1c713

But according to AHCI 1.3.1, section:
3.3.14 Offset 38h: PxCI – Port x Command Issue

PxCI does also have a reset value of 0h.

So I would assume that it is okay to clear PxCI to zero also in
ahci_reset_port().

By doing so, we could avoid both the ahci_clear_cmd_issue(),
and the ahci_write_fis_d2h(ad, false).

Perhaps you can drop the "bonus" part of my patch and simply
add a pr->cmd_issue to ahci_reset_port()?

Thank you for your review!


Kind regards,
Niklas

Re: [PATCH v2] hw/ide/ahci: fix legacy software reset
Posted by Kevin Wolf 6 months ago
Am 08.11.2023 um 00:57 hat Niklas Cassel geschrieben:
> On Tue, Nov 07, 2023 at 07:14:07PM +0100, Kevin Wolf wrote:
> > Am 05.10.2023 um 12:04 hat Niklas Cassel geschrieben:
> > > From: Niklas Cassel <niklas.cassel@wdc.com>
> > > 
> > > Legacy software contains a standard mechanism for generating a reset to a
> > > Serial ATA device - setting the SRST (software reset) bit in the Device
> > > Control register.
> > > 
> > > Serial ATA has a more robust mechanism called COMRESET, also referred to
> > > as port reset. A port reset is the preferred mechanism for error
> > > recovery and should be used in place of software reset.
> > > 
> > > Commit e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> > > improved the handling of PxCI, such that PxCI gets cleared after handling
> > > a non-NCQ, or NCQ command (instead of incorrectly clearing PxCI after
> > > receiving an arbitrary FIS).
> > > 
> > > However, simply clearing PxCI after a non-NCQ, or NCQ command, is not
> > > enough, we also need to clear PxCI when receiving a SRST in the Device
> > > Control register.
> > > 
> > > This fixes an issue for FreeBSD where the device would fail to reset.
> > > The problem was not noticed in Linux, because Linux uses a COMRESET
> > > instead of a legacy software reset by default.
> > > 
> > > Fixes: e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> > > Reported-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
> > > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > > ---
> > > Changes since v1: write the D2H FIS before clearing PxCI.
> > > 
> > >  hw/ide/ahci.c | 16 ++++++++++++++++
> > >  1 file changed, 16 insertions(+)
> > > 
> > > diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> > > index babdd7b458..7269dabbdb 100644
> > > --- a/hw/ide/ahci.c
> > > +++ b/hw/ide/ahci.c
> > > @@ -1254,10 +1254,26 @@ static void handle_reg_h2d_fis(AHCIState *s, int port,
> > >          case STATE_RUN:
> > >              if (cmd_fis[15] & ATA_SRST) {
> > >                  s->dev[port].port_state = STATE_RESET;
> > > +                /*
> > > +                 * When setting SRST in the first H2D FIS in the reset sequence,
> > > +                 * the device does not send a D2H FIS. Host software thus has to
> > > +                 * set the "Clear Busy upon R_OK" bit such that PxCI (and BUSY)
> > > +                 * gets cleared. See AHCI 1.3.1, section 10.4.1 Software Reset.
> > > +                 */
> > > +                if (opts & AHCI_CMD_CLR_BUSY) {
> > > +                    ahci_clear_cmd_issue(ad, slot);
> > > +                }
> > 
> > I suspect that AHCI_CMD_CLR_BUSY really should be checked in a more
> > generic place, but this will do for fixing software reset.
> 
> This weirdo option "Clear Busy upon R_OK" is for the HBA itself to
> clear the bit for the command in PxCI, when it gets the R_OK that the
> Host to Device (H2D) FIS was transmitted successfully to the device.
> The normal way is that the device sends back a Device to Host (D2H) FIS
> which then causes the bit in PxCI to be cleared in the HBA.
> 
> Yes, theoretically you could probably build a FIS that has the "Clear
> Busy upon R_OK" even for e.g. a regular non-NCQ I/O command (where
> PxCI/BUSY is first supposed to be cleared once the command is
> done...), however this would surely cause problems as PxCI would then
> no longer shadow the state of the device.
> 
> If you search for "Clear Busy upon R_OK", the only usage seems
> to be for legacy software reset.
> [...]

Yes, looks like ignoring it otherwise is harmless.

> > ahci_reset_port() already calls ahci_init_d2h() -> ahci_write_fis_d2h().
> > So I think this new ahci_write_fis_d2h() only sets some state that will
> > immediately be overwritten again. Which is good, because we didn't set
> > the signature as described in the SATA software reset protocol yet, that
> > is only done in ahci_reset_port().
> > 
> > Am I missing something? Why do we need this ahci_write_fis_d2h() call
> > here?
> > 
> > As for the ahci_clear_cmd_issue(), I'm surprised that ahci_reset_port()
> > doesn't already clear the register. Wouldn't it make more sense there
> > than just in this one caller?
> 
> A start/stop of the engine (PxCMD.ST) does clear PxCI, as I implemented in:
> https://github.com/qemu/qemu/commit/d73b84d0b664e60fffb66f46e84d0db4a8e1c713
> 
> But according to AHCI 1.3.1, section:
> 3.3.14 Offset 38h: PxCI – Port x Command Issue
> 
> PxCI does also have a reset value of 0h.
> 
> So I would assume that it is okay to clear PxCI to zero also in
> ahci_reset_port().

Yes, that was what I thought.

> By doing so, we could avoid both the ahci_clear_cmd_issue(),
> and the ahci_write_fis_d2h(ad, false).
> 
> Perhaps you can drop the "bonus" part of my patch and simply
> add a pr->cmd_issue to ahci_reset_port()?

I agree that this is probably the best change to make.

If you also want to add that comment that there already is a D2H FIS
sent during reset, and maybe change the commit message to explain why
we're touching ahci_reset_port(), I think I would prefer if you sent a
v3. I'm usually happy to make small changes while applying, but it feels
like this would be a larger change than I would be comfortable making
without having it on the mailing list again.

Kevin


Re: [PATCH v2] hw/ide/ahci: fix legacy software reset
Posted by Kevin Wolf 6 months ago
Am 07.11.2023 um 19:14 hat Kevin Wolf geschrieben:
> Am 05.10.2023 um 12:04 hat Niklas Cassel geschrieben:
> > From: Niklas Cassel <niklas.cassel@wdc.com>
> > 
> > Legacy software contains a standard mechanism for generating a reset to a
> > Serial ATA device - setting the SRST (software reset) bit in the Device
> > Control register.
> > 
> > Serial ATA has a more robust mechanism called COMRESET, also referred to
> > as port reset. A port reset is the preferred mechanism for error
> > recovery and should be used in place of software reset.
> > 
> > Commit e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> > improved the handling of PxCI, such that PxCI gets cleared after handling
> > a non-NCQ, or NCQ command (instead of incorrectly clearing PxCI after
> > receiving an arbitrary FIS).
> > 
> > However, simply clearing PxCI after a non-NCQ, or NCQ command, is not
> > enough, we also need to clear PxCI when receiving a SRST in the Device
> > Control register.
> > 
> > This fixes an issue for FreeBSD where the device would fail to reset.
> > The problem was not noticed in Linux, because Linux uses a COMRESET
> > instead of a legacy software reset by default.
> > 
> > Fixes: e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> > Reported-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> > Changes since v1: write the D2H FIS before clearing PxCI.
> > 
> >  hw/ide/ahci.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> > index babdd7b458..7269dabbdb 100644
> > --- a/hw/ide/ahci.c
> > +++ b/hw/ide/ahci.c
> > @@ -1254,10 +1254,26 @@ static void handle_reg_h2d_fis(AHCIState *s, int port,
> >          case STATE_RUN:
> >              if (cmd_fis[15] & ATA_SRST) {
> >                  s->dev[port].port_state = STATE_RESET;
> > +                /*
> > +                 * When setting SRST in the first H2D FIS in the reset sequence,
> > +                 * the device does not send a D2H FIS. Host software thus has to
> > +                 * set the "Clear Busy upon R_OK" bit such that PxCI (and BUSY)
> > +                 * gets cleared. See AHCI 1.3.1, section 10.4.1 Software Reset.
> > +                 */
> > +                if (opts & AHCI_CMD_CLR_BUSY) {
> > +                    ahci_clear_cmd_issue(ad, slot);
> > +                }
> 
> I suspect that AHCI_CMD_CLR_BUSY really should be checked in a more
> generic place, but this will do for fixing software reset.
> 
> >              }
> >              break;
> >          case STATE_RESET:
> >              if (!(cmd_fis[15] & ATA_SRST)) {
> > +                /*
> > +                 * When clearing SRST in the second H2D FIS in the reset
> > +                 * sequence, the device will send a D2H FIS. See SATA 3.5a Gold,
> > +                 * section 11.4 Software reset protocol.
> > +                 */
> > +                ahci_clear_cmd_issue(ad, slot);
> > +                ahci_write_fis_d2h(ad, false);
> >                  ahci_reset_port(s, port);
> 
> This part isn't mentioned in the commit message at all, and I don't see
> how it's related to commit e2a5d9b3d9c3 either. Is this supposed to be a
> bonus fix?
> 
> ahci_reset_port() already calls ahci_init_d2h() -> ahci_write_fis_d2h().
> So I think this new ahci_write_fis_d2h() only sets some state that will
> immediately be overwritten again. Which is good, because we didn't set
> the signature as described in the SATA software reset protocol yet, that
> is only done in ahci_reset_port().
> 
> Am I missing something? Why do we need this ahci_write_fis_d2h() call
> here?
> 
> As for the ahci_clear_cmd_issue(), I'm surprised that ahci_reset_port()
> doesn't already clear the register. Wouldn't it make more sense there
> than just in this one caller?

The other thing I wondered and forgot to actually write is if we should
extend ahci-test to include port and software resets.

Kevin
Re: [PATCH v2] hw/ide/ahci: fix legacy software reset
Posted by Michael Tokarev 6 months, 2 weeks ago
05.10.2023 13:04, Niklas Cassel:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> Legacy software contains a standard mechanism for generating a reset to a
> Serial ATA device - setting the SRST (software reset) bit in the Device
> Control register.
> 
> Serial ATA has a more robust mechanism called COMRESET, also referred to
> as port reset. A port reset is the preferred mechanism for error
> recovery and should be used in place of software reset.
> 
> Commit e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> improved the handling of PxCI, such that PxCI gets cleared after handling
> a non-NCQ, or NCQ command (instead of incorrectly clearing PxCI after
> receiving an arbitrary FIS).
> 
> However, simply clearing PxCI after a non-NCQ, or NCQ command, is not
> enough, we also need to clear PxCI when receiving a SRST in the Device
> Control register.
> 
> This fixes an issue for FreeBSD where the device would fail to reset.
> The problem was not noticed in Linux, because Linux uses a COMRESET
> instead of a legacy software reset by default.
> 
> Fixes: e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> Reported-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>

Ping?  This bugfix hasn't landed in master still.. :(

/mjt
Re: [PATCH v2] hw/ide/ahci: fix legacy software reset
Posted by Michael Tokarev 7 months ago
05.10.2023 13:04, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> Legacy software contains a standard mechanism for generating a reset to a
> Serial ATA device - setting the SRST (software reset) bit in the Device
> Control register.
> 
> Serial ATA has a more robust mechanism called COMRESET, also referred to
> as port reset. A port reset is the preferred mechanism for error
> recovery and should be used in place of software reset.
> 
> Commit e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> improved the handling of PxCI, such that PxCI gets cleared after handling
> a non-NCQ, or NCQ command (instead of incorrectly clearing PxCI after
> receiving an arbitrary FIS).
> 
> However, simply clearing PxCI after a non-NCQ, or NCQ command, is not
> enough, we also need to clear PxCI when receiving a SRST in the Device
> Control register.
> 
> This fixes an issue for FreeBSD where the device would fail to reset.
> The problem was not noticed in Linux, because Linux uses a COMRESET
> instead of a legacy software reset by default.

I don't know neither this area of qemu nor how hardware works, but as
far as I can tell, this change fixes the reported FreeBSD ISO failure, -
it works fine before e2a5d9b3d9c3, it fails to see the connected drives
after, and it works again with this patch applied.  I can't say this is
a good testing, since obviously Niklas did the same testing locally too :)

John, can you send pullreq for this to master please?

Thank you Niklas for the good work!

/mjt

> Fixes: e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> Reported-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
> Changes since v1: write the D2H FIS before clearing PxCI.
> 
>   hw/ide/ahci.c | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index babdd7b458..7269dabbdb 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -1254,10 +1254,26 @@ static void handle_reg_h2d_fis(AHCIState *s, int port,
>           case STATE_RUN:
>               if (cmd_fis[15] & ATA_SRST) {
>                   s->dev[port].port_state = STATE_RESET;
> +                /*
> +                 * When setting SRST in the first H2D FIS in the reset sequence,
> +                 * the device does not send a D2H FIS. Host software thus has to
> +                 * set the "Clear Busy upon R_OK" bit such that PxCI (and BUSY)
> +                 * gets cleared. See AHCI 1.3.1, section 10.4.1 Software Reset.
> +                 */
> +                if (opts & AHCI_CMD_CLR_BUSY) {
> +                    ahci_clear_cmd_issue(ad, slot);
> +                }
>               }
>               break;
>           case STATE_RESET:
>               if (!(cmd_fis[15] & ATA_SRST)) {
> +                /*
> +                 * When clearing SRST in the second H2D FIS in the reset
> +                 * sequence, the device will send a D2H FIS. See SATA 3.5a Gold,
> +                 * section 11.4 Software reset protocol.
> +                 */
> +                ahci_clear_cmd_issue(ad, slot);
> +                ahci_write_fis_d2h(ad, false);
>                   ahci_reset_port(s, port);
>               }
>               break;