[PATCH] ata: libata-eh: Fix the low data link negotiation

Guoliang Zhang posted 1 patch 2 months, 2 weeks ago
drivers/ata/libata-eh.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] ata: libata-eh: Fix the low data link negotiation
Posted by Guoliang Zhang 2 months, 2 weeks ago
large capacity hard drives, such as the 16TB Seagate hard drive in hotplug,
have data link negotiations that only reach 3.0 Gbps.
The reason for the issue is that after powering on, the hardware reset time
of the hard drive is relatively long.
In the driver, after the hardreset signal is sent, the waiting time for the
first two tries is too short,
causing the hardware reset to fall into a vicious cycle.

log is as follows:
[ 959.461875] ata7: found unknown device (class 0)
[ 963.686830] ata7: softreset failed (1st FIS failed)
[ 969.442516] ata7: found unknown device (class 0)
[ 973.686229] ata7: softreset failed (1st FIS failed)
[ 979.426704] ata7: found unknown device (class 0)
[1008.687432] ata7: softreset failed (1st FIS failed)
[1008.687447] ata7: limiting SATA link speed to 3.0 Gbps
[1009.566733] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[1009.567405] ata7.00: ATA-11: ST16000NT001-3LV101, EN01, max UDMA/133
[10009.613694] ata7.00: 31251759104 sectors,
multi 16: LBA48 NCQ (depth32), AA
[10009.614223] ata7.00: Features: NCQ-sndrcv
[10009.639149] ata7.00: configured for UDMA/133
[10009.639366] scsi 6:0:0:0: Direct-Access  ATA ST16000NT001-3LVEN01
 PQ: 0 ANSI: 5
[10009.639779] sd 6:0:0:0: Attached scsi generic sg2 type 0
[10009.639989] sd 6:0:0:0: [sdc] 31251759104 512-byte logical blocks:
(16.0 TB/14.6 TiB)
[10009.639999] sd 6:0:0:0: [sdc] 4096-byte physical blocks
[10009.640028] sd 6:0:0:0: [sdc] Write Protect is off
[10009.640038] sd 6:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[10009.640082] sd 6:0:0:0: [sdc] Write cache: enabled, read cache:enabled,
doesn't support DPO or FUA
[10009.717866]  sdc: sdc1 sdc2 sdc3 sdc4
[10009.739038] sd 6:0:0:0: [sdc] Attached SCSI disk
========
Logs after modify:
[  661.023298] ata7: found unknown device (class 0)
[  675.253714] ata7: softreset failed (1st FIS failed)
[  680.996545] ata7: found unknown device (class 0)
[  695.251101] ata7: softreset failed (1st FIS failed)
[  696.131404] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  696.132140] ata7.00: ATA-11: ST16000NT001-3LV101, EN01, max UDMA/133
[  696.172742] ata7.00: 31251759104 sectors, multi 16: LBA48 NCQ (depth
32), AA
[  696.173327] ata7.00: Features: NCQ-sndrcv
[  696.198155] ata7.00: configured for UDMA/133

Signed-off-by: Guoliang Zhang <zhangguoliang@zspace.cn>
---
 drivers/ata/libata-eh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 214b935c2ced..9e0f17e93e73 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -80,7 +80,7 @@ enum {
  */
 static const unsigned int ata_eh_reset_timeouts[] = {
 	10000,	/* most drives spin up by 10sec */
-	10000,	/* > 99% working drives spin up before 20sec */
+	20000,	/* > 99% working drives spin up before 30sec */
 	35000,	/* give > 30 secs of idleness for outlier devices */
 	 5000,	/* and sweet one last chance */
 	UINT_MAX, /* > 1 min has elapsed, give up */
-- 
2.43.0
Re: [PATCH] ata: libata-eh: Fix the low data link negotiation
Posted by Damien Le Moal 2 months, 1 week ago
On 2024/09/16 16:42, Guoliang Zhang wrote:
> large capacity hard drives, such as the 16TB Seagate hard drive in hotplug,
> have data link negotiations that only reach 3.0 Gbps.
> The reason for the issue is that after powering on, the hardware reset time
> of the hard drive is relatively long.
> In the driver, after the hardreset signal is sent, the waiting time for the
> first two tries is too short,
> causing the hardware reset to fall into a vicious cycle.
> 
> log is as follows:
> [ 959.461875] ata7: found unknown device (class 0)
> [ 963.686830] ata7: softreset failed (1st FIS failed)
> [ 969.442516] ata7: found unknown device (class 0)
> [ 973.686229] ata7: softreset failed (1st FIS failed)
> [ 979.426704] ata7: found unknown device (class 0)
> [1008.687432] ata7: softreset failed (1st FIS failed)

These messages are not normal and do not correspond to a drive that is slow to
spinup. These are generally indicative of bad communication with the device.
I ran into this the other day and solved the issue by using a better SATA cable...

If the drive is slow to spinup and takes time to start responding to commands,
you should see the message:

"link is slow to respond, please be patient "

which is issued by ata_wait_after_reset() -> ata_wait_ready(). The "1st FIS
failed" error indicates that the first H2D FIS sent to the device with the SRST
bit set does not give the correct answer, that is, the SRST bit is NOT set to
one in the Device Control register. Software reset should work always regardless
of the power state of the drive (spun down or not). So this error has nothing to
do with the timeout length, which is used for this first phase of the reset
sequence as a convenience but is really intended as a timeout value for waiting
for the link to become ready, which may take time if the device is spinning up.

> [1008.687447] ata7: limiting SATA link speed to 3.0 Gbps
> [1009.566733] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 320)

This is the result of the sofreset failing due to the "1st FIS failed" errors,
and NOT due to the fact that the drive is slow to spin up.

> [1009.567405] ata7.00: ATA-11: ST16000NT001-3LV101, EN01, max UDMA/133
> [10009.613694] ata7.00: 31251759104 sectors,
> multi 16: LBA48 NCQ (depth32), AA
> [10009.614223] ata7.00: Features: NCQ-sndrcv
> [10009.639149] ata7.00: configured for UDMA/133
> [10009.639366] scsi 6:0:0:0: Direct-Access  ATA ST16000NT001-3LVEN01
>  PQ: 0 ANSI: 5
> [10009.639779] sd 6:0:0:0: Attached scsi generic sg2 type 0
> [10009.639989] sd 6:0:0:0: [sdc] 31251759104 512-byte logical blocks:
> (16.0 TB/14.6 TiB)
> [10009.639999] sd 6:0:0:0: [sdc] 4096-byte physical blocks
> [10009.640028] sd 6:0:0:0: [sdc] Write Protect is off
> [10009.640038] sd 6:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> [10009.640082] sd 6:0:0:0: [sdc] Write cache: enabled, read cache:enabled,
> doesn't support DPO or FUA
> [10009.717866]  sdc: sdc1 sdc2 sdc3 sdc4
> [10009.739038] sd 6:0:0:0: [sdc] Attached SCSI disk
> ========
> Logs after modify:
> [  661.023298] ata7: found unknown device (class 0)
> [  675.253714] ata7: softreset failed (1st FIS failed)
> [  680.996545] ata7: found unknown device (class 0)
> [  695.251101] ata7: softreset failed (1st FIS failed)

The errors are still here... So either you have a bad SATA cable, or the drive
reset sequence is not per specs. Changing the timeouts to a longer interval is
not removing these errors, so you are not actually fixing anything here.

Please check your cable. If that does nothing, I suggest contacting the drive
vendor about this.

I have plenty of drives that are slow to spin up and they do NOT generate that
FIS error.

> [  696.131404] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [  696.132140] ata7.00: ATA-11: ST16000NT001-3LV101, EN01, max UDMA/133
> [  696.172742] ata7.00: 31251759104 sectors, multi 16: LBA48 NCQ (depth
> 32), AA
> [  696.173327] ata7.00: Features: NCQ-sndrcv
> [  696.198155] ata7.00: configured for UDMA/133
> 
> Signed-off-by: Guoliang Zhang <zhangguoliang@zspace.cn>
> ---
>  drivers/ata/libata-eh.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 214b935c2ced..9e0f17e93e73 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -80,7 +80,7 @@ enum {
>   */
>  static const unsigned int ata_eh_reset_timeouts[] = {
>  	10000,	/* most drives spin up by 10sec */
> -	10000,	/* > 99% working drives spin up before 20sec */
> +	20000,	/* > 99% working drives spin up before 30sec */
>  	35000,	/* give > 30 secs of idleness for outlier devices */
>  	 5000,	/* and sweet one last chance */
>  	UINT_MAX, /* > 1 min has elapsed, give up */

-- 
Damien Le Moal
Western Digital Research