[v2] two fixes for pNFS SCSI device handling

[PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure

Posted by Benjamin Coddington 1 year, 2 months ago

If we're unable to register a SCSI device, ensure we mark the device as
unavailable so that it will timeout and be re-added via GETDEVINFO.  This
avoids repeated doomed attempts to register a device in the IO path.

Add some clarifying comments as well.

Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 0becdec12970..b36bc2f4f7e2 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -571,19 +571,29 @@ bl_find_get_deviceid(struct nfs_server *server,
 	if (!node)
 		return ERR_PTR(-ENODEV);
 
+	/*
+	 * Devices that are marked unavailable are left in the cache with a
+	 * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or
+	 * constantly attempting to register the device.  Once marked as
+	 * unavailable they must be deleted and never reused.
+	 */
 	if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) {
 		unsigned long end = jiffies;
 		unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT;
 
 		if (!time_in_range(node->timestamp_unavailable, start, end)) {
+			/* Force a new GETDEVINFO for this LAYOUT */
 			nfs4_delete_deviceid(node->ld, node->nfs_client, id);
 			goto retry;
 		}
 		goto out_put;
 	}
 
-	if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node)))
+	/* If we cannot register, treat this device as transient */
+	if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) {
+		nfs4_mark_deviceid_unavailable(node);
 		goto out_put;
+	}
 
 	return node;
 
-- 
2.47.0

Re: [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure

Posted by Chuck Lever 1 year, 2 months ago

On Wed, Nov 20, 2024 at 09:09:35AM -0500, Benjamin Coddington wrote:
> If we're unable to register a SCSI device, ensure we mark the device as
> unavailable so that it will timeout and be re-added via GETDEVINFO.  This
> avoids repeated doomed attempts to register a device in the IO path.
> 
> Add some clarifying comments as well.
> 
> Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration")
> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 0becdec12970..b36bc2f4f7e2 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -571,19 +571,29 @@ bl_find_get_deviceid(struct nfs_server *server,
>  	if (!node)
>  		return ERR_PTR(-ENODEV);
>  
> +	/*
> +	 * Devices that are marked unavailable are left in the cache with a
> +	 * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or
> +	 * constantly attempting to register the device.  Once marked as
> +	 * unavailable they must be deleted and never reused.
> +	 */
>  	if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) {
>  		unsigned long end = jiffies;
>  		unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT;
>  
>  		if (!time_in_range(node->timestamp_unavailable, start, end)) {
> +			/* Force a new GETDEVINFO for this LAYOUT */

Or perhaps: "Uncork subsequent GETDEVINFO operations for this device"
<shrug>

>  			nfs4_delete_deviceid(node->ld, node->nfs_client, id);
>  			goto retry;
>  		}
>  		goto out_put;
>  	}
>  
> -	if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node)))
> +	/* If we cannot register, treat this device as transient */

How about "Make a negative cache entry for this device"


> +	if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) {
> +		nfs4_mark_deviceid_unavailable(node);
>  		goto out_put;
> +	}
>  
>  	return node;
>  
> -- 
> 2.47.0
> 

It took me a bit to understand what this patch does. It is like
setting up a negative dentry so the local device cache absorbs
bursts of checks for the device. OK.

Just an observation: Negative caching has some consequences too.
For instance, there will now be a period where, if the device
happens to become available, the layout is still unusable. I wonder
if that's going to have some undesirable operational effects.


-- 
Chuck Lever

Re: [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure

Posted by Benjamin Coddington 1 year, 2 months ago

On 20 Nov 2024, at 10:22, Chuck Lever wrote:

> On Wed, Nov 20, 2024 at 09:09:35AM -0500, Benjamin Coddington wrote:
>> If we're unable to register a SCSI device, ensure we mark the device as
>> unavailable so that it will timeout and be re-added via GETDEVINFO.  This
>> avoids repeated doomed attempts to register a device in the IO path.
>>
>> Add some clarifying comments as well.
>>
>> Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration")
>> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>>  fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
>> index 0becdec12970..b36bc2f4f7e2 100644
>> --- a/fs/nfs/blocklayout/blocklayout.c
>> +++ b/fs/nfs/blocklayout/blocklayout.c
>> @@ -571,19 +571,29 @@ bl_find_get_deviceid(struct nfs_server *server,
>>  	if (!node)
>>  		return ERR_PTR(-ENODEV);
>>
>> +	/*
>> +	 * Devices that are marked unavailable are left in the cache with a
>> +	 * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or
>> +	 * constantly attempting to register the device.  Once marked as
>> +	 * unavailable they must be deleted and never reused.
>> +	 */
>>  	if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) {
>>  		unsigned long end = jiffies;
>>  		unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT;
>>
>>  		if (!time_in_range(node->timestamp_unavailable, start, end)) {
>> +			/* Force a new GETDEVINFO for this LAYOUT */
>
> Or perhaps: "Uncork subsequent GETDEVINFO operations for this device"
> <shrug>

Sure, ok!

>>  			nfs4_delete_deviceid(node->ld, node->nfs_client, id);
>>  			goto retry;
>>  		}
>>  		goto out_put;
>>  	}
>>
>> -	if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node)))
>> +	/* If we cannot register, treat this device as transient */
>
> How about "Make a negative cache entry for this device"

Hmm - that's closer to the dentry language rather than how we refer to
temporary error cases in device land.  For me the "transient" has some
hopeful meaning as in we expect this might work in the future - but I'm ok
changing this comment.  There will be some NFS clients that might try to do
pNFS SCSI but will never actually have the devices locally, and so that's
not a "transient" situation.  This can only fixed today with export policy.

>
>> +	if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) {
>> +		nfs4_mark_deviceid_unavailable(node);
>>  		goto out_put;
>> +	}
>>
>>  	return node;
>>
>> -- 
>> 2.47.0
>>
>
> It took me a bit to understand what this patch does. It is like
> setting up a negative dentry so the local device cache absorbs
> bursts of checks for the device. OK.

Yes, its like the layout error handling, but for devices.

Its not obvious at this layer, but every IO wants to do LAYOUTGET, then
figure out which device GETDEVINFO, then here we need to prep the device
with a reservation.  Its a lot of slow work that makes a mess of IO
latencies if one of the later steps is going to fail for awhile.

> Just an observation: Negative caching has some consequences too.
> For instance, there will now be a period where, if the device
> happens to become available, the layout is still unusable. I wonder
> if that's going to have some undesirable operational effects.

It sure does, but I don't think there's a simple way to get notified that a
SCSI device has re-appeared or has started supporting persistent
reservations.

Ben

Re: [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure

Posted by Chuck Lever 1 year, 2 months ago

On Thu, Nov 21, 2024 at 10:11:33AM -0500, Benjamin Coddington wrote:
> On 20 Nov 2024, at 10:22, Chuck Lever wrote:
> 
> > On Wed, Nov 20, 2024 at 09:09:35AM -0500, Benjamin Coddington wrote:
> >> If we're unable to register a SCSI device, ensure we mark the device as
> >> unavailable so that it will timeout and be re-added via GETDEVINFO.  This
> >> avoids repeated doomed attempts to register a device in the IO path.
> >>
> >> Add some clarifying comments as well.
> >>
> >> Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration")
> >> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> >> Reviewed-by: Christoph Hellwig <hch@lst.de>
> >> ---
> >>  fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++-
> >>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> >> index 0becdec12970..b36bc2f4f7e2 100644
> >> --- a/fs/nfs/blocklayout/blocklayout.c
> >> +++ b/fs/nfs/blocklayout/blocklayout.c
> >> @@ -571,19 +571,29 @@ bl_find_get_deviceid(struct nfs_server *server,
> >>  	if (!node)
> >>  		return ERR_PTR(-ENODEV);
> >>
> >> +	/*
> >> +	 * Devices that are marked unavailable are left in the cache with a
> >> +	 * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or
> >> +	 * constantly attempting to register the device.  Once marked as
> >> +	 * unavailable they must be deleted and never reused.
> >> +	 */
> >>  	if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) {
> >>  		unsigned long end = jiffies;
> >>  		unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT;
> >>
> >>  		if (!time_in_range(node->timestamp_unavailable, start, end)) {
> >> +			/* Force a new GETDEVINFO for this LAYOUT */
> >
> > Or perhaps: "Uncork subsequent GETDEVINFO operations for this device"
> > <shrug>
> 
> Sure, ok!
> 
> >>  			nfs4_delete_deviceid(node->ld, node->nfs_client, id);
> >>  			goto retry;
> >>  		}
> >>  		goto out_put;
> >>  	}
> >>
> >> -	if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node)))
> >> +	/* If we cannot register, treat this device as transient */
> >
> > How about "Make a negative cache entry for this device"
> 
> Hmm - that's closer to the dentry language rather than how we refer to
> temporary error cases in device land.  For me the "transient" has some
> hopeful meaning as in we expect this might work in the future - but I'm ok
> changing this comment.  There will be some NFS clients that might try to do
> pNFS SCSI but will never actually have the devices locally, and so that's
> not a "transient" situation.  This can only fixed today with export policy.

No big deal!


> >> +	if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) {
> >> +		nfs4_mark_deviceid_unavailable(node);
> >>  		goto out_put;
> >> +	}
> >>
> >>  	return node;
> >>
> >> -- 
> >> 2.47.0
> >>
> >
> > It took me a bit to understand what this patch does. It is like
> > setting up a negative dentry so the local device cache absorbs
> > bursts of checks for the device. OK.
> 
> Yes, its like the layout error handling, but for devices.
> 
> Its not obvious at this layer, but every IO wants to do LAYOUTGET, then
> figure out which device GETDEVINFO, then here we need to prep the device
> with a reservation.  Its a lot of slow work that makes a mess of IO
> latencies if one of the later steps is going to fail for awhile.

Thanks! It would be nice to add this bit of context to the patch
description.


> > Just an observation: Negative caching has some consequences too.
> > For instance, there will now be a period where, if the device
> > happens to become available, the layout is still unusable. I wonder
> > if that's going to have some undesirable operational effects.
> 
> It sure does, but I don't think there's a simple way to get notified that a
> SCSI device has re-appeared or has started supporting persistent
> reservations.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>


-- 
Chuck Lever