[PATCH] NFSv4: clear exception state on successful mkdir retry

Igor Raits posted 1 patch 1 month, 2 weeks ago
fs/nfs/nfs4proc.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
[PATCH] NFSv4: clear exception state on successful mkdir retry
Posted by Igor Raits 1 month, 2 weeks ago
After a server returns NFS4ERR_DELAY for an NFSv4 CREATE issued by
mkdir(2), the client correctly waits and retries.  When the retry
succeeds, however, mkdir(2) can still surface -EEXIST to userspace
even though the directory was just created on the server.

Reproducer (random 16-hex names so collisions are not the cause)
against an in-kernel Linux nfsd; reproduces under both NFSv4.0 and
NFSv4.2:

  N=2000000; base=/var/gdc/export
  for ((i=1; i<=N; i++)); do
      d=$base/$(openssl rand -hex 8)
      mkdir "$d" 2>/dev/null || echo "$(date +%T) failed loop=$i $d"
      rmdir "$d" 2>/dev/null
  done

Failures cluster at the cadence at which the server-side auth/export
cache refresh path causes nfsd to return NFS4ERR_DELAY for CREATE.

A wire trace of one failure (the three CREATE RPCs all come from a
single mkdir(2), generated by the do-while in nfs4_proc_mkdir()):

  client -> server  CREATE name=...  -> NFS4ERR_DELAY
  ~100 ms later
  client -> server  CREATE name=...  -> NFS4_OK         (dir created)
  ~80 us later
  client -> server  CREATE name=...  -> NFS4ERR_EXIST   (correct)

Since commit dd862da61e91 ("nfs: fix incorrect handling of large-number
NFS errors in nfs4_do_mkdir()"), nfs4_handle_exception() is called only
when _nfs4_proc_mkdir() returned an error.  That gate breaks retry-state
hygiene: nfs4_do_handle_exception() resets exception.{delay,recovering,
retry} to 0 on entry, so calling it on success is what previously
cleared the retry flag set by the preceding NFS4ERR_DELAY iteration.
With the gate in place, exception.retry stays at 1 after the successful
retry, the loop runs once more, and the resulting CREATE for an
already-created name yields NFS4ERR_EXIST -> -EEXIST to userspace.

Drop the conditional and call nfs4_handle_exception() unconditionally,
matching every other do-while in fs/nfs/nfs4proc.c (nfs4_proc_symlink(),
nfs4_proc_link(), etc.).  The dentry/status separation introduced by
that commit is preserved.

Fixes: dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir()")
Reported-and-tested-by: Jan Čípa <jan.cipa@gooddata.com>
Closes: https://lore.kernel.org/linux-nfs/CA+9S74hSp_tJu2Ffe2BPNC2T25gfkhgjjDkdgSsF5c2rnJq_wA@mail.gmail.com/
Reviewed-by: NeilBrown <neil@brown.name>
Cc: stable@vger.kernel.org
Signed-off-by: Igor Raits <igor.raits@gmail.com>
---
 fs/nfs/nfs4proc.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index a0885ae55abc..ffd14141ea1d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5393,10 +5393,9 @@ static struct dentry *nfs4_proc_mkdir(struct inode *dir, struct dentry *dentry,
 	do {
 		alias = _nfs4_proc_mkdir(dir, dentry, sattr, label, &err);
 		trace_nfs4_mkdir(dir, &dentry->d_name, err);
+		err = nfs4_handle_exception(NFS_SERVER(dir), err, &exception);
 		if (err)
-			alias = ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir),
-							      err,
-							      &exception));
+			alias = ERR_PTR(err);
 	} while (exception.retry);
 	nfs4_label_release_security(label);
 
-- 
2.53.0

Re: [PATCH] NFSv4: clear exception state on successful mkdir retry
Posted by Thorsten Leemhuis 1 month ago
[top-posting to facilitate processing]

@NFSv4 maintainers, just wondering, did this patch maybe fall through
the cracks? It fixes a regression, that's why it's on my radar. Or was
there some progress and I missed it?

Ciao, Thorsten

On 4/29/26 12:49, Igor Raits wrote:
> After a server returns NFS4ERR_DELAY for an NFSv4 CREATE issued by
> mkdir(2), the client correctly waits and retries.  When the retry
> succeeds, however, mkdir(2) can still surface -EEXIST to userspace
> even though the directory was just created on the server.
> 
> Reproducer (random 16-hex names so collisions are not the cause)
> against an in-kernel Linux nfsd; reproduces under both NFSv4.0 and
> NFSv4.2:
> 
>   N=2000000; base=/var/gdc/export
>   for ((i=1; i<=N; i++)); do
>       d=$base/$(openssl rand -hex 8)
>       mkdir "$d" 2>/dev/null || echo "$(date +%T) failed loop=$i $d"
>       rmdir "$d" 2>/dev/null
>   done
> 
> Failures cluster at the cadence at which the server-side auth/export
> cache refresh path causes nfsd to return NFS4ERR_DELAY for CREATE.
> 
> A wire trace of one failure (the three CREATE RPCs all come from a
> single mkdir(2), generated by the do-while in nfs4_proc_mkdir()):
> 
>   client -> server  CREATE name=...  -> NFS4ERR_DELAY
>   ~100 ms later
>   client -> server  CREATE name=...  -> NFS4_OK         (dir created)
>   ~80 us later
>   client -> server  CREATE name=...  -> NFS4ERR_EXIST   (correct)
> 
> Since commit dd862da61e91 ("nfs: fix incorrect handling of large-number
> NFS errors in nfs4_do_mkdir()"), nfs4_handle_exception() is called only
> when _nfs4_proc_mkdir() returned an error.  That gate breaks retry-state
> hygiene: nfs4_do_handle_exception() resets exception.{delay,recovering,
> retry} to 0 on entry, so calling it on success is what previously
> cleared the retry flag set by the preceding NFS4ERR_DELAY iteration.
> With the gate in place, exception.retry stays at 1 after the successful
> retry, the loop runs once more, and the resulting CREATE for an
> already-created name yields NFS4ERR_EXIST -> -EEXIST to userspace.
> 
> Drop the conditional and call nfs4_handle_exception() unconditionally,
> matching every other do-while in fs/nfs/nfs4proc.c (nfs4_proc_symlink(),
> nfs4_proc_link(), etc.).  The dentry/status separation introduced by
> that commit is preserved.
> 
> Fixes: dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir()")
> Reported-and-tested-by: Jan Čípa <jan.cipa@gooddata.com>
> Closes: https://lore.kernel.org/linux-nfs/CA+9S74hSp_tJu2Ffe2BPNC2T25gfkhgjjDkdgSsF5c2rnJq_wA@mail.gmail.com/
> Reviewed-by: NeilBrown <neil@brown.name>
> Cc: stable@vger.kernel.org
> Signed-off-by: Igor Raits <igor.raits@gmail.com>
> ---
>  fs/nfs/nfs4proc.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index a0885ae55abc..ffd14141ea1d 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5393,10 +5393,9 @@ static struct dentry *nfs4_proc_mkdir(struct inode *dir, struct dentry *dentry,
>  	do {
>  		alias = _nfs4_proc_mkdir(dir, dentry, sattr, label, &err);
>  		trace_nfs4_mkdir(dir, &dentry->d_name, err);
> +		err = nfs4_handle_exception(NFS_SERVER(dir), err, &exception);
>  		if (err)
> -			alias = ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir),
> -							      err,
> -							      &exception));
> +			alias = ERR_PTR(err);
>  	} while (exception.retry);
>  	nfs4_label_release_security(label);
>  

Re: [PATCH] NFSv4: clear exception state on successful mkdir retry
Posted by Thorsten Leemhuis 1 week ago
On 5/13/26 09:18, Thorsten Leemhuis wrote:
> [top-posting to facilitate processing]
> 
> @NFSv4 maintainers, just wondering, did this patch maybe fall through
> the cracks? It fixes a regression, that's why it's on my radar. Or was
> there some progress and I missed it?

Still no progress afaics. Feels like I'm missing something obvious or
like I'm totally of track.

Igor, Neil, is that the case? Or are you also waiting for the fix to
make progress?

Ciao, Thorsten

> On 4/29/26 12:49, Igor Raits wrote:
>> After a server returns NFS4ERR_DELAY for an NFSv4 CREATE issued by
>> mkdir(2), the client correctly waits and retries.  When the retry
>> succeeds, however, mkdir(2) can still surface -EEXIST to userspace
>> even though the directory was just created on the server.
>>
>> Reproducer (random 16-hex names so collisions are not the cause)
>> against an in-kernel Linux nfsd; reproduces under both NFSv4.0 and
>> NFSv4.2:
>>
>>   N=2000000; base=/var/gdc/export
>>   for ((i=1; i<=N; i++)); do
>>       d=$base/$(openssl rand -hex 8)
>>       mkdir "$d" 2>/dev/null || echo "$(date +%T) failed loop=$i $d"
>>       rmdir "$d" 2>/dev/null
>>   done
>>
>> Failures cluster at the cadence at which the server-side auth/export
>> cache refresh path causes nfsd to return NFS4ERR_DELAY for CREATE.
>>
>> A wire trace of one failure (the three CREATE RPCs all come from a
>> single mkdir(2), generated by the do-while in nfs4_proc_mkdir()):
>>
>>   client -> server  CREATE name=...  -> NFS4ERR_DELAY
>>   ~100 ms later
>>   client -> server  CREATE name=...  -> NFS4_OK         (dir created)
>>   ~80 us later
>>   client -> server  CREATE name=...  -> NFS4ERR_EXIST   (correct)
>>
>> Since commit dd862da61e91 ("nfs: fix incorrect handling of large-number
>> NFS errors in nfs4_do_mkdir()"), nfs4_handle_exception() is called only
>> when _nfs4_proc_mkdir() returned an error.  That gate breaks retry-state
>> hygiene: nfs4_do_handle_exception() resets exception.{delay,recovering,
>> retry} to 0 on entry, so calling it on success is what previously
>> cleared the retry flag set by the preceding NFS4ERR_DELAY iteration.
>> With the gate in place, exception.retry stays at 1 after the successful
>> retry, the loop runs once more, and the resulting CREATE for an
>> already-created name yields NFS4ERR_EXIST -> -EEXIST to userspace.
>>
>> Drop the conditional and call nfs4_handle_exception() unconditionally,
>> matching every other do-while in fs/nfs/nfs4proc.c (nfs4_proc_symlink(),
>> nfs4_proc_link(), etc.).  The dentry/status separation introduced by
>> that commit is preserved.
>>
>> Fixes: dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir()")
>> Reported-and-tested-by: Jan Čípa <jan.cipa@gooddata.com>
>> Closes: https://lore.kernel.org/linux-nfs/CA+9S74hSp_tJu2Ffe2BPNC2T25gfkhgjjDkdgSsF5c2rnJq_wA@mail.gmail.com/
>> Reviewed-by: NeilBrown <neil@brown.name>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Igor Raits <igor.raits@gmail.com>
>> ---
>>  fs/nfs/nfs4proc.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index a0885ae55abc..ffd14141ea1d 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -5393,10 +5393,9 @@ static struct dentry *nfs4_proc_mkdir(struct inode *dir, struct dentry *dentry,
>>  	do {
>>  		alias = _nfs4_proc_mkdir(dir, dentry, sattr, label, &err);
>>  		trace_nfs4_mkdir(dir, &dentry->d_name, err);
>> +		err = nfs4_handle_exception(NFS_SERVER(dir), err, &exception);
>>  		if (err)
>> -			alias = ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir),
>> -							      err,
>> -							      &exception));
>> +			alias = ERR_PTR(err);
>>  	} while (exception.retry);
>>  	nfs4_label_release_security(label);
>>  
> 

Re: [PATCH] NFSv4: clear exception state on successful mkdir retry
Posted by Anna Schumaker 6 days, 5 hours ago
Hi Thorsten,

On Tue, Jun 9, 2026, at 6:05 AM, Thorsten Leemhuis wrote:
> On 5/13/26 09:18, Thorsten Leemhuis wrote:
>> [top-posting to facilitate processing]
>> 
>> @NFSv4 maintainers, just wondering, did this patch maybe fall through
>> the cracks? It fixes a regression, that's why it's on my radar. Or was
>> there some progress and I missed it?

The patch is in my linux-next branch here: https://git.linux-nfs.org/?p=anna/linux-nfs.git;a=commit;h=238e9b51aa29f48b6243212a3b75c8e48d6b96fd

It'll be included when the merge window opens this weekend.

Anna

>
> Still no progress afaics. Feels like I'm missing something obvious or
> like I'm totally of track.
>
> Igor, Neil, is that the case? Or are you also waiting for the fix to
> make progress?
>
> Ciao, Thorsten
>
>> On 4/29/26 12:49, Igor Raits wrote:
>>> After a server returns NFS4ERR_DELAY for an NFSv4 CREATE issued by
>>> mkdir(2), the client correctly waits and retries.  When the retry
>>> succeeds, however, mkdir(2) can still surface -EEXIST to userspace
>>> even though the directory was just created on the server.
>>>
>>> Reproducer (random 16-hex names so collisions are not the cause)
>>> against an in-kernel Linux nfsd; reproduces under both NFSv4.0 and
>>> NFSv4.2:
>>>
>>>   N=2000000; base=/var/gdc/export
>>>   for ((i=1; i<=N; i++)); do
>>>       d=$base/$(openssl rand -hex 8)
>>>       mkdir "$d" 2>/dev/null || echo "$(date +%T) failed loop=$i $d"
>>>       rmdir "$d" 2>/dev/null
>>>   done
>>>
>>> Failures cluster at the cadence at which the server-side auth/export
>>> cache refresh path causes nfsd to return NFS4ERR_DELAY for CREATE.
>>>
>>> A wire trace of one failure (the three CREATE RPCs all come from a
>>> single mkdir(2), generated by the do-while in nfs4_proc_mkdir()):
>>>
>>>   client -> server  CREATE name=...  -> NFS4ERR_DELAY
>>>   ~100 ms later
>>>   client -> server  CREATE name=...  -> NFS4_OK         (dir created)
>>>   ~80 us later
>>>   client -> server  CREATE name=...  -> NFS4ERR_EXIST   (correct)
>>>
>>> Since commit dd862da61e91 ("nfs: fix incorrect handling of large-number
>>> NFS errors in nfs4_do_mkdir()"), nfs4_handle_exception() is called only
>>> when _nfs4_proc_mkdir() returned an error.  That gate breaks retry-state
>>> hygiene: nfs4_do_handle_exception() resets exception.{delay,recovering,
>>> retry} to 0 on entry, so calling it on success is what previously
>>> cleared the retry flag set by the preceding NFS4ERR_DELAY iteration.
>>> With the gate in place, exception.retry stays at 1 after the successful
>>> retry, the loop runs once more, and the resulting CREATE for an
>>> already-created name yields NFS4ERR_EXIST -> -EEXIST to userspace.
>>>
>>> Drop the conditional and call nfs4_handle_exception() unconditionally,
>>> matching every other do-while in fs/nfs/nfs4proc.c (nfs4_proc_symlink(),
>>> nfs4_proc_link(), etc.).  The dentry/status separation introduced by
>>> that commit is preserved.
>>>
>>> Fixes: dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir()")
>>> Reported-and-tested-by: Jan Čípa <jan.cipa@gooddata.com>
>>> Closes: https://lore.kernel.org/linux-nfs/CA+9S74hSp_tJu2Ffe2BPNC2T25gfkhgjjDkdgSsF5c2rnJq_wA@mail.gmail.com/
>>> Reviewed-by: NeilBrown <neil@brown.name>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Igor Raits <igor.raits@gmail.com>
>>> ---
>>>  fs/nfs/nfs4proc.c | 5 ++---
>>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index a0885ae55abc..ffd14141ea1d 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -5393,10 +5393,9 @@ static struct dentry *nfs4_proc_mkdir(struct inode *dir, struct dentry *dentry,
>>>  	do {
>>>  		alias = _nfs4_proc_mkdir(dir, dentry, sattr, label, &err);
>>>  		trace_nfs4_mkdir(dir, &dentry->d_name, err);
>>> +		err = nfs4_handle_exception(NFS_SERVER(dir), err, &exception);
>>>  		if (err)
>>> -			alias = ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir),
>>> -							      err,
>>> -							      &exception));
>>> +			alias = ERR_PTR(err);
>>>  	} while (exception.retry);
>>>  	nfs4_label_release_security(label);
>>>  
>>
Re: [PATCH] NFSv4: clear exception state on successful mkdir retry
Posted by Thorsten Leemhuis 6 days, 3 hours ago
On 6/10/26 16:28, Anna Schumaker wrote:
> On Tue, Jun 9, 2026, at 6:05 AM, Thorsten Leemhuis wrote:
>> On 5/13/26 09:18, Thorsten Leemhuis wrote:
>>> [top-posting to facilitate processing]
>>> 
>>> @NFSv4 maintainers, just wondering, did this patch maybe fall
>>> through the cracks? It fixes a regression, that's why it's on my
>>> radar. Or was there some progress and I missed it?
> 
> The patch is in my linux-next branch here: https://git.linux-
> nfs.org/?p=anna/linux-
> nfs.git;a=commit;h=238e9b51aa29f48b6243212a3b75c8e48d6b96fd
> 
> It'll be included when the merge window opens this weekend.

Great, thx Anna. Was a bit confused why I could not see in -next 90
minutes ago (that where I checked yesterday before prodding, too), but
it turned up there in the new -next release that was published a few
minutes ago. :-D

Ciao, Thorsten

>> Still no progress afaics. Feels like I'm missing something obvious
>> or like I'm totally of track.
>> 
>> Igor, Neil, is that the case? Or are you also waiting for the fix
>> to make progress?
>> 
>> Ciao, Thorsten
>> 
>>> On 4/29/26 12:49, Igor Raits wrote:
>>>> After a server returns NFS4ERR_DELAY for an NFSv4 CREATE
>>>> issued by mkdir(2), the client correctly waits and retries.
>>>> When the retry succeeds, however, mkdir(2) can still surface -
>>>> EEXIST to userspace even though the directory was just created
>>>> on the server.
>>>> 
>>>> Reproducer (random 16-hex names so collisions are not the
>>>> cause) against an in-kernel Linux nfsd; reproduces under both
>>>> NFSv4.0 and NFSv4.2:
>>>> 
>>>> N=2000000; base=/var/gdc/export for ((i=1; i<=N; i++)); do 
>>>> d=$base/$(openssl rand -hex 8) mkdir "$d" 2>/dev/null || echo
>>>> "$(date +%T) failed loop=$i $d" rmdir "$d" 2>/dev/null done
>>>> 
>>>> Failures cluster at the cadence at which the server-side auth/
>>>> export cache refresh path causes nfsd to return NFS4ERR_DELAY
>>>> for CREATE.
>>>> 
>>>> A wire trace of one failure (the three CREATE RPCs all come
>>>> from a single mkdir(2), generated by the do-while in
>>>> nfs4_proc_mkdir()):
>>>> 
>>>> client -> server  CREATE name=...  -> NFS4ERR_DELAY ~100 ms
>>>> later client -> server  CREATE name=...  -> NFS4_OK
>>>> (dir created) ~80 us later client -> server  CREATE name=...  -
>>>> > NFS4ERR_EXIST   (correct)
>>>> 
>>>> Since commit dd862da61e91 ("nfs: fix incorrect handling of
>>>> large-number NFS errors in nfs4_do_mkdir()"),
>>>> nfs4_handle_exception() is called only when _nfs4_proc_mkdir()
>>>> returned an error.  That gate breaks retry-state hygiene:
>>>> nfs4_do_handle_exception() resets exception.{delay,recovering, 
>>>> retry} to 0 on entry, so calling it on success is what
>>>> previously cleared the retry flag set by the preceding
>>>> NFS4ERR_DELAY iteration. With the gate in place,
>>>> exception.retry stays at 1 after the successful retry, the
>>>> loop runs once more, and the resulting CREATE for an already-
>>>> created name yields NFS4ERR_EXIST -> -EEXIST to userspace.
>>>> 
>>>> Drop the conditional and call nfs4_handle_exception()
>>>> unconditionally, matching every other do-while in fs/nfs/
>>>> nfs4proc.c (nfs4_proc_symlink(), nfs4_proc_link(), etc.).  The
>>>> dentry/status separation introduced by that commit is
>>>> preserved.
>>>> 
>>>> Fixes: dd862da61e91 ("nfs: fix incorrect handling of large-
>>>> number NFS errors in nfs4_do_mkdir()") Reported-and-tested-by:
>>>> Jan Čípa <jan.cipa@gooddata.com> Closes: https://
>>>> lore.kernel.org/linux-nfs/
>>>> CA+9S74hSp_tJu2Ffe2BPNC2T25gfkhgjjDkdgSsF5c2rnJq_wA@mail.gmail.com/
>>>>  Reviewed-by: NeilBrown <neil@brown.name> Cc:
>>>> stable@vger.kernel.org Signed-off-by: Igor Raits
>>>> <igor.raits@gmail.com> --- fs/nfs/nfs4proc.c | 5 ++--- 1 file
>>>> changed, 2 insertions(+), 3 deletions(-)
>>>> 
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index
>>>> a0885ae55abc..ffd14141ea1d 100644 --- a/fs/nfs/nfs4proc.c +++
>>>> b/fs/nfs/nfs4proc.c @@ -5393,10 +5393,9 @@ static struct
>>>> dentry *nfs4_proc_mkdir(struct inode *dir, struct dentry
>>>> *dentry, do { alias = _nfs4_proc_mkdir(dir, dentry, sattr,
>>>> label, &err); trace_nfs4_mkdir(dir, &dentry->d_name, err); +
>>>> err = nfs4_handle_exception(NFS_SERVER(dir), err, &exception); 
>>>> if (err) -			alias =
>>>> ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir), -
>>>> err, -							      &exception)); +			alias = ERR_PTR(err); }
>>>> while (exception.retry); nfs4_label_release_security(label);
>>>> 
>>>