[PATCH v5 0/6] nvme-fabrics: short-circuit connect retries

Daniel Wagner posted 6 patches 1 year, 10 months ago
There is a newer version of this series
drivers/nvme/host/core.c               |  6 ++--
drivers/nvme/host/fabrics.c            | 25 ++++++-------
drivers/nvme/host/fc.c                 |  4 +--
drivers/nvme/host/nvme.h               | 26 +++++++++++++-
drivers/nvme/host/rdma.c               | 22 ++++++++----
drivers/nvme/host/tcp.c                | 23 +++++++-----
drivers/nvme/target/auth.c             | 22 ++++++------
drivers/nvme/target/configfs.c         | 22 +++++++++---
drivers/nvme/target/fabrics-cmd-auth.c | 49 +++++++++++++-------------
drivers/nvme/target/fabrics-cmd.c      | 11 +++---
drivers/nvme/target/nvmet.h            |  8 ++---
11 files changed, 134 insertions(+), 84 deletions(-)
[PATCH v5 0/6] nvme-fabrics: short-circuit connect retries
Posted by Daniel Wagner 1 year, 10 months ago
The first patch returns only kernel error codes now and avoids overwriting error
codes later. Thje newly introduced helper for deciding if a reconnect should be
attempted is the only place where we have the logic (and documentation).

On the target side I've separate the nvme status from the dhchap status handling
which made it a bit clearer. I was tempted to refactor the code in
nvmet_execute_auth_send to avoid hitting the 80 chars limit but didn't came up
with something nice yet. So let's keep this change at a minimum before any
refactoring attempts.

I've tested with blktests and also an real hardware for nvme-fc.

changes:
v5:
  - nvme: do not mix kernel error code with nvme status
  - nvmet: separate nvme status from dhchap status
  - https://lore.kernel.org/all/20240404154500.2101-1-dwagner@suse.de/

v4:
  - rebased
  - added 'nvme: fixes for authentication errors' series
    https://lore.kernel.org/linux-nvme/20240301112823.132570-1-hare@kernel.org/

v3:
  - added my SOB tag
  - fixed indention
  - https://lore.kernel.org/linux-nvme/20240305080005.3638-1-dwagner@suse.de/

v2:
  - refresh/rebase on current head
  - extended blktests (nvme/045) to cover this case
    (see separate post)
  - https://lore.kernel.org/linux-nvme/20240304161006.19328-1-dwagner@suse.de/
  
v1:
  - initial version
  - https://lore.kernel.org/linux-nvme/20210623143250.82445-1-hare@suse.de/


Daniel Wagner (1):
  nvme-fc: use nvme_ctrl_reconnect to decide reconnect attempts

Hannes Reinecke (5):
  nvme: authentication error are always non-retryable
  nvmet: lock config semaphore when accessing DH-HMAC-CHAP key
  nvme-tcp: short-circuit reconnect retries
  nvme-rdma: short-circuit reconnect retries
  nvmet: return DHCHAP status codes from nvmet_setup_auth()

 drivers/nvme/host/core.c               |  6 ++--
 drivers/nvme/host/fabrics.c            | 25 ++++++-------
 drivers/nvme/host/fc.c                 |  4 +--
 drivers/nvme/host/nvme.h               | 26 +++++++++++++-
 drivers/nvme/host/rdma.c               | 22 ++++++++----
 drivers/nvme/host/tcp.c                | 23 +++++++-----
 drivers/nvme/target/auth.c             | 22 ++++++------
 drivers/nvme/target/configfs.c         | 22 +++++++++---
 drivers/nvme/target/fabrics-cmd-auth.c | 49 +++++++++++++-------------
 drivers/nvme/target/fabrics-cmd.c      | 11 +++---
 drivers/nvme/target/nvmet.h            |  8 ++---
 11 files changed, 134 insertions(+), 84 deletions(-)

-- 
2.44.0
Re: [PATCH v5 0/6] nvme-fabrics: short-circuit connect retries
Posted by Keith Busch 1 year, 10 months ago
On Tue, Apr 09, 2024 at 11:35:04AM +0200, Daniel Wagner wrote:
> The first patch returns only kernel error codes now and avoids overwriting error
> codes later. Thje newly introduced helper for deciding if a reconnect should be
> attempted is the only place where we have the logic (and documentation).
> 
> On the target side I've separate the nvme status from the dhchap status handling
> which made it a bit clearer. I was tempted to refactor the code in
> nvmet_execute_auth_send to avoid hitting the 80 chars limit but didn't came up
> with something nice yet. So let's keep this change at a minimum before any
> refactoring attempts.
> 
> I've tested with blktests and also an real hardware for nvme-fc.

Thanks, series applied to nvme-6.9.
Re: [PATCH v5 0/6] nvme-fabrics: short-circuit connect retries
Posted by Daniel Wagner 1 year, 10 months ago
On Thu, Apr 11, 2024 at 06:35:25PM -0600, Keith Busch wrote:
> On Tue, Apr 09, 2024 at 11:35:04AM +0200, Daniel Wagner wrote:
> > The first patch returns only kernel error codes now and avoids overwriting error
> > codes later. Thje newly introduced helper for deciding if a reconnect should be
> > attempted is the only place where we have the logic (and documentation).
> > 
> > On the target side I've separate the nvme status from the dhchap status handling
> > which made it a bit clearer. I was tempted to refactor the code in
> > nvmet_execute_auth_send to avoid hitting the 80 chars limit but didn't came up
> > with something nice yet. So let's keep this change at a minimum before any
> > refactoring attempts.
> > 
> > I've tested with blktests and also an real hardware for nvme-fc.
> 
> Thanks, series applied to nvme-6.9.

Thanks! I have an updated version here which addresses some of Sagi's
feedback, e.g. using only one helper function. Sorry I didn't send out
it earlier, I got a bit side tracked in testing because of the 'funky'
results with RDMA.

Do you want me to send a complete fresh series or patches on top of this
series? I'm fine either way.
Re: [PATCH v5 0/6] nvme-fabrics: short-circuit connect retries
Posted by Keith Busch 1 year, 10 months ago
On Fri, Apr 12, 2024 at 09:24:04AM +0200, Daniel Wagner wrote:
> On Thu, Apr 11, 2024 at 06:35:25PM -0600, Keith Busch wrote:
> > On Tue, Apr 09, 2024 at 11:35:04AM +0200, Daniel Wagner wrote:
> > > The first patch returns only kernel error codes now and avoids overwriting error
> > > codes later. Thje newly introduced helper for deciding if a reconnect should be
> > > attempted is the only place where we have the logic (and documentation).
> > > 
> > > On the target side I've separate the nvme status from the dhchap status handling
> > > which made it a bit clearer. I was tempted to refactor the code in
> > > nvmet_execute_auth_send to avoid hitting the 80 chars limit but didn't came up
> > > with something nice yet. So let's keep this change at a minimum before any
> > > refactoring attempts.
> > > 
> > > I've tested with blktests and also an real hardware for nvme-fc.
> > 
> > Thanks, series applied to nvme-6.9.
> 
> Thanks! I have an updated version here which addresses some of Sagi's
> feedback, e.g. using only one helper function. Sorry I didn't send out
> it earlier, I got a bit side tracked in testing because of the 'funky'
> results with RDMA.
> 
> Do you want me to send a complete fresh series or patches on top of this
> series? I'm fine either way.

Oh sorry, I didn't notice the discussion carried on after the "review"
tag. Please send me the update, I'll force push.
Re: [PATCH v5 0/6] nvme-fabrics: short-circuit connect retries
Posted by Sagi Grimberg 1 year, 9 months ago

On 12/04/2024 18:24, Keith Busch wrote:
> On Fri, Apr 12, 2024 at 09:24:04AM +0200, Daniel Wagner wrote:
>> On Thu, Apr 11, 2024 at 06:35:25PM -0600, Keith Busch wrote:
>>> On Tue, Apr 09, 2024 at 11:35:04AM +0200, Daniel Wagner wrote:
>>>> The first patch returns only kernel error codes now and avoids overwriting error
>>>> codes later. Thje newly introduced helper for deciding if a reconnect should be
>>>> attempted is the only place where we have the logic (and documentation).
>>>>
>>>> On the target side I've separate the nvme status from the dhchap status handling
>>>> which made it a bit clearer. I was tempted to refactor the code in
>>>> nvmet_execute_auth_send to avoid hitting the 80 chars limit but didn't came up
>>>> with something nice yet. So let's keep this change at a minimum before any
>>>> refactoring attempts.
>>>>
>>>> I've tested with blktests and also an real hardware for nvme-fc.
>>> Thanks, series applied to nvme-6.9.
>> Thanks! I have an updated version here which addresses some of Sagi's
>> feedback, e.g. using only one helper function. Sorry I didn't send out
>> it earlier, I got a bit side tracked in testing because of the 'funky'
>> results with RDMA.
>>
>> Do you want me to send a complete fresh series or patches on top of this
>> series? I'm fine either way.
> Oh sorry, I didn't notice the discussion carried on after the "review"
> tag. Please send me the update, I'll force push.

I think that what we want is to have a special handling in the very specific
connect failure when the host/ctrl credentials mismatch, because out of all
the reasons to a connection failure, this _could_ be transient.