[PATCH v2 net-next 0/10] pds_core: Various improvements and AQ race condition cleanup

Brett Creeley posted 10 patches 1 year, 11 months ago
drivers/net/ethernet/amd/pds_core/adminq.c  |  74 +++++++----
drivers/net/ethernet/amd/pds_core/core.c    | 130 ++++++++++++--------
drivers/net/ethernet/amd/pds_core/core.h    |   3 +-
drivers/net/ethernet/amd/pds_core/debugfs.c |  12 +-
drivers/net/ethernet/amd/pds_core/dev.c     |  30 +++--
drivers/net/ethernet/amd/pds_core/devlink.c |   3 +-
drivers/net/ethernet/amd/pds_core/fw.c      |   3 +
drivers/net/ethernet/amd/pds_core/main.c    |  26 +++-
8 files changed, 187 insertions(+), 94 deletions(-)
[PATCH v2 net-next 0/10] pds_core: Various improvements and AQ race condition cleanup
Posted by Brett Creeley 1 year, 11 months ago
This series includes the following changes:

There can be many users of the pds_core's adminq. This includes
pds_core's uses and any clients that depend on it. When the pds_core
device goes through a reset for any reason the adminq is freed
and reconfigured. There are some gaps in the current implementation
that will cause crashes during reset if any of the previously mentioned
users of the adminq attempt to use it after it's been freed.

Issues around how resets are handled, specifically regarding the driver's
error handlers.

Some general cleanups.

v1:
https://lore.kernel.org/netdev/20240104171221.31399-1-brett.creeley@amd.com/

v2:
- Combined the RCT clean-ups with an incorrect goto label fix
- Added a couple more patches related to reset flows
- Slightly updated the cover letter to mention the extra patches that
  were added
- Changed a function used only once to be static

Brett Creeley (10):
  pds_core: Prevent health thread from running during reset/remove
  pds_core: Cancel AQ work on teardown
  pds_core: Use struct pdsc for the pdsc_adminq_isr private data
  pds_core: Prevent race issues involving the adminq
  pds_core: Clear BARs on reset
  pds_core: Don't assign interrupt index/bound_intr to notifyq
  pds_core: Unmask adminq interrupt in work thread
  pds_core: Fix up some minor issues
  pds_core: Rework teardown/setup flow to be more common
  pds_core: Clean up init/uninit flows to be more readable

 drivers/net/ethernet/amd/pds_core/adminq.c  |  74 +++++++----
 drivers/net/ethernet/amd/pds_core/core.c    | 130 ++++++++++++--------
 drivers/net/ethernet/amd/pds_core/core.h    |   3 +-
 drivers/net/ethernet/amd/pds_core/debugfs.c |  12 +-
 drivers/net/ethernet/amd/pds_core/dev.c     |  30 +++--
 drivers/net/ethernet/amd/pds_core/devlink.c |   3 +-
 drivers/net/ethernet/amd/pds_core/fw.c      |   3 +
 drivers/net/ethernet/amd/pds_core/main.c    |  26 +++-
 8 files changed, 187 insertions(+), 94 deletions(-)

-- 
2.17.1
Re: [PATCH v2 net-next 0/10] pds_core: Various improvements and AQ race condition cleanup
Posted by Jakub Kicinski 1 year, 11 months ago
On Fri, 26 Jan 2024 09:42:45 -0800 Brett Creeley wrote:
> This series includes the following changes:
> 
> There can be many users of the pds_core's adminq. This includes
> pds_core's uses and any clients that depend on it. When the pds_core
> device goes through a reset for any reason the adminq is freed
> and reconfigured. There are some gaps in the current implementation
> that will cause crashes during reset if any of the previously mentioned
> users of the adminq attempt to use it after it's been freed.
> 
> Issues around how resets are handled, specifically regarding the driver's
> error handlers.

Patches 1, 2 and 4 look like fixes. Is there any reason these are
targeting net-next? If someone deploys this device at scale rare
things will happen a lot..
Re: [PATCH v2 net-next 0/10] pds_core: Various improvements and AQ race condition cleanup
Posted by Brett Creeley 1 year, 10 months ago
On 1/26/2024 8:44 PM, Jakub Kicinski wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Fri, 26 Jan 2024 09:42:45 -0800 Brett Creeley wrote:
>> This series includes the following changes:
>>
>> There can be many users of the pds_core's adminq. This includes
>> pds_core's uses and any clients that depend on it. When the pds_core
>> device goes through a reset for any reason the adminq is freed
>> and reconfigured. There are some gaps in the current implementation
>> that will cause crashes during reset if any of the previously mentioned
>> users of the adminq attempt to use it after it's been freed.
>>
>> Issues around how resets are handled, specifically regarding the driver's
>> error handlers.
> 
> Patches 1, 2 and 4 look like fixes. Is there any reason these are
> targeting net-next? If someone deploys this device at scale rare
> things will happen a lot..

No reason, just an oversight on my part. I actually think patches 1, 2, 
3, 4, 5, and 9 could all go to net. Unfortunately some of these patches 
are intertwined (i.e. patch 10 depends on patch 9).

If I push the previously mentioned patches to net and they get accepted, 
how soon are fixes typically added to the net-next tree so I can 
rebase/re-push the remaining patches?

Thank for the review,

Brett
Re: [PATCH v2 net-next 0/10] pds_core: Various improvements and AQ race condition cleanup
Posted by Jakub Kicinski 1 year, 10 months ago
On Mon, 29 Jan 2024 09:27:21 -0800 Brett Creeley wrote:
> > On Fri, 26 Jan 2024 09:42:45 -0800 Brett Creeley wrote:  
> >> This series includes the following changes:
> >>
> >> There can be many users of the pds_core's adminq. This includes
> >> pds_core's uses and any clients that depend on it. When the pds_core
> >> device goes through a reset for any reason the adminq is freed
> >> and reconfigured. There are some gaps in the current implementation
> >> that will cause crashes during reset if any of the previously mentioned
> >> users of the adminq attempt to use it after it's been freed.
> >>
> >> Issues around how resets are handled, specifically regarding the driver's
> >> error handlers.  
> > 
> > Patches 1, 2 and 4 look like fixes. Is there any reason these are
> > targeting net-next? If someone deploys this device at scale rare
> > things will happen a lot..  
> 
> No reason, just an oversight on my part. I actually think patches 1, 2, 
> 3, 4, 5, and 9 could all go to net. Unfortunately some of these patches 
> are intertwined (i.e. patch 10 depends on patch 9).
> 
> If I push the previously mentioned patches to net and they get accepted, 
> how soon are fixes typically added to the net-next tree so I can 
> rebase/re-push the remaining patches?

net gets merged into net-next very Thursday, exact timing depends on how
quickly Linus pulls from us.
Re: [PATCH v2 net-next 0/10] pds_core: Various improvements and AQ race condition cleanup
Posted by Brett Creeley 1 year, 10 months ago
On 1/29/2024 12:05 PM, Jakub Kicinski wrote:
> net gets merged into net-next very Thursday, exact timing depends on how
> quickly Linus pulls from us

Okay, then I will work on splitting this series up between net and net-next.

Thanks again,

Brett