[PATCH net-next v7 0/5] netconsole: support automatic target recovery

Andre Carvalho posted 5 patches 5 days, 1 hour ago
There is a newer version of this series
drivers/net/netconsole.c                           | 155 +++++++++++++++++----
tools/testing/selftests/drivers/net/Makefile       |   1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh    |  35 ++++-
.../selftests/drivers/net/netcons_resume.sh        |  97 +++++++++++++
4 files changed, 254 insertions(+), 34 deletions(-)
[PATCH net-next v7 0/5] netconsole: support automatic target recovery
Posted by Andre Carvalho 5 days, 1 hour ago
This patchset introduces target resume capability to netconsole allowing
it to recover targets when underlying low-level interface comes back
online.

The patchset starts by refactoring netconsole state representation in
order to allow representing deactivated targets (targets that are
disabled due to interfaces going down).

It then modifies netconsole to handle NETDEV_UP events for such targets
and setups netpoll. Targets are matched with incoming interfaces
depending on how they were initially bound in netconsole (by mac or
interface name).

The patchset includes a selftest that validates netconsole target state
transitions and that target is functional after resumed.

Signed-off-by: Andre Carvalho <asantostc@gmail.com>
---
Changes in v7:
- selftest: use ${EXIT_STATUS} instead of ${ksft_pass} to avoid
  shellcheck warning
- Link to v6: https://lore.kernel.org/r/20251121-netcons-retrigger-v6-0-9c03f5a2bd6f@gmail.com

Changes in v6:
- Rebase on top of net-next to resolve conflicts, no functional changes.
- Link to v5: https://lore.kernel.org/r/20251119-netcons-retrigger-v5-0-2c7dda6055d6@gmail.com

Changes in v5:
- patch 3: Set (de)enslaved target as DISABLED instead of DEACTIVATED to prevent
  resuming it.
- selftest: Fix test cleanup by moving trap line to outside of loop and remove
  unneeded 'local' keyword
- Rename maybe_resume_target to resume_target, add netconsole_ prefix to
  process_resumable_targets.
- Hold device reference before calling __netpoll_setup.
- Link to v4: https://lore.kernel.org/r/20251116-netcons-retrigger-v4-0-5290b5f140c2@gmail.com

Changes in v4:
- Simplify selftest cleanup, removing trap setup in loop.
- Drop netpoll helper (__setup_netpoll_hold) and manage reference inside
  netconsole.
- Move resume_list processing logic to separate function.
- Link to v3: https://lore.kernel.org/r/20251109-netcons-retrigger-v3-0-1654c280bbe6@gmail.com

Changes in v3:
- Resume by mac or interface name depending on how target was created.
- Attempt to resume target without holding target list lock, by moving
  the target to a temporary list. This is required as netpoll may
  attempt to allocate memory.
- Link to v2: https://lore.kernel.org/r/20250921-netcons-retrigger-v2-0-a0e84006237f@gmail.com

Changes in v2:
- Attempt to resume target in the same thread, instead of using
workqueue .
- Add wrapper around __netpoll_setup (patch 4).
- Renamed resume_target to maybe_resume_target and moved conditionals to
inside its implementation, keeping code more clear.
- Verify that device addr matches target mac address when target was
setup using mac.
- Update selftest to cover targets bound by mac and interface name.
- Fix typo in selftest comment and sort tests alphabetically in
  Makefile.
- Link to v1:
https://lore.kernel.org/r/20250909-netcons-retrigger-v1-0-3aea904926cf@gmail.com

---
Andre Carvalho (3):
      netconsole: convert 'enabled' flag to enum for clearer state management
      netconsole: resume previously deactivated target
      selftests: netconsole: validate target resume

Breno Leitao (2):
      netconsole: add target_state enum
      netconsole: add STATE_DEACTIVATED to track targets disabled by low level

 drivers/net/netconsole.c                           | 155 +++++++++++++++++----
 tools/testing/selftests/drivers/net/Makefile       |   1 +
 .../selftests/drivers/net/lib/sh/lib_netcons.sh    |  35 ++++-
 .../selftests/drivers/net/netcons_resume.sh        |  97 +++++++++++++
 4 files changed, 254 insertions(+), 34 deletions(-)
---
base-commit: ab084f0b8d6d2ee4b1c6a28f39a2a7430bdfa7f0
change-id: 20250816-netcons-retrigger-a4f547bfc867

Best regards,
-- 
Andre Carvalho <asantostc@gmail.com>
Re: [PATCH net-next v7 0/5] netconsole: support automatic target recovery
Posted by Jakub Kicinski 4 days, 20 hours ago
On Wed, 26 Nov 2025 20:22:52 +0000 Andre Carvalho wrote:
> This patchset introduces target resume capability to netconsole allowing
> it to recover targets when underlying low-level interface comes back
> online.
> 
> The patchset starts by refactoring netconsole state representation in
> order to allow representing deactivated targets (targets that are
> disabled due to interfaces going down).
> 
> It then modifies netconsole to handle NETDEV_UP events for such targets
> and setups netpoll. Targets are matched with incoming interfaces
> depending on how they were initially bound in netconsole (by mac or
> interface name).

Netpoll does not seem to handle DOWN events, so I'm guessing your
primary use case is that the device had a HW fault and netdev was
recreated after device reset?

Should we not be listening for the REGISTER event then? On boot
we force UP the device if we find it, theoretically there may
be a case where user space is not configured to UP the device,
and then we'd never resume the target?
Re: [PATCH net-next v7 0/5] netconsole: support automatic target recovery
Posted by Andre Carvalho 3 days, 22 hours ago
Hi Jakub!

On Wed, Nov 26, 2025 at 05:36:46PM -0800, Jakub Kicinski wrote:
> Netpoll does not seem to handle DOWN events, so I'm guessing your
> primary use case is that the device had a HW fault and netdev was
> recreated after device reset?

Correct, this is the intended use case for this series. Handling cases where
the device was unregistered and then brought back up.

> Should we not be listening for the REGISTER event then? On boot
> we force UP the device if we find it, theoretically there may
> be a case where user space is not configured to UP the device,
> and then we'd never resume the target?

This is indeed a limitation on the current implementation. Based on
your feedback, I'm working on a new version of this series handling REGISTER
instead of UP and ensuring we force UP the device.
This will make it consistent with the boot behavior you described.

Based on my tests, I can't force the device UP while handling the REGISTER event.
I believe this is due to dev_open attempting to lock the device which is already held.
For this reason, I'm resorting to defering this to a workqueue, similar to my approach 
on v1 [1] (but correctly handling target_list lock).

Let me know if this approach makes sense or if I'm missing something.

Thanks for the review!

[1] https://lore.kernel.org/all/20250909-netcons-retrigger-v1-4-3aea904926cf@gmail.com
-- 
Andre Carvalho
Re: [PATCH net-next v7 0/5] netconsole: support automatic target recovery
Posted by Jakub Kicinski 3 days, 21 hours ago
On Thu, 27 Nov 2025 23:07:02 +0000 Andre Carvalho wrote:
> > Should we not be listening for the REGISTER event then? On boot
> > we force UP the device if we find it, theoretically there may
> > be a case where user space is not configured to UP the device,
> > and then we'd never resume the target?  
> 
> This is indeed a limitation on the current implementation. Based on
> your feedback, I'm working on a new version of this series handling REGISTER
> instead of UP and ensuring we force UP the device.
> This will make it consistent with the boot behavior you described.
> 
> Based on my tests, I can't force the device UP while handling the REGISTER event.
> I believe this is due to dev_open attempting to lock the device which is already held.
> For this reason, I'm resorting to defering this to a workqueue, similar to my approach 
> on v1 [1] (but correctly handling target_list lock).
> 
> Let me know if this approach makes sense or if I'm missing something.

SG, that's probably the most resilient solution.