[PATCH net-next V7 02/14] devlink: introduce shared devlink instance for PFs on same chip

Tariq Toukan posted 14 patches 1 week, 4 days ago
[PATCH net-next V7 02/14] devlink: introduce shared devlink instance for PFs on same chip
Posted by Tariq Toukan 1 week, 4 days ago
From: Jiri Pirko <jiri@nvidia.com>

Multiple PFs may reside on the same physical chip, running a single
firmware. Some of the resources and configurations may be shared among
these PFs. Currently, there is no good object to pin the configuration
knobs on.

Introduce a shared devlink instance, instantiated upon probe of the
first PF and removed during remove of the last PF. The shared devlink
instance is backed by a faux device, as there is no PCI device related
to it. The implementation uses reference counting to manage the
lifecycle: each PF that probes calls devlink_shd_get() to get or create
the shared instance, and calls devlink_shd_put() when it removes. The
shared instance is automatically destroyed when the last PF removes.

Example:

$ devlink dev
pci/0000:08:00.0:
  nested_devlink:
    auxiliary/mlx5_core.eth.0
faux/mlx5_core_83013c12b77faa1a30000c82a1045c91:
  nested_devlink:
    pci/0000:08:00.0
    pci/0000:08:00.1
auxiliary/mlx5_core.eth.0
pci/0000:08:00.1:
  nested_devlink:
    auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 include/net/devlink.h |   6 ++
 net/devlink/Makefile  |   2 +-
 net/devlink/sh_dev.c  | 163 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100644 net/devlink/sh_dev.c

diff --git a/include/net/devlink.h b/include/net/devlink.h
index cb839e0435a1..c453faec8ebf 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1644,6 +1644,12 @@ void devlink_register(struct devlink *devlink);
 void devlink_unregister(struct devlink *devlink);
 void devlink_free(struct devlink *devlink);
 
+struct devlink *devlink_shd_get(const char *id,
+				const struct devlink_ops *ops,
+				size_t priv_size);
+void devlink_shd_put(struct devlink *devlink);
+void *devlink_shd_get_priv(struct devlink *devlink);
+
 /**
  * struct devlink_port_ops - Port operations
  * @port_split: Callback used to split the port into multiple ones.
diff --git a/net/devlink/Makefile b/net/devlink/Makefile
index 000da622116a..8f2adb5e5836 100644
--- a/net/devlink/Makefile
+++ b/net/devlink/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-y := core.o netlink.o netlink_gen.o dev.o port.o sb.o dpipe.o \
-	 resource.o param.o region.o health.o trap.o rate.o linecard.o
+	 resource.o param.o region.o health.o trap.o rate.o linecard.o sh_dev.o
diff --git a/net/devlink/sh_dev.c b/net/devlink/sh_dev.c
new file mode 100644
index 000000000000..acc8d549aaae
--- /dev/null
+++ b/net/devlink/sh_dev.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#include <linux/device/faux.h>
+#include <net/devlink.h>
+
+#include "devl_internal.h"
+
+static LIST_HEAD(shd_list);
+static DEFINE_MUTEX(shd_mutex); /* Protects shd_list and shd->list */
+
+/* This structure represents a shared devlink instance,
+ * there is one created per identifier (e.g., serial number).
+ */
+struct devlink_shd {
+	struct list_head list; /* Node in shd list */
+	const char *id; /* Identifier string (e.g., serial number) */
+	struct faux_device *faux_dev; /* Related faux device */
+	refcount_t refcount; /* Reference count */
+	char priv[] __aligned(NETDEV_ALIGN); /* Driver private data */
+};
+
+static struct devlink_shd *devlink_shd_lookup(const char *id)
+{
+	struct devlink_shd *shd;
+
+	list_for_each_entry(shd, &shd_list, list) {
+		if (!strcmp(shd->id, id))
+			return shd;
+	}
+
+	return NULL;
+}
+
+static struct devlink_shd *devlink_shd_create(const char *id,
+					      const struct devlink_ops *ops,
+					      size_t priv_size)
+{
+	struct faux_device *faux_dev;
+	struct devlink_shd *shd;
+	struct devlink *devlink;
+
+	/* Create faux device - probe will be called synchronously */
+	faux_dev = faux_device_create(id, NULL, NULL);
+	if (!faux_dev)
+		return NULL;
+
+	devlink = devlink_alloc(ops, sizeof(struct devlink_shd) + priv_size,
+				&faux_dev->dev);
+	if (!devlink)
+		goto err_devlink_alloc;
+	shd = devlink_priv(devlink);
+
+	shd->id = kstrdup(id, GFP_KERNEL);
+	if (!shd->id)
+		goto err_kstrdup_id;
+	shd->faux_dev = faux_dev;
+	refcount_set(&shd->refcount, 1);
+
+	devl_lock(devlink);
+	devl_register(devlink);
+	devl_unlock(devlink);
+
+	list_add_tail(&shd->list, &shd_list);
+
+	return shd;
+
+err_kstrdup_id:
+	devlink_free(devlink);
+
+err_devlink_alloc:
+	faux_device_destroy(faux_dev);
+	return NULL;
+}
+
+static void devlink_shd_destroy(struct devlink_shd *shd)
+{
+	struct devlink *devlink = priv_to_devlink(shd);
+	struct faux_device *faux_dev = shd->faux_dev;
+
+	list_del(&shd->list);
+	devl_lock(devlink);
+	devl_unregister(devlink);
+	devl_unlock(devlink);
+	kfree(shd->id);
+	devlink_free(devlink);
+	faux_device_destroy(faux_dev);
+}
+
+/**
+ * devlink_shd_get - Get or create a shared devlink instance
+ * @id: Identifier string (e.g., serial number) for the shared instance
+ * @ops: Devlink operations structure
+ * @priv_size: Size of private data structure
+ *
+ * Get an existing shared devlink instance identified by @id, or create
+ * a new one if it doesn't exist. The device is automatically added to
+ * the shared instance's device list. Return the devlink instance
+ * with a reference held. The caller must call devlink_shd_put() when done.
+ *
+ * Return: Pointer to the shared devlink instance on success,
+ *         NULL on failure
+ */
+struct devlink *devlink_shd_get(const char *id,
+				const struct devlink_ops *ops,
+				size_t priv_size)
+{
+	struct devlink_shd *shd;
+
+	if (WARN_ON(!id || !ops))
+		return NULL;
+
+	mutex_lock(&shd_mutex);
+
+	shd = devlink_shd_lookup(id);
+	if (!shd)
+		shd = devlink_shd_create(id, ops, priv_size);
+	else
+		refcount_inc(&shd->refcount);
+
+	mutex_unlock(&shd_mutex);
+	return shd ? priv_to_devlink(shd) : NULL;
+}
+EXPORT_SYMBOL_GPL(devlink_shd_get);
+
+/**
+ * devlink_shd_put - Release a reference on a shared devlink instance
+ * @devlink: Shared devlink instance
+ *
+ * Release a reference on a shared devlink instance obtained via
+ * devlink_shd_get().
+ */
+void devlink_shd_put(struct devlink *devlink)
+{
+	struct devlink_shd *shd;
+
+	if (WARN_ON(!devlink))
+		return;
+
+	mutex_lock(&shd_mutex);
+	shd = devlink_priv(devlink);
+	if (refcount_dec_and_test(&shd->refcount))
+		devlink_shd_destroy(shd);
+	mutex_unlock(&shd_mutex);
+}
+EXPORT_SYMBOL_GPL(devlink_shd_put);
+
+/**
+ * devlink_shd_get_priv - Get private data from shared devlink instance
+ * @devlink: Devlink instance
+ *
+ * Returns a pointer to the driver's private data structure within
+ * the shared devlink instance.
+ *
+ * Return: Pointer to private data
+ */
+void *devlink_shd_get_priv(struct devlink *devlink)
+{
+	struct devlink_shd *shd = devlink_priv(devlink);
+
+	return shd->priv;
+}
+EXPORT_SYMBOL_GPL(devlink_shd_get_priv);
-- 
2.44.0
Re: [PATCH net-next V7 02/14] devlink: introduce shared devlink instance for PFs on same chip
Posted by Jakub Kicinski 5 days, 20 hours ago
On Wed, 28 Jan 2026 13:25:32 +0200 Tariq Toukan wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> Multiple PFs may reside on the same physical chip, running a single
> firmware. Some of the resources and configurations may be shared among
> these PFs. Currently, there is no good object to pin the configuration
> knobs on.
> 
> Introduce a shared devlink instance, instantiated upon probe of the
> first PF and removed during remove of the last PF. The shared devlink
> instance is backed by a faux device, as there is no PCI device related
> to it. The implementation uses reference counting to manage the
> lifecycle: each PF that probes calls devlink_shd_get() to get or create
> the shared instance, and calls devlink_shd_put() when it removes. The
> shared instance is automatically destroyed when the last PF removes.

> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index cb839e0435a1..c453faec8ebf 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -1644,6 +1644,12 @@ void devlink_register(struct devlink *devlink);
>  void devlink_unregister(struct devlink *devlink);
>  void devlink_free(struct devlink *devlink);
>  
> +struct devlink *devlink_shd_get(const char *id,
> +				const struct devlink_ops *ops,
> +				size_t priv_size);
> +void devlink_shd_put(struct devlink *devlink);
> +void *devlink_shd_get_priv(struct devlink *devlink);

Would Cosmin or someone else be willing to take on co-maintainership 
of this API (including reviews of other drivers using it)?
We could add a maintainers entry with:

K:	devlink_shd_

So y'all get CCed.

> +#include <linux/device/faux.h>
> +#include <net/devlink.h>

> +/* This structure represents a shared devlink instance,
> + * there is one created per identifier (e.g., serial number).
> + */
> +struct devlink_shd {
> +	struct list_head list; /* Node in shd list */
> +	const char *id; /* Identifier string (e.g., serial number) */

Why does this have to be a string? The identifier should be irrelevant,
and if something like serial number exists it can be reported in dev
info for the shared instance?

> +	struct faux_device *faux_dev; /* Related faux device */
> +	refcount_t refcount; /* Reference count */
> +	char priv[] __aligned(NETDEV_ALIGN); /* Driver private data */

size member annotated with __counted_by() is missing here

> +};

> +static struct devlink_shd *devlink_shd_create(const char *id,
> +					      const struct devlink_ops *ops,
> +					      size_t priv_size)
> +{
> +	struct faux_device *faux_dev;
> +	struct devlink_shd *shd;
> +	struct devlink *devlink;
> +
> +	/* Create faux device - probe will be called synchronously */
> +	faux_dev = faux_device_create(id, NULL, NULL);
> +	if (!faux_dev)
> +		return NULL;
> +
> +	devlink = devlink_alloc(ops, sizeof(struct devlink_shd) + priv_size,
> +				&faux_dev->dev);
> +	if (!devlink)
> +		goto err_devlink_alloc;

error labels should be named after the target not the source in new code

> +	shd = devlink_priv(devlink);
> +
> +	shd->id = kstrdup(id, GFP_KERNEL);
> +	if (!shd->id)
> +		goto err_kstrdup_id;
> +	shd->faux_dev = faux_dev;
> +	refcount_set(&shd->refcount, 1);
> +
> +	devl_lock(devlink);
> +	devl_register(devlink);
> +	devl_unlock(devlink);
> +
> +	list_add_tail(&shd->list, &shd_list);
> +
> +	return shd;
> +
> +err_kstrdup_id:
> +	devlink_free(devlink);
> +
> +err_devlink_alloc:
> +	faux_device_destroy(faux_dev);
> +	return NULL;
> +}

> +struct devlink *devlink_shd_get(const char *id,
> +				const struct devlink_ops *ops,
> +				size_t priv_size)
> +{
> +	struct devlink_shd *shd;
> +
> +	if (WARN_ON(!id || !ops))
> +		return NULL;

Seems a little too defensive to check input attrs against NULL.
Let the kernel crash if someone is foolish enough..

> +	mutex_lock(&shd_mutex);
> +
> +	shd = devlink_shd_lookup(id);
> +	if (!shd)
> +		shd = devlink_shd_create(id, ops, priv_size);
> +	else
> +		refcount_inc(&shd->refcount);
> +
> +	mutex_unlock(&shd_mutex);
> +	return shd ? priv_to_devlink(shd) : NULL;
> +}
> +EXPORT_SYMBOL_GPL(devlink_shd_get);
> +
> +/**
> + * devlink_shd_put - Release a reference on a shared devlink instance
> + * @devlink: Shared devlink instance
> + *
> + * Release a reference on a shared devlink instance obtained via
> + * devlink_shd_get().
> + */
> +void devlink_shd_put(struct devlink *devlink)
> +{
> +	struct devlink_shd *shd;
> +
> +	if (WARN_ON(!devlink))
> +		return;

ditto

> +	mutex_lock(&shd_mutex);
> +	shd = devlink_priv(devlink);
> +	if (refcount_dec_and_test(&shd->refcount))
> +		devlink_shd_destroy(shd);
> +	mutex_unlock(&shd_mutex);
> +}
> +EXPORT_SYMBOL_GPL(devlink_shd_put);
Re: [PATCH net-next V7 02/14] devlink: introduce shared devlink instance for PFs on same chip
Posted by Jiri Pirko 5 days, 14 hours ago
Tue, Feb 03, 2026 at 04:49:46AM +0100, kuba@kernel.org wrote:
>On Wed, 28 Jan 2026 13:25:32 +0200 Tariq Toukan wrote:
>> From: Jiri Pirko <jiri@nvidia.com>
>> 
>> Multiple PFs may reside on the same physical chip, running a single
>> firmware. Some of the resources and configurations may be shared among
>> these PFs. Currently, there is no good object to pin the configuration
>> knobs on.
>> 
>> Introduce a shared devlink instance, instantiated upon probe of the
>> first PF and removed during remove of the last PF. The shared devlink
>> instance is backed by a faux device, as there is no PCI device related
>> to it. The implementation uses reference counting to manage the
>> lifecycle: each PF that probes calls devlink_shd_get() to get or create
>> the shared instance, and calls devlink_shd_put() when it removes. The
>> shared instance is automatically destroyed when the last PF removes.
>
>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>> index cb839e0435a1..c453faec8ebf 100644
>> --- a/include/net/devlink.h
>> +++ b/include/net/devlink.h
>> @@ -1644,6 +1644,12 @@ void devlink_register(struct devlink *devlink);
>>  void devlink_unregister(struct devlink *devlink);
>>  void devlink_free(struct devlink *devlink);
>>  
>> +struct devlink *devlink_shd_get(const char *id,
>> +				const struct devlink_ops *ops,
>> +				size_t priv_size);
>> +void devlink_shd_put(struct devlink *devlink);
>> +void *devlink_shd_get_priv(struct devlink *devlink);
>
>Would Cosmin or someone else be willing to take on co-maintainership 
>of this API (including reviews of other drivers using it)?
>We could add a maintainers entry with:
>
>K:	devlink_shd_
>
>So y'all get CCed.
>
>> +#include <linux/device/faux.h>
>> +#include <net/devlink.h>
>
>> +/* This structure represents a shared devlink instance,
>> + * there is one created per identifier (e.g., serial number).
>> + */
>> +struct devlink_shd {
>> +	struct list_head list; /* Node in shd list */
>> +	const char *id; /* Identifier string (e.g., serial number) */
>
>Why does this have to be a string? The identifier should be irrelevant,
>and if something like serial number exists it can be reported in dev
>info for the shared instance?

String gives drivers flexibility to use anything. Perhaps I'm missing
your point. Are you againts free-form or just string and buf+buf_len
would be fine?


>
>> +	struct faux_device *faux_dev; /* Related faux device */
>> +	refcount_t refcount; /* Reference count */
>> +	char priv[] __aligned(NETDEV_ALIGN); /* Driver private data */
>
>size member annotated with __counted_by() is missing here

Will add.


>
>> +};
>
>> +static struct devlink_shd *devlink_shd_create(const char *id,
>> +					      const struct devlink_ops *ops,
>> +					      size_t priv_size)
>> +{
>> +	struct faux_device *faux_dev;
>> +	struct devlink_shd *shd;
>> +	struct devlink *devlink;
>> +
>> +	/* Create faux device - probe will be called synchronously */
>> +	faux_dev = faux_device_create(id, NULL, NULL);
>> +	if (!faux_dev)
>> +		return NULL;
>> +
>> +	devlink = devlink_alloc(ops, sizeof(struct devlink_shd) + priv_size,
>> +				&faux_dev->dev);
>> +	if (!devlink)
>> +		goto err_devlink_alloc;
>
>error labels should be named after the target not the source in new code

Okay. Tried to be consistent with the rest of the code. But as you wish.


>
>> +	shd = devlink_priv(devlink);
>> +
>> +	shd->id = kstrdup(id, GFP_KERNEL);
>> +	if (!shd->id)
>> +		goto err_kstrdup_id;
>> +	shd->faux_dev = faux_dev;
>> +	refcount_set(&shd->refcount, 1);
>> +
>> +	devl_lock(devlink);
>> +	devl_register(devlink);
>> +	devl_unlock(devlink);
>> +
>> +	list_add_tail(&shd->list, &shd_list);
>> +
>> +	return shd;
>> +
>> +err_kstrdup_id:
>> +	devlink_free(devlink);
>> +
>> +err_devlink_alloc:
>> +	faux_device_destroy(faux_dev);
>> +	return NULL;
>> +}
>
>> +struct devlink *devlink_shd_get(const char *id,
>> +				const struct devlink_ops *ops,
>> +				size_t priv_size)
>> +{
>> +	struct devlink_shd *shd;
>> +
>> +	if (WARN_ON(!id || !ops))
>> +		return NULL;
>
>Seems a little too defensive to check input attrs against NULL.
>Let the kernel crash if someone is foolish enough..

Okay.


>
>> +	mutex_lock(&shd_mutex);
>> +
>> +	shd = devlink_shd_lookup(id);
>> +	if (!shd)
>> +		shd = devlink_shd_create(id, ops, priv_size);
>> +	else
>> +		refcount_inc(&shd->refcount);
>> +
>> +	mutex_unlock(&shd_mutex);
>> +	return shd ? priv_to_devlink(shd) : NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(devlink_shd_get);
>> +
>> +/**
>> + * devlink_shd_put - Release a reference on a shared devlink instance
>> + * @devlink: Shared devlink instance
>> + *
>> + * Release a reference on a shared devlink instance obtained via
>> + * devlink_shd_get().
>> + */
>> +void devlink_shd_put(struct devlink *devlink)
>> +{
>> +	struct devlink_shd *shd;
>> +
>> +	if (WARN_ON(!devlink))
>> +		return;
>
>ditto

Okay.


>
>> +	mutex_lock(&shd_mutex);
>> +	shd = devlink_priv(devlink);
>> +	if (refcount_dec_and_test(&shd->refcount))
>> +		devlink_shd_destroy(shd);
>> +	mutex_unlock(&shd_mutex);
>> +}
>> +EXPORT_SYMBOL_GPL(devlink_shd_put);
Re: [PATCH net-next V7 02/14] devlink: introduce shared devlink instance for PFs on same chip
Posted by Jakub Kicinski 4 days, 21 hours ago
On Tue, 3 Feb 2026 10:44:16 +0100 Jiri Pirko wrote:
> >> +/* This structure represents a shared devlink instance,
> >> + * there is one created per identifier (e.g., serial number).
> >> + */
> >> +struct devlink_shd {
> >> +	struct list_head list; /* Node in shd list */
> >> +	const char *id; /* Identifier string (e.g., serial number) */  
> >
> >Why does this have to be a string? The identifier should be irrelevant,
> >and if something like serial number exists it can be reported in dev
> >info for the shared instance?  
> 
> String gives drivers flexibility to use anything. Perhaps I'm missing
> your point. Are you againts free-form or just string and buf+buf_len
> would be fine?

I was thinking binary buf+len is fine, and we shouldn't really expose
this to user space in any shape or form (hence no concern about free
form).
Re: [PATCH net-next V7 02/14] devlink: introduce shared devlink instance for PFs on same chip
Posted by Jiri Pirko 4 days, 16 hours ago
Wed, Feb 04, 2026 at 03:42:00AM +0100, kuba@kernel.org wrote:
>On Tue, 3 Feb 2026 10:44:16 +0100 Jiri Pirko wrote:
>> >> +/* This structure represents a shared devlink instance,
>> >> + * there is one created per identifier (e.g., serial number).
>> >> + */
>> >> +struct devlink_shd {
>> >> +	struct list_head list; /* Node in shd list */
>> >> +	const char *id; /* Identifier string (e.g., serial number) */  
>> >
>> >Why does this have to be a string? The identifier should be irrelevant,
>> >and if something like serial number exists it can be reported in dev
>> >info for the shared instance?  
>> 
>> String gives drivers flexibility to use anything. Perhaps I'm missing
>> your point. Are you againts free-form or just string and buf+buf_len
>> would be fine?
>
>I was thinking binary buf+len is fine, and we shouldn't really expose
>this to user space in any shape or form (hence no concern about free
>form).

How you imagine to name faux device then? I'm sensing that you want to
get rid of busname/devname handle for things like this and rely on some
randomly generated index. But the whole ecosystem is bases on
busname/devname handle. Any idea how to overcome that?
Re: [PATCH net-next V7 02/14] devlink: introduce shared devlink instance for PFs on same chip
Posted by Jakub Kicinski 3 days, 21 hours ago
On Wed, 4 Feb 2026 08:15:21 +0100 Jiri Pirko wrote:
> >> String gives drivers flexibility to use anything. Perhaps I'm missing
> >> your point. Are you againts free-form or just string and buf+buf_len
> >> would be fine?  
> >
> >I was thinking binary buf+len is fine, and we shouldn't really expose
> >this to user space in any shape or form (hence no concern about free
> >form).  
> 
> How you imagine to name faux device then? I'm sensing that you want to
> get rid of busname/devname handle for things like this and rely on some
> randomly generated index. But the whole ecosystem is bases on
> busname/devname handle. Any idea how to overcome that?

Quoting myself form the other sub-thread:

  FWIW using devlink day to day, the bus/device is not at all useful as
  an identifier. Most of code touching devlink at Meta either matches
  on devlink dev info or assumes there's one instance on the system.

IOW I think you over-estimate the value of the bus/dev in real life
systems. And that's for instances that have a real bus/dev. Having
a faux bus/dev is completely pointless, right? The only question is
whether some code will straight up crash when seeing a device without
bus/dev.