[PATCH net-next v2 02/14] i40e: support generic devlink param "max_mac_per_vf"

Jacob Keller posted 14 patches 2 months ago
[PATCH net-next v2 02/14] i40e: support generic devlink param "max_mac_per_vf"
Posted by Jacob Keller 2 months ago
From: Mohammad Heib <mheib@redhat.com>

Currently the i40e driver enforces its own internally calculated per-VF MAC
filter limit, derived from the number of allocated VFs and available
hardware resources. This limit is not configurable by the administrator,
which makes it difficult to control how many MAC addresses each VF may
use.

This patch adds support for the new generic devlink runtime parameter
"max_mac_per_vf" which provides administrators with a way to cap the
number of MAC addresses a VF can use:

- When the parameter is set to 0 (default), the driver continues to use
  its internally calculated limit.

- When set to a non-zero value, the driver applies this value as a strict
  cap for VFs, overriding the internal calculation.

Important notes:

- The configured value is a theoretical maximum. Hardware limits may
  still prevent additional MAC addresses from being added, even if the
  parameter allows it.

- Since MAC filters are a shared hardware resource across all VFs,
  setting a high value may cause resource contention and starve other
  VFs.

- This change gives administrators predictable and flexible control over
  VF resource allocation, while still respecting hardware limitations.

- Previous discussion about this change:
  https://lore.kernel.org/netdev/20250805134042.2604897-2-dhill@redhat.com
  https://lore.kernel.org/netdev/20250823094952.182181-1-mheib@redhat.com

Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |  4 ++
 drivers/net/ethernet/intel/i40e/i40e_devlink.c     | 48 +++++++++++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 31 ++++++++++----
 Documentation/networking/devlink/i40e.rst          | 32 +++++++++++++++
 4 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 801a57a925da..d2d03db2acec 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -574,6 +574,10 @@ struct i40e_pf {
 	struct i40e_vf *vf;
 	int num_alloc_vfs;	/* actual number of VFs allocated */
 	u32 vf_aq_requests;
+	/* If set to non-zero, the device uses this value
+	 * as maximum number of MAC filters per VF.
+	 */
+	u32 max_mac_per_vf;
 	u32 arq_overflows;	/* Not fatal, possibly indicative of problems */
 	struct ratelimit_state mdd_message_rate_limit;
 	/* DCBx/DCBNL capability for PF that indicates
diff --git a/drivers/net/ethernet/intel/i40e/i40e_devlink.c b/drivers/net/ethernet/intel/i40e/i40e_devlink.c
index cc4e9e2addb7..cd01e35da94e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_devlink.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_devlink.c
@@ -5,6 +5,35 @@
 #include "i40e.h"
 #include "i40e_devlink.h"
 
+static int i40e_max_mac_per_vf_set(struct devlink *devlink,
+				   u32 id,
+				   struct devlink_param_gset_ctx *ctx,
+				   struct netlink_ext_ack *extack)
+{
+	struct i40e_pf *pf = devlink_priv(devlink);
+
+	pf->max_mac_per_vf = ctx->val.vu32;
+	return 0;
+}
+
+static int i40e_max_mac_per_vf_get(struct devlink *devlink,
+				   u32 id,
+				   struct devlink_param_gset_ctx *ctx)
+{
+	struct i40e_pf *pf = devlink_priv(devlink);
+
+	ctx->val.vu32 = pf->max_mac_per_vf;
+	return 0;
+}
+
+static const struct devlink_param i40e_dl_params[] = {
+	DEVLINK_PARAM_GENERIC(MAX_MAC_PER_VF,
+			      BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			      i40e_max_mac_per_vf_get,
+			      i40e_max_mac_per_vf_set,
+			      NULL),
+};
+
 static void i40e_info_get_dsn(struct i40e_pf *pf, char *buf, size_t len)
 {
 	u8 dsn[8];
@@ -165,7 +194,18 @@ void i40e_free_pf(struct i40e_pf *pf)
  **/
 void i40e_devlink_register(struct i40e_pf *pf)
 {
-	devlink_register(priv_to_devlink(pf));
+	struct devlink *dl = priv_to_devlink(pf);
+	struct device *dev = &pf->pdev->dev;
+	int err;
+
+	err = devlink_params_register(dl, i40e_dl_params,
+				      ARRAY_SIZE(i40e_dl_params));
+	if (err)
+		dev_err(dev,
+			"devlink params register failed with error %d", err);
+
+	devlink_register(dl);
+
 }
 
 /**
@@ -176,7 +216,11 @@ void i40e_devlink_register(struct i40e_pf *pf)
  **/
 void i40e_devlink_unregister(struct i40e_pf *pf)
 {
-	devlink_unregister(priv_to_devlink(pf));
+	struct devlink *dl = priv_to_devlink(pf);
+
+	devlink_unregister(dl);
+	devlink_params_unregister(dl, i40e_dl_params,
+				  ARRAY_SIZE(i40e_dl_params));
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 081a4526a2f0..6e154a8aa474 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2935,33 +2935,48 @@ static inline int i40e_check_vf_permission(struct i40e_vf *vf,
 		if (!f)
 			++mac_add_cnt;
 	}
-
-	/* If this VF is not privileged, then we can't add more than a limited
-	 * number of addresses.
+	/* Determine the maximum number of MAC addresses this VF may use.
 	 *
-	 * If this VF is trusted, it can use more resources than untrusted.
-	 * However to ensure that every trusted VF has appropriate number of
-	 * resources, divide whole pool of resources per port and then across
-	 * all VFs.
+	 * - For untrusted VFs: use a fixed small limit.
+	 *
+	 * - For trusted VFs: limit is calculated by dividing total MAC
+	 *  filter pool across all VFs/ports.
+	 *
+	 * - User can override this by devlink param "max_mac_per_vf".
+	 *   If set its value is used as a strict cap for both trusted and
+	 *   untrusted VFs.
+	 *   Note:
+	 *    even when overridden, this is a theoretical maximum; hardware
+	 *    may reject additional MACs if the absolute HW limit is reached.
 	 */
 	if (!vf_trusted)
 		mac_add_max = I40E_VC_MAX_MAC_ADDR_PER_VF;
 	else
 		mac_add_max = I40E_VC_MAX_MACVLAN_PER_TRUSTED_VF(pf->num_alloc_vfs, hw->num_ports);
 
+	if (pf->max_mac_per_vf > 0)
+		mac_add_max = pf->max_mac_per_vf;
+
 	/* VF can replace all its filters in one step, in this case mac_add_max
 	 * will be added as active and another mac_add_max will be in
 	 * a to-be-removed state. Account for that.
 	 */
 	if ((i40e_count_active_filters(vsi) + mac_add_cnt) > mac_add_max ||
 	    (i40e_count_all_filters(vsi) + mac_add_cnt) > 2 * mac_add_max) {
+		if (pf->max_mac_per_vf == mac_add_max && mac_add_max > 0) {
+			dev_err(&pf->pdev->dev,
+				"Cannot add more MAC addresses: VF reached its maximum allowed limit (%d)\n",
+				mac_add_max);
+				return -EPERM;
+		}
 		if (!vf_trusted) {
 			dev_err(&pf->pdev->dev,
 				"Cannot add more MAC addresses, VF is not trusted, switch the VF to trusted to add more functionality\n");
 			return -EPERM;
 		} else {
 			dev_err(&pf->pdev->dev,
-				"Cannot add more MAC addresses, trusted VF exhausted it's resources\n");
+				"Cannot add more MAC addresses: trusted VF reached its maximum allowed limit (%d)\n",
+				mac_add_max);
 			return -EPERM;
 		}
 	}
diff --git a/Documentation/networking/devlink/i40e.rst b/Documentation/networking/devlink/i40e.rst
index d3cb5bb5197e..7480f0300fdb 100644
--- a/Documentation/networking/devlink/i40e.rst
+++ b/Documentation/networking/devlink/i40e.rst
@@ -7,6 +7,38 @@ i40e devlink support
 This document describes the devlink features implemented by the ``i40e``
 device driver.
 
+Parameters
+==========
+
+.. list-table:: Generic parameters implemented
+    :widths: 5 5 90
+
+    * - Name
+      - Mode
+      - Notes
+    * - ``max_mac_per_vf``
+      - runtime
+      - Controls the maximum number of MAC addresses a VF can use
+        on i40e devices.
+
+        By default (``0``), the driver enforces its internally calculated per-VF
+        MAC filter limit, which is based on the number of allocated VFS.
+
+        If set to a non-zero value, this parameter acts as a strict cap:
+        the driver will use the user-provided value instead of its internal
+        calculation.
+
+        **Important notes:**
+
+        - MAC filters are a **shared hardware resource** across all VFs.
+          Setting a high value may cause other VFs to be starved of filters.
+        - This value is a **theoretical maximum**. The hardware may return
+          errors when its absolute limit is reached, regardless of the value
+          set here.
+
+        The default value is ``0`` (internal calculation is used).
+
+
 Info versions
 =============
 

-- 
2.51.0.rc1.197.g6d975e95c9d7
Re: [PATCH net-next v2 02/14] i40e: support generic devlink param "max_mac_per_vf"
Posted by Jakub Kicinski 1 month, 4 weeks ago
On Thu, 16 Oct 2025 23:08:31 -0700 Jacob Keller wrote:
> - The configured value is a theoretical maximum. Hardware limits may
>   still prevent additional MAC addresses from being added, even if the
>   parameter allows it.

Is "administrative policy" better than "theoretical max" ?

Also -- should we be scanning the existing state to check if some VM
hasn't violated the new setting and error or at least return a extack
to the user to warn that the policy is not currently adhered to?
Re: [PATCH net-next v2 02/14] i40e: support generic devlink param "max_mac_per_vf"
Posted by Jacob Keller 1 month, 4 weeks ago

On 10/20/2025 6:25 PM, Jakub Kicinski wrote:
> On Thu, 16 Oct 2025 23:08:31 -0700 Jacob Keller wrote:
>> - The configured value is a theoretical maximum. Hardware limits may
>>   still prevent additional MAC addresses from being added, even if the
>>   parameter allows it.
> 
> Is "administrative policy" better than "theoretical max" ?
> 

That could be a bit more accurate.

> Also -- should we be scanning the existing state to check if some VM
> hasn't violated the new setting and error or at least return a extack
> to the user to warn that the policy is not currently adhered to?

My understanding here is that this enforces the VF to never go *above*
this value, but its possible some other hardware restriction (i.e. out
of filters) could prevent a VF from adding more filters even if the
value is set higher.

Basically, this sets the maximum allowed number of filters, but doesn't
guarantee that many filters are actually available, at least on X710
where filters are a shared resource and we do not have a good mechanism
to coordinate across PFs to confirm how many have been made available or
reserved already. (Until firmware rejects adding a filter because
resources are capped)

Thus, I don't think we need to scan to check anything here. VFs should
be unable to exceed this limit, and thats checked on filter add.
Re: [PATCH net-next v2 02/14] i40e: support generic devlink param "max_mac_per_vf"
Posted by Jakub Kicinski 1 month, 4 weeks ago
On Tue, 21 Oct 2025 13:39:27 -0700 Jacob Keller wrote:
> On 10/20/2025 6:25 PM, Jakub Kicinski wrote:
> > On Thu, 16 Oct 2025 23:08:31 -0700 Jacob Keller wrote:  
> >> - The configured value is a theoretical maximum. Hardware limits may
> >>   still prevent additional MAC addresses from being added, even if the
> >>   parameter allows it.  
> > 
> > Is "administrative policy" better than "theoretical max" ?
> 
> That could be a bit more accurate.
> 
> > Also -- should we be scanning the existing state to check if some VM
> > hasn't violated the new setting and error or at least return a extack
> > to the user to warn that the policy is not currently adhered to?  
> 
> My understanding here is that this enforces the VF to never go *above*
> this value, but its possible some other hardware restriction (i.e. out
> of filters) could prevent a VF from adding more filters even if the
> value is set higher.
> 
> Basically, this sets the maximum allowed number of filters, but doesn't
> guarantee that many filters are actually available, at least on X710
> where filters are a shared resource and we do not have a good mechanism
> to coordinate across PFs to confirm how many have been made available or
> reserved already. (Until firmware rejects adding a filter because
> resources are capped)
> 
> Thus, I don't think we need to scan to check anything here. VFs should
> be unable to exceed this limit, and thats checked on filter add.

Sorry, just to be clear -- this comment is independent on the comment
about "policy" vs "theoretical".

What if:
 - max is set to 4
 - VF 1 adds 4 filters
 - (some time later) user asks to decrease max to 2

The devlink param is CMODE_RUNTIME so I'm assuming it can be tweaked 
at any point in time.

We probably don't want to prevent lowering the max as admin has no way
to flush the filters. Either we don't let the knob be turned when SRIOV
is enabled or we should warn if some VF has more filters than the new
max?
Re: [PATCH net-next v2 02/14] i40e: support generic devlink param "max_mac_per_vf"
Posted by Jacob Keller 1 month, 3 weeks ago

On 10/21/2025 4:07 PM, Jakub Kicinski wrote:
> On Tue, 21 Oct 2025 13:39:27 -0700 Jacob Keller wrote:
>> On 10/20/2025 6:25 PM, Jakub Kicinski wrote:
>>> On Thu, 16 Oct 2025 23:08:31 -0700 Jacob Keller wrote:  
>>>> - The configured value is a theoretical maximum. Hardware limits may
>>>>   still prevent additional MAC addresses from being added, even if the
>>>>   parameter allows it.  
>>>
>>> Is "administrative policy" better than "theoretical max" ?
>>
>> That could be a bit more accurate.
>>
>>> Also -- should we be scanning the existing state to check if some VM
>>> hasn't violated the new setting and error or at least return a extack
>>> to the user to warn that the policy is not currently adhered to?  
>>
>> My understanding here is that this enforces the VF to never go *above*
>> this value, but its possible some other hardware restriction (i.e. out
>> of filters) could prevent a VF from adding more filters even if the
>> value is set higher.
>>
>> Basically, this sets the maximum allowed number of filters, but doesn't
>> guarantee that many filters are actually available, at least on X710
>> where filters are a shared resource and we do not have a good mechanism
>> to coordinate across PFs to confirm how many have been made available or
>> reserved already. (Until firmware rejects adding a filter because
>> resources are capped)
>>
>> Thus, I don't think we need to scan to check anything here. VFs should
>> be unable to exceed this limit, and thats checked on filter add.
> 
> Sorry, just to be clear -- this comment is independent on the comment
> about "policy" vs "theoretical".
> 
> What if:
>  - max is set to 4
>  - VF 1 adds 4 filters
>  - (some time later) user asks to decrease max to 2
> 
> The devlink param is CMODE_RUNTIME so I'm assuming it can be tweaked 
> at any point in time.
> 
> We probably don't want to prevent lowering the max as admin has no way
> to flush the filters. Either we don't let the knob be turned when SRIOV
> is enabled or we should warn if some VF has more filters than the new
> max?

Ah, yes that makes sense to me. I think the best approach is just return
-EBUSY if there are active VFs. We could implement warning logic
instead, but I think most of the time the administrator should be
expected to configure this once during setup (i.e. a boot up script or
something), and not during runtime.
Re: [PATCH net-next v2 02/14] i40e: support generic devlink param "max_mac_per_vf"
Posted by mohammad heib 1 month, 3 weeks ago
Thanks Jakub and Jacob for the feedback.

On 10/23/25 1:11 AM, Jacob Keller wrote:
> 
> 
> On 10/21/2025 4:07 PM, Jakub Kicinski wrote:
>> On Tue, 21 Oct 2025 13:39:27 -0700 Jacob Keller wrote:
>>> On 10/20/2025 6:25 PM, Jakub Kicinski wrote:
>>>> On Thu, 16 Oct 2025 23:08:31 -0700 Jacob Keller wrote:
>>>>> - The configured value is a theoretical maximum. Hardware limits may
>>>>>    still prevent additional MAC addresses from being added, even if the
>>>>>    parameter allows it.
>>>>
>>>> Is "administrative policy" better than "theoretical max" ?
>>>
>>> That could be a bit more accurate.
>>>
>>>> Also -- should we be scanning the existing state to check if some VM
>>>> hasn't violated the new setting and error or at least return a extack
>>>> to the user to warn that the policy is not currently adhered to?
>>>
>>> My understanding here is that this enforces the VF to never go *above*
>>> this value, but its possible some other hardware restriction (i.e. out
>>> of filters) could prevent a VF from adding more filters even if the
>>> value is set higher.
>>>
>>> Basically, this sets the maximum allowed number of filters, but doesn't
>>> guarantee that many filters are actually available, at least on X710
>>> where filters are a shared resource and we do not have a good mechanism
>>> to coordinate across PFs to confirm how many have been made available or
>>> reserved already. (Until firmware rejects adding a filter because
>>> resources are capped)
>>>
>>> Thus, I don't think we need to scan to check anything here. VFs should
>>> be unable to exceed this limit, and thats checked on filter add.
>>
>> Sorry, just to be clear -- this comment is independent on the comment
>> about "policy" vs "theoretical".
>>
>> What if:
>>   - max is set to 4
>>   - VF 1 adds 4 filters
>>   - (some time later) user asks to decrease max to 2
>>
>> The devlink param is CMODE_RUNTIME so I'm assuming it can be tweaked
>> at any point in time.
>>
>> We probably don't want to prevent lowering the max as admin has no way
>> to flush the filters. Either we don't let the knob be turned when SRIOV
>> is enabled or we should warn if some VF has more filters than the new
>> max?
> 
> Ah, yes that makes sense to me. I think the best approach is just return
> -EBUSY if there are active VFs. We could implement warning logic
> instead, but I think most of the time the administrator should be
> expected to configure this once during setup (i.e. a boot up script or
> something), and not during runtime.

To make sure I understood correctly before sending the next version:
  - I need to update the parameter documentation to describe it as an 
   administrative policy limit instead of a “theoretical maximum.

  - I need to modify the set callback to return -EBUSY if VFs are 
already allocated, so the parameter can only be changed before enabling 
SR-IOV.

  - I need to mention this behavior explicitly in the i40e devlink 
parameter description.

Also, just to confirm — should I resend the updated patch directly to 
the upstream netdev mailing list, or should it go through the IWL list 
first?

Thanks,

Re: [PATCH net-next v2 02/14] i40e: support generic devlink param "max_mac_per_vf"
Posted by mohammad heib 1 month, 4 weeks ago
Thank you for the review.

As Jacob Keller mentioned, this change enforces that a VF can never go 
above the maximum allowed value. However, there could still be other 
hardware-related restrictions.

Regarding the scenario you described, if the maximum is decreased to 2 
after VF1 has already added 4 filters, then the next time the user tries 
to add a new MAC address to VF1 (or to any VF that already has 2 or more 
MAC filters), they will see an error message in the kernel log:
  "Cannot add more MAC addresses: VF reached its maximum allowed limit 2"

I didn’t really consider the decreasing scenario, since this change is 
intended to be configured by the system administrator once, before 
setting up the VFs. If for some reason they decide to reduce the limit 
during the VF’s lifetime, I believe it’s the user’s responsibility to 
first remove the old MAC addresses and filters from the VF.


On 10/22/25 2:07 AM, Jakub Kicinski wrote:
> On Tue, 21 Oct 2025 13:39:27 -0700 Jacob Keller wrote:
>> On 10/20/2025 6:25 PM, Jakub Kicinski wrote:
>>> On Thu, 16 Oct 2025 23:08:31 -0700 Jacob Keller wrote:
>>>> - The configured value is a theoretical maximum. Hardware limits may
>>>>    still prevent additional MAC addresses from being added, even if the
>>>>    parameter allows it.
>>>
>>> Is "administrative policy" better than "theoretical max" ?
>>
>> That could be a bit more accurate.
>>
>>> Also -- should we be scanning the existing state to check if some VM
>>> hasn't violated the new setting and error or at least return a extack
>>> to the user to warn that the policy is not currently adhered to?
>>
>> My understanding here is that this enforces the VF to never go *above*
>> this value, but its possible some other hardware restriction (i.e. out
>> of filters) could prevent a VF from adding more filters even if the
>> value is set higher.
>>
>> Basically, this sets the maximum allowed number of filters, but doesn't
>> guarantee that many filters are actually available, at least on X710
>> where filters are a shared resource and we do not have a good mechanism
>> to coordinate across PFs to confirm how many have been made available or
>> reserved already. (Until firmware rejects adding a filter because
>> resources are capped)
>>
>> Thus, I don't think we need to scan to check anything here. VFs should
>> be unable to exceed this limit, and thats checked on filter add.
> 
> Sorry, just to be clear -- this comment is independent on the comment
> about "policy" vs "theoretical".
> 
> What if:
>   - max is set to 4
>   - VF 1 adds 4 filters
>   - (some time later) user asks to decrease max to 2
> 
> The devlink param is CMODE_RUNTIME so I'm assuming it can be tweaked
> at any point in time.
> 
> We probably don't want to prevent lowering the max as admin has no way
> to flush the filters. Either we don't let the knob be turned when SRIOV
> is enabled or we should warn if some VF has more filters than the new
> max?
>