[PATCH net-next v2 1/3] gve: skip error logging for retryable AdminQ commands

Harshitha Ramamurthy posted 3 patches 6 days, 20 hours ago
[PATCH net-next v2 1/3] gve: skip error logging for retryable AdminQ commands
Posted by Harshitha Ramamurthy 6 days, 20 hours ago
From: Jordan Rhee <jordanrhee@google.com>

AdminQ commands may return -EAGAIN under certain transient conditions.
These commands are intended to be retried by the driver, so logging
a formal error to the system log is misleading and creates
unnecessary noise.

Modify the logging logic to skip the error message when the result
is -EAGAIN.

Reviewed-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jordan Rhee <jordanrhee@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
---
 drivers/net/ethernet/google/gve/gve_adminq.c | 26 +++++++++++++++-----
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 08587bf40ed4..c7834614c5f0 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -416,11 +416,6 @@ static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
 
 static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
 {
-	if (status != GVE_ADMINQ_COMMAND_PASSED &&
-	    status != GVE_ADMINQ_COMMAND_UNSET) {
-		dev_err(&priv->pdev->dev, "AQ command failed with status %d\n", status);
-		priv->adminq_cmd_fail++;
-	}
 	switch (status) {
 	case GVE_ADMINQ_COMMAND_PASSED:
 		return 0;
@@ -455,6 +450,16 @@ static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
 	}
 }
 
+static bool gve_adminq_is_retryable(enum gve_adminq_opcodes opcode)
+{
+	switch (opcode) {
+	case GVE_ADMINQ_REPORT_NIC_TIMESTAMP:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /* Flushes all AQ commands currently queued and waits for them to complete.
  * If there are failures, it will return the first error.
  */
@@ -482,9 +487,18 @@ static int gve_adminq_kick_and_wait(struct gve_priv *priv)
 		cmd = &priv->adminq[i & priv->adminq_mask];
 		status = be32_to_cpu(READ_ONCE(cmd->status));
 		err = gve_adminq_parse_err(priv, status);
-		if (err)
+		if (err) {
+			enum gve_adminq_opcodes opcode =
+				be32_to_cpu(READ_ONCE(cmd->opcode));
+			priv->adminq_cmd_fail++;
+			if (!gve_adminq_is_retryable(opcode) || err != -EAGAIN)
+				dev_err_ratelimited(&priv->pdev->dev,
+						    "AQ command %d failed with status %d\n",
+						    opcode, status);
+
 			// Return the first error if we failed.
 			return err;
+		}
 	}
 
 	return 0;
-- 
2.53.0.1018.g2bb0e51243-goog
Re: [PATCH net-next v2 1/3] gve: skip error logging for retryable AdminQ commands
Posted by Li Xiasong 6 days, 16 hours ago
Hi,

On 3/27/2026 6:45 AM, Harshitha Ramamurthy wrote:
> From: Jordan Rhee <jordanrhee@google.com>
> 
> AdminQ commands may return -EAGAIN under certain transient conditions.
> These commands are intended to be retried by the driver, so logging
> a formal error to the system log is misleading and creates
> unnecessary noise.
> 
> Modify the logging logic to skip the error message when the result
> is -EAGAIN.
> 
> Reviewed-by: Joshua Washington <joshwash@google.com>
> Signed-off-by: Jordan Rhee <jordanrhee@google.com>
> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
> ---
>  drivers/net/ethernet/google/gve/gve_adminq.c | 26 +++++++++++++++-----
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
> index 08587bf40ed4..c7834614c5f0 100644
> --- a/drivers/net/ethernet/google/gve/gve_adminq.c
> +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
> @@ -416,11 +416,6 @@ static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
>  
>  static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
>  {
> -	if (status != GVE_ADMINQ_COMMAND_PASSED &&
> -	    status != GVE_ADMINQ_COMMAND_UNSET) {
> -		dev_err(&priv->pdev->dev, "AQ command failed with status %d\n", status);
> -		priv->adminq_cmd_fail++;
> -	}
>  	switch (status) {
>  	case GVE_ADMINQ_COMMAND_PASSED:
>  		return 0;
> @@ -455,6 +450,16 @@ static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
>  	}
>  }
>  
> +static bool gve_adminq_is_retryable(enum gve_adminq_opcodes opcode)
> +{
> +	switch (opcode) {
> +	case GVE_ADMINQ_REPORT_NIC_TIMESTAMP:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
>  /* Flushes all AQ commands currently queued and waits for them to complete.
>   * If there are failures, it will return the first error.
>   */
> @@ -482,9 +487,18 @@ static int gve_adminq_kick_and_wait(struct gve_priv *priv)
>  		cmd = &priv->adminq[i & priv->adminq_mask];
>  		status = be32_to_cpu(READ_ONCE(cmd->status));
>  		err = gve_adminq_parse_err(priv, status);
> -		if (err)
> +		if (err) {
> +			enum gve_adminq_opcodes opcode =
> +				be32_to_cpu(READ_ONCE(cmd->opcode));
> +			priv->adminq_cmd_fail++;
> +			if (!gve_adminq_is_retryable(opcode) || err != -EAGAIN)

In gve_adminq_kick_and_wait(), the condition is:

	if (!gve_adminq_is_retryable(opcode) || err != -EAGAIN)
		dev_err_ratelimited(...);

Based on the commit log, the goal is to skip logging when the result is
-EAGAIN for transient conditions. However, when gve_adminq_is_retryable()
returns false (e.g., GVE_ADMINQ_COMMAND_ERROR_ABORTED), even if err is
-EAGAIN, the condition evaluates to true and the error would still be logged.

Would it be more appropriate to use && instead of || here?

	if (!gve_adminq_is_retryable(opcode) && err != -EAGAIN)

I may be missing something, so please let me know if I've misunderstood.

> +				dev_err_ratelimited(&priv->pdev->dev,
> +						    "AQ command %d failed with status %d\n",
> +						    opcode, status);
> +
>  			// Return the first error if we failed.
>  			return err;
> +		}
>  	}
>  
>  	return 0;

Best regards,
Li Xiasong
Re: [PATCH net-next v2 1/3] gve: skip error logging for retryable AdminQ commands
Posted by Jordan Rhee 6 days, 15 hours ago
Hi Li, thank you very much for the review. The intent is only to skip
logging for *retryable* commands that return -EAGAIN. If a
non-retryable command fails, we do want to log, even if it returns
-EAGAIN.
Jordan


On Thu, Mar 26, 2026 at 7:28 PM Li Xiasong <lixiasong1@huawei.com> wrote:
>
> Hi,
>
> On 3/27/2026 6:45 AM, Harshitha Ramamurthy wrote:
> > From: Jordan Rhee <jordanrhee@google.com>
> >
> > AdminQ commands may return -EAGAIN under certain transient conditions.
> > These commands are intended to be retried by the driver, so logging
> > a formal error to the system log is misleading and creates
> > unnecessary noise.
> >
> > Modify the logging logic to skip the error message when the result
> > is -EAGAIN.
> >
> > Reviewed-by: Joshua Washington <joshwash@google.com>
> > Signed-off-by: Jordan Rhee <jordanrhee@google.com>
> > Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
> > ---
> >  drivers/net/ethernet/google/gve/gve_adminq.c | 26 +++++++++++++++-----
> >  1 file changed, 20 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
> > index 08587bf40ed4..c7834614c5f0 100644
> > --- a/drivers/net/ethernet/google/gve/gve_adminq.c
> > +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
> > @@ -416,11 +416,6 @@ static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
> >
> >  static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
> >  {
> > -     if (status != GVE_ADMINQ_COMMAND_PASSED &&
> > -         status != GVE_ADMINQ_COMMAND_UNSET) {
> > -             dev_err(&priv->pdev->dev, "AQ command failed with status %d\n", status);
> > -             priv->adminq_cmd_fail++;
> > -     }
> >       switch (status) {
> >       case GVE_ADMINQ_COMMAND_PASSED:
> >               return 0;
> > @@ -455,6 +450,16 @@ static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
> >       }
> >  }
> >
> > +static bool gve_adminq_is_retryable(enum gve_adminq_opcodes opcode)
> > +{
> > +     switch (opcode) {
> > +     case GVE_ADMINQ_REPORT_NIC_TIMESTAMP:
> > +             return true;
> > +     default:
> > +             return false;
> > +     }
> > +}
> > +
> >  /* Flushes all AQ commands currently queued and waits for them to complete.
> >   * If there are failures, it will return the first error.
> >   */
> > @@ -482,9 +487,18 @@ static int gve_adminq_kick_and_wait(struct gve_priv *priv)
> >               cmd = &priv->adminq[i & priv->adminq_mask];
> >               status = be32_to_cpu(READ_ONCE(cmd->status));
> >               err = gve_adminq_parse_err(priv, status);
> > -             if (err)
> > +             if (err) {
> > +                     enum gve_adminq_opcodes opcode =
> > +                             be32_to_cpu(READ_ONCE(cmd->opcode));
> > +                     priv->adminq_cmd_fail++;
> > +                     if (!gve_adminq_is_retryable(opcode) || err != -EAGAIN)
>
> In gve_adminq_kick_and_wait(), the condition is:
>
>         if (!gve_adminq_is_retryable(opcode) || err != -EAGAIN)
>                 dev_err_ratelimited(...);
>
> Based on the commit log, the goal is to skip logging when the result is
> -EAGAIN for transient conditions. However, when gve_adminq_is_retryable()
> returns false (e.g., GVE_ADMINQ_COMMAND_ERROR_ABORTED), even if err is
> -EAGAIN, the condition evaluates to true and the error would still be logged.
>
> Would it be more appropriate to use && instead of || here?
>
>         if (!gve_adminq_is_retryable(opcode) && err != -EAGAIN)
>
> I may be missing something, so please let me know if I've misunderstood.
>
> > +                             dev_err_ratelimited(&priv->pdev->dev,
> > +                                                 "AQ command %d failed with status %d\n",
> > +                                                 opcode, status);
> > +
> >                       // Return the first error if we failed.
> >                       return err;
> > +             }
> >       }
> >
> >       return 0;
>
> Best regards,
> Li Xiasong
Re: [PATCH net-next v2 1/3] gve: skip error logging for retryable AdminQ commands
Posted by Li Xiasong 6 days, 13 hours ago
Hi, Jordan

On 3/27/2026 11:56 AM, Jordan Rhee wrote:
> Hi Li, thank you very much for the review. The intent is only to skip
> logging for *retryable* commands that return -EAGAIN. If a
> non-retryable command fails, we do want to log, even if it returns
> -EAGAIN.
> Jordan
>

Thanks for the explanation! I totally misread the condition - I confused
opcode with err because they both start with GVE_ADMINQ_.

The || makes perfect sense now. Sorry for the noise!

Best regards,
Li Xiasong

> 
> On Thu, Mar 26, 2026 at 7:28 PM Li Xiasong <lixiasong1@huawei.com> wrote:
>>
>> Hi,
>>
>> On 3/27/2026 6:45 AM, Harshitha Ramamurthy wrote:
>>> From: Jordan Rhee <jordanrhee@google.com>
>>>
>>> AdminQ commands may return -EAGAIN under certain transient conditions.
>>> These commands are intended to be retried by the driver, so logging
>>> a formal error to the system log is misleading and creates
>>> unnecessary noise.
>>>
>>> Modify the logging logic to skip the error message when the result
>>> is -EAGAIN.
>>>
>>> Reviewed-by: Joshua Washington <joshwash@google.com>
>>> Signed-off-by: Jordan Rhee <jordanrhee@google.com>
>>> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
>>> ---
>>>  drivers/net/ethernet/google/gve/gve_adminq.c | 26 +++++++++++++++-----
>>>  1 file changed, 20 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
>>> index 08587bf40ed4..c7834614c5f0 100644
>>> --- a/drivers/net/ethernet/google/gve/gve_adminq.c
>>> +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
>>> @@ -416,11 +416,6 @@ static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
>>>
>>>  static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
>>>  {
>>> -     if (status != GVE_ADMINQ_COMMAND_PASSED &&
>>> -         status != GVE_ADMINQ_COMMAND_UNSET) {
>>> -             dev_err(&priv->pdev->dev, "AQ command failed with status %d\n", status);
>>> -             priv->adminq_cmd_fail++;
>>> -     }
>>>       switch (status) {
>>>       case GVE_ADMINQ_COMMAND_PASSED:
>>>               return 0;
>>> @@ -455,6 +450,16 @@ static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
>>>       }
>>>  }
>>>
>>> +static bool gve_adminq_is_retryable(enum gve_adminq_opcodes opcode)
>>> +{
>>> +     switch (opcode) {
>>> +     case GVE_ADMINQ_REPORT_NIC_TIMESTAMP:
>>> +             return true;
>>> +     default:
>>> +             return false;
>>> +     }
>>> +}
>>> +
>>>  /* Flushes all AQ commands currently queued and waits for them to complete.
>>>   * If there are failures, it will return the first error.
>>>   */
>>> @@ -482,9 +487,18 @@ static int gve_adminq_kick_and_wait(struct gve_priv *priv)
>>>               cmd = &priv->adminq[i & priv->adminq_mask];
>>>               status = be32_to_cpu(READ_ONCE(cmd->status));
>>>               err = gve_adminq_parse_err(priv, status);
>>> -             if (err)
>>> +             if (err) {
>>> +                     enum gve_adminq_opcodes opcode =
>>> +                             be32_to_cpu(READ_ONCE(cmd->opcode));
>>> +                     priv->adminq_cmd_fail++;
>>> +                     if (!gve_adminq_is_retryable(opcode) || err != -EAGAIN)
>>
>> In gve_adminq_kick_and_wait(), the condition is:
>>
>>         if (!gve_adminq_is_retryable(opcode) || err != -EAGAIN)
>>                 dev_err_ratelimited(...);
>>
>> Based on the commit log, the goal is to skip logging when the result is
>> -EAGAIN for transient conditions. However, when gve_adminq_is_retryable()
>> returns false (e.g., GVE_ADMINQ_COMMAND_ERROR_ABORTED), even if err is
>> -EAGAIN, the condition evaluates to true and the error would still be logged.
>>
>> Would it be more appropriate to use && instead of || here?
>>
>>         if (!gve_adminq_is_retryable(opcode) && err != -EAGAIN)
>>
>> I may be missing something, so please let me know if I've misunderstood.
>>
>>> +                             dev_err_ratelimited(&priv->pdev->dev,
>>> +                                                 "AQ command %d failed with status %d\n",
>>> +                                                 opcode, status);
>>> +
>>>                       // Return the first error if we failed.
>>>                       return err;
>>> +             }
>>>       }
>>>
>>>       return 0;
>>
>> Best regards,
>> Li Xiasong