[PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure warn() to WARN()

Sven van Ashbrook posted 1 patch 3 years, 5 months ago
drivers/platform/x86/intel/pmc/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure warn() to WARN()
Posted by Sven van Ashbrook 3 years, 5 months ago
The "failure to enter S0ix" warning is critically important for monitoring
and debugging power regressions, both in the field and in the test lab.

Promote from lower-case warn() to upper-case WARN() so that it becomes
more prominent, and gets picked up as part of existing monitoring
infrastructure, which typically focuses on WARN() and ignores warn()
type log messages.

Signed-off-by: Sven van Ashbrook <svenva@chromium.org>
---
Against v6.1-rc2

 drivers/platform/x86/intel/pmc/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c
index a1fe1e0dcf4a5..834f0352c0edf 100644
--- a/drivers/platform/x86/intel/pmc/core.c
+++ b/drivers/platform/x86/intel/pmc/core.c
@@ -2125,7 +2125,7 @@ static __maybe_unused int pmc_core_resume(struct device *dev)
 	}
 
 	/* The real interesting case - S0ix failed - lets ask PMC why. */
-	dev_warn(dev, "CPU did not enter SLP_S0!!! (S0ix cnt=%llu)\n",
+	dev_WARN(dev, "CPU did not enter SLP_S0!!! (S0ix cnt=%llu)\n",
 		 pmcdev->s0ix_counter);
 	if (pmcdev->map->slps0_dbg_maps)
 		pmc_core_slps0_display(pmcdev, dev, NULL);
-- 
2.38.0.135.g90850a2211-goog
Re: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure warn() to WARN()
Posted by Hans de Goede 3 years, 5 months ago
Hi,

On 10/27/22 17:19, Sven van Ashbrook wrote:
> The "failure to enter S0ix" warning is critically important for monitoring
> and debugging power regressions, both in the field and in the test lab.
> 
> Promote from lower-case warn() to upper-case WARN() so that it becomes
> more prominent, and gets picked up as part of existing monitoring
> infrastructure, which typically focuses on WARN() and ignores warn()
> type log messages.
> 
> Signed-off-by: Sven van Ashbrook <svenva@chromium.org>

WARN() is really only intended for internal kernel bugs and not for
hw misbehaving, so I'm not a fan of the change you are suggesting here.

Intel folks, do you have an opinion on this ?

Regards,

Hans


> ---
> Against v6.1-rc2
> 
>  drivers/platform/x86/intel/pmc/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c
> index a1fe1e0dcf4a5..834f0352c0edf 100644
> --- a/drivers/platform/x86/intel/pmc/core.c
> +++ b/drivers/platform/x86/intel/pmc/core.c
> @@ -2125,7 +2125,7 @@ static __maybe_unused int pmc_core_resume(struct device *dev)
>  	}
>  
>  	/* The real interesting case - S0ix failed - lets ask PMC why. */
> -	dev_warn(dev, "CPU did not enter SLP_S0!!! (S0ix cnt=%llu)\n",
> +	dev_WARN(dev, "CPU did not enter SLP_S0!!! (S0ix cnt=%llu)\n",
>  		 pmcdev->s0ix_counter);
>  	if (pmcdev->map->slps0_dbg_maps)
>  		pmc_core_slps0_display(pmcdev, dev, NULL);
Re: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure warn() to WARN()
Posted by David E. Box 3 years, 5 months ago
On Thu, 2022-10-27 at 17:40 +0200, Hans de Goede wrote:
> Hi,
> 
> On 10/27/22 17:19, Sven van Ashbrook wrote:
> > The "failure to enter S0ix" warning is critically important for monitoring
> > and debugging power regressions, both in the field and in the test lab.
> > 
> > Promote from lower-case warn() to upper-case WARN() so that it becomes
> > more prominent, and gets picked up as part of existing monitoring
> > infrastructure, which typically focuses on WARN() and ignores warn()
> > type log messages.
> > 
> > Signed-off-by: Sven van Ashbrook <svenva@chromium.org>
> 
> WARN() is really only intended for internal kernel bugs and not for
> hw misbehaving, so I'm not a fan of the change you are suggesting here.
> 
> Intel folks, do you have an opinion on this ?

I agree that not entering s0ix is a critical failure, but this is a hardware
suspend failure. How we treat this should be akin to how we treat failure to
enter S3 or deeper. S0ix support is indicated by the S0 Low Power Idle bit in
the ACPI FADT table. It's better IMO to create some framework in the suspend or
ACPI core that allows platforms to report whether they have achieved the
hardware state indicated by having this bit set. Rafael, your thoughts?

David

> 
> Regards,
> 
> Hans
> 
> 
> > ---
> > Against v6.1-rc2
> > 
> >  drivers/platform/x86/intel/pmc/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/platform/x86/intel/pmc/core.c
> > b/drivers/platform/x86/intel/pmc/core.c
> > index a1fe1e0dcf4a5..834f0352c0edf 100644
> > --- a/drivers/platform/x86/intel/pmc/core.c
> > +++ b/drivers/platform/x86/intel/pmc/core.c
> > @@ -2125,7 +2125,7 @@ static __maybe_unused int pmc_core_resume(struct
> > device *dev)
> >         }
> >  
> >         /* The real interesting case - S0ix failed - lets ask PMC why. */
> > -       dev_warn(dev, "CPU did not enter SLP_S0!!! (S0ix cnt=%llu)\n",
> > +       dev_WARN(dev, "CPU did not enter SLP_S0!!! (S0ix cnt=%llu)\n",
> >                  pmcdev->s0ix_counter);
> >         if (pmcdev->map->slps0_dbg_maps)
> >                 pmc_core_slps0_display(pmcdev, dev, NULL);
> 
RE: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure warn() to WARN()
Posted by Limonciello, Mario 3 years, 5 months ago
[Public]

+Shyam & Raul

> -----Original Message-----
> From: Hans de Goede <hdegoede@redhat.com>
> Sent: Thursday, October 27, 2022 10:41
> To: Sven van Ashbrook <svenva@chromium.org>; LKML <linux-
> kernel@vger.kernel.org>
> Cc: platform-driver-x86@vger.kernel.org; Rajneesh Bhardwaj
> <rajneesh.bhardwaj@intel.com>; Rafael J Wysocki <rjw@rjwysocki.net>;
> Rajat Jain <rajatja@google.com>; David E Box <david.e.box@intel.com>;
> Mark Gross <markgross@kernel.org>; Rajneesh Bhardwaj
> <irenic.rajneesh@gmail.com>
> Subject: Re: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure
> warn() to WARN()
> 
> Hi,
> 
> On 10/27/22 17:19, Sven van Ashbrook wrote:
> > The "failure to enter S0ix" warning is critically important for monitoring
> > and debugging power regressions, both in the field and in the test lab.
> >
> > Promote from lower-case warn() to upper-case WARN() so that it becomes
> > more prominent, and gets picked up as part of existing monitoring
> > infrastructure, which typically focuses on WARN() and ignores warn()
> > type log messages.
> >
> > Signed-off-by: Sven van Ashbrook <svenva@chromium.org>
> 
> WARN() is really only intended for internal kernel bugs and not for
> hw misbehaving, so I'm not a fan of the change you are suggesting here.
> 
> Intel folks, do you have an opinion on this ?

For a reference point; on AMD's implementation of a similar driver (platform/x86/amd/pmc.c)
this "type" of message is also "dev_warn":
https://github.com/torvalds/linux/blob/master/drivers/platform/x86/amd/pmc.c#L365

If we do make changes to this message level so that other infrastructure picks up I suggest
we do it for both drivers.

Are we maybe at the point now it should be dev_err instead?

> 
> Regards,
> 
> Hans
> 
> 
> > ---
> > Against v6.1-rc2
> >
> >  drivers/platform/x86/intel/pmc/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/platform/x86/intel/pmc/core.c
> b/drivers/platform/x86/intel/pmc/core.c
> > index a1fe1e0dcf4a5..834f0352c0edf 100644
> > --- a/drivers/platform/x86/intel/pmc/core.c
> > +++ b/drivers/platform/x86/intel/pmc/core.c
> > @@ -2125,7 +2125,7 @@ static __maybe_unused int
> pmc_core_resume(struct device *dev)
> >  	}
> >
> >  	/* The real interesting case - S0ix failed - lets ask PMC why. */
> > -	dev_warn(dev, "CPU did not enter SLP_S0!!! (S0ix cnt=%llu)\n",
> > +	dev_WARN(dev, "CPU did not enter SLP_S0!!! (S0ix cnt=%llu)\n",
> >  		 pmcdev->s0ix_counter);
> >  	if (pmcdev->map->slps0_dbg_maps)
> >  		pmc_core_slps0_display(pmcdev, dev, NULL);
Re: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure warn() to WARN()
Posted by Sven van Ashbrook 3 years, 5 months ago
On Thu, Oct 27, 2022 at 11:47 AM Limonciello, Mario
<Mario.Limonciello@amd.com> wrote:
>
> If we do make changes to this message level so that other infrastructure picks up I suggest
> we do it for both drivers.

Sounds good, patch feedback willing.

> WARN() is really only intended for internal kernel bugs and not for
> hw misbehaving, so I'm not a fan of the change you are suggesting here.

Thanks, I was not aware of this distinction. But it does look like upper-case
WARN is already used in many places to indicate hw misbehaving?

Here for example:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/platform/chrome/cros_ec.c?h=v6.1-rc2#n142