[PATCH] Revert "clk: Fix invalid execution of clk_set_rate"

Johan Hovold posted 1 patch 1 year, 2 months ago
drivers/clk/clk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] Revert "clk: Fix invalid execution of clk_set_rate"
Posted by Johan Hovold 1 year, 2 months ago
This reverts commit 25f1c96a0e841013647d788d4598e364e5c2ebb7.

The offending commit results in errors like

	cpu cpu0: _opp_config_clk_single: failed to set clock rate: -22

spamming the logs on the Lenovo ThinkPad X13s and other Qualcomm
machines when cpufreq tries to update the CPUFreq HW Engine clocks.

As mentioned in commit 4370232c727b ("cpufreq: qcom-hw: Add CPU clock
provider support"):

	[T]he frequency supplied by the driver is the actual frequency
	that comes out of the EPSS/OSM block after the DCVS operation.
	This frequency is not same as what the CPUFreq framework has set
	but it is the one that gets supplied to the CPUs after
	throttling by LMh.

which seems to suggest that the driver relies on the previous behaviour
of clk_set_rate().

Since this affects many Qualcomm machines, let's revert for now.

Fixes: 25f1c96a0e84 ("clk: Fix invalid execution of clk_set_rate")
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Link: https://lore.kernel.org/all/e2d83e57-ad07-411b-99f6-a4fc3c4534fa@arm.com/
Cc: Chuan Liu <chuan.liu@amlogic.com>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---

#regzbot introduced: 25f1c96a0e84


 drivers/clk/clk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 0d5a2293a8b3..936c0b79c169 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2539,7 +2539,7 @@ static int clk_core_set_rate_nolock(struct clk_core *core,
 	rate = clk_core_req_round_rate_nolock(core, req_rate);
 
 	/* bail early if nothing to do */
-	if (rate == clk_core_get_rate_recalc(core))
+	if (rate == clk_core_get_rate_nolock(core))
 		return 0;
 
 	/* fail on a direct rate set of a protected provider */
-- 
2.45.2
Re: [PATCH] Revert "clk: Fix invalid execution of clk_set_rate"
Posted by Stephen Boyd 1 year, 2 months ago
Quoting Johan Hovold (2024-12-02 02:06:21)
> This reverts commit 25f1c96a0e841013647d788d4598e364e5c2ebb7.
> 
> The offending commit results in errors like
> 
>         cpu cpu0: _opp_config_clk_single: failed to set clock rate: -22
> 
> spamming the logs on the Lenovo ThinkPad X13s and other Qualcomm
> machines when cpufreq tries to update the CPUFreq HW Engine clocks.
> 
> As mentioned in commit 4370232c727b ("cpufreq: qcom-hw: Add CPU clock
> provider support"):
> 
>         [T]he frequency supplied by the driver is the actual frequency
>         that comes out of the EPSS/OSM block after the DCVS operation.
>         This frequency is not same as what the CPUFreq framework has set
>         but it is the one that gets supplied to the CPUs after
>         throttling by LMh.
> 
> which seems to suggest that the driver relies on the previous behaviour
> of clk_set_rate().

I don't understand why a clk provider is needed there. Is anyone looking
into the real problem?

> 
> Since this affects many Qualcomm machines, let's revert for now.
> 
> Fixes: 25f1c96a0e84 ("clk: Fix invalid execution of clk_set_rate")
> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
> Link: https://lore.kernel.org/all/e2d83e57-ad07-411b-99f6-a4fc3c4534fa@arm.com/
> Cc: Chuan Liu <chuan.liu@amlogic.com>
> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---

Applied to clk-fixes
Re: [PATCH] Revert "clk: Fix invalid execution of clk_set_rate"
Posted by Johan Hovold 1 year, 2 months ago
[ +CC: Viresh and Sudeep ]

On Mon, Dec 02, 2024 at 05:20:06PM -0800, Stephen Boyd wrote:
> Quoting Johan Hovold (2024-12-02 02:06:21)
> > This reverts commit 25f1c96a0e841013647d788d4598e364e5c2ebb7.
> > 
> > The offending commit results in errors like
> > 
> >         cpu cpu0: _opp_config_clk_single: failed to set clock rate: -22
> > 
> > spamming the logs on the Lenovo ThinkPad X13s and other Qualcomm
> > machines when cpufreq tries to update the CPUFreq HW Engine clocks.
> > 
> > As mentioned in commit 4370232c727b ("cpufreq: qcom-hw: Add CPU clock
> > provider support"):
> > 
> >         [T]he frequency supplied by the driver is the actual frequency
> >         that comes out of the EPSS/OSM block after the DCVS operation.
> >         This frequency is not same as what the CPUFreq framework has set
> >         but it is the one that gets supplied to the CPUs after
> >         throttling by LMh.
> > 
> > which seems to suggest that the driver relies on the previous behaviour
> > of clk_set_rate().
> 
> I don't understand why a clk provider is needed there. Is anyone looking
> into the real problem?

I mentioned this to Mani yesterday, but I'm not sure if he has had time
to look into it yet. And I forgot to CC Viresh who was involved in
implementing this. There is comment of his in the thread where this
feature was added:

	Most likely no one will ever do clk_set_rate() on this new
	clock, which is fine, though OPP core will likely do
	clk_get_rate() here.

which may suggest that some underlying assumption has changed. [1]

There are some more details in that thread that should explain why
things were implemented the way they were:

	https://lore.kernel.org/linux-arm-msm/20221117053145.10409-1-manivannan.sadhasivam@linaro.org/

> > Since this affects many Qualcomm machines, let's revert for now.
> > 
> > Fixes: 25f1c96a0e84 ("clk: Fix invalid execution of clk_set_rate")
> > Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
> > Link: https://lore.kernel.org/all/e2d83e57-ad07-411b-99f6-a4fc3c4534fa@arm.com/
> > Cc: Chuan Liu <chuan.liu@amlogic.com>
> > Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> > ---
> 
> Applied to clk-fixes

Thanks.

Johan

[1] https://lore.kernel.org/linux-arm-msm/20221118055730.yrzpuih3zfko5c2q@vireshk-i7/
Re: [PATCH] Revert "clk: Fix invalid execution of clk_set_rate"
Posted by Manivannan Sadhasivam 1 year, 2 months ago
On Tue, Dec 03, 2024 at 09:25:01AM +0100, Johan Hovold wrote:
> [ +CC: Viresh and Sudeep ]
> 
> On Mon, Dec 02, 2024 at 05:20:06PM -0800, Stephen Boyd wrote:
> > Quoting Johan Hovold (2024-12-02 02:06:21)
> > > This reverts commit 25f1c96a0e841013647d788d4598e364e5c2ebb7.
> > > 
> > > The offending commit results in errors like
> > > 
> > >         cpu cpu0: _opp_config_clk_single: failed to set clock rate: -22
> > > 
> > > spamming the logs on the Lenovo ThinkPad X13s and other Qualcomm
> > > machines when cpufreq tries to update the CPUFreq HW Engine clocks.
> > > 
> > > As mentioned in commit 4370232c727b ("cpufreq: qcom-hw: Add CPU clock
> > > provider support"):
> > > 
> > >         [T]he frequency supplied by the driver is the actual frequency
> > >         that comes out of the EPSS/OSM block after the DCVS operation.
> > >         This frequency is not same as what the CPUFreq framework has set
> > >         but it is the one that gets supplied to the CPUs after
> > >         throttling by LMh.
> > > 
> > > which seems to suggest that the driver relies on the previous behaviour
> > > of clk_set_rate().
> > 
> > I don't understand why a clk provider is needed there. Is anyone looking
> > into the real problem?
> 
> I mentioned this to Mani yesterday, but I'm not sure if he has had time
> to look into it yet. And I forgot to CC Viresh who was involved in
> implementing this. There is comment of his in the thread where this
> feature was added:
> 
> 	Most likely no one will ever do clk_set_rate() on this new
> 	clock, which is fine, though OPP core will likely do
> 	clk_get_rate() here.
> 
> which may suggest that some underlying assumption has changed. [1]
> 

I just looked into the issue this morning. The commit that triggered the errors
seem to be doing the right thing (although the commit message was a bit hard to
understand), but the problem is this check which gets triggered now:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/clk.c?h=v6.13-rc1#n2319

Since the qcom-cpufreq* clocks doesn't have parents now (they should've been
defined anyway) and there is no CLK_SET_RATE_PARENT flag set, the check returns
NULL for the 'top' clock. Then clk_core_set_rate_nolock() returns -EINVAL,
causing the reported error.

But I don't quite understand why clk_core_set_rate_nolock() fails if there is no
parent or CLK_SET_RATE_PARENT is not set. The API is supposed to set the rate of
the passed clock irrespective of the parent. Propagating the rate change to
parent is not strictly needed and doesn't make sense if the parent is a fixed
clock like XO.

Stephen, thoughts?

- Mani

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH] Revert "clk: Fix invalid execution of clk_set_rate"
Posted by Stephen Boyd 1 year, 2 months ago
Quoting Manivannan Sadhasivam (2024-12-03 01:21:51)
> On Tue, Dec 03, 2024 at 09:25:01AM +0100, Johan Hovold wrote:
> > [ +CC: Viresh and Sudeep ]
> > 
> > On Mon, Dec 02, 2024 at 05:20:06PM -0800, Stephen Boyd wrote:
> > > Quoting Johan Hovold (2024-12-02 02:06:21)
> > > > This reverts commit 25f1c96a0e841013647d788d4598e364e5c2ebb7.
> > > > 
> > > > The offending commit results in errors like
> > > > 
> > > >         cpu cpu0: _opp_config_clk_single: failed to set clock rate: -22
> > > > 
> > > > spamming the logs on the Lenovo ThinkPad X13s and other Qualcomm
> > > > machines when cpufreq tries to update the CPUFreq HW Engine clocks.
> > > > 
> > > > As mentioned in commit 4370232c727b ("cpufreq: qcom-hw: Add CPU clock
> > > > provider support"):
> > > > 
> > > >         [T]he frequency supplied by the driver is the actual frequency
> > > >         that comes out of the EPSS/OSM block after the DCVS operation.
> > > >         This frequency is not same as what the CPUFreq framework has set
> > > >         but it is the one that gets supplied to the CPUs after
> > > >         throttling by LMh.
> > > > 
> > > > which seems to suggest that the driver relies on the previous behaviour
> > > > of clk_set_rate().
> > > 
> > > I don't understand why a clk provider is needed there. Is anyone looking
> > > into the real problem?
> > 
> > I mentioned this to Mani yesterday, but I'm not sure if he has had time
> > to look into it yet. And I forgot to CC Viresh who was involved in
> > implementing this. There is comment of his in the thread where this
> > feature was added:
> > 
> >       Most likely no one will ever do clk_set_rate() on this new
> >       clock, which is fine, though OPP core will likely do
> >       clk_get_rate() here.
> > 
> > which may suggest that some underlying assumption has changed. [1]
> > 

Yikes.

> 
> I just looked into the issue this morning. The commit that triggered the errors
> seem to be doing the right thing (although the commit message was a bit hard to
> understand), but the problem is this check which gets triggered now:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/clk.c?h=v6.13-rc1#n2319
> 
> Since the qcom-cpufreq* clocks doesn't have parents now (they should've been
> defined anyway) and there is no CLK_SET_RATE_PARENT flag set, the check returns
> NULL for the 'top' clock. Then clk_core_set_rate_nolock() returns -EINVAL,
> causing the reported error.
> 
> But I don't quite understand why clk_core_set_rate_nolock() fails if there is no
> parent or CLK_SET_RATE_PARENT is not set. The API is supposed to set the rate of
> the passed clock irrespective of the parent. Propagating the rate change to
> parent is not strictly needed and doesn't make sense if the parent is a fixed
> clock like XO.

The recalc_rate clk_op is telling the framework that the clk is at a
different rate than is requested by the clk consumer _and_ than what the
framework thinks the clk is currently running at. The clk_set_rate()
call is going to attempt to satisfy that request, and because there
isn't a determine_rate/round_rate clk_op it assumes the clk can't change
rate so it looks to see if there's a parent that can be changed to
satisfy the rate. There isn't a parent either, so the clk_set_rate()
call fails because the rate can't be achieved on this clk.

It may work to have a determine_rate clk_op that is like the recalc_rate
one that says "this rate you requested is going to turn into whatever
the hardware is running at" by simply returning the rate that the clk is
running at.
Re: [PATCH] Revert "clk: Fix invalid execution of clk_set_rate"
Posted by Manivannan Sadhasivam 1 year, 2 months ago
On Tue, Dec 03, 2024 at 11:30:07AM -0800, Stephen Boyd wrote:
> Quoting Manivannan Sadhasivam (2024-12-03 01:21:51)
> > On Tue, Dec 03, 2024 at 09:25:01AM +0100, Johan Hovold wrote:
> > > [ +CC: Viresh and Sudeep ]
> > > 
> > > On Mon, Dec 02, 2024 at 05:20:06PM -0800, Stephen Boyd wrote:
> > > > Quoting Johan Hovold (2024-12-02 02:06:21)
> > > > > This reverts commit 25f1c96a0e841013647d788d4598e364e5c2ebb7.
> > > > > 
> > > > > The offending commit results in errors like
> > > > > 
> > > > >         cpu cpu0: _opp_config_clk_single: failed to set clock rate: -22
> > > > > 
> > > > > spamming the logs on the Lenovo ThinkPad X13s and other Qualcomm
> > > > > machines when cpufreq tries to update the CPUFreq HW Engine clocks.
> > > > > 
> > > > > As mentioned in commit 4370232c727b ("cpufreq: qcom-hw: Add CPU clock
> > > > > provider support"):
> > > > > 
> > > > >         [T]he frequency supplied by the driver is the actual frequency
> > > > >         that comes out of the EPSS/OSM block after the DCVS operation.
> > > > >         This frequency is not same as what the CPUFreq framework has set
> > > > >         but it is the one that gets supplied to the CPUs after
> > > > >         throttling by LMh.
> > > > > 
> > > > > which seems to suggest that the driver relies on the previous behaviour
> > > > > of clk_set_rate().
> > > > 
> > > > I don't understand why a clk provider is needed there. Is anyone looking
> > > > into the real problem?
> > > 
> > > I mentioned this to Mani yesterday, but I'm not sure if he has had time
> > > to look into it yet. And I forgot to CC Viresh who was involved in
> > > implementing this. There is comment of his in the thread where this
> > > feature was added:
> > > 
> > >       Most likely no one will ever do clk_set_rate() on this new
> > >       clock, which is fine, though OPP core will likely do
> > >       clk_get_rate() here.
> > > 
> > > which may suggest that some underlying assumption has changed. [1]
> > > 
> 
> Yikes.
> 
> > 
> > I just looked into the issue this morning. The commit that triggered the errors
> > seem to be doing the right thing (although the commit message was a bit hard to
> > understand), but the problem is this check which gets triggered now:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/clk.c?h=v6.13-rc1#n2319
> > 
> > Since the qcom-cpufreq* clocks doesn't have parents now (they should've been
> > defined anyway) and there is no CLK_SET_RATE_PARENT flag set, the check returns
> > NULL for the 'top' clock. Then clk_core_set_rate_nolock() returns -EINVAL,
> > causing the reported error.
> > 
> > But I don't quite understand why clk_core_set_rate_nolock() fails if there is no
> > parent or CLK_SET_RATE_PARENT is not set. The API is supposed to set the rate of
> > the passed clock irrespective of the parent. Propagating the rate change to
> > parent is not strictly needed and doesn't make sense if the parent is a fixed
> > clock like XO.
> 
> The recalc_rate clk_op is telling the framework that the clk is at a
> different rate than is requested by the clk consumer _and_ than what the
> framework thinks the clk is currently running at. The clk_set_rate()
> call is going to attempt to satisfy that request, and because there
> isn't a determine_rate/round_rate clk_op it assumes the clk can't change
> rate so it looks to see if there's a parent that can be changed to
> satisfy the rate. There isn't a parent either, so the clk_set_rate()
> call fails because the rate can't be achieved on this clk.
> 
> It may work to have a determine_rate clk_op that is like the recalc_rate
> one that says "this rate you requested is going to turn into whatever
> the hardware is running at" by simply returning the rate that the clk is
> running at.

Sounds reasonable to me. Fix submitted incorporating your suggestion, thanks!

- Mani

-- 
மணிவண்ணன் சதாசிவம்