[PATCH v4 0/7] Support for GPU ACD feature on Adreno X1-85

Akhil P Oommen posted 7 patches 11 months, 2 weeks ago
There is a newer version of this series
.../bindings/opp/opp-v2-qcom-adreno.yaml           | 97 ++++++++++++++++++++++
MAINTAINERS                                        |  1 +
arch/arm64/boot/dts/qcom/x1e80100.dtsi             | 27 +++++-
drivers/gpu/drm/msm/adreno/a6xx_gmu.c              | 96 ++++++++++++++++++---
drivers/gpu/drm/msm/adreno/a6xx_gmu.h              |  1 +
drivers/gpu/drm/msm/adreno/a6xx_hfi.c              | 38 ++++++++-
drivers/gpu/drm/msm/adreno/a6xx_hfi.h              | 21 +++++
drivers/gpu/drm/msm/adreno/adreno_device.c         |  4 +
8 files changed, 270 insertions(+), 15 deletions(-)
[PATCH v4 0/7] Support for GPU ACD feature on Adreno X1-85
Posted by Akhil P Oommen 11 months, 2 weeks ago
This series adds support for ACD feature for Adreno GPU which helps to
lower the power consumption on GX rail and also sometimes is a requirement
to enable higher GPU frequencies. At high level, following are the
sequences required for ACD feature:
	1. Identify the ACD level data for each regulator corner
	2. Send a message to AOSS to switch voltage plan
	3. Send a table with ACD level information to GMU during every
	gpu wake up

For (1), it is better to keep ACD level data in devicetree because this
value depends on the process node, voltage margins etc which are
chipset specific. For instance, same GPU HW IP on a different chipset
would have a different set of values. So, a new schema which extends
opp-v2 is created to add a new property called "qcom,opp-acd-level".

ACD support is dynamically detected based on the presence of
"qcom,opp-acd-level" property in GPU's opp table. Also, qmp node should be
present under GMU node in devicetree for communication with AOSS.

The devicetree patch in this series adds the acd-level data for X1-85
GPU present in Snapdragon X1 Elite chipset.

The last two devicetree patches are for Bjorn and all the rest for
Rob Clark.

---
Changes in v4:
- Send correct acd data via hfi (Neil)
- Fix dt-bindings error
- Fix IB vote for the 1.1Ghz OPP
- New patch#2 to fix the HFI timeout error seen when ACD is enabled
- Link to v3: https://lore.kernel.org/r/20241231-gpu-acd-v3-0-3ba73660e9ca@quicinc.com

Changes in v3:
- Rebased on top of v6.13-rc4 since X1E doesn't boot properly with msm-next
- Update patternProperties regex (Krzysztof)
- Update MAINTAINERS file include the new opp-v2-qcom-adreno.yaml
- Update the new dt properties' description
- Do not move qmp_get() to acd probe (Konrad)
- New patches: patch#2, #3 and #6
- Link to v2: https://lore.kernel.org/r/20241021-gpu-acd-v2-0-9c25a62803bc@quicinc.com

Changes in v2:
- Removed RFC tag for the series
- Improve documentation for the new dt bindings (Krzysztof)
- Add fallback compatible string for opp-table (Krzysztof)
- Link to v1: https://lore.kernel.org/r/20241012-gpu-acd-v1-0-1e5e91aa95b6@quicinc.com

---
Akhil P Oommen (7):
      drm/msm/adreno: Add support for ACD
      drm/msm/a6xx: Increase HFI response timeout
      drm/msm: a6x: Rework qmp_get() error handling
      drm/msm/adreno: Add module param to disable ACD
      dt-bindings: opp: Add v2-qcom-adreno vendor bindings
      arm64: dts: qcom: x1e80100: Add ACD levels for GPU
      arm64: dts: qcom: x1e80100: Add OPPs up to Turbo L3 for GPU

 .../bindings/opp/opp-v2-qcom-adreno.yaml           | 97 ++++++++++++++++++++++
 MAINTAINERS                                        |  1 +
 arch/arm64/boot/dts/qcom/x1e80100.dtsi             | 27 +++++-
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c              | 96 ++++++++++++++++++---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h              |  1 +
 drivers/gpu/drm/msm/adreno/a6xx_hfi.c              | 38 ++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_hfi.h              | 21 +++++
 drivers/gpu/drm/msm/adreno/adreno_device.c         |  4 +
 8 files changed, 270 insertions(+), 15 deletions(-)
---
base-commit: dbfac60febfa806abb2d384cb6441e77335d2799
change-id: 20240724-gpu-acd-6c1dc5dcf516

Best regards,
-- 
Akhil P Oommen <quic_akhilpo@quicinc.com>
Re: [PATCH v4 0/7] Support for GPU ACD feature on Adreno X1-85
Posted by Anthony Ruhier 9 months ago
Using this patch serie on 6.14-rc (tested over multiple RCs, up to rc7) on a
Yoga Slim 7x (x1e80100), I often get a video output freeze a few seconds after
my wayland compositor loads. I can still ssh into the laptop. I get these
kernel errors in loop:

	msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
	msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 777
	msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 778

Rob Clark recommended to me to remove the higher GPU frequencies added by this
patch (1.25Ghz and 1.175 Ghz). The lockups happen then less often, but are
still present. It is easily reproducible.

A way to mitigate the problem is by constantly moving my cursor during a few
seconds after my wayland session starts, then no freeze happens. Reverting this
patch serie fixes the problem.

Thanks,

--
Anthony Ruhier
Re: [PATCH v4 0/7] Support for GPU ACD feature on Adreno X1-85
Posted by Konrad Dybcio 8 months, 2 weeks ago
On 3/18/25 2:12 PM, Anthony Ruhier wrote:
> Using this patch serie on 6.14-rc (tested over multiple RCs, up to rc7) on a
> Yoga Slim 7x (x1e80100), I often get a video output freeze a few seconds after
> my wayland compositor loads. I can still ssh into the laptop. I get these
> kernel errors in loop:
> 
> 	msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
> 	msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 777
> 	msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 778
> 
> Rob Clark recommended to me to remove the higher GPU frequencies added by this
> patch (1.25Ghz and 1.175 Ghz). The lockups happen then less often, but are
> still present. It is easily reproducible.
> 
> A way to mitigate the problem is by constantly moving my cursor during a few
> seconds after my wayland session starts, then no freeze happens. Reverting this
> patch serie fixes the problem.

What firmware files are you using? ZAP surely comes from the Windows
package, but what about GMU and SQE? Linux-firmware?

Specifically, please provide the GMU version which is printed to dmesg
on first GPU open

Konrad
Re: [PATCH v4 0/7] Support for GPU ACD feature on Adreno X1-85
Posted by Anthony Ruhier 8 months, 1 week ago
Hi,

Sorry I should have gave an update on this: I don't think the lockups are
related to this patch series, the same problem happens without applying these
patches. It seems to increase by a lot the chances that a GPU lockup happens at
start, however, so I could use that to debug the real problem.

> What firmware files are you using? ZAP surely comes from the Windows
> package, but what about GMU and SQE? Linux-firmware?
>
> Specifically, please provide the GMU version which is printed to dmesg
> on first GPU open

I'm using the firmwares imported from Windows, the Yoga Slim 7x is not in
linux-firmware. I understood that the firmware files used by the Slim 7x are
quite old, maybe it could be the problem.

The GMU version:

> [drm] Loaded GMU firmware v4.3.17

Thanks

--
Anthony Ruhier
Re: [PATCH v4 0/7] Support for GPU ACD feature on Adreno X1-85
Posted by Dmitry Baryshkov 8 months, 1 week ago
On Thu, Apr 10, 2025 at 05:51:38PM +0200, Anthony Ruhier wrote:
> Hi,
> 
> Sorry I should have gave an update on this: I don't think the lockups are
> related to this patch series, the same problem happens without applying these
> patches. It seems to increase by a lot the chances that a GPU lockup happens at
> start, however, so I could use that to debug the real problem.
> 
> > What firmware files are you using? ZAP surely comes from the Windows
> > package, but what about GMU and SQE? Linux-firmware?
> >
> > Specifically, please provide the GMU version which is printed to dmesg
> > on first GPU open
> 
> I'm using the firmwares imported from Windows, the Yoga Slim 7x is not in
> linux-firmware. I understood that the firmware files used by the Slim 7x are
> quite old, maybe it could be the problem.

Recently firmware for Yoga Slim 7x was merged to linux-firmware. Could
you please check if this helps or not?

> 
> The GMU version:
> 
> > [drm] Loaded GMU firmware v4.3.17
> 
> Thanks
> 
> --
> Anthony Ruhier

-- 
With best wishes
Dmitry
Re: [PATCH v4 0/7] Support for GPU ACD feature on Adreno X1-85
Posted by Anthony Ruhier 8 months, 1 week ago
Hi,

Tested-by: Anthony Ruhier <aruhier@mailbox.org>

--
Anthony Ruhier
Re: [PATCH v4 0/7] Support for GPU ACD feature on Adreno X1-85
Posted by Maya Matuszczyk 11 months, 2 weeks ago
Thanks,

Tested-by: Maya Matuszczyk <maccraft123mc@gmail.com>

śr., 8 sty 2025 o 21:40 Akhil P Oommen <quic_akhilpo@quicinc.com> napisał(a):
>
> This series adds support for ACD feature for Adreno GPU which helps to
> lower the power consumption on GX rail and also sometimes is a requirement
> to enable higher GPU frequencies. At high level, following are the
> sequences required for ACD feature:
>         1. Identify the ACD level data for each regulator corner
>         2. Send a message to AOSS to switch voltage plan
>         3. Send a table with ACD level information to GMU during every
>         gpu wake up
>
> For (1), it is better to keep ACD level data in devicetree because this
> value depends on the process node, voltage margins etc which are
> chipset specific. For instance, same GPU HW IP on a different chipset
> would have a different set of values. So, a new schema which extends
> opp-v2 is created to add a new property called "qcom,opp-acd-level".
>
> ACD support is dynamically detected based on the presence of
> "qcom,opp-acd-level" property in GPU's opp table. Also, qmp node should be
> present under GMU node in devicetree for communication with AOSS.
>
> The devicetree patch in this series adds the acd-level data for X1-85
> GPU present in Snapdragon X1 Elite chipset.
>
> The last two devicetree patches are for Bjorn and all the rest for
> Rob Clark.
>
> ---
> Changes in v4:
> - Send correct acd data via hfi (Neil)
> - Fix dt-bindings error
> - Fix IB vote for the 1.1Ghz OPP
> - New patch#2 to fix the HFI timeout error seen when ACD is enabled
> - Link to v3: https://lore.kernel.org/r/20241231-gpu-acd-v3-0-3ba73660e9ca@quicinc.com
>
> Changes in v3:
> - Rebased on top of v6.13-rc4 since X1E doesn't boot properly with msm-next
> - Update patternProperties regex (Krzysztof)
> - Update MAINTAINERS file include the new opp-v2-qcom-adreno.yaml
> - Update the new dt properties' description
> - Do not move qmp_get() to acd probe (Konrad)
> - New patches: patch#2, #3 and #6
> - Link to v2: https://lore.kernel.org/r/20241021-gpu-acd-v2-0-9c25a62803bc@quicinc.com
>
> Changes in v2:
> - Removed RFC tag for the series
> - Improve documentation for the new dt bindings (Krzysztof)
> - Add fallback compatible string for opp-table (Krzysztof)
> - Link to v1: https://lore.kernel.org/r/20241012-gpu-acd-v1-0-1e5e91aa95b6@quicinc.com
>
> ---
> Akhil P Oommen (7):
>       drm/msm/adreno: Add support for ACD
>       drm/msm/a6xx: Increase HFI response timeout
>       drm/msm: a6x: Rework qmp_get() error handling
>       drm/msm/adreno: Add module param to disable ACD
>       dt-bindings: opp: Add v2-qcom-adreno vendor bindings
>       arm64: dts: qcom: x1e80100: Add ACD levels for GPU
>       arm64: dts: qcom: x1e80100: Add OPPs up to Turbo L3 for GPU
>
>  .../bindings/opp/opp-v2-qcom-adreno.yaml           | 97 ++++++++++++++++++++++
>  MAINTAINERS                                        |  1 +
>  arch/arm64/boot/dts/qcom/x1e80100.dtsi             | 27 +++++-
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c              | 96 ++++++++++++++++++---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.h              |  1 +
>  drivers/gpu/drm/msm/adreno/a6xx_hfi.c              | 38 ++++++++-
>  drivers/gpu/drm/msm/adreno/a6xx_hfi.h              | 21 +++++
>  drivers/gpu/drm/msm/adreno/adreno_device.c         |  4 +
>  8 files changed, 270 insertions(+), 15 deletions(-)
> ---
> base-commit: dbfac60febfa806abb2d384cb6441e77335d2799
> change-id: 20240724-gpu-acd-6c1dc5dcf516
>
> Best regards,
> --
> Akhil P Oommen <quic_akhilpo@quicinc.com>
>