[PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding

ming.qian@oss.nxp.com posted 1 patch 1 week, 1 day ago
arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by ming.qian@oss.nxp.com 1 week, 1 day ago
From: Ming Qian <ming.qian@oss.nxp.com>

The VPU G2 clock was reduced from 600MHz to 300MHz in commit
b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock") to address
pixel errors with high-resolution HEVC postprocessor output.

However, testing shows the 300MHz clock rate is insufficient for
4K60fps decoding and the original pixel errors no longer occur at
600MHz with current drivers.

Test results with 3840x2160@60fps HEVC stream decoded to NV12
(the same scenario that exhibited pixel errors previously):

300MHz performance:
- Severe frame dropping throughout playback
- Only 336 frames rendered in 11:53 (0.471 fps)
- Continuous "A lot of buffers are being dropped" warnings
- Completely unusable for 4K video

600MHz performance:
- Smooth playback with only 1 frame dropped at startup
- 37981 frames rendered in 10:34 (59.857 fps)
- Achieves target 60fps performance
- No pixel errors or artifacts observed

Restore the clock to 600MHz to enable proper 4K60fps decoding
capability while maintaining stability.

Test pipeline:
  gst-launch-1.0 filesrc location=<4K60_HEVC.mkv> ! \
    video/x-matroska ! aiurdemux ! h265parse ! \
    v4l2slh265dec ! video/x-raw,format=NV12 ! \
    queue ! waylandsink

Fixes: b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock")
Signed-off-by: Ming Qian <ming.qian@oss.nxp.com>
---
 arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
index 607962f807be..731142176625 100644
--- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
@@ -960,7 +960,7 @@ pgc_vpu: power-domain@6 {
 									 <&clk IMX8MQ_SYS1_PLL_800M>,
 									 <&clk IMX8MQ_VPU_PLL>;
 						assigned-clock-rates = <600000000>,
-								       <300000000>,
+								       <600000000>,
 								       <800000000>,
 								       <0>;
 					};

base-commit: c824345288d11e269ce41b36c105715bc2286050
prerequisite-patch-id: 0000000000000000000000000000000000000000
-- 
2.52.0
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Nicolas Dufresne 1 week, 1 day ago
Hi,

Le vendredi 30 janvier 2026 à 16:41 +0800, ming.qian@oss.nxp.com a écrit :
> From: Ming Qian <ming.qian@oss.nxp.com>
> 
> The VPU G2 clock was reduced from 600MHz to 300MHz in commit
> b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock") to address
> pixel errors with high-resolution HEVC postprocessor output.
> 
> However, testing shows the 300MHz clock rate is insufficient for
> 4K60fps decoding and the original pixel errors no longer occur at
> 600MHz with current drivers.

Tested on EVK, with the downstream DCSS driver, and this change triggers DCSS
underrun (which is related to the DRAM QoS erratas on this SoC). It also
sometimes trigger the "not all macroblock decoded" warning I added recently, and
we can empty IRQs, but these are handled now.

> 
> Test results with 3840x2160@60fps HEVC stream decoded to NV12
> (the same scenario that exhibited pixel errors previously):
> 
> 300MHz performance:
> - Severe frame dropping throughout playback
> - Only 336 frames rendered in 11:53 (0.471 fps)
> - Continuous "A lot of buffers are being dropped" warnings
> - Completely unusable for 4K video
> 
> 600MHz performance:
> - Smooth playback with only 1 frame dropped at startup
> - 37981 frames rendered in 10:34 (59.857 fps)
> - Achieves target 60fps performance
> - No pixel errors or artifacts observed

That probably only true with the upstream DCSS + a small resolution embedded
panel ? Can you clarify this setup, because the display drivers mainline are
very minimal. Would be nice to show you average DDR read/write bandwidth
utilization during this run for comparision.

Another information that bugs me, in the BSP code, the G2 voltage is increased
too, which you didn't do here. They also use the thermal 2 zone to kick it down
to 300 until it cools down.

Nicolas

> 
> Restore the clock to 600MHz to enable proper 4K60fps decoding
> capability while maintaining stability.
> 
> Test pipeline:
>   gst-launch-1.0 filesrc location=<4K60_HEVC.mkv> ! \
>     video/x-matroska ! aiurdemux ! h265parse ! \
>     v4l2slh265dec ! video/x-raw,format=NV12 ! \
>     queue ! waylandsink
> 
> Fixes: b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock")
> Signed-off-by: Ming Qian <ming.qian@oss.nxp.com>
> ---
>  arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> index 607962f807be..731142176625 100644
> --- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> +++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> @@ -960,7 +960,7 @@ pgc_vpu: power-domain@6 {
>  									
> <&clk IMX8MQ_SYS1_PLL_800M>,
>  									
> <&clk IMX8MQ_VPU_PLL>;
>  						assigned-clock-rates =
> <600000000>,
> -								      
> <300000000>,
> +								      
> <600000000>,
>  								      
> <800000000>,
>  								       <0>;
>  					};
> 
> base-commit: c824345288d11e269ce41b36c105715bc2286050
> prerequisite-patch-id: 0000000000000000000000000000000000000000
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Ming Qian(OSS) 5 days, 7 hours ago
Hi Nicolas,

On 1/30/2026 10:47 PM, Nicolas Dufresne wrote:
> Hi,
> 
> Le vendredi 30 janvier 2026 à 16:41 +0800, ming.qian@oss.nxp.com a écrit :
>> From: Ming Qian <ming.qian@oss.nxp.com>
>>
>> The VPU G2 clock was reduced from 600MHz to 300MHz in commit
>> b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock") to address
>> pixel errors with high-resolution HEVC postprocessor output.
>>
>> However, testing shows the 300MHz clock rate is insufficient for
>> 4K60fps decoding and the original pixel errors no longer occur at
>> 600MHz with current drivers.
> 
> Tested on EVK, with the downstream DCSS driver, and this change triggers DCSS
> underrun (which is related to the DRAM QoS erratas on this SoC). It also
> sometimes trigger the "not all macroblock decoded" warning I added recently, and
> we can empty IRQs, but these are handled now.
> 

This doesn't sound like just a VPU issue; it's related to the display or 
DDR.
If not displayed, do the fluster test cases yield different results at 
600MHz and 300MHz?

>>
>> Test results with 3840x2160@60fps HEVC stream decoded to NV12
>> (the same scenario that exhibited pixel errors previously):
>>
>> 300MHz performance:
>> - Severe frame dropping throughout playback
>> - Only 336 frames rendered in 11:53 (0.471 fps)
>> - Continuous "A lot of buffers are being dropped" warnings
>> - Completely unusable for 4K video
>>
>> 600MHz performance:
>> - Smooth playback with only 1 frame dropped at startup
>> - 37981 frames rendered in 10:34 (59.857 fps)
>> - Achieves target 60fps performance
>> - No pixel errors or artifacts observed
> 
> That probably only true with the upstream DCSS + a small resolution embedded
> panel ? Can you clarify this setup, because the display drivers mainline are
> very minimal. Would be nice to show you average DDR read/write bandwidth
> utilization during this run for comparision.

My display is hdmi, I'll try the DCSS.
And the DDR bandwidth results measured by perf are as follows:

  Performance counter stats for 'system wide':

       113303664278      imx8_ddr0/read-cycles/
        82457075530      imx8_ddr0/write-cycles/

      634.892101865 seconds time elapsed

> 
> Another information that bugs me, in the BSP code, the G2 voltage is increased
> too, which you didn't do here. They also use the thermal 2 zone to kick it down
> to 300 until it cools down.
> 

In our internal code, whenever the frequency of either g1 or g2 reaches
600MHz, the voltage is adjusted to 1.0V. Since g1 is already set to 600
MHz in the upstream DTS, I believe the default version is already 1.0v.

And do you mean vpu-thermal? But it doesn't define the cooling-map, I'm
not sure how it works.

		vpu-thermal {
			polling-delay-passive = <250>;
			polling-delay = <2000>;
			thermal-sensors = <&tmu 2>;

			trips {
				vpu-crit {
					temperature = <90000>;
					hysteresis = <2000>;
					type = "critical";
				};
			};
		};

Regards,
Ming

> Nicolas
> 
>>
>> Restore the clock to 600MHz to enable proper 4K60fps decoding
>> capability while maintaining stability.
>>
>> Test pipeline:
>>    gst-launch-1.0 filesrc location=<4K60_HEVC.mkv> ! \
>>      video/x-matroska ! aiurdemux ! h265parse ! \
>>      v4l2slh265dec ! video/x-raw,format=NV12 ! \
>>      queue ! waylandsink
>>
>> Fixes: b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock")
>> Signed-off-by: Ming Qian <ming.qian@oss.nxp.com>
>> ---
>>   arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
>> b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
>> index 607962f807be..731142176625 100644
>> --- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
>> +++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
>> @@ -960,7 +960,7 @@ pgc_vpu: power-domain@6 {
>>   									
>> <&clk IMX8MQ_SYS1_PLL_800M>,
>>   									
>> <&clk IMX8MQ_VPU_PLL>;
>>   						assigned-clock-rates =
>> <600000000>,
>> -								
>> <300000000>,
>> +								
>> <600000000>,
>>   								
>> <800000000>,
>>   								       <0>;
>>   					};
>>
>> base-commit: c824345288d11e269ce41b36c105715bc2286050
>> prerequisite-patch-id: 0000000000000000000000000000000000000000
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Nicolas Dufresne 4 days, 20 hours ago
Le lundi 02 février 2026 à 15:44 +0800, Ming Qian(OSS) a écrit :
> Hi Nicolas,
> 
> On 1/30/2026 10:47 PM, Nicolas Dufresne wrote:
> > Hi,
> > 
> > Le vendredi 30 janvier 2026 à 16:41 +0800, ming.qian@oss.nxp.com a écrit :
> > > From: Ming Qian <ming.qian@oss.nxp.com>
> > > 
> > > The VPU G2 clock was reduced from 600MHz to 300MHz in commit
> > > b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock") to address
> > > pixel errors with high-resolution HEVC postprocessor output.
> > > 
> > > However, testing shows the 300MHz clock rate is insufficient for
> > > 4K60fps decoding and the original pixel errors no longer occur at
> > > 600MHz with current drivers.
> > 
> > Tested on EVK, with the downstream DCSS driver, and this change triggers DCSS
> > underrun (which is related to the DRAM QoS erratas on this SoC). It also
> > sometimes trigger the "not all macroblock decoded" warning I added recently, and
> > we can empty IRQs, but these are handled now.
> > 
> 
> This doesn't sound like just a VPU issue; it's related to the display or 
> DDR.
> If not displayed, do the fluster test cases yield different results at 
> 600MHz and 300MHz?

Didn't you run these tests before sending ? I can try again, but in my internal
notes, I wrote:

  > Tested that, and everything becomes unstable

That was before I figure-out the IRQ handler didn't handle exception bits that
didn't stop the decoder (or dry IRQ, which strangely is common from the G2).

> 
> > > 
> > > Test results with 3840x2160@60fps HEVC stream decoded to NV12
> > > (the same scenario that exhibited pixel errors previously):
> > > 
> > > 300MHz performance:
> > > - Severe frame dropping throughout playback
> > > - Only 336 frames rendered in 11:53 (0.471 fps)
> > > - Continuous "A lot of buffers are being dropped" warnings
> > > - Completely unusable for 4K video
> > > 
> > > 600MHz performance:
> > > - Smooth playback with only 1 frame dropped at startup
> > > - 37981 frames rendered in 10:34 (59.857 fps)
> > > - Achieves target 60fps performance
> > > - No pixel errors or artifacts observed
> > 
> > That probably only true with the upstream DCSS + a small resolution embedded
> > panel ? Can you clarify this setup, because the display drivers mainline are
> > very minimal. Would be nice to show you average DDR read/write bandwidth
> > utilization during this run for comparision.
> 
> My display is hdmi, I'll try the DCSS.
> And the DDR bandwidth results measured by perf are as follows:
> 
>   Performance counter stats for 'system wide':
> 
>        113303664278      imx8_ddr0/read-cycles/
>         82457075530      imx8_ddr0/write-cycles/
> 
>       634.892101865 seconds time elapsed
> 
> > 
> > Another information that bugs me, in the BSP code, the G2 voltage is increased
> > too, which you didn't do here. They also use the thermal 2 zone to kick it down
> > to 300 until it cools down.
> > 
> 
> In our internal code, whenever the frequency of either g1 or g2 reaches
> 600MHz, the voltage is adjusted to 1.0V. Since g1 is already set to 600
> MHz in the upstream DTS, I believe the default version is already 1.0v.
> 
> And do you mean vpu-thermal? But it doesn't define the cooling-map, I'm
> not sure how it works.
> 
> 		vpu-thermal {
> 			polling-delay-passive = <250>;
> 			polling-delay = <2000>;
> 			thermal-sensors = <&tmu 2>;
> 
> 			trips {
> 				vpu-crit {
> 					temperature = <90000>;
> 					hysteresis = <2000>;
> 					type = "critical";
> 				};
> 			};
> 		};

Its not:

 $> cat /sys/kernel/debug/regulator/regulator_summary | grep SW1C
 SW1C                             1    1      0 unknown   900mV     0mA   825mV  1100


Before I gave up on 60Hz on this SoC, I did test raising it to 1v with this
patch (hopefully there is a way to do that in DT, would be more elegant):


diff --git a/drivers/pmdomain/imx/gpcv2.c b/drivers/pmdomain/imx/gpcv2.c
index 4b828d74a606..2f2b85ca6fd2 100644
--- a/drivers/pmdomain/imx/gpcv2.c
+++ b/drivers/pmdomain/imx/gpcv2.c
@@ -639,6 +639,7 @@ static const struct imx_pgc_domain imx8m_pgc_domains[] = {
 		},
 		.pgc   = BIT(IMX8M_PGC_VPU),
 		.keep_clocks = true,
+		.voltage   = 1000000,
 	},
 
 	[IMX8M_POWER_DOMAIN_DISP] = {

I would also like to remind you your own erratas, in the errata document you
state that DRAM QoS is broken, and all transactions are treated with the same
priority. If you overload the bandwidth, it becomes fatal for the display
controller. We tried to workaround with changing the NoC configuration, but it
did not work. It feels like that NoC granularity is not enough to prevent
underrun of the display controller (where the QoS would work, since its done at
transaction level, not by measuring bandwidth).

Nicolas

> 
> Regards,
> Ming
> 
> > Nicolas
> > 
> > > 
> > > Restore the clock to 600MHz to enable proper 4K60fps decoding
> > > capability while maintaining stability.
> > > 
> > > Test pipeline:
> > >    gst-launch-1.0 filesrc location=<4K60_HEVC.mkv> ! \
> > >      video/x-matroska ! aiurdemux ! h265parse ! \
> > >      v4l2slh265dec ! video/x-raw,format=NV12 ! \
> > >      queue ! waylandsink
> > > 
> > > Fixes: b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock")
> > > Signed-off-by: Ming Qian <ming.qian@oss.nxp.com>
> > > ---
> > >   arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> > > b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> > > index 607962f807be..731142176625 100644
> > > --- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> > > +++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> > > @@ -960,7 +960,7 @@ pgc_vpu: power-domain@6 {
> > >   									
> > > <&clk IMX8MQ_SYS1_PLL_800M>,
> > >   									
> > > <&clk IMX8MQ_VPU_PLL>;
> > >   						assigned-clock-rates =
> > > <600000000>,
> > > -								
> > > <300000000>,
> > > +								
> > > <600000000>,
> > >   								
> > > <800000000>,
> > >   								       <0>;
> > >   					};
> > > 
> > > base-commit: c824345288d11e269ce41b36c105715bc2286050
> > > prerequisite-patch-id: 0000000000000000000000000000000000000000
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Nicolas Dufresne 4 days, 19 hours ago
Hi,

Le lundi 02 février 2026 à 13:44 -0500, Nicolas Dufresne a écrit :
> > This doesn't sound like just a VPU issue; it's related to the display or 
> > DDR.
> > If not displayed, do the fluster test cases yield different results at 
> > 600MHz and 300MHz?
> 
> Didn't you run these tests before sending ? I can try again, but in my
> internal
> notes, I wrote:
> 
>   > Tested that, and everything becomes unstable
> 
> That was before I figure-out the IRQ handler didn't handle exception bits that
> didn't stop the decoder (or dry IRQ, which strangely is common from the G2).

Ran some fluster tests now. With this patch the results is not consistent
anymore. Then I ran it with weston being started, and in the middle of the test
the display turned black. Matches my past observation. We did reproduce this on
BSP kernel too. When the display goes black, the recent hantro drivers reports:

[  827.581586] hantro-vpu 38310000.video-codec: frame decode timed out.
[  827.720201] hantro-vpu 38310000.video-codec: not all macroblocks were
decoded.


I have local patches to reduce the cascade of errors, so it likely survived
longer then last time. I will send these patches soon. The "not all macroblocks
were decoded." is triggered by a bit in the status register that is not
documented in NXP TRM. I found that bit in some VC8000D documentation (the
sucessor of G2). I concluded it was the same meaning after looking at the failed
buffer visually, it is indeed missing couple of macroblocks near th end. Each
time we see this error, the DCSS gives up and turn either black, or sometimes
other color. The second case has been tracked to a DCSS Scaler underrun, the
first we don't know.

Fluster command ran (two threads, never completes):

./fluster.py run -d GStreamer-H.265-V4L2SL-Gst1.0 -ts JCT-VC-HEVC_V1 -j2 -t90

Nicolas
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Ming Qian(OSS) 4 days, 7 hours ago
Hi Nicolas,

On 2/3/2026 3:12 AM, Nicolas Dufresne wrote:
> Hi,
> 
> Le lundi 02 février 2026 à 13:44 -0500, Nicolas Dufresne a écrit :
>>> This doesn't sound like just a VPU issue; it's related to the display or
>>> DDR.
>>> If not displayed, do the fluster test cases yield different results at
>>> 600MHz and 300MHz?
>>
>> Didn't you run these tests before sending ? I can try again, but in my
>> internal
>> notes, I wrote:
>>
>>    > Tested that, and everything becomes unstable
>>
>> That was before I figure-out the IRQ handler didn't handle exception bits that
>> didn't stop the decoder (or dry IRQ, which strangely is common from the G2).
> 
> Ran some fluster tests now. With this patch the results is not consistent
> anymore. Then I ran it with weston being started, and in the middle of the test
> the display turned black. Matches my past observation. We did reproduce this on
> BSP kernel too. When the display goes black, the recent hantro drivers reports:
> 
> [  827.581586] hantro-vpu 38310000.video-codec: frame decode timed out.
> [  827.720201] hantro-vpu 38310000.video-codec: not all macroblocks were
> decoded.
> 
> 
> I have local patches to reduce the cascade of errors, so it likely survived
> longer then last time. I will send these patches soon. The "not all macroblocks
> were decoded." is triggered by a bit in the status register that is not
> documented in NXP TRM. I found that bit in some VC8000D documentation (the
> sucessor of G2). I concluded it was the same meaning after looking at the failed
> buffer visually, it is indeed missing couple of macroblocks near th end. Each
> time we see this error, the DCSS gives up and turn either black, or sometimes
> other color. The second case has been tracked to a DCSS Scaler underrun, the
> first we don't know.
> 
> Fluster command ran (two threads, never completes):
> 
> ./fluster.py run -d GStreamer-H.265-V4L2SL-Gst1.0 -ts JCT-VC-HEVC_V1 -j2 -t90
> 
> Nicolas

My test results for fluster differ from yours.
On my end, the results for JCT-VC-HEVC_V1 are consistent at both 300MHz 
and 600MHz.
And results remained unchanged after multiple tests.

I'm not sure what caused the differences between us.

Below are my test results:

600Mhz, 0.9v
	cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
	 SW1C                             0    1      0 unknown   900mV     0mA 
   825mV  1100mV
	cat /sys/kernel/debug/clk/vpu_g2/clk_rate
	600000000

	./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 
-j2 -t 90
	****************************************************************************************************
	Running test suite JCT-VC-HEVC_V1 with decoder 
GStreamer-H.265-V4L2SL-Gst1.0
	Using 2 parallel job(s)
	****************************************************************************************************

	Ran 139/147 tests successfully               in 505.434 secs
	Ran 139/147 tests successfully               in 505.350 secs
	Ran 139/147 tests successfully               in 507.540 secs

600Mhz, 1.0v
	cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
	 SW1C                             0    1      0 unknown  1000mV     0mA 
   825mV  1100mV
	cat /sys/kernel/debug/clk/vpu_g2/clk_rate
	600000000

	./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 
-j2 -t 90
	Ran 139/147 tests successfully               in 506.901 secs

300Mhz, 0.9v
	cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
	 SW1C                             0    1      0 unknown   900mV     0mA 
   825mV  1100mV
	cat /sys/kernel/debug/clk/vpu_g2/clk_rate
	300000000

	./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 
-j2 -t 90
	Ran 139/147 tests successfully               in 506.063 secs

Downstream v4l2 driver
	cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
	 SW1C                             0    2      0 unknown  1000mV     0mA 
   825mV  1100mV
	cat /sys/kernel/debug/clk/vpu_g2/clk_rate
	600000000

	./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2-Gst1.0 -j2 
-t 90
	Ran 136/147 tests successfully               in 460.435 secs

Regards,
Ming
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Lucas Stach 4 days, 6 hours ago
Hi,

Am Dienstag, dem 03.02.2026 um 15:13 +0800 schrieb Ming Qian(OSS):
> Hi Nicolas,
> 
> On 2/3/2026 3:12 AM, Nicolas Dufresne wrote:
> > Hi,
> > 
> > Le lundi 02 février 2026 à 13:44 -0500, Nicolas Dufresne a écrit :
> > > > This doesn't sound like just a VPU issue; it's related to the display or
> > > > DDR.
> > > > If not displayed, do the fluster test cases yield different results at
> > > > 600MHz and 300MHz?
> > > 
> > > Didn't you run these tests before sending ? I can try again, but in my
> > > internal
> > > notes, I wrote:
> > > 
> > >    > Tested that, and everything becomes unstable
> > > 
> > > That was before I figure-out the IRQ handler didn't handle exception bits that
> > > didn't stop the decoder (or dry IRQ, which strangely is common from the G2).
> > 
> > Ran some fluster tests now. With this patch the results is not consistent
> > anymore. Then I ran it with weston being started, and in the middle of the test
> > the display turned black. Matches my past observation. We did reproduce this on
> > BSP kernel too. When the display goes black, the recent hantro drivers reports:
> > 
> > [  827.581586] hantro-vpu 38310000.video-codec: frame decode timed out.
> > [  827.720201] hantro-vpu 38310000.video-codec: not all macroblocks were
> > decoded.
> > 
> > 
> > I have local patches to reduce the cascade of errors, so it likely survived
> > longer then last time. I will send these patches soon. The "not all macroblocks
> > were decoded." is triggered by a bit in the status register that is not
> > documented in NXP TRM. I found that bit in some VC8000D documentation (the
> > sucessor of G2). I concluded it was the same meaning after looking at the failed
> > buffer visually, it is indeed missing couple of macroblocks near th end. Each
> > time we see this error, the DCSS gives up and turn either black, or sometimes
> > other color. The second case has been tracked to a DCSS Scaler underrun, the
> > first we don't know.
> > 
> > Fluster command ran (two threads, never completes):
> > 
> > ./fluster.py run -d GStreamer-H.265-V4L2SL-Gst1.0 -ts JCT-VC-HEVC_V1 -j2 -t90
> > 
> > Nicolas
> 
> My test results for fluster differ from yours.
> On my end, the results for JCT-VC-HEVC_V1 are consistent at both 300MHz 
> and 600MHz.
> And results remained unchanged after multiple tests.
> 
> I'm not sure what caused the differences between us.
> 
> Below are my test results:
> 
> 600Mhz, 0.9v
> 	cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
> 	 SW1C                             0    1      0 unknown   900mV     0mA 
>    825mV  1100mV
> 	cat /sys/kernel/debug/clk/vpu_g2/clk_rate
> 	600000000

You are driving the SoC out of spec. The datasheet clearly states that
you need a 1000mV typical voltage for 600MHz VPU clock.

If you drive the SoC outside of those ratings it squarely depends on
the individual SoC if it will tolerate the too low voltage without
errors. Some SoCs land on the better side of PVT curve and will run at
the higher speed without issues, but some will not and will exhibit
random issues outside of the datasheet provided specs.

There isn't much to discuss here. The upstream DT for the i.MX8MQ runs
all the clocks at a rate to meet the nominal drive voltage specs. If
some peripheral clock does violate this, this is a bug not a feature to
replicate in new patches.

Regards,
Lucas
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Ming Qian(OSS) 4 days, 5 hours ago
Hi Lucas,

On 2/3/2026 5:04 PM, Lucas Stach wrote:
> Hi,
> 
> Am Dienstag, dem 03.02.2026 um 15:13 +0800 schrieb Ming Qian(OSS):
>> Hi Nicolas,
>>
>> On 2/3/2026 3:12 AM, Nicolas Dufresne wrote:
>>> Hi,
>>>
>>> Le lundi 02 février 2026 à 13:44 -0500, Nicolas Dufresne a écrit :
>>>>> This doesn't sound like just a VPU issue; it's related to the display or
>>>>> DDR.
>>>>> If not displayed, do the fluster test cases yield different results at
>>>>> 600MHz and 300MHz?
>>>>
>>>> Didn't you run these tests before sending ? I can try again, but in my
>>>> internal
>>>> notes, I wrote:
>>>>
>>>>     > Tested that, and everything becomes unstable
>>>>
>>>> That was before I figure-out the IRQ handler didn't handle exception bits that
>>>> didn't stop the decoder (or dry IRQ, which strangely is common from the G2).
>>>
>>> Ran some fluster tests now. With this patch the results is not consistent
>>> anymore. Then I ran it with weston being started, and in the middle of the test
>>> the display turned black. Matches my past observation. We did reproduce this on
>>> BSP kernel too. When the display goes black, the recent hantro drivers reports:
>>>
>>> [  827.581586] hantro-vpu 38310000.video-codec: frame decode timed out.
>>> [  827.720201] hantro-vpu 38310000.video-codec: not all macroblocks were
>>> decoded.
>>>
>>>
>>> I have local patches to reduce the cascade of errors, so it likely survived
>>> longer then last time. I will send these patches soon. The "not all macroblocks
>>> were decoded." is triggered by a bit in the status register that is not
>>> documented in NXP TRM. I found that bit in some VC8000D documentation (the
>>> sucessor of G2). I concluded it was the same meaning after looking at the failed
>>> buffer visually, it is indeed missing couple of macroblocks near th end. Each
>>> time we see this error, the DCSS gives up and turn either black, or sometimes
>>> other color. The second case has been tracked to a DCSS Scaler underrun, the
>>> first we don't know.
>>>
>>> Fluster command ran (two threads, never completes):
>>>
>>> ./fluster.py run -d GStreamer-H.265-V4L2SL-Gst1.0 -ts JCT-VC-HEVC_V1 -j2 -t90
>>>
>>> Nicolas
>>
>> My test results for fluster differ from yours.
>> On my end, the results for JCT-VC-HEVC_V1 are consistent at both 300MHz
>> and 600MHz.
>> And results remained unchanged after multiple tests.
>>
>> I'm not sure what caused the differences between us.
>>
>> Below are my test results:
>>
>> 600Mhz, 0.9v
>> 	cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
>> 	 SW1C                             0    1      0 unknown   900mV     0mA
>>     825mV  1100mV
>> 	cat /sys/kernel/debug/clk/vpu_g2/clk_rate
>> 	600000000
> 
> You are driving the SoC out of spec. The datasheet clearly states that
> you need a 1000mV typical voltage for 600MHz VPU clock.
> 
> If you drive the SoC outside of those ratings it squarely depends on
> the individual SoC if it will tolerate the too low voltage without
> errors. Some SoCs land on the better side of PVT curve and will run at
> the higher speed without issues, but some will not and will exhibit
> random issues outside of the datasheet provided specs.
> 
> There isn't much to discuss here. The upstream DT for the i.MX8MQ runs
> all the clocks at a rate to meet the nominal drive voltage specs. If
> some peripheral clock does violate this, this is a bug not a feature to
> replicate in new patches.
> 
> Regards,
> Lucas

I agree with you, it's meaningless that test vpu with overdriver clock
frequency and nominal drive voltage.
We should focus on the overdrive mode at a frequency of 600 MHz and a
voltage of 1.0 V.

It is my mistake that not to adjust the voltage in this patch.

Regards,
Ming
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Marco Felsch 4 days, 6 hours ago
Hi,

sorry for jumping in.

On 26-02-03, Ming Qian(OSS) wrote:
> Hi Nicolas,
> 
> On 2/3/2026 3:12 AM, Nicolas Dufresne wrote:
> > Hi,
> > 
> > Le lundi 02 février 2026 à 13:44 -0500, Nicolas Dufresne a écrit :
> > > > This doesn't sound like just a VPU issue; it's related to the display or
> > > > DDR.
> > > > If not displayed, do the fluster test cases yield different results at
> > > > 600MHz and 300MHz?
> > > 
> > > Didn't you run these tests before sending ? I can try again, but in my
> > > internal
> > > notes, I wrote:
> > > 
> > >    > Tested that, and everything becomes unstable
> > > 
> > > That was before I figure-out the IRQ handler didn't handle exception bits that
> > > didn't stop the decoder (or dry IRQ, which strangely is common from the G2).
> > 
> > Ran some fluster tests now. With this patch the results is not consistent
> > anymore. Then I ran it with weston being started, and in the middle of the test
> > the display turned black. Matches my past observation. We did reproduce this on
> > BSP kernel too. When the display goes black, the recent hantro drivers reports:
> > 
> > [  827.581586] hantro-vpu 38310000.video-codec: frame decode timed out.
> > [  827.720201] hantro-vpu 38310000.video-codec: not all macroblocks were
> > decoded.
> > 
> > 
> > I have local patches to reduce the cascade of errors, so it likely survived
> > longer then last time. I will send these patches soon. The "not all macroblocks
> > were decoded." is triggered by a bit in the status register that is not
> > documented in NXP TRM. I found that bit in some VC8000D documentation (the
> > sucessor of G2). I concluded it was the same meaning after looking at the failed
> > buffer visually, it is indeed missing couple of macroblocks near th end. Each
> > time we see this error, the DCSS gives up and turn either black, or sometimes
> > other color. The second case has been tracked to a DCSS Scaler underrun, the
> > first we don't know.
> > 
> > Fluster command ran (two threads, never completes):
> > 
> > ./fluster.py run -d GStreamer-H.265-V4L2SL-Gst1.0 -ts JCT-VC-HEVC_V1 -j2 -t90
> > 
> > Nicolas
> 
> My test results for fluster differ from yours.
> On my end, the results for JCT-VC-HEVC_V1 are consistent at both 300MHz and
> 600MHz.
> And results remained unchanged after multiple tests.
> 
> I'm not sure what caused the differences between us.

Once it comes to system stability, you need to ensure that your
bootstack is aligned e.g. same TF-A version and sometimes same
bootloader since there might be workarounds/erratum applied by the boot
firmware.

Regards,
  Marco

> 
> Below are my test results:
> 
> 600Mhz, 0.9v
> 	cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
> 	 SW1C                             0    1      0 unknown   900mV     0mA
> 825mV  1100mV
> 	cat /sys/kernel/debug/clk/vpu_g2/clk_rate
> 	600000000
> 
> 	./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
> 90
> 	****************************************************************************************************
> 	Running test suite JCT-VC-HEVC_V1 with decoder
> GStreamer-H.265-V4L2SL-Gst1.0
> 	Using 2 parallel job(s)
> 	****************************************************************************************************
> 
> 	Ran 139/147 tests successfully               in 505.434 secs
> 	Ran 139/147 tests successfully               in 505.350 secs
> 	Ran 139/147 tests successfully               in 507.540 secs
> 
> 600Mhz, 1.0v
> 	cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
> 	 SW1C                             0    1      0 unknown  1000mV     0mA
> 825mV  1100mV
> 	cat /sys/kernel/debug/clk/vpu_g2/clk_rate
> 	600000000
> 
> 	./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
> 90
> 	Ran 139/147 tests successfully               in 506.901 secs
> 
> 300Mhz, 0.9v
> 	cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
> 	 SW1C                             0    1      0 unknown   900mV     0mA
> 825mV  1100mV
> 	cat /sys/kernel/debug/clk/vpu_g2/clk_rate
> 	300000000
> 
> 	./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
> 90
> 	Ran 139/147 tests successfully               in 506.063 secs
> 
> Downstream v4l2 driver
> 	cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
> 	 SW1C                             0    2      0 unknown  1000mV     0mA
> 825mV  1100mV
> 	cat /sys/kernel/debug/clk/vpu_g2/clk_rate
> 	600000000
> 
> 	./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2-Gst1.0 -j2 -t
> 90
> 	Ran 136/147 tests successfully               in 460.435 secs
> 
> Regards,
> Ming
> 
> 

-- 
#gernperDu 
#CallMeByMyFirstName

Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | https://www.pengutronix.de/ |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-9    |
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Ming Qian(OSS) 4 days, 6 hours ago
Hi Marco,

On 2/3/2026 4:31 PM, Marco Felsch wrote:
> [You don't often get email from m.felsch@pengutronix.de. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hi,
> 
> sorry for jumping in.
> 
> On 26-02-03, Ming Qian(OSS) wrote:
>> Hi Nicolas,
>>
>> On 2/3/2026 3:12 AM, Nicolas Dufresne wrote:
>>> Hi,
>>>
>>> Le lundi 02 février 2026 à 13:44 -0500, Nicolas Dufresne a écrit :
>>>>> This doesn't sound like just a VPU issue; it's related to the display or
>>>>> DDR.
>>>>> If not displayed, do the fluster test cases yield different results at
>>>>> 600MHz and 300MHz?
>>>>
>>>> Didn't you run these tests before sending ? I can try again, but in my
>>>> internal
>>>> notes, I wrote:
>>>>
>>>>     > Tested that, and everything becomes unstable
>>>>
>>>> That was before I figure-out the IRQ handler didn't handle exception bits that
>>>> didn't stop the decoder (or dry IRQ, which strangely is common from the G2).
>>>
>>> Ran some fluster tests now. With this patch the results is not consistent
>>> anymore. Then I ran it with weston being started, and in the middle of the test
>>> the display turned black. Matches my past observation. We did reproduce this on
>>> BSP kernel too. When the display goes black, the recent hantro drivers reports:
>>>
>>> [  827.581586] hantro-vpu 38310000.video-codec: frame decode timed out.
>>> [  827.720201] hantro-vpu 38310000.video-codec: not all macroblocks were
>>> decoded.
>>>
>>>
>>> I have local patches to reduce the cascade of errors, so it likely survived
>>> longer then last time. I will send these patches soon. The "not all macroblocks
>>> were decoded." is triggered by a bit in the status register that is not
>>> documented in NXP TRM. I found that bit in some VC8000D documentation (the
>>> sucessor of G2). I concluded it was the same meaning after looking at the failed
>>> buffer visually, it is indeed missing couple of macroblocks near th end. Each
>>> time we see this error, the DCSS gives up and turn either black, or sometimes
>>> other color. The second case has been tracked to a DCSS Scaler underrun, the
>>> first we don't know.
>>>
>>> Fluster command ran (two threads, never completes):
>>>
>>> ./fluster.py run -d GStreamer-H.265-V4L2SL-Gst1.0 -ts JCT-VC-HEVC_V1 -j2 -t90
>>>
>>> Nicolas
>>
>> My test results for fluster differ from yours.
>> On my end, the results for JCT-VC-HEVC_V1 are consistent at both 300MHz and
>> 600MHz.
>> And results remained unchanged after multiple tests.
>>
>> I'm not sure what caused the differences between us.
> 
> Once it comes to system stability, you need to ensure that your
> bootstack is aligned e.g. same TF-A version and sometimes same
> bootloader since there might be workarounds/erratum applied by the boot
> firmware.
> 
> Regards,
>    Marco
> 

Thanks for the reminder, and I agree.
I think we need to align our board environment first.

Regards,
Ming

>>
>> Below are my test results:
>>
>> 600Mhz, 0.9v
>>        cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
>>         SW1C                             0    1      0 unknown   900mV     0mA
>> 825mV  1100mV
>>        cat /sys/kernel/debug/clk/vpu_g2/clk_rate
>>        600000000
>>
>>        ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
>> 90
>>        ****************************************************************************************************
>>        Running test suite JCT-VC-HEVC_V1 with decoder
>> GStreamer-H.265-V4L2SL-Gst1.0
>>        Using 2 parallel job(s)
>>        ****************************************************************************************************
>>
>>        Ran 139/147 tests successfully               in 505.434 secs
>>        Ran 139/147 tests successfully               in 505.350 secs
>>        Ran 139/147 tests successfully               in 507.540 secs
>>
>> 600Mhz, 1.0v
>>        cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
>>         SW1C                             0    1      0 unknown  1000mV     0mA
>> 825mV  1100mV
>>        cat /sys/kernel/debug/clk/vpu_g2/clk_rate
>>        600000000
>>
>>        ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
>> 90
>>        Ran 139/147 tests successfully               in 506.901 secs
>>
>> 300Mhz, 0.9v
>>        cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
>>         SW1C                             0    1      0 unknown   900mV     0mA
>> 825mV  1100mV
>>        cat /sys/kernel/debug/clk/vpu_g2/clk_rate
>>        300000000
>>
>>        ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
>> 90
>>        Ran 139/147 tests successfully               in 506.063 secs
>>
>> Downstream v4l2 driver
>>        cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
>>         SW1C                             0    2      0 unknown  1000mV     0mA
>> 825mV  1100mV
>>        cat /sys/kernel/debug/clk/vpu_g2/clk_rate
>>        600000000
>>
>>        ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2-Gst1.0 -j2 -t
>> 90
>>        Ran 136/147 tests successfully               in 460.435 secs
>>
>> Regards,
>> Ming
>>
>>
> 
> --
> #gernperDu
> #CallMeByMyFirstName
> 
> Pengutronix e.K.                           |                             |
> Steuerwalder Str. 21                       | https://www.pengutronix.de/ |
> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-9    |

Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Nicolas Dufresne 3 days, 23 hours ago
Le mardi 03 février 2026 à 16:53 +0800, Ming Qian(OSS) a écrit :
> Hi Marco,
> 
> On 2/3/2026 4:31 PM, Marco Felsch wrote:
> > [You don't often get email from m.felsch@pengutronix.de. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> > 
> > Hi,
> > 
> > sorry for jumping in.
> > 
> > On 26-02-03, Ming Qian(OSS) wrote:
> > > Hi Nicolas,
> > > 
> > > On 2/3/2026 3:12 AM, Nicolas Dufresne wrote:
> > > > Hi,
> > > > 
> > > > Le lundi 02 février 2026 à 13:44 -0500, Nicolas Dufresne a écrit :
> > > > > > This doesn't sound like just a VPU issue; it's related to the display or
> > > > > > DDR.
> > > > > > If not displayed, do the fluster test cases yield different results at
> > > > > > 600MHz and 300MHz?
> > > > > 
> > > > > Didn't you run these tests before sending ? I can try again, but in my
> > > > > internal
> > > > > notes, I wrote:
> > > > > 
> > > > >     > Tested that, and everything becomes unstable
> > > > > 
> > > > > That was before I figure-out the IRQ handler didn't handle exception bits that
> > > > > didn't stop the decoder (or dry IRQ, which strangely is common from the G2).
> > > > 
> > > > Ran some fluster tests now. With this patch the results is not consistent
> > > > anymore. Then I ran it with weston being started, and in the middle of the test
> > > > the display turned black. Matches my past observation. We did reproduce this on
> > > > BSP kernel too. When the display goes black, the recent hantro drivers reports:
> > > > 
> > > > [  827.581586] hantro-vpu 38310000.video-codec: frame decode timed out.
> > > > [  827.720201] hantro-vpu 38310000.video-codec: not all macroblocks were
> > > > decoded.
> > > > 
> > > > 
> > > > I have local patches to reduce the cascade of errors, so it likely survived
> > > > longer then last time. I will send these patches soon. The "not all macroblocks
> > > > were decoded." is triggered by a bit in the status register that is not
> > > > documented in NXP TRM. I found that bit in some VC8000D documentation (the
> > > > sucessor of G2). I concluded it was the same meaning after looking at the failed
> > > > buffer visually, it is indeed missing couple of macroblocks near th end. Each
> > > > time we see this error, the DCSS gives up and turn either black, or sometimes
> > > > other color. The second case has been tracked to a DCSS Scaler underrun, the
> > > > first we don't know.
> > > > 
> > > > Fluster command ran (two threads, never completes):
> > > > 
> > > > ./fluster.py run -d GStreamer-H.265-V4L2SL-Gst1.0 -ts JCT-VC-HEVC_V1 -j2 -t90
> > > > 
> > > > Nicolas
> > > 
> > > My test results for fluster differ from yours.
> > > On my end, the results for JCT-VC-HEVC_V1 are consistent at both 300MHz and
> > > 600MHz.
> > > And results remained unchanged after multiple tests.

After more testing, the fluster test is stable for NV12/NV15 tiled output for me
too. I'm running the tests with linear NV12/P010, which imply an extra set of
buffer. I will check if I can give you a easy way to test the linear formats. I
also have couple of streams that systematically breaks at specific spot (high
complexity scenes) with the provided patch. As most licensed content, this is
not sharable as-is. I will try and see if I can find a way to share something.

> > > 
> > > I'm not sure what caused the differences between us.
> > 
> > Once it comes to system stability, you need to ensure that your
> > bootstack is aligned e.g. same TF-A version and sometimes same
> > bootloader since there might be workarounds/erratum applied by the boot
> > firmware.
> > 
> > Regards,
> >    Marco
> > 
> 
> Thanks for the reminder, and I agree.
> I think we need to align our board environment first.

I do likely have slightly different bootchain, and of course all the HDMI
component are downstream, but I can't really isolate the dramatic issue of this
overclock without a display component of some sort. Its a huge differentiator in
the bandwidth consumption which is the main challenge on this SoC so far. 10bit
videos makes things a lot worse fwiw.

We did review latest IMX vendor firmware package and can confirm we are running
the latest memory training blob and HDMI firmware.

Nicolas

> Regards,
> Ming
> 
> > > 
> > > Below are my test results:
> > > 
> > > 600Mhz, 0.9v
> > >        cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
> > >         SW1C                             0    1      0 unknown   900mV     0mA
> > > 825mV  1100mV
> > >        cat /sys/kernel/debug/clk/vpu_g2/clk_rate
> > >        600000000
> > > 
> > >        ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
> > > 90
> > >        ****************************************************************************************************
> > >        Running test suite JCT-VC-HEVC_V1 with decoder
> > > GStreamer-H.265-V4L2SL-Gst1.0
> > >        Using 2 parallel job(s)
> > >        ****************************************************************************************************
> > > 
> > >        Ran 139/147 tests successfully               in 505.434 secs
> > >        Ran 139/147 tests successfully               in 505.350 secs
> > >        Ran 139/147 tests successfully               in 507.540 secs
> > > 
> > > 600Mhz, 1.0v
> > >        cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
> > >         SW1C                             0    1      0 unknown  1000mV     0mA
> > > 825mV  1100mV
> > >        cat /sys/kernel/debug/clk/vpu_g2/clk_rate
> > >        600000000
> > > 
> > >        ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
> > > 90
> > >        Ran 139/147 tests successfully               in 506.901 secs
> > > 
> > > 300Mhz, 0.9v
> > >        cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
> > >         SW1C                             0    1      0 unknown   900mV     0mA
> > > 825mV  1100mV
> > >        cat /sys/kernel/debug/clk/vpu_g2/clk_rate
> > >        300000000
> > > 
> > >        ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
> > > 90
> > >        Ran 139/147 tests successfully               in 506.063 secs
> > > 
> > > Downstream v4l2 driver
> > >        cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
> > >         SW1C                             0    2      0 unknown  1000mV     0mA
> > > 825mV  1100mV
> > >        cat /sys/kernel/debug/clk/vpu_g2/clk_rate
> > >        600000000
> > > 
> > >        ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2-Gst1.0 -j2 -t
> > > 90
> > >        Ran 136/147 tests successfully               in 460.435 secs
> > > 
> > > Regards,
> > > Ming
> > > 
> > > 
> > 
> > --
> > #gernperDu
> > #CallMeByMyFirstName
> > 
> > Pengutronix e.K.                           |                             |
> > Steuerwalder Str. 21                       | https://www.pengutronix.de/ |
> > 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> > Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-9    |
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Ming Qian(OSS) 3 days, 12 hours ago
Hi Nicolas,

On 2/3/2026 11:40 PM, Nicolas Dufresne wrote:
> Le mardi 03 février 2026 à 16:53 +0800, Ming Qian(OSS) a écrit :
>> Hi Marco,
>>
>> On 2/3/2026 4:31 PM, Marco Felsch wrote:
>>> [You don't often get email from m.felsch@pengutronix.de. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> Hi,
>>>
>>> sorry for jumping in.
>>>
>>> On 26-02-03, Ming Qian(OSS) wrote:
>>>> Hi Nicolas,
>>>>
>>>> On 2/3/2026 3:12 AM, Nicolas Dufresne wrote:
>>>>> Hi,
>>>>>
>>>>> Le lundi 02 février 2026 à 13:44 -0500, Nicolas Dufresne a écrit :
>>>>>>> This doesn't sound like just a VPU issue; it's related to the display or
>>>>>>> DDR.
>>>>>>> If not displayed, do the fluster test cases yield different results at
>>>>>>> 600MHz and 300MHz?
>>>>>>
>>>>>> Didn't you run these tests before sending ? I can try again, but in my
>>>>>> internal
>>>>>> notes, I wrote:
>>>>>>
>>>>>>      > Tested that, and everything becomes unstable
>>>>>>
>>>>>> That was before I figure-out the IRQ handler didn't handle exception bits that
>>>>>> didn't stop the decoder (or dry IRQ, which strangely is common from the G2).
>>>>>
>>>>> Ran some fluster tests now. With this patch the results is not consistent
>>>>> anymore. Then I ran it with weston being started, and in the middle of the test
>>>>> the display turned black. Matches my past observation. We did reproduce this on
>>>>> BSP kernel too. When the display goes black, the recent hantro drivers reports:
>>>>>
>>>>> [  827.581586] hantro-vpu 38310000.video-codec: frame decode timed out.
>>>>> [  827.720201] hantro-vpu 38310000.video-codec: not all macroblocks were
>>>>> decoded.
>>>>>
>>>>>
>>>>> I have local patches to reduce the cascade of errors, so it likely survived
>>>>> longer then last time. I will send these patches soon. The "not all macroblocks
>>>>> were decoded." is triggered by a bit in the status register that is not
>>>>> documented in NXP TRM. I found that bit in some VC8000D documentation (the
>>>>> sucessor of G2). I concluded it was the same meaning after looking at the failed
>>>>> buffer visually, it is indeed missing couple of macroblocks near th end. Each
>>>>> time we see this error, the DCSS gives up and turn either black, or sometimes
>>>>> other color. The second case has been tracked to a DCSS Scaler underrun, the
>>>>> first we don't know.
>>>>>
>>>>> Fluster command ran (two threads, never completes):
>>>>>
>>>>> ./fluster.py run -d GStreamer-H.265-V4L2SL-Gst1.0 -ts JCT-VC-HEVC_V1 -j2 -t90
>>>>>
>>>>> Nicolas
>>>>
>>>> My test results for fluster differ from yours.
>>>> On my end, the results for JCT-VC-HEVC_V1 are consistent at both 300MHz and
>>>> 600MHz.
>>>> And results remained unchanged after multiple tests.
> 
> After more testing, the fluster test is stable for NV12/NV15 tiled output for me
> too. I'm running the tests with linear NV12/P010, which imply an extra set of
> buffer. I will check if I can give you a easy way to test the linear formats. I
> also have couple of streams that systematically breaks at specific spot (high
> complexity scenes) with the provided patch. As most licensed content, this is
> not sharable as-is. I will try and see if I can find a way to share something.
> 

That would be very helpful. Thank you very much.

>>>>
>>>> I'm not sure what caused the differences between us.
>>>
>>> Once it comes to system stability, you need to ensure that your
>>> bootstack is aligned e.g. same TF-A version and sometimes same
>>> bootloader since there might be workarounds/erratum applied by the boot
>>> firmware.
>>>
>>> Regards,
>>>     Marco
>>>
>>
>> Thanks for the reminder, and I agree.
>> I think we need to align our board environment first.
> 
> I do likely have slightly different bootchain, and of course all the HDMI
> component are downstream, but I can't really isolate the dramatic issue of this
> overclock without a display component of some sort. Its a huge differentiator in
> the bandwidth consumption which is the main challenge on this SoC so far. 10bit
> videos makes things a lot worse fwiw.
> 
> We did review latest IMX vendor firmware package and can confirm we are running
> the latest memory training blob and HDMI firmware.
> 
> Nicolas

Since the datasheet clearly states that the VPU G2 requires a voltage
increase to 1.0V to run at 600MHz, (I thought it was already like this,
but it actually wasn't.)
I think we can align our test conditions to this.

I will increase the G2 voltage in V2.

Regards,
Ming

> 
>> Regards,
>> Ming
>>
>>>>
>>>> Below are my test results:
>>>>
>>>> 600Mhz, 0.9v
>>>>         cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
>>>>          SW1C                             0    1      0 unknown   900mV     0mA
>>>> 825mV  1100mV
>>>>         cat /sys/kernel/debug/clk/vpu_g2/clk_rate
>>>>         600000000
>>>>
>>>>         ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
>>>> 90
>>>>         ****************************************************************************************************
>>>>         Running test suite JCT-VC-HEVC_V1 with decoder
>>>> GStreamer-H.265-V4L2SL-Gst1.0
>>>>         Using 2 parallel job(s)
>>>>         ****************************************************************************************************
>>>>
>>>>         Ran 139/147 tests successfully               in 505.434 secs
>>>>         Ran 139/147 tests successfully               in 505.350 secs
>>>>         Ran 139/147 tests successfully               in 507.540 secs
>>>>
>>>> 600Mhz, 1.0v
>>>>         cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
>>>>          SW1C                             0    1      0 unknown  1000mV     0mA
>>>> 825mV  1100mV
>>>>         cat /sys/kernel/debug/clk/vpu_g2/clk_rate
>>>>         600000000
>>>>
>>>>         ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
>>>> 90
>>>>         Ran 139/147 tests successfully               in 506.901 secs
>>>>
>>>> 300Mhz, 0.9v
>>>>         cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
>>>>          SW1C                             0    1      0 unknown   900mV     0mA
>>>> 825mV  1100mV
>>>>         cat /sys/kernel/debug/clk/vpu_g2/clk_rate
>>>>         300000000
>>>>
>>>>         ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2SL-Gst1.0 -j2 -t
>>>> 90
>>>>         Ran 139/147 tests successfully               in 506.063 secs
>>>>
>>>> Downstream v4l2 driver
>>>>         cat /sys/kernel/debug/regulator/regulator_summary  |grep SW1C
>>>>          SW1C                             0    2      0 unknown  1000mV     0mA
>>>> 825mV  1100mV
>>>>         cat /sys/kernel/debug/clk/vpu_g2/clk_rate
>>>>         600000000
>>>>
>>>>         ./fluster.py run -ts JCT-VC-HEVC_V1 -d GStreamer-H.265-V4L2-Gst1.0 -j2 -t
>>>> 90
>>>>         Ran 136/147 tests successfully               in 460.435 secs
>>>>
>>>> Regards,
>>>> Ming
>>>>
>>>>
>>>
>>> --
>>> #gernperDu
>>> #CallMeByMyFirstName
>>>
>>> Pengutronix e.K.                           |                             |
>>> Steuerwalder Str. 21                       | https://www.pengutronix.de/ |
>>> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
>>> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-9    |
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Alexander Stein 1 week, 1 day ago
Am Freitag, 30. Januar 2026, 09:41:31 CET schrieb ming.qian@oss.nxp.com:
> From: Ming Qian <ming.qian@oss.nxp.com>
> 
> The VPU G2 clock was reduced from 600MHz to 300MHz in commit
> b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock") to address
> pixel errors with high-resolution HEVC postprocessor output.
> 
> However, testing shows the 300MHz clock rate is insufficient for
> 4K60fps decoding and the original pixel errors no longer occur at
> 600MHz with current drivers.
> 
> Test results with 3840x2160@60fps HEVC stream decoded to NV12
> (the same scenario that exhibited pixel errors previously):
> 
> 300MHz performance:
> - Severe frame dropping throughout playback
> - Only 336 frames rendered in 11:53 (0.471 fps)
> - Continuous "A lot of buffers are being dropped" warnings
> - Completely unusable for 4K video
> 
> 600MHz performance:
> - Smooth playback with only 1 frame dropped at startup
> - 37981 frames rendered in 10:34 (59.857 fps)
> - Achieves target 60fps performance
> - No pixel errors or artifacts observed
> 
> Restore the clock to 600MHz to enable proper 4K60fps decoding
> capability while maintaining stability.
> 
> Test pipeline:
>   gst-launch-1.0 filesrc location=<4K60_HEVC.mkv> ! \
>     video/x-matroska ! aiurdemux ! h265parse ! \
>     v4l2slh265dec ! video/x-raw,format=NV12 ! \
>     queue ! waylandsink
> 
> Fixes: b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock")
> Signed-off-by: Ming Qian <ming.qian@oss.nxp.com>
> ---
>  arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> index 607962f807be..731142176625 100644
> --- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> +++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> @@ -960,7 +960,7 @@ pgc_vpu: power-domain@6 {
>  									 <&clk IMX8MQ_SYS1_PLL_800M>,
>  									 <&clk IMX8MQ_VPU_PLL>;
>  						assigned-clock-rates = <600000000>,
> -								       <300000000>,
> +								       <600000000>,

If I read the Datasheet correctly 600 MHz is only supported by overdrive
mode (also depending on the VDD_VPU).
Is this frequency really correct?

Best regards,
Alexander

>  								       <800000000>,
>  								       <0>;
>  					};
> 
> base-commit: c824345288d11e269ce41b36c105715bc2286050
> prerequisite-patch-id: 0000000000000000000000000000000000000000
>
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Peng Fan 5 days, 12 hours ago
On Fri, Jan 30, 2026 at 10:09:46AM +0100, Alexander Stein wrote:
>Am Freitag, 30. Januar 2026, 09:41:31 CET schrieb ming.qian@oss.nxp.com:
>> From: Ming Qian <ming.qian@oss.nxp.com>
>> 
>> The VPU G2 clock was reduced from 600MHz to 300MHz in commit
>> b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock") to address
>> pixel errors with high-resolution HEVC postprocessor output.
>> 
>> However, testing shows the 300MHz clock rate is insufficient for
>> 4K60fps decoding and the original pixel errors no longer occur at
>> 600MHz with current drivers.
>> 
>> Test results with 3840x2160@60fps HEVC stream decoded to NV12
>> (the same scenario that exhibited pixel errors previously):
>> 
>> 300MHz performance:
>> - Severe frame dropping throughout playback
>> - Only 336 frames rendered in 11:53 (0.471 fps)
>> - Continuous "A lot of buffers are being dropped" warnings
>> - Completely unusable for 4K video
>> 
>> 600MHz performance:
>> - Smooth playback with only 1 frame dropped at startup
>> - 37981 frames rendered in 10:34 (59.857 fps)
>> - Achieves target 60fps performance
>> - No pixel errors or artifacts observed
>> 
>> Restore the clock to 600MHz to enable proper 4K60fps decoding
>> capability while maintaining stability.
>> 
>> Test pipeline:
>>   gst-launch-1.0 filesrc location=<4K60_HEVC.mkv> ! \
>>     video/x-matroska ! aiurdemux ! h265parse ! \
>>     v4l2slh265dec ! video/x-raw,format=NV12 ! \
>>     queue ! waylandsink
>> 
>> Fixes: b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock")
>> Signed-off-by: Ming Qian <ming.qian@oss.nxp.com>
>> ---
>>  arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
>> index 607962f807be..731142176625 100644
>> --- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
>> +++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
>> @@ -960,7 +960,7 @@ pgc_vpu: power-domain@6 {
>>  									 <&clk IMX8MQ_SYS1_PLL_800M>,
>>  									 <&clk IMX8MQ_VPU_PLL>;
>>  						assigned-clock-rates = <600000000>,
>> -								       <300000000>,
>> +								       <600000000>,
>
>If I read the Datasheet correctly 600 MHz is only supported by overdrive
>mode (also depending on the VDD_VPU).
>Is this frequency really correct?

G1 and BUS clk were already set as Overdrive frequency.

This change is to only upgrading G2 from 300M to 600M.

So if your question is should we downgrade all to Nominal mode, I think
no. The freq could be override in board dts, or adding a new dts
as arch/arm64/boot/dts/freescale/imx8mp-nominal.dtsi

Regards
Peng

>
>Best regards,
>Alexander
>
>>  								       <800000000>,
>>  								       <0>;
>>  					};
>> 
>> base-commit: c824345288d11e269ce41b36c105715bc2286050
>> prerequisite-patch-id: 0000000000000000000000000000000000000000
>> 
>
>
>
>
Re: [PATCH] arm64: dts: imx8mq: Restore VPU G2 clock to 600MHz for 4K60fps decoding
Posted by Ming Qian(OSS) 5 days, 9 hours ago
Hi Alexander,

On 2/2/2026 10:41 AM, Peng Fan wrote:
> On Fri, Jan 30, 2026 at 10:09:46AM +0100, Alexander Stein wrote:
>> Am Freitag, 30. Januar 2026, 09:41:31 CET schrieb ming.qian@oss.nxp.com:
>>> From: Ming Qian <ming.qian@oss.nxp.com>
>>>
>>> The VPU G2 clock was reduced from 600MHz to 300MHz in commit
>>> b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock") to address
>>> pixel errors with high-resolution HEVC postprocessor output.
>>>
>>> However, testing shows the 300MHz clock rate is insufficient for
>>> 4K60fps decoding and the original pixel errors no longer occur at
>>> 600MHz with current drivers.
>>>
>>> Test results with 3840x2160@60fps HEVC stream decoded to NV12
>>> (the same scenario that exhibited pixel errors previously):
>>>
>>> 300MHz performance:
>>> - Severe frame dropping throughout playback
>>> - Only 336 frames rendered in 11:53 (0.471 fps)
>>> - Continuous "A lot of buffers are being dropped" warnings
>>> - Completely unusable for 4K video
>>>
>>> 600MHz performance:
>>> - Smooth playback with only 1 frame dropped at startup
>>> - 37981 frames rendered in 10:34 (59.857 fps)
>>> - Achieves target 60fps performance
>>> - No pixel errors or artifacts observed
>>>
>>> Restore the clock to 600MHz to enable proper 4K60fps decoding
>>> capability while maintaining stability.
>>>
>>> Test pipeline:
>>>    gst-launch-1.0 filesrc location=<4K60_HEVC.mkv> ! \
>>>      video/x-matroska ! aiurdemux ! h265parse ! \
>>>      v4l2slh265dec ! video/x-raw,format=NV12 ! \
>>>      queue ! waylandsink
>>>
>>> Fixes: b27bfc5103c7 ("arm64: dts: freescale: Fix VPU G2 clock")
>>> Signed-off-by: Ming Qian <ming.qian@oss.nxp.com>
>>> ---
>>>   arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
>>> index 607962f807be..731142176625 100644
>>> --- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
>>> +++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
>>> @@ -960,7 +960,7 @@ pgc_vpu: power-domain@6 {
>>>   									 <&clk IMX8MQ_SYS1_PLL_800M>,
>>>   									 <&clk IMX8MQ_VPU_PLL>;
>>>   						assigned-clock-rates = <600000000>,
>>> -								       <300000000>,
>>> +								       <600000000>,
>>
>> If I read the Datasheet correctly 600 MHz is only supported by overdrive
>> mode (also depending on the VDD_VPU).
>> Is this frequency really correct?
> 
> G1 and BUS clk were already set as Overdrive frequency.
> 
> This change is to only upgrading G2 from 300M to 600M.
> 
> So if your question is should we downgrade all to Nominal mode, I think
> no. The freq could be override in board dts, or adding a new dts
> as arch/arm64/boot/dts/freescale/imx8mp-nominal.dtsi
> 
> Regards
> Peng
> 
>>
>> Best regards,
>> Alexander
>>

Yes, you are right, 600MHz is the Overdriver frequency.
However, to achieve the 4K 60fps target, we set the VPU to run in
overdrive mode by default, just as Peng said.

Regards,
Ming

>>>   								       <800000000>,
>>>   								       <0>;
>>>   					};
>>>
>>> base-commit: c824345288d11e269ce41b36c105715bc2286050
>>> prerequisite-patch-id: 0000000000000000000000000000000000000000
>>>
>>
>>
>>
>>