The calls rockchip_pd_power makes to rockchip_pmu_set_idle_request lack
any return error handling, causing device drivers to incorrectly believe
the hardware idle requests succeed when they may have failed. This leads
to software possibly accessing hardware that is powered off and the
subsequent SError panic that follows.
Add error checking and return errors to the calling function to prevent
such crashes.
gst-launch-1.0 videotestsrc num-buffers=2000 ! v4l2jpegenc ! fakesink
Setting pipeline to PAUSED ...er-x64
Pipeline is PREROLLING ...
Redistribute latency...
rockchip-pm-domain ff100000.syscon:power-controller: failed to get ack on
domain 'hevc', val=0x98260, idle = 0
SError Interrupt on CPU2, code 0x00000000bf000002 -- SError
CPU: 2 UID: 0 PID: 804 Comm: videotestsrc0:s Not tainted 6.12.0-rc5+ #54
Hardware name: Firefly roc-rk3328-cc (DT)
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : rockchip_vpu2_jpeg_enc_run+0x168/0xbc8
lr : device_run+0xb0/0x128
sp : ffff800082143a20
x29: ffff800082143a20 x28: 0000000000000140 x27: 0000000000000000
x26: ffff582c47a313e8 x25: ffff582c53e95000 x24: ffff582c53e92800
x23: ffff582c5bbe0000 x22: 0000000000000000 x21: ffff582c47a31080
x20: ffffa0d78cfa4168 x19: ffffa0d78cfa4168 x18: ffffb755b0519000
x17: 000000040044ffff x16: 00500072b5503510 x15: a7a6a5a4a3a29a99
x14: 989796959493928a x13: 0000000051eb851f x12: 00000000000000ff
x11: ffffa0d78d812880 x10: ffffa0d78d7fbca0 x9 : 000000000000003f
x8 : 0000000000000063 x7 : 000000000000003f x6 : 0000000000000040
x5 : ffff80008010d000 x4 : ffffa0d78cfa4168 x3 : ffffa0d78cfbfdd8
x2 : ffff80008010d0f4 x1 : 0000000000000020 x0 : 0000000000000140
Kernel panic - not syncing: Asynchronous SError Interrupt
CPU: 2 UID: 0 PID: 804 Comm: videotestsrc0:s Not tainted 6.12.0-rc5+ #54
Hardware name: Firefly roc-rk3328-cc (DT)
Call trace:
dump_backtrace+0xa0/0x128
show_stack+0x20/0x38
dump_stack_lvl+0xc8/0xf8
dump_stack+0x18/0x28
panic+0x3ec/0x428
nmi_panic+0x48/0xa0
arm64_serror_panic+0x6c/0x88
do_serror+0x30/0x70
el1h_64_error_handler+0x38/0x60
el1h_64_error+0x7c/0x80
rockchip_vpu2_jpeg_enc_run+0x168/0xbc8
device_run+0xb0/0x128
v4l2_m2m_try_run+0xac/0x230
v4l2_m2m_ioctl_streamon+0x70/0x90
v4l_streamon+0x2c/0x40
__video_do_ioctl+0x194/0x400
video_usercopy+0x10c/0x808
video_ioctl2+0x20/0x80
v4l2_ioctl+0x48/0x70
__arm64_sys_ioctl+0xb0/0x100
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x38/0x100
el0t_64_sync_handler+0xc0/0xc8
el0t_64_sync+0x1a8/0x1b0
SMP: stopping secondary CPUs
Kernel Offset: 0x20d70c000000 from 0xffff800080000000
PHYS_OFFSET: 0xffffa7d3c0000000
CPU features: 0x00,00000090,00200000,0200421b
Memory Limit: none
---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
Fixes: 7c696693a4f5 ("soc: rockchip: power-domain: Add power domain driver")
Signed-off-by: Peter Geis <pgwipeout@gmail.com>
---
drivers/pmdomain/rockchip/pm-domains.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/pmdomain/rockchip/pm-domains.c b/drivers/pmdomain/rockchip/pm-domains.c
index cb0f93800138..57e8fa25d2bd 100644
--- a/drivers/pmdomain/rockchip/pm-domains.c
+++ b/drivers/pmdomain/rockchip/pm-domains.c
@@ -590,14 +590,18 @@ static int rockchip_pd_power(struct rockchip_pm_domain *pd, bool power_on)
rockchip_pmu_save_qos(pd);
/* if powering down, idle request to NIU first */
- rockchip_pmu_set_idle_request(pd, true);
+ ret = rockchip_pmu_set_idle_request(pd, true);
+ if (ret < 0)
+ return ret;
}
rockchip_do_pmu_set_power_domain(pd, power_on);
if (power_on) {
/* if powering up, leave idle mode */
- rockchip_pmu_set_idle_request(pd, false);
+ ret = rockchip_pmu_set_idle_request(pd, false);
+ if (ret < 0)
+ return ret;
rockchip_pmu_restore_qos(pd);
}
--
2.39.5
Hi Peter,
kernel test robot noticed the following build warnings:
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Peter-Geis/pmdomain-rockchip-fix-rockchip_pd_power-error-handling/20241210-093424
base: https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git for-next
patch link: https://lore.kernel.org/r/20241210013010.81257-2-pgwipeout%40gmail.com
patch subject: [PATCH 1/6] pmdomain: rockchip: fix rockchip_pd_power error handling
config: powerpc-randconfig-r072-20241223 (https://download.01.org/0day-ci/archive/20241224/202412240015.MfjYhpNz-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 14.2.0
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202412240015.MfjYhpNz-lkp@intel.com/
smatch warnings:
drivers/pmdomain/rockchip/pm-domains.c:614 rockchip_pd_power() warn: inconsistent returns '&pmu->mutex'.
vim +614 drivers/pmdomain/rockchip/pm-domains.c
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 572 static int rockchip_pd_power(struct rockchip_pm_domain *pd, bool power_on)
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 573 {
d909072d0521a8 drivers/soc/rockchip/pm_domains.c Jeffy Chen 2018-02-28 574 struct rockchip_pmu *pmu = pd->pmu;
d909072d0521a8 drivers/soc/rockchip/pm_domains.c Jeffy Chen 2018-02-28 575 int ret;
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 576
d909072d0521a8 drivers/soc/rockchip/pm_domains.c Jeffy Chen 2018-02-28 577 mutex_lock(&pmu->mutex);
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 578
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 579 if (rockchip_pmu_domain_is_on(pd) != power_on) {
d909072d0521a8 drivers/soc/rockchip/pm_domains.c Jeffy Chen 2018-02-28 580 ret = clk_bulk_enable(pd->num_clks, pd->clks);
d909072d0521a8 drivers/soc/rockchip/pm_domains.c Jeffy Chen 2018-02-28 581 if (ret < 0) {
d909072d0521a8 drivers/soc/rockchip/pm_domains.c Jeffy Chen 2018-02-28 582 dev_err(pmu->dev, "failed to enable clocks\n");
d909072d0521a8 drivers/soc/rockchip/pm_domains.c Jeffy Chen 2018-02-28 583 mutex_unlock(&pmu->mutex);
d909072d0521a8 drivers/soc/rockchip/pm_domains.c Jeffy Chen 2018-02-28 584 return ret;
d909072d0521a8 drivers/soc/rockchip/pm_domains.c Jeffy Chen 2018-02-28 585 }
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 586
8b579881de295d drivers/pmdomain/rockchip/pm-domains.c Detlev Casanova 2024-08-29 587 rockchip_pmu_ungate_clk(pd, true);
8b579881de295d drivers/pmdomain/rockchip/pm-domains.c Detlev Casanova 2024-08-29 588
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 589 if (!power_on) {
074c6a422d86ff drivers/soc/rockchip/pm_domains.c Elaine Zhang 2016-04-14 590 rockchip_pmu_save_qos(pd);
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 591
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 592 /* if powering down, idle request to NIU first */
a15817772c0ae6 drivers/pmdomain/rockchip/pm-domains.c Peter Geis 2024-12-10 593 ret = rockchip_pmu_set_idle_request(pd, true);
a15817772c0ae6 drivers/pmdomain/rockchip/pm-domains.c Peter Geis 2024-12-10 594 if (ret < 0)
a15817772c0ae6 drivers/pmdomain/rockchip/pm-domains.c Peter Geis 2024-12-10 595 return ret;
mutex_unlock(&pmu->mutex);
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 596 }
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 597
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 598 rockchip_do_pmu_set_power_domain(pd, power_on);
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 599
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 600 if (power_on) {
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 601 /* if powering up, leave idle mode */
a15817772c0ae6 drivers/pmdomain/rockchip/pm-domains.c Peter Geis 2024-12-10 602 ret = rockchip_pmu_set_idle_request(pd, false);
a15817772c0ae6 drivers/pmdomain/rockchip/pm-domains.c Peter Geis 2024-12-10 603 if (ret < 0)
a15817772c0ae6 drivers/pmdomain/rockchip/pm-domains.c Peter Geis 2024-12-10 604 return ret;
mutex_unlock(&pmu->mutex);
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 605
074c6a422d86ff drivers/soc/rockchip/pm_domains.c Elaine Zhang 2016-04-14 606 rockchip_pmu_restore_qos(pd);
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 607 }
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 608
8b579881de295d drivers/pmdomain/rockchip/pm-domains.c Detlev Casanova 2024-08-29 609 rockchip_pmu_ungate_clk(pd, false);
d909072d0521a8 drivers/soc/rockchip/pm_domains.c Jeffy Chen 2018-02-28 610 clk_bulk_disable(pd->num_clks, pd->clks);
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 611 }
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 612
d909072d0521a8 drivers/soc/rockchip/pm_domains.c Jeffy Chen 2018-02-28 613 mutex_unlock(&pmu->mutex);
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 @614 return 0;
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 615 }
7c696693a4f54d drivers/soc/rockchip/pm_domains.c Caesar Wang 2015-09-08 616
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hello Peter,
On 2024-12-10 02:30, Peter Geis wrote:
> The calls rockchip_pd_power makes to rockchip_pmu_set_idle_request lack
> any return error handling, causing device drivers to incorrectly
> believe
> the hardware idle requests succeed when they may have failed. This
> leads
> to software possibly accessing hardware that is powered off and the
> subsequent SError panic that follows.
>
> Add error checking and return errors to the calling function to prevent
> such crashes.
>
> gst-launch-1.0 videotestsrc num-buffers=2000 ! v4l2jpegenc ! fakesink
> Setting pipeline to PAUSED ...er-x64
> Pipeline is PREROLLING ...
> Redistribute latency...
> rockchip-pm-domain ff100000.syscon:power-controller: failed to get ack
> on
> domain 'hevc', val=0x98260, idle = 0
> SError Interrupt on CPU2, code 0x00000000bf000002 -- SError
> CPU: 2 UID: 0 PID: 804 Comm: videotestsrc0:s Not tainted 6.12.0-rc5+
> #54
> Hardware name: Firefly roc-rk3328-cc (DT)
> pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : rockchip_vpu2_jpeg_enc_run+0x168/0xbc8
> lr : device_run+0xb0/0x128
> sp : ffff800082143a20
> x29: ffff800082143a20 x28: 0000000000000140 x27: 0000000000000000
> x26: ffff582c47a313e8 x25: ffff582c53e95000 x24: ffff582c53e92800
> x23: ffff582c5bbe0000 x22: 0000000000000000 x21: ffff582c47a31080
> x20: ffffa0d78cfa4168 x19: ffffa0d78cfa4168 x18: ffffb755b0519000
> x17: 000000040044ffff x16: 00500072b5503510 x15: a7a6a5a4a3a29a99
> x14: 989796959493928a x13: 0000000051eb851f x12: 00000000000000ff
> x11: ffffa0d78d812880 x10: ffffa0d78d7fbca0 x9 : 000000000000003f
> x8 : 0000000000000063 x7 : 000000000000003f x6 : 0000000000000040
> x5 : ffff80008010d000 x4 : ffffa0d78cfa4168 x3 : ffffa0d78cfbfdd8
> x2 : ffff80008010d0f4 x1 : 0000000000000020 x0 : 0000000000000140
> Kernel panic - not syncing: Asynchronous SError Interrupt
> CPU: 2 UID: 0 PID: 804 Comm: videotestsrc0:s Not tainted 6.12.0-rc5+
> #54
> Hardware name: Firefly roc-rk3328-cc (DT)
> Call trace:
> dump_backtrace+0xa0/0x128
> show_stack+0x20/0x38
> dump_stack_lvl+0xc8/0xf8
> dump_stack+0x18/0x28
> panic+0x3ec/0x428
> nmi_panic+0x48/0xa0
> arm64_serror_panic+0x6c/0x88
> do_serror+0x30/0x70
> el1h_64_error_handler+0x38/0x60
> el1h_64_error+0x7c/0x80
> rockchip_vpu2_jpeg_enc_run+0x168/0xbc8
> device_run+0xb0/0x128
> v4l2_m2m_try_run+0xac/0x230
> v4l2_m2m_ioctl_streamon+0x70/0x90
> v4l_streamon+0x2c/0x40
> __video_do_ioctl+0x194/0x400
> video_usercopy+0x10c/0x808
> video_ioctl2+0x20/0x80
> v4l2_ioctl+0x48/0x70
> __arm64_sys_ioctl+0xb0/0x100
> invoke_syscall+0x50/0x120
> el0_svc_common.constprop.0+0x48/0xf0
> do_el0_svc+0x24/0x38
> el0_svc+0x38/0x100
> el0t_64_sync_handler+0xc0/0xc8
> el0t_64_sync+0x1a8/0x1b0
> SMP: stopping secondary CPUs
> Kernel Offset: 0x20d70c000000 from 0xffff800080000000
> PHYS_OFFSET: 0xffffa7d3c0000000
> CPU features: 0x00,00000090,00200000,0200421b
> Memory Limit: none
> ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
>
> Fixes: 7c696693a4f5 ("soc: rockchip: power-domain: Add power domain
> driver")
> Signed-off-by: Peter Geis <pgwipeout@gmail.com>
This patch is obviously correct, because not checking what
rockchip_pmu_set_idle_request() returns was simply wrong.
Thanks for the patch!
Though, shouldn't we improve further the way proper error
handling is performed in rockchip_do_pmu_set_power_domain(),
by "rolling back" what rockchip_do_pmu_set_power_domain()
did after powering up fails?
> ---
>
> drivers/pmdomain/rockchip/pm-domains.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pmdomain/rockchip/pm-domains.c
> b/drivers/pmdomain/rockchip/pm-domains.c
> index cb0f93800138..57e8fa25d2bd 100644
> --- a/drivers/pmdomain/rockchip/pm-domains.c
> +++ b/drivers/pmdomain/rockchip/pm-domains.c
> @@ -590,14 +590,18 @@ static int rockchip_pd_power(struct
> rockchip_pm_domain *pd, bool power_on)
> rockchip_pmu_save_qos(pd);
>
> /* if powering down, idle request to NIU first */
> - rockchip_pmu_set_idle_request(pd, true);
> + ret = rockchip_pmu_set_idle_request(pd, true);
> + if (ret < 0)
> + return ret;
> }
>
> rockchip_do_pmu_set_power_domain(pd, power_on);
>
> if (power_on) {
> /* if powering up, leave idle mode */
> - rockchip_pmu_set_idle_request(pd, false);
> + ret = rockchip_pmu_set_idle_request(pd, false);
> + if (ret < 0)
> + return ret;
>
> rockchip_pmu_restore_qos(pd);
> }
On Tue, Dec 10, 2024 at 3:18 AM Dragan Simic <dsimic@manjaro.org> wrote:
>
> Hello Peter,
>
> On 2024-12-10 02:30, Peter Geis wrote:
> > The calls rockchip_pd_power makes to rockchip_pmu_set_idle_request lack
> > any return error handling, causing device drivers to incorrectly
> > believe
> > the hardware idle requests succeed when they may have failed. This
> > leads
> > to software possibly accessing hardware that is powered off and the
> > subsequent SError panic that follows.
> >
> > Add error checking and return errors to the calling function to prevent
> > such crashes.
> >
> > gst-launch-1.0 videotestsrc num-buffers=2000 ! v4l2jpegenc ! fakesink
> > Setting pipeline to PAUSED ...er-x64
> > Pipeline is PREROLLING ...
> > Redistribute latency...
> > rockchip-pm-domain ff100000.syscon:power-controller: failed to get ack
> > on
> > domain 'hevc', val=0x98260, idle = 0
> > SError Interrupt on CPU2, code 0x00000000bf000002 -- SError
> > CPU: 2 UID: 0 PID: 804 Comm: videotestsrc0:s Not tainted 6.12.0-rc5+
> > #54
> > Hardware name: Firefly roc-rk3328-cc (DT)
> > pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : rockchip_vpu2_jpeg_enc_run+0x168/0xbc8
> > lr : device_run+0xb0/0x128
> > sp : ffff800082143a20
> > x29: ffff800082143a20 x28: 0000000000000140 x27: 0000000000000000
> > x26: ffff582c47a313e8 x25: ffff582c53e95000 x24: ffff582c53e92800
> > x23: ffff582c5bbe0000 x22: 0000000000000000 x21: ffff582c47a31080
> > x20: ffffa0d78cfa4168 x19: ffffa0d78cfa4168 x18: ffffb755b0519000
> > x17: 000000040044ffff x16: 00500072b5503510 x15: a7a6a5a4a3a29a99
> > x14: 989796959493928a x13: 0000000051eb851f x12: 00000000000000ff
> > x11: ffffa0d78d812880 x10: ffffa0d78d7fbca0 x9 : 000000000000003f
> > x8 : 0000000000000063 x7 : 000000000000003f x6 : 0000000000000040
> > x5 : ffff80008010d000 x4 : ffffa0d78cfa4168 x3 : ffffa0d78cfbfdd8
> > x2 : ffff80008010d0f4 x1 : 0000000000000020 x0 : 0000000000000140
> > Kernel panic - not syncing: Asynchronous SError Interrupt
> > CPU: 2 UID: 0 PID: 804 Comm: videotestsrc0:s Not tainted 6.12.0-rc5+
> > #54
> > Hardware name: Firefly roc-rk3328-cc (DT)
> > Call trace:
> > dump_backtrace+0xa0/0x128
> > show_stack+0x20/0x38
> > dump_stack_lvl+0xc8/0xf8
> > dump_stack+0x18/0x28
> > panic+0x3ec/0x428
> > nmi_panic+0x48/0xa0
> > arm64_serror_panic+0x6c/0x88
> > do_serror+0x30/0x70
> > el1h_64_error_handler+0x38/0x60
> > el1h_64_error+0x7c/0x80
> > rockchip_vpu2_jpeg_enc_run+0x168/0xbc8
> > device_run+0xb0/0x128
> > v4l2_m2m_try_run+0xac/0x230
> > v4l2_m2m_ioctl_streamon+0x70/0x90
> > v4l_streamon+0x2c/0x40
> > __video_do_ioctl+0x194/0x400
> > video_usercopy+0x10c/0x808
> > video_ioctl2+0x20/0x80
> > v4l2_ioctl+0x48/0x70
> > __arm64_sys_ioctl+0xb0/0x100
> > invoke_syscall+0x50/0x120
> > el0_svc_common.constprop.0+0x48/0xf0
> > do_el0_svc+0x24/0x38
> > el0_svc+0x38/0x100
> > el0t_64_sync_handler+0xc0/0xc8
> > el0t_64_sync+0x1a8/0x1b0
> > SMP: stopping secondary CPUs
> > Kernel Offset: 0x20d70c000000 from 0xffff800080000000
> > PHYS_OFFSET: 0xffffa7d3c0000000
> > CPU features: 0x00,00000090,00200000,0200421b
> > Memory Limit: none
> > ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
> >
> > Fixes: 7c696693a4f5 ("soc: rockchip: power-domain: Add power domain
> > driver")
> > Signed-off-by: Peter Geis <pgwipeout@gmail.com>
>
> This patch is obviously correct, because not checking what
> rockchip_pmu_set_idle_request() returns was simply wrong.
> Thanks for the patch!
>
> Though, shouldn't we improve further the way proper error
> handling is performed in rockchip_do_pmu_set_power_domain(),
> by "rolling back" what rockchip_do_pmu_set_power_domain()
> did after powering up fails?
Eventually, but the reality is if we hit this path the hardware is
already broken. I wanted to provide a fix with the least amount of
risk of breaking things further. I'm open to suggestions for further
improvements here.
Current behavior with this patch with the example above causes
gstreamer to wait indefinitely. I'll trace the return path and see if
returning an error other than -ETIMEDOUT triggers more robust
handling.
>
> > ---
> >
> > drivers/pmdomain/rockchip/pm-domains.c | 8 ++++++--
> > 1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pmdomain/rockchip/pm-domains.c
> > b/drivers/pmdomain/rockchip/pm-domains.c
> > index cb0f93800138..57e8fa25d2bd 100644
> > --- a/drivers/pmdomain/rockchip/pm-domains.c
> > +++ b/drivers/pmdomain/rockchip/pm-domains.c
> > @@ -590,14 +590,18 @@ static int rockchip_pd_power(struct
> > rockchip_pm_domain *pd, bool power_on)
> > rockchip_pmu_save_qos(pd);
> >
> > /* if powering down, idle request to NIU first */
> > - rockchip_pmu_set_idle_request(pd, true);
> > + ret = rockchip_pmu_set_idle_request(pd, true);
> > + if (ret < 0)
> > + return ret;
> > }
> >
> > rockchip_do_pmu_set_power_domain(pd, power_on);
> >
> > if (power_on) {
> > /* if powering up, leave idle mode */
> > - rockchip_pmu_set_idle_request(pd, false);
> > + ret = rockchip_pmu_set_idle_request(pd, false);
> > + if (ret < 0)
> > + return ret;
> >
> > rockchip_pmu_restore_qos(pd);
> > }
Hello Peter,
On 2024-12-10 21:12, Peter Geis wrote:
> On Tue, Dec 10, 2024 at 3:18 AM Dragan Simic <dsimic@manjaro.org>
> wrote:
>> On 2024-12-10 02:30, Peter Geis wrote:
>> > The calls rockchip_pd_power makes to rockchip_pmu_set_idle_request lack
>> > any return error handling, causing device drivers to incorrectly
>> > believe
>> > the hardware idle requests succeed when they may have failed. This
>> > leads
>> > to software possibly accessing hardware that is powered off and the
>> > subsequent SError panic that follows.
>> >
>> > Add error checking and return errors to the calling function to prevent
>> > such crashes.
>> >
>> > gst-launch-1.0 videotestsrc num-buffers=2000 ! v4l2jpegenc ! fakesink
>> > Setting pipeline to PAUSED ...er-x64
>> > Pipeline is PREROLLING ...
>> > Redistribute latency...
>> > rockchip-pm-domain ff100000.syscon:power-controller: failed to get ack
>> > on
>> > domain 'hevc', val=0x98260, idle = 0
>> > SError Interrupt on CPU2, code 0x00000000bf000002 -- SError
>> > CPU: 2 UID: 0 PID: 804 Comm: videotestsrc0:s Not tainted 6.12.0-rc5+
>> > #54
>> > Hardware name: Firefly roc-rk3328-cc (DT)
>> > pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> > pc : rockchip_vpu2_jpeg_enc_run+0x168/0xbc8
>> > lr : device_run+0xb0/0x128
>> > sp : ffff800082143a20
>> > x29: ffff800082143a20 x28: 0000000000000140 x27: 0000000000000000
>> > x26: ffff582c47a313e8 x25: ffff582c53e95000 x24: ffff582c53e92800
>> > x23: ffff582c5bbe0000 x22: 0000000000000000 x21: ffff582c47a31080
>> > x20: ffffa0d78cfa4168 x19: ffffa0d78cfa4168 x18: ffffb755b0519000
>> > x17: 000000040044ffff x16: 00500072b5503510 x15: a7a6a5a4a3a29a99
>> > x14: 989796959493928a x13: 0000000051eb851f x12: 00000000000000ff
>> > x11: ffffa0d78d812880 x10: ffffa0d78d7fbca0 x9 : 000000000000003f
>> > x8 : 0000000000000063 x7 : 000000000000003f x6 : 0000000000000040
>> > x5 : ffff80008010d000 x4 : ffffa0d78cfa4168 x3 : ffffa0d78cfbfdd8
>> > x2 : ffff80008010d0f4 x1 : 0000000000000020 x0 : 0000000000000140
>> > Kernel panic - not syncing: Asynchronous SError Interrupt
>> > CPU: 2 UID: 0 PID: 804 Comm: videotestsrc0:s Not tainted 6.12.0-rc5+
>> > #54
>> > Hardware name: Firefly roc-rk3328-cc (DT)
>> > Call trace:
>> > dump_backtrace+0xa0/0x128
>> > show_stack+0x20/0x38
>> > dump_stack_lvl+0xc8/0xf8
>> > dump_stack+0x18/0x28
>> > panic+0x3ec/0x428
>> > nmi_panic+0x48/0xa0
>> > arm64_serror_panic+0x6c/0x88
>> > do_serror+0x30/0x70
>> > el1h_64_error_handler+0x38/0x60
>> > el1h_64_error+0x7c/0x80
>> > rockchip_vpu2_jpeg_enc_run+0x168/0xbc8
>> > device_run+0xb0/0x128
>> > v4l2_m2m_try_run+0xac/0x230
>> > v4l2_m2m_ioctl_streamon+0x70/0x90
>> > v4l_streamon+0x2c/0x40
>> > __video_do_ioctl+0x194/0x400
>> > video_usercopy+0x10c/0x808
>> > video_ioctl2+0x20/0x80
>> > v4l2_ioctl+0x48/0x70
>> > __arm64_sys_ioctl+0xb0/0x100
>> > invoke_syscall+0x50/0x120
>> > el0_svc_common.constprop.0+0x48/0xf0
>> > do_el0_svc+0x24/0x38
>> > el0_svc+0x38/0x100
>> > el0t_64_sync_handler+0xc0/0xc8
>> > el0t_64_sync+0x1a8/0x1b0
>> > SMP: stopping secondary CPUs
>> > Kernel Offset: 0x20d70c000000 from 0xffff800080000000
>> > PHYS_OFFSET: 0xffffa7d3c0000000
>> > CPU features: 0x00,00000090,00200000,0200421b
>> > Memory Limit: none
>> > ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
>> >
>> > Fixes: 7c696693a4f5 ("soc: rockchip: power-domain: Add power domain
>> > driver")
>> > Signed-off-by: Peter Geis <pgwipeout@gmail.com>
>>
>> This patch is obviously correct, because not checking what
>> rockchip_pmu_set_idle_request() returns was simply wrong.
>> Thanks for the patch!
>>
>> Though, shouldn't we improve further the way proper error
>> handling is performed in rockchip_do_pmu_set_power_domain(),
>> by "rolling back" what rockchip_do_pmu_set_power_domain()
>> did after powering up fails?
>
> Eventually, but the reality is if we hit this path the hardware is
> already broken. I wanted to provide a fix with the least amount of
> risk of breaking things further. I'm open to suggestions for further
> improvements here.
Perhaps a good approach, at the moment, would be to add no more
code for the "rollback", but to add a TODO note about the need for
that addition. That way we'd keep the amount of changes at the
minimum, to hopefully not cause any regressions, while leaving
a note for the second round of improvements in the future.
> Current behavior with this patch with the example above causes
> gstreamer to wait indefinitely. I'll trace the return path and see if
> returning an error other than -ETIMEDOUT triggers more robust
> handling.
Sounds like a plan to me. Thanks for working on this!
>>
>> > ---
>> >
>> > drivers/pmdomain/rockchip/pm-domains.c | 8 ++++++--
>> > 1 file changed, 6 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/pmdomain/rockchip/pm-domains.c
>> > b/drivers/pmdomain/rockchip/pm-domains.c
>> > index cb0f93800138..57e8fa25d2bd 100644
>> > --- a/drivers/pmdomain/rockchip/pm-domains.c
>> > +++ b/drivers/pmdomain/rockchip/pm-domains.c
>> > @@ -590,14 +590,18 @@ static int rockchip_pd_power(struct
>> > rockchip_pm_domain *pd, bool power_on)
>> > rockchip_pmu_save_qos(pd);
>> >
>> > /* if powering down, idle request to NIU first */
>> > - rockchip_pmu_set_idle_request(pd, true);
>> > + ret = rockchip_pmu_set_idle_request(pd, true);
>> > + if (ret < 0)
>> > + return ret;
>> > }
>> >
>> > rockchip_do_pmu_set_power_domain(pd, power_on);
>> >
>> > if (power_on) {
>> > /* if powering up, leave idle mode */
>> > - rockchip_pmu_set_idle_request(pd, false);
>> > + ret = rockchip_pmu_set_idle_request(pd, false);
>> > + if (ret < 0)
>> > + return ret;
>> >
>> > rockchip_pmu_restore_qos(pd);
>> > }
© 2016 - 2025 Red Hat, Inc.