drivers/irqchip/irq-sifive-plic.c | 296 ++++++++++++------------------ 1 file changed, 117 insertions(+), 179 deletions(-)
Hi Anup,
As described in the thread below[1] I haven't been able to boot my
boards based on the Allwinner D1 SoC since 6.9 where you converted the
SiFive PLIC driver to a platform driver.
This is clearly a regression and there haven't really been much progress
on fixing the issue since then, so here is the revert that fixes it.
If no other fix is found before 6.11 I suggest we apply this.
[1]: https://lore.kernel.org/linux-riscv/CAJM55Z9hGKo4784N3s3DhWw=nMRKZKcmvZ58x7uVBghExhoc9A@mail.gmail.com/
/Emil
Emil Renner Berthing (9):
Revert "irqchip/sifive-plic: Chain to parent IRQ after handlers are
ready"
Revert "irqchip/sifive-plic: Avoid explicit cpumask allocation on
stack"
Revert "irqchip/sifive-plic: Improve locking safety by using
irqsave/irqrestore"
Revert "irqchip/sifive-plic: Parse number of interrupts and contexts
early in plic_probe()"
Revert "irqchip/sifive-plic: Cleanup PLIC contexts upon irqdomain
creation failure"
Revert "irqchip/sifive-plic: Use riscv_get_intc_hwnode() to get parent
fwnode"
Revert "irqchip/sifive-plic: Use devm_xyz() for managed allocation"
Revert "irqchip/sifive-plic: Use dev_xyz() in-place of pr_xyz()"
Revert "irqchip/sifive-plic: Convert PLIC driver into a platform
driver"
drivers/irqchip/irq-sifive-plic.c | 296 ++++++++++++------------------
1 file changed, 117 insertions(+), 179 deletions(-)
--
2.43.0
On Wed, Aug 14 2024 at 16:56, Emil Renner Berthing wrote:
> As described in the thread below[1] I haven't been able to boot my
> boards based on the Allwinner D1 SoC since 6.9 where you converted the
> SiFive PLIC driver to a platform driver.
>
> This is clearly a regression and there haven't really been much progress
> on fixing the issue since then, so here is the revert that fixes it.
>
> If no other fix is found before 6.11 I suggest we apply this.
So this mess has been ignored for two month now?
From the pastebin in the initial report:
[ 0.000000] irq: no irq domain found for interrupt-controller@10000000 !
[ 0.000000] Failed to map interrupt for /soc/timer@2050000
[ 0.000000] Failed to initialize '/soc/timer@2050000': -22
This comes back with -EINVAL. So the timer cannot find an interrupt,
which makes it pretty obvious why the system stops to boot, unless there
is some other timer available.
This is obviously related to the SUN4I_TIMER because that message went
away when it was disabled according to the next pastebin.
Obviously that can't work because the SUN4I timer driver is using
timer_of_init() which cannot handle deferred probing.
Daniel: There was a partial fix for the sun4i driver, which you said you
applied:
https://lore.kernel.org/all/20240312192519.1602493-1-samuel.holland@sifive.com
But that thing never materialized in a pull request.
And of course everyone involved ignored the problem since March 13th
2024, i.e. almost half a year.
Seriously?
Can you RISCV folks get your act together and ensure to fix things you
broke on the way? Especially when Emil reported this nobody pointed him
to this patch and nobody noticed that it's still not merged?
It took me less than 15 minutes to find that patch and the correlation,
but this is absolutely not my job.
I'm seriously grumpy about that. This is not how it works. If you break
stuff, then you take care to fix it before you shove more changes into
the tree and waste my time.
I'm very much inclined to take the reverts right now, send them to Linus
for -rc5 tagged with cc: stable and ignore/nak any irqchip related riscv
patches until the next merge window is over.
Emil, can you give that sun4i fix a test ride please?
Thanks,
tglx
On Wed, 14 Aug 2024 10:30:48 PDT (-0700), tglx@linutronix.de wrote: > On Wed, Aug 14 2024 at 16:56, Emil Renner Berthing wrote: >> As described in the thread below[1] I haven't been able to boot my >> boards based on the Allwinner D1 SoC since 6.9 where you converted the >> SiFive PLIC driver to a platform driver. >> >> This is clearly a regression and there haven't really been much progress >> on fixing the issue since then, so here is the revert that fixes it. >> >> If no other fix is found before 6.11 I suggest we apply this. > > So this mess has been ignored for two month now? > >>From the pastebin in the initial report: > > [ 0.000000] irq: no irq domain found for interrupt-controller@10000000 ! > [ 0.000000] Failed to map interrupt for /soc/timer@2050000 > [ 0.000000] Failed to initialize '/soc/timer@2050000': -22 > > This comes back with -EINVAL. So the timer cannot find an interrupt, > which makes it pretty obvious why the system stops to boot, unless there > is some other timer available. > > This is obviously related to the SUN4I_TIMER because that message went > away when it was disabled according to the next pastebin. > > Obviously that can't work because the SUN4I timer driver is using > timer_of_init() which cannot handle deferred probing. > > Daniel: There was a partial fix for the sun4i driver, which you said you > applied: > > https://lore.kernel.org/all/20240312192519.1602493-1-samuel.holland@sifive.com > > But that thing never materialized in a pull request. > > And of course everyone involved ignored the problem since March 13th > 2024, i.e. almost half a year. > > Seriously? > > Can you RISCV folks get your act together and ensure to fix things you > broke on the way? Especially when Emil reported this nobody pointed him > to this patch and nobody noticed that it's still not merged? > > It took me less than 15 minutes to find that patch and the correlation, > but this is absolutely not my job. Sorry, I guess I'd just sort of been ignoring the platform-specific side of things because it's so frustrating to deal with, but that's led to a bunch of breakages so it's obviously the wrong thing to do. > I'm seriously grumpy about that. This is not how it works. If you break > stuff, then you take care to fix it before you shove more changes into > the tree and waste my time. > > I'm very much inclined to take the reverts right now, send them to Linus > for -rc5 tagged with cc: stable and ignore/nak any irqchip related riscv > patches until the next merge window is over. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> if you want to take the revert. IIUC the patch above doesn't actually fix it, that's what led to just sending the reverts -- at least reverts are better than breaking users. I'll post over there too... And it's no big deal if we're in the doghouse for a bit. Regressions should get fixed faster than this, so we deserve it. Probably also another sign we're way too focused on getting new features merged, as that's coming at the expense of making existing platforms work. IMO we've been way too focused on getting support for specs that don't even have implementations, and not enough on building real working systems. > Emil, can you give that sun4i fix a test ride please? > > Thanks, > > tglx
在 2024-08-15星期四的 10:51 -0700,Palmer Dabbelt写道: > On Wed, 14 Aug 2024 10:30:48 PDT (-0700), tglx@linutronix.de wrote: > > On Wed, Aug 14 2024 at 16:56, Emil Renner Berthing wrote: > > > As described in the thread below[1] I haven't been able to boot > > > my > > > boards based on the Allwinner D1 SoC since 6.9 where you > > > converted the > > > SiFive PLIC driver to a platform driver. > > > > > > This is clearly a regression and there haven't really been much > > > progress > > > on fixing the issue since then, so here is the revert that fixes > > > it. > > > > > > If no other fix is found before 6.11 I suggest we apply this. > > > > So this mess has been ignored for two month now? > > > > > From the pastebin in the initial report: > > > > [ 0.000000] irq: no irq domain found for > > interrupt-controller@10000000 ! > > [ 0.000000] Failed to map interrupt for /soc/timer@2050000 > > [ 0.000000] Failed to initialize '/soc/timer@2050000': -22 > > > > This comes back with -EINVAL. So the timer cannot find an > > interrupt, > > which makes it pretty obvious why the system stops to boot, unless > > there > > is some other timer available. > > > > This is obviously related to the SUN4I_TIMER because that message > > went > > away when it was disabled according to the next pastebin. > > > > Obviously that can't work because the SUN4I timer driver is using > > timer_of_init() which cannot handle deferred probing. > > > > Daniel: There was a partial fix for the sun4i driver, which you > > said you > > applied: > > > > > > https://lore.kernel.org/all/20240312192519.1602493-1-samuel.holland@sifive.com > > > > But that thing never materialized in a pull request. > > > > And of course everyone involved ignored the problem since March > > 13th > > 2024, i.e. almost half a year. > > > > Seriously? > > > > Can you RISCV folks get your act together and ensure to fix things > > you > > broke on the way? Especially when Emil reported this nobody pointed > > him > > to this patch and nobody noticed that it's still not merged? > > > > It took me less than 15 minutes to find that patch and the > > correlation, > > but this is absolutely not my job. > > Sorry, I guess I'd just sort of been ignoring the platform-specific > side > of things because it's so frustrating to deal with, but that's led to > a > bunch of breakages so it's obviously the wrong thing to do. > > > I'm seriously grumpy about that. This is not how it works. If you > > break > > stuff, then you take care to fix it before you shove more changes > > into > > the tree and waste my time. > > > > I'm very much inclined to take the reverts right now, send them to > > Linus > > for -rc5 tagged with cc: stable and ignore/nak any irqchip related > > riscv > > patches until the next merge window is over. > > Acked-by: Palmer Dabbelt <palmer@rivosinc.com> > > if you want to take the revert. > > IIUC the patch above doesn't actually fix it, that's what led to just > sending the reverts -- at least reverts are better than breaking > users. > I'll post over there too... > > And it's no big deal if we're in the doghouse for a bit. Regressions > should get fixed faster than this, so we deserve it. > > Probably also another sign we're way too focused on getting new > features > merged, as that's coming at the expense of making existing platforms > work. IMO we've been way too focused on getting support for specs > that > don't even have implementations, and not enough on building real > working > systems. Well I think all existing platforms are more or less weird (in specification-compatibility, stability, etc). (Maybe FU540 isn't so weird, but it has too few peripherals to be really useful, and it's discontinued; FU740 has some stability issues.) > > > Emil, can you give that sun4i fix a test ride please? > > > > Thanks, > > > > tglx > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Thu, Aug 15 2024 at 10:51, Palmer Dabbelt wrote:
> On Wed, 14 Aug 2024 10:30:48 PDT (-0700), tglx@linutronix.de wrote:
>> I'm very much inclined to take the reverts right now, send them to Linus
>> for -rc5 tagged with cc: stable and ignore/nak any irqchip related riscv
>> patches until the next merge window is over.
>
> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
>
> if you want to take the revert.
I'm happy to wait a week and see whether someone gets that CLINT hack
working or as I suggested the D1 PLIC early probe quirk.
> IIUC the patch above doesn't actually fix it, that's what led to just
> sending the reverts -- at least reverts are better than breaking users.
> I'll post over there too...
Right. We figured that out by now :)
> And it's no big deal if we're in the doghouse for a bit. Regressions
> should get fixed faster than this, so we deserve it.
For a week I consider you probationers :)
> Probably also another sign we're way too focused on getting new features
> merged, as that's coming at the expense of making existing platforms
> work. IMO we've been way too focused on getting support for specs that
> don't even have implementations, and not enough on building real working
> systems.
RISCV is not alone with that. This whole industry is nuts about features
and forgets the stuff what matters.
Thanks,
tglx
On Thu, 15 Aug 2024 11:10:25 PDT (-0700), tglx@linutronix.de wrote: > On Thu, Aug 15 2024 at 10:51, Palmer Dabbelt wrote: >> On Wed, 14 Aug 2024 10:30:48 PDT (-0700), tglx@linutronix.de wrote: >>> I'm very much inclined to take the reverts right now, send them to Linus >>> for -rc5 tagged with cc: stable and ignore/nak any irqchip related riscv >>> patches until the next merge window is over. >> >> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> >> >> if you want to take the revert. > > I'm happy to wait a week and see whether someone gets that CLINT hack > working or as I suggested the D1 PLIC early probe quirk. > >> IIUC the patch above doesn't actually fix it, that's what led to just >> sending the reverts -- at least reverts are better than breaking users. >> I'll post over there too... > > Right. We figured that out by now :) > >> And it's no big deal if we're in the doghouse for a bit. Regressions >> should get fixed faster than this, so we deserve it. > > For a week I consider you probationers :) Works for me ;) >> Probably also another sign we're way too focused on getting new features >> merged, as that's coming at the expense of making existing platforms >> work. IMO we've been way too focused on getting support for specs that >> don't even have implementations, and not enough on building real working >> systems. > > RISCV is not alone with that. This whole industry is nuts about features > and forgets the stuff what matters. > > Thanks, > > tglx
Thomas Gleixner wrote: > On Wed, Aug 14 2024 at 16:56, Emil Renner Berthing wrote: > > As described in the thread below[1] I haven't been able to boot my > > boards based on the Allwinner D1 SoC since 6.9 where you converted the > > SiFive PLIC driver to a platform driver. > > > > This is clearly a regression and there haven't really been much progress > > on fixing the issue since then, so here is the revert that fixes it. > > > > If no other fix is found before 6.11 I suggest we apply this. > > So this mess has been ignored for two month now? > > From the pastebin in the initial report: > > [ 0.000000] irq: no irq domain found for interrupt-controller@10000000 ! > [ 0.000000] Failed to map interrupt for /soc/timer@2050000 > [ 0.000000] Failed to initialize '/soc/timer@2050000': -22 > > This comes back with -EINVAL. So the timer cannot find an interrupt, > which makes it pretty obvious why the system stops to boot, unless there > is some other timer available. > > This is obviously related to the SUN4I_TIMER because that message went > away when it was disabled according to the next pastebin. > > Obviously that can't work because the SUN4I timer driver is using > timer_of_init() which cannot handle deferred probing. > > Daniel: There was a partial fix for the sun4i driver, which you said you > applied: > > https://lore.kernel.org/all/20240312192519.1602493-1-samuel.holland@sifive.com > > But that thing never materialized in a pull request. > > And of course everyone involved ignored the problem since March 13th > 2024, i.e. almost half a year. > > Seriously? > > Can you RISCV folks get your act together and ensure to fix things you > broke on the way? Especially when Emil reported this nobody pointed him > to this patch and nobody noticed that it's still not merged? > > It took me less than 15 minutes to find that patch and the correlation, > but this is absolutely not my job. > > I'm seriously grumpy about that. This is not how it works. If you break > stuff, then you take care to fix it before you shove more changes into > the tree and waste my time. > > I'm very much inclined to take the reverts right now, send them to Linus > for -rc5 tagged with cc: stable and ignore/nak any irqchip related riscv > patches until the next merge window is over. > > Emil, can you give that sun4i fix a test ride please? Hi Thomas, Thanks for looking at this! Unfortunately the above patch isn't enough to fix the issue: https://termbin.com/7sgc It still hangs after the "[ 0.176451] cpuidle: using governor teo" message until the watchdog reboots the systems. /Emil
On Thu, Aug 15 2024 at 03:29, Emil Renner Berthing wrote:
> Thomas Gleixner wrote:
> Thanks for looking at this! Unfortunately the above patch isn't enough to fix
> the issue:
>
> https://termbin.com/7sgc
>
> It still hangs after the "[ 0.176451] cpuidle: using governor teo" message
> until the watchdog reboots the systems.
So what's puzzling is that there is a timer installed early on:
[ 0.000000] clocksource: riscv_clocksource: ....
That same init function installs the per cpu riscv clockevent, so there
should be a timer available.
The deffered probing of the PLIC driver delays obviously the probing of
the sun4i timer, but that should not matter when another timer is
available. So the sun4i driver might be a red herring.
Can you please add "ignore_loglevel initcall_debug" to the command line
and provide the output of a booting and a failing kernel?
And on the booting kernel please provide the output from:
# cat /sys/devices/system/clockevents/clockevent0/current_device
# cat /sys/devices/system/clockevents/broadcast/current_device
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
Thanks,
tglx
Thomas Gleixner wrote: > On Thu, Aug 15 2024 at 03:29, Emil Renner Berthing wrote: > > Thomas Gleixner wrote: > > Thanks for looking at this! Unfortunately the above patch isn't enough to fix > > the issue: > > > > https://termbin.com/7sgc > > > > It still hangs after the "[ 0.176451] cpuidle: using governor teo" message > > until the watchdog reboots the systems. > > So what's puzzling is that there is a timer installed early on: > > [ 0.000000] clocksource: riscv_clocksource: .... > > That same init function installs the per cpu riscv clockevent, so there > should be a timer available. > > The deffered probing of the PLIC driver delays obviously the probing of > the sun4i timer, but that should not matter when another timer is > available. So the sun4i driver might be a red herring. > > Can you please add "ignore_loglevel initcall_debug" to the command line > and provide the output of a booting and a failing kernel? 6.11-rc3 + these reverts: https://termbin.com/q6wk 6.11-rc3 + Samuel's patch: https://termbin.com/7cgs > And on the booting kernel please provide the output from: > > # cat /sys/devices/system/clockevents/clockevent0/current_device > # cat /sys/devices/system/clockevents/broadcast/current_device > # cat /sys/devices/system/clocksource/clocksource0/current_clocksource On both a 6.8.6 kernel and 6.11-rc3 + reverts I get: # cat /sys/devices/system/clockevents/clockevent0/current_device sun4i_tick # cat /sys/devices/system/clockevents/broadcast/current_device riscv_timer_clockevent # cat /sys/devices/system/clocksource/clocksource0/current_clocksource riscv_clocksource /Emil
Emil Renner Berthing wrote: > Thomas Gleixner wrote: > > On Thu, Aug 15 2024 at 03:29, Emil Renner Berthing wrote: > > > Thomas Gleixner wrote: > > > Thanks for looking at this! Unfortunately the above patch isn't enough to fix > > > the issue: > > > > > > https://termbin.com/7sgc > > > > > > It still hangs after the "[ 0.176451] cpuidle: using governor teo" message > > > until the watchdog reboots the systems. > > > > So what's puzzling is that there is a timer installed early on: > > > > [ 0.000000] clocksource: riscv_clocksource: .... > > > > That same init function installs the per cpu riscv clockevent, so there > > should be a timer available. > > > > The deffered probing of the PLIC driver delays obviously the probing of > > the sun4i timer, but that should not matter when another timer is > > available. So the sun4i driver might be a red herring. > > > > Can you please add "ignore_loglevel initcall_debug" to the command line > > and provide the output of a booting and a failing kernel? > > 6.11-rc3 + these reverts: https://termbin.com/q6wk > 6.11-rc3 + Samuel's patch: https://termbin.com/7cgs I think this confirms what Charlie found here: https://lore.kernel.org/linux-riscv/ZoydV7vad5JWIcZb@ghost/ > > > And on the booting kernel please provide the output from: > > > > # cat /sys/devices/system/clockevents/clockevent0/current_device > > # cat /sys/devices/system/clockevents/broadcast/current_device > > # cat /sys/devices/system/clocksource/clocksource0/current_clocksource > > On both a 6.8.6 kernel and 6.11-rc3 + reverts I get: > > # cat /sys/devices/system/clockevents/clockevent0/current_device > sun4i_tick > # cat /sys/devices/system/clockevents/broadcast/current_device > riscv_timer_clockevent > # cat /sys/devices/system/clocksource/clocksource0/current_clocksource > riscv_clocksource > > /Emil > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Thu, Aug 15 2024 at 05:14, Emil Renner Berthing wrote:
> Emil Renner Berthing wrote:
>> 6.11-rc3 + these reverts: https://termbin.com/q6wk
>> 6.11-rc3 + Samuel's patch: https://termbin.com/7cgs
>
> I think this confirms what Charlie found here:
> https://lore.kernel.org/linux-riscv/ZoydV7vad5JWIcZb@ghost/
Yes. So the riscv timer is not working on this thing or it stops
somehow.
Can you apply the debug patch below and check whether you see the
'J: ....' output at all and if so whether it stops at some point.
Thanks,
tglx
---
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -2459,6 +2459,9 @@ static void run_local_timers(void)
{
struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
+ if (!(jiffies & 0xFF))
+ pr_info("J: %lx\n", jiffies);
+
hrtimer_run_queues();
for (int i = 0; i < NR_BASES; i++, base++) {
Thomas Gleixner wrote: > On Thu, Aug 15 2024 at 05:14, Emil Renner Berthing wrote: > > Emil Renner Berthing wrote: > >> 6.11-rc3 + these reverts: https://termbin.com/q6wk > >> 6.11-rc3 + Samuel's patch: https://termbin.com/7cgs > > > > I think this confirms what Charlie found here: > > https://lore.kernel.org/linux-riscv/ZoydV7vad5JWIcZb@ghost/ > > Yes. So the riscv timer is not working on this thing or it stops > somehow. > > Can you apply the debug patch below and check whether you see the > 'J: ....' output at all and if so whether it stops at some point. No, I don't see the J: ... debug output anywhere: https://termbin.com/3vez /Emil
Hi Thomas, Emil,
On 2024-08-15 8:16 AM, Thomas Gleixner wrote:
> On Thu, Aug 15 2024 at 05:14, Emil Renner Berthing wrote:
>> Emil Renner Berthing wrote:
>>> 6.11-rc3 + these reverts: https://us01.z.antigena.com/l/Er4kZWDmvL5-bLzHHJoZv0k71iwW2jCD5qNpiz0x0XdYY6oORF_nXh7U7jw6oubhi~32HI4i71jUW9v8~NvSvPeUWrdYx3WJBr2GPDUjOu6LYPCOBfR2dVQuMWvlNj4tDjXFp3QEQAmeawZflD4JrIJjtSYIbKfe6v-tgH7SEuHMeSSriU633Lv
>>> 6.11-rc3 + Samuel's patch: https://us01.z.antigena.com/l/EULtAYky6ZvgqZ49KGS-WBsYTg~Ht1NoQtEYmUVb56ymS9jDagqYHLK90WDjnVt69GfB4IX5NSRQXmSfkNsTzB8lJmFvDihHQmGrsCv9FzlorD9yGfXDlQ6rG6vmn5BNDwlipmssGaOGfh9yko8n9ArWR4TLhEf~f9ODqme~NXXwA9DLLc9p
>>
>> I think this confirms what Charlie found here:
>> https://lore.kernel.org/linux-riscv/ZoydV7vad5JWIcZb@ghost/
>
> Yes. So the riscv timer is not working on this thing or it stops
> somehow.
That's correct. With the (firmware) devicetree that Emil is using, the OpenSBI
firmware does not have a timer device, so it does not expose the (optional[1])
SBI time extension, and sbi_set_timer() does nothing.
I wrote a patch (not submitted) to skip registering riscv_clock_event when the
SBI time extension is unavailable, but this doesn't fully solve the issue
either, because then we have no clockevent at all when
check_unaligned_access_all_cpus() is called.
How early in the boot process are we "required" to have a functional clockevent?
Do we need to refactor check_unaligned_access_all_cpus() so it works on systems
where the only clockevent is provided by a platform device?
Regards,
Samuel
[1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/intro.adoc
> Can you apply the debug patch below and check whether you see the
> 'J: ....' output at all and if so whether it stops at some point.
>
> Thanks,
>
> tglx
>
> ---
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -2459,6 +2459,9 @@ static void run_local_timers(void)
> {
> struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
>
> + if (!(jiffies & 0xFF))
> + pr_info("J: %lx\n", jiffies);
> +
> hrtimer_run_queues();
>
> for (int i = 0; i < NR_BASES; i++, base++) {
>
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
On Thu, Aug 15, 2024 at 7:02 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Hi Thomas, Emil,
>
> On 2024-08-15 8:16 AM, Thomas Gleixner wrote:
> > On Thu, Aug 15 2024 at 05:14, Emil Renner Berthing wrote:
> >> Emil Renner Berthing wrote:
> >>> 6.11-rc3 + these reverts: https://us01.z.antigena.com/l/Er4kZWDmvL5-bLzHHJoZv0k71iwW2jCD5qNpiz0x0XdYY6oORF_nXh7U7jw6oubhi~32HI4i71jUW9v8~NvSvPeUWrdYx3WJBr2GPDUjOu6LYPCOBfR2dVQuMWvlNj4tDjXFp3QEQAmeawZflD4JrIJjtSYIbKfe6v-tgH7SEuHMeSSriU633Lv
> >>> 6.11-rc3 + Samuel's patch: https://us01.z.antigena.com/l/EULtAYky6ZvgqZ49KGS-WBsYTg~Ht1NoQtEYmUVb56ymS9jDagqYHLK90WDjnVt69GfB4IX5NSRQXmSfkNsTzB8lJmFvDihHQmGrsCv9FzlorD9yGfXDlQ6rG6vmn5BNDwlipmssGaOGfh9yko8n9ArWR4TLhEf~f9ODqme~NXXwA9DLLc9p
> >>
> >> I think this confirms what Charlie found here:
> >> https://lore.kernel.org/linux-riscv/ZoydV7vad5JWIcZb@ghost/
> >
> > Yes. So the riscv timer is not working on this thing or it stops
> > somehow.
>
> That's correct. With the (firmware) devicetree that Emil is using, the OpenSBI
> firmware does not have a timer device, so it does not expose the (optional[1])
> SBI time extension, and sbi_set_timer() does nothing.
OpenSBI uses platform specific M-mode timer (mtime and mtimecmp) to
provide SBI time extension to Linux.
The RISC-V privileged specification (v1.10 or higher) requires platform to
provide a M-mode timer (mtime and mtimecmp).
This platform not having any M-mode timer is yet another RISC-V spec
violation by this platform.
Regards,
Anup
>
> I wrote a patch (not submitted) to skip registering riscv_clock_event when the
> SBI time extension is unavailable, but this doesn't fully solve the issue
> either, because then we have no clockevent at all when
> check_unaligned_access_all_cpus() is called.
>
> How early in the boot process are we "required" to have a functional clockevent?
> Do we need to refactor check_unaligned_access_all_cpus() so it works on systems
> where the only clockevent is provided by a platform device?
>
> Regards,
> Samuel
>
> [1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/intro.adoc
>
> > Can you apply the debug patch below and check whether you see the
> > 'J: ....' output at all and if so whether it stops at some point.
> >
> > Thanks,
> >
> > tglx
> >
> > ---
> > --- a/kernel/time/timer.c
> > +++ b/kernel/time/timer.c
> > @@ -2459,6 +2459,9 @@ static void run_local_timers(void)
> > {
> > struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
> >
> > + if (!(jiffies & 0xFF))
> > + pr_info("J: %lx\n", jiffies);
> > +
> > hrtimer_run_queues();
> >
> > for (int i = 0; i < NR_BASES; i++, base++) {
> >
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
>
在 2024-08-15星期四的 20:00 +0530,Anup Patel写道:
> On Thu, Aug 15, 2024 at 7:02 PM Samuel Holland
> <samuel.holland@sifive.com> wrote:
> >
> > Hi Thomas, Emil,
> >
> > On 2024-08-15 8:16 AM, Thomas Gleixner wrote:
> > > On Thu, Aug 15 2024 at 05:14, Emil Renner Berthing wrote:
> > > > Emil Renner Berthing wrote:
> > > > > 6.11-rc3 + these reverts:
> > > > > https://us01.z.antigena.com/l/Er4kZWDmvL5-bLzHHJoZv0k71iwW2jCD5qNpiz0x0XdYY6oORF_nXh7U7jw6oubhi~32HI4i71jUW9v8~NvSvPeUWrdYx3WJBr2GPDUjOu6LYPCOBfR2dVQuMWvlNj4tDjXFp3QEQAmeawZflD4JrIJjtSYIbKfe6v-tgH7SEuHMeSSriU633Lv
> > > > > 6.11-rc3 + Samuel's patch:
> > > > > https://us01.z.antigena.com/l/EULtAYky6ZvgqZ49KGS-WBsYTg~Ht1NoQtEYmUVb56ymS9jDagqYHLK90WDjnVt69GfB4IX5NSRQXmSfkNsTzB8lJmFvDihHQmGrsCv9FzlorD9yGfXDlQ6rG6vmn5BNDwlipmssGaOGfh9yko8n9ArWR4TLhEf~f9ODqme~NXXwA9DLLc9p
> > > >
> > > > I think this confirms what Charlie found here:
> > > > https://lore.kernel.org/linux-riscv/ZoydV7vad5JWIcZb@ghost/
> > >
> > > Yes. So the riscv timer is not working on this thing or it stops
> > > somehow.
> >
> > That's correct. With the (firmware) devicetree that Emil is using,
> > the OpenSBI
> > firmware does not have a timer device, so it does not expose the
> > (optional[1])
> > SBI time extension, and sbi_set_timer() does nothing.
>
> OpenSBI uses platform specific M-mode timer (mtime and mtimecmp) to
> provide SBI time extension to Linux.
>
> The RISC-V privileged specification (v1.10 or higher) requires
> platform to
> provide a M-mode timer (mtime and mtimecmp).
The T-Head cores' design is weird, and because of its source code is
available (as OpenC906), we can investigate it further in the RTL
level:
- From software perspective, it has no mtime mmap'ed register, but it
has mtimecmp, which compares with time CSR (a CSR R/O in all priv
levels).
- From SoC integration perspective, the value of the time CSR is
sourced from an external input, pad_cpu_sys_cnt[63:0] [1].
BTW I already added support for this kind of non-standard CLINT to
OpenSBI [2], however I don't think the current firmware DT of D1
utilizes it; and this is also a quite new SBI version I think.
[1]
https://github.com/XUANTIE-RV/openc906/blob/main/C906_RTL_FACTORY/gen_rtl/cpu/rtl/openC906.v#L84
[2]
https://github.com/riscv-software-src/opensbi/commit/b848d8763a737de44b64bfc036c8f51200226440
>
> This platform not having any M-mode timer is yet another RISC-V spec
> violation by this platform.
>
> Regards,
> Anup
>
> >
> > I wrote a patch (not submitted) to skip registering
> > riscv_clock_event when the
> > SBI time extension is unavailable, but this doesn't fully solve the
> > issue
> > either, because then we have no clockevent at all when
> > check_unaligned_access_all_cpus() is called.
> >
> > How early in the boot process are we "required" to have a
> > functional clockevent?
> > Do we need to refactor check_unaligned_access_all_cpus() so it
> > works on systems
> > where the only clockevent is provided by a platform device?
> >
> > Regards,
> > Samuel
> >
> > [1]
> > https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/intro.adoc
> >
> > > Can you apply the debug patch below and check whether you see the
> > > 'J: ....' output at all and if so whether it stops at some point.
> > >
> > > Thanks,
> > >
> > > tglx
> > >
> > > ---
> > > --- a/kernel/time/timer.c
> > > +++ b/kernel/time/timer.c
> > > @@ -2459,6 +2459,9 @@ static void run_local_timers(void)
> > > {
> > > struct timer_base *base =
> > > this_cpu_ptr(&timer_bases[BASE_LOCAL]);
> > >
> > > + if (!(jiffies & 0xFF))
> > > + pr_info("J: %lx\n", jiffies);
> > > +
> > > hrtimer_run_queues();
> > >
> > > for (int i = 0; i < NR_BASES; i++, base++) {
> > >
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > >
> >
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Hi Anup, On 2024-08-15 9:30 AM, Anup Patel wrote: > On Thu, Aug 15, 2024 at 7:02 PM Samuel Holland > <samuel.holland@sifive.com> wrote: >>> Yes. So the riscv timer is not working on this thing or it stops >>> somehow. >> >> That's correct. With the (firmware) devicetree that Emil is using, the OpenSBI >> firmware does not have a timer device, so it does not expose the (optional[1]) >> SBI time extension, and sbi_set_timer() does nothing. > > OpenSBI uses platform specific M-mode timer (mtime and mtimecmp) to > provide SBI time extension to Linux. > > The RISC-V privileged specification (v1.10 or higher) requires platform to > provide a M-mode timer (mtime and mtimecmp). > > This platform not having any M-mode timer is yet another RISC-V spec > violation by this platform. You've misunderstood here. Allwinner D1 (T-HEAD C906) _does_ have an M-mode timer (a CLINT). It is just omitted from devicetree that Emil happens to be using, so OpenSBI isn't using it. Currently OpenSBI allows the system to boot without a timer device, and the SBI specification does not mandate the time extension. If consensus is that either of these should change, that's fine, but currently I see nothing in either the privileged spec nor the SBI spec that guarantees the availability of some timer to the kernel in S-mode. Regards, Samuel
On Thu, Aug 15, 2024 at 8:33 PM Samuel Holland <samuel.holland@sifive.com> wrote: > > Hi Anup, > > On 2024-08-15 9:30 AM, Anup Patel wrote: > > On Thu, Aug 15, 2024 at 7:02 PM Samuel Holland > > <samuel.holland@sifive.com> wrote: > >>> Yes. So the riscv timer is not working on this thing or it stops > >>> somehow. > >> > >> That's correct. With the (firmware) devicetree that Emil is using, the OpenSBI > >> firmware does not have a timer device, so it does not expose the (optional[1]) > >> SBI time extension, and sbi_set_timer() does nothing. > > > > OpenSBI uses platform specific M-mode timer (mtime and mtimecmp) to > > provide SBI time extension to Linux. > > > > The RISC-V privileged specification (v1.10 or higher) requires platform to > > provide a M-mode timer (mtime and mtimecmp). > > > > This platform not having any M-mode timer is yet another RISC-V spec > > violation by this platform. > > You've misunderstood here. Allwinner D1 (T-HEAD C906) _does_ have an M-mode > timer (a CLINT). It is just omitted from devicetree that Emil happens to be > using, so OpenSBI isn't using it. > > Currently OpenSBI allows the system to boot without a timer device, and the SBI > specification does not mandate the time extension. If consensus is that either > of these should change, that's fine, but currently I see nothing in either the > privileged spec nor the SBI spec that guarantees the availability of some timer > to the kernel in S-mode. > The SBI time is certainly optional hence OpenSBI does not hang or crash if it can't provide SBI time to supervisor software. My comment is from the RISC-V privileged spec perspective: 1) Priv v1.10 says "Platforms provide a real-time counter, exposed as a memory-mapped machine-mode register, mtime." in section "3.1.15 Machine Timer Registers (mtime and mtimecmp)". 2) Similar statement in Priv v1.11 section "3.1.10 Machine Timer Registers (mtime and mtimecmp)" 3) Similar statement in Priv v1.12 section "3.2.1 Machine Timer Registers (mtime and mtimecmp)" But since the M-mode timer was omitted from the DT, I think the DT was always incomplete from the M-mode perspective. Regards, Anup
On Thu, Aug 15 2024 at 08:32, Samuel Holland wrote:
> On 2024-08-15 8:16 AM, Thomas Gleixner wrote:
>> Yes. So the riscv timer is not working on this thing or it stops
>> somehow.
>
> That's correct. With the (firmware) devicetree that Emil is using, the OpenSBI
> firmware does not have a timer device, so it does not expose the (optional[1])
> SBI time extension, and sbi_set_timer() does nothing.
Sigh. Does RISCV really have to repeat all mistakes which have been made
by x86, ARM and others before? It's known for decades that the kernel
relies on a working timer...
> I wrote a patch (not submitted) to skip registering riscv_clock_event when the
> SBI time extension is unavailable, but this doesn't fully solve the issue
> either, because then we have no clockevent at all when
> check_unaligned_access_all_cpus() is called.
check_unaligned_access_all_cpus() is irrelevant.
> How early in the boot process are we "required" to have a functional clockevent?
> Do we need to refactor check_unaligned_access_all_cpus() so it works on systems
> where the only clockevent is provided by a platform device?
Right after init/main::late_time_init() everything can depend on a
working timer and on jiffies increasing.
I'm actually surprised that the boot process gets that far. That's just
by pure luck, really.
Thanks,
tglx
On Thu, Aug 15, 2024 at 7:41 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Thu, Aug 15 2024 at 08:32, Samuel Holland wrote: > > On 2024-08-15 8:16 AM, Thomas Gleixner wrote: > >> Yes. So the riscv timer is not working on this thing or it stops > >> somehow. > > > > That's correct. With the (firmware) devicetree that Emil is using, the OpenSBI > > firmware does not have a timer device, so it does not expose the (optional[1]) > > SBI time extension, and sbi_set_timer() does nothing. > > Sigh. Does RISCV really have to repeat all mistakes which have been made > by x86, ARM and others before? It's known for decades that the kernel > relies on a working timer... My apologies for the delay in finding a fix for this issue. Almost all RISC-V platforms (except this one) have SBI Timer always available and Linux uses a better timer or Sstc extension whenever it is available. When Emil first reported this issue, I did try to help him root cause the issue but unfortunately I don't have this particular platform and PLIC on all other RISC-V platforms works fine. I am also surprised that none of the Allwiner folks tried helping. > > > I wrote a patch (not submitted) to skip registering riscv_clock_event when the > > SBI time extension is unavailable, but this doesn't fully solve the issue > > either, because then we have no clockevent at all when > > check_unaligned_access_all_cpus() is called. > > check_unaligned_access_all_cpus() is irrelevant. > > > How early in the boot process are we "required" to have a functional clockevent? > > Do we need to refactor check_unaligned_access_all_cpus() so it works on systems > > where the only clockevent is provided by a platform device? > > Right after init/main::late_time_init() everything can depend on a > working timer and on jiffies increasing. > > I'm actually surprised that the boot process gets that far. That's just > by pure luck, really. > > Thanks, > > tglx Regards, Anup
On 2024-08-15 9:16 AM, Anup Patel wrote: > On Thu, Aug 15, 2024 at 7:41 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> >> On Thu, Aug 15 2024 at 08:32, Samuel Holland wrote: >>> On 2024-08-15 8:16 AM, Thomas Gleixner wrote: >>>> Yes. So the riscv timer is not working on this thing or it stops >>>> somehow. >>> >>> That's correct. With the (firmware) devicetree that Emil is using, the OpenSBI >>> firmware does not have a timer device, so it does not expose the (optional[1]) >>> SBI time extension, and sbi_set_timer() does nothing. >> >> Sigh. Does RISCV really have to repeat all mistakes which have been made >> by x86, ARM and others before? It's known for decades that the kernel >> relies on a working timer... > > My apologies for the delay in finding a fix for this issue. > > Almost all RISC-V platforms (except this one) have SBI Timer always > available and Linux uses a better timer or Sstc extension whenever > it is available. So this is the immediate solution: add the CLINT to the firmware devicetree so that the SBI time extension works, and Linux will boot without any code changes, albeit with a higher-overhead clockevent device. Additionally merging the sun4i timer patch[1] will allow the system to switch to the better MMIO clocksource later in the boot process. The reason the CLINT was not added to the devicetree already is that the T-HEAD version of the CLINT includes an extension to drive SSIP/STIP from a second S-mode visible set of registers. So it should really have twice as many entries in its interrupts-extended property as the existing CLINT, and I never got around to validating that this would work. The long-term solution would be adding driver support for the T-HEAD CLINT extensions, which provide an even better clockevent than the sun4i timer. [1]: https://lore.kernel.org/all/20240312192519.1602493-1-samuel.holland@sifive.com/ > When Emil first reported this issue, I did try to help him root cause > the issue but unfortunately I don't have this particular platform and > PLIC on all other RISC-V platforms works fine. > > I am also surprised that none of the Allwinner folks tried helping. Allwinner D1 support was upstreamed by unpaid hobbyists with very little first-party assistance. >>> I wrote a patch (not submitted) to skip registering riscv_clock_event when the >>> SBI time extension is unavailable, but this doesn't fully solve the issue >>> either, because then we have no clockevent at all when >>> check_unaligned_access_all_cpus() is called. >> >> check_unaligned_access_all_cpus() is irrelevant. >> >>> How early in the boot process are we "required" to have a functional clockevent? >>> Do we need to refactor check_unaligned_access_all_cpus() so it works on systems >>> where the only clockevent is provided by a platform device? >> >> Right after init/main::late_time_init() everything can depend on a >> working timer and on jiffies increasing. >> >> I'm actually surprised that the boot process gets that far. That's just >> by pure luck, really. Thanks for clearing this up! Regards, Samuel
On Thu, Aug 15 2024 at 09:41, Samuel Holland wrote:
> On 2024-08-15 9:16 AM, Anup Patel wrote:
>> Almost all RISC-V platforms (except this one) have SBI Timer always
>> available and Linux uses a better timer or Sstc extension whenever
>> it is available.
>
> So this is the immediate solution: add the CLINT to the firmware
> devicetree so that the SBI time extension works, and Linux will boot
> without any code changes, albeit with a higher-overhead clockevent
> device.
That does not matter for the boot process when the sun4i timer becomes
available afterwards. But how can this be retrofitted along with the
kernel update?
Thanks,
tglx
Samuel Holland wrote: > On 2024-08-15 9:16 AM, Anup Patel wrote: > > On Thu, Aug 15, 2024 at 7:41 PM Thomas Gleixner <tglx@linutronix.de> wrote: > >> > >> On Thu, Aug 15 2024 at 08:32, Samuel Holland wrote: > >>> On 2024-08-15 8:16 AM, Thomas Gleixner wrote: > >>>> Yes. So the riscv timer is not working on this thing or it stops > >>>> somehow. > >>> > >>> That's correct. With the (firmware) devicetree that Emil is using, the OpenSBI > >>> firmware does not have a timer device, so it does not expose the (optional[1]) > >>> SBI time extension, and sbi_set_timer() does nothing. > >> > >> Sigh. Does RISCV really have to repeat all mistakes which have been made > >> by x86, ARM and others before? It's known for decades that the kernel > >> relies on a working timer... > > > > My apologies for the delay in finding a fix for this issue. > > > > Almost all RISC-V platforms (except this one) have SBI Timer always > > available and Linux uses a better timer or Sstc extension whenever > > it is available. > > So this is the immediate solution: add the CLINT to the firmware devicetree so > that the SBI time extension works, and Linux will boot without any code changes, > albeit with a higher-overhead clockevent device. But this will mean that you can't update your kernel to v6.9 or newer without reflashing OpenSBI and u-boot. That's still a regression right? /Emil
Hi Emil, On 2024-08-15 10:07 AM, Emil Renner Berthing wrote: > Samuel Holland wrote: >> On 2024-08-15 9:16 AM, Anup Patel wrote: >>> On Thu, Aug 15, 2024 at 7:41 PM Thomas Gleixner <tglx@linutronix.de> wrote: >>>> >>>> On Thu, Aug 15 2024 at 08:32, Samuel Holland wrote: >>>>> On 2024-08-15 8:16 AM, Thomas Gleixner wrote: >>>>>> Yes. So the riscv timer is not working on this thing or it stops >>>>>> somehow. >>>>> >>>>> That's correct. With the (firmware) devicetree that Emil is using, the OpenSBI >>>>> firmware does not have a timer device, so it does not expose the (optional[1]) >>>>> SBI time extension, and sbi_set_timer() does nothing. >>>> >>>> Sigh. Does RISCV really have to repeat all mistakes which have been made >>>> by x86, ARM and others before? It's known for decades that the kernel >>>> relies on a working timer... >>> >>> My apologies for the delay in finding a fix for this issue. >>> >>> Almost all RISC-V platforms (except this one) have SBI Timer always >>> available and Linux uses a better timer or Sstc extension whenever >>> it is available. >> >> So this is the immediate solution: add the CLINT to the firmware devicetree so >> that the SBI time extension works, and Linux will boot without any code changes, >> albeit with a higher-overhead clockevent device. > > But this will mean that you can't update your kernel to v6.9 or newer without > reflashing OpenSBI and u-boot. That's still a regression right? I suppose that depends on if you think the SBI time extension is (or should have been) mandatory for platforms without Sstc. If the SBI time extension is mandatory, then this is a firmware bug, and not really Linux's responsibility to work around. If the SBI time extension is not mandatory, then Linux needs to be able to handle platforms where the S-mode visible timer is attached to an external interrupt controller (PLIC or APLIC), so the irqchip driver needs to be loaded before time_init() (timer_probe()). So in that case, the bug is a Linux regression, and we would need to revert the platform driver conversion. Regards, Samuel
On Thu, 15 Aug 2024 08:59:37 PDT (-0700), samuel.holland@sifive.com wrote: > Hi Emil, > > On 2024-08-15 10:07 AM, Emil Renner Berthing wrote: >> Samuel Holland wrote: >>> On 2024-08-15 9:16 AM, Anup Patel wrote: >>>> On Thu, Aug 15, 2024 at 7:41 PM Thomas Gleixner <tglx@linutronix.de> wrote: >>>>> >>>>> On Thu, Aug 15 2024 at 08:32, Samuel Holland wrote: >>>>>> On 2024-08-15 8:16 AM, Thomas Gleixner wrote: >>>>>>> Yes. So the riscv timer is not working on this thing or it stops >>>>>>> somehow. >>>>>> >>>>>> That's correct. With the (firmware) devicetree that Emil is using, the OpenSBI >>>>>> firmware does not have a timer device, so it does not expose the (optional[1]) >>>>>> SBI time extension, and sbi_set_timer() does nothing. >>>>> >>>>> Sigh. Does RISCV really have to repeat all mistakes which have been made >>>>> by x86, ARM and others before? It's known for decades that the kernel >>>>> relies on a working timer... It's even worse than that: RISC-V doesn't even mandate any working _instructions_, much less anything in the platform/firmware. >>>> My apologies for the delay in finding a fix for this issue. >>>> >>>> Almost all RISC-V platforms (except this one) have SBI Timer always >>>> available and Linux uses a better timer or Sstc extension whenever >>>> it is available. >>> >>> So this is the immediate solution: add the CLINT to the firmware devicetree so >>> that the SBI time extension works, and Linux will boot without any code changes, >>> albeit with a higher-overhead clockevent device. >> >> But this will mean that you can't update your kernel to v6.9 or newer without >> reflashing OpenSBI and u-boot. That's still a regression right? Ya, I'd call that a regression. Updating the firmware on these things isn't generally something we can rely on users to do, we've worked around other firmware bugs where we can to avoid forced updates. > I suppose that depends on if you think the SBI time extension is (or should have > been) mandatory for platforms without Sstc. If the SBI time extension is > mandatory, then this is a firmware bug, and not really Linux's responsibility to > work around. > > If the SBI time extension is not mandatory, then Linux needs to be able to > handle platforms where the S-mode visible timer is attached to an external > interrupt controller (PLIC or APLIC), so the irqchip driver needs to be loaded > before time_init() (timer_probe()). So in that case, the bug is a Linux > regression, and we would need to revert the platform driver conversion. It doesn't really matter what the specs say (aka intended to say in RISC-V land): if there's a regression then we have to deal with it. It's not like whatever's written in the specs actually matters, vendors can just do whatever they want, so wer'e just stuck making the known implementations work. So I think if the revert is the best fix then we should revert it. That said: If the CLINT works, could we just add a probing quirk to make it appear on these systems even when it's not in the DT? I'm thinking something like adding a compatibly string to the CLINT driver for the SOC (or core or whatever, just something that's already there). We'd probably need a bit of special-case probing code, but shouldn't be so bad. We've got some other compatibility-oriented DT quirks floating around. > Regards, > Samuel
在 2024-08-15星期四的 10:51 -0700,Palmer Dabbelt写道: > On Thu, 15 Aug 2024 08:59:37 PDT (-0700), > samuel.holland@sifive.com wrote: > > Hi Emil, > > > > On 2024-08-15 10:07 AM, Emil Renner Berthing wrote: > > > Samuel Holland wrote: > > > > On 2024-08-15 9:16 AM, Anup Patel wrote: > > > > > On Thu, Aug 15, 2024 at 7:41 PM Thomas Gleixner > > > > > <tglx@linutronix.de> wrote: > > > > > > > > > > > > On Thu, Aug 15 2024 at 08:32, Samuel Holland wrote: > > > > > > > On 2024-08-15 8:16 AM, Thomas Gleixner wrote: > > > > > > > > Yes. So the riscv timer is not working on this thing or > > > > > > > > it stops > > > > > > > > somehow. > > > > > > > > > > > > > > That's correct. With the (firmware) devicetree that Emil > > > > > > > is using, the OpenSBI > > > > > > > firmware does not have a timer device, so it does not > > > > > > > expose the (optional[1]) > > > > > > > SBI time extension, and sbi_set_timer() does nothing. > > > > > > > > > > > > Sigh. Does RISCV really have to repeat all mistakes which > > > > > > have been made > > > > > > by x86, ARM and others before? It's known for decades that > > > > > > the kernel > > > > > > relies on a working timer... > > It's even worse than that: RISC-V doesn't even mandate any working > _instructions_, much less anything in the platform/firmware. > > > > > > My apologies for the delay in finding a fix for this issue. > > > > > > > > > > Almost all RISC-V platforms (except this one) have SBI Timer > > > > > always > > > > > available and Linux uses a better timer or Sstc extension > > > > > whenever > > > > > it is available. > > > > > > > > So this is the immediate solution: add the CLINT to the > > > > firmware devicetree so > > > > that the SBI time extension works, and Linux will boot without > > > > any code changes, > > > > albeit with a higher-overhead clockevent device. > > > > > > But this will mean that you can't update your kernel to v6.9 or > > > newer without > > > reflashing OpenSBI and u-boot. That's still a regression right? > > Ya, I'd call that a regression. Updating the firmware on these > things > isn't generally something we can rely on users to do, we've worked > around other firmware bugs where we can to avoid forced updates. > > > I suppose that depends on if you think the SBI time extension is > > (or should have > > been) mandatory for platforms without Sstc. If the SBI time > > extension is > > mandatory, then this is a firmware bug, and not really Linux's > > responsibility to > > work around. > > > > If the SBI time extension is not mandatory, then Linux needs to be > > able to > > handle platforms where the S-mode visible timer is attached to an > > external > > interrupt controller (PLIC or APLIC), so the irqchip driver needs > > to be loaded > > before time_init() (timer_probe()). So in that case, the bug is a > > Linux > > regression, and we would need to revert the platform driver > > conversion. > > It doesn't really matter what the specs say (aka intended to say in > RISC-V land): if there's a regression then we have to deal with it. > It's not like whatever's written in the specs actually matters, > vendors > can just do whatever they want, so wer'e just stuck making the known > implementations work. > > So I think if the revert is the best fix then we should revert it. > > That said: If the CLINT works, could we just add a probing quirk to > make CLINT works, but will will never work in S mode by its design -- the register used is all M-mode-only. So it this kind of probing quirk is being added, the target will be OpenSBI instead of Linux, and the problem of updating firmware still exists. > it appear on these systems even when it's not in the DT? I'm > thinking > something like adding a compatibly string to the CLINT driver for the > SOC (or core or whatever, just something that's already there). We'd > probably need a bit of special-case probing code, but shouldn't be so > bad. We've got some other compatibility-oriented DT quirks floating > around. > > > Regards, > > Samuel > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Thu, Aug 15 2024 at 10:51, Palmer Dabbelt wrote:
> On Thu, 15 Aug 2024 08:59:37 PDT (-0700), samuel.holland@sifive.com wrote:
>>>>>> Sigh. Does RISCV really have to repeat all mistakes which have been made
>>>>>> by x86, ARM and others before? It's known for decades that the kernel
>>>>>> relies on a working timer...
>
> It's even worse than that: RISC-V doesn't even mandate any working
> _instructions_, much less anything in the platform/firmware.
So it's definitely taking the award for architectural disaster and will
probably keep it for a while.
> So I think if the revert is the best fix then we should revert it.
>
> That said: If the CLINT works, could we just add a probing quirk to make
> it appear on these systems even when it's not in the DT? I'm thinking
> something like adding a compatibly string to the CLINT driver for the
> SOC (or core or whatever, just something that's already there). We'd
> probably need a bit of special-case probing code, but shouldn't be so
> bad. We've got some other compatibility-oriented DT quirks floating
> around.
Alternatively, you can have a quirk in the PLIC driver for that
Allwinner D1 chip which probes it via IRQCHIP_DECLARE() as before with a
special probe function and denies the later platform probe.
Thanks,
tglx
On Wed, 14 Aug 2024 07:56:32 PDT (-0700), Renner Berthing wrote: > Hi Anup, > > As described in the thread below[1] I haven't been able to boot my > boards based on the Allwinner D1 SoC since 6.9 where you converted the > SiFive PLIC driver to a platform driver. > > This is clearly a regression and there haven't really been much progress > on fixing the issue since then, so here is the revert that fixes it. > > If no other fix is found before 6.11 I suggest we apply this. > > [1]: https://lore.kernel.org/linux-riscv/CAJM55Z9hGKo4784N3s3DhWw=nMRKZKcmvZ58x7uVBghExhoc9A@mail.gmail.com/ > > /Emil > > Emil Renner Berthing (9): > Revert "irqchip/sifive-plic: Chain to parent IRQ after handlers are > ready" > Revert "irqchip/sifive-plic: Avoid explicit cpumask allocation on > stack" > Revert "irqchip/sifive-plic: Improve locking safety by using > irqsave/irqrestore" > Revert "irqchip/sifive-plic: Parse number of interrupts and contexts > early in plic_probe()" > Revert "irqchip/sifive-plic: Cleanup PLIC contexts upon irqdomain > creation failure" > Revert "irqchip/sifive-plic: Use riscv_get_intc_hwnode() to get parent > fwnode" > Revert "irqchip/sifive-plic: Use devm_xyz() for managed allocation" > Revert "irqchip/sifive-plic: Use dev_xyz() in-place of pr_xyz()" > Revert "irqchip/sifive-plic: Convert PLIC driver into a platform > driver" > > drivers/irqchip/irq-sifive-plic.c | 296 ++++++++++++------------------ > 1 file changed, 117 insertions(+), 179 deletions(-) I'm still only testing on the QEMU-emulated platforms, but this isn't regressing over there so Tested-by: Palmer Dabbelt <palmer@rivosinc.com> # QEMU Thanks!
© 2016 - 2026 Red Hat, Inc.