Hi!
I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
("target/riscv/cpu.c: restrict 'marchid' value")
Reverting that commit, or the hack below solves the boot issue:
--8<--
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8cbfc7e781ad..e18596c8a55a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
cpu->cfg.ext_xtheadsync = true;
cpu->cfg.mvendorid = THEAD_VENDOR_ID;
+ cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
+ (QEMU_VERSION_MINOR << 8) |
+ (QEMU_VERSION_MICRO));
#ifndef CONFIG_USER_ONLY
set_satp_mode_max_supported(cpu, VM_1_10_SV39);
#endif
--8<--
I'm unsure what the correct qemu way of adding a default value is,
or if c906 should have a proper marchid.
Maybe Christoph or Zhiwei can answer?
qemu command-line:
qemu-system-riscv64 -nodefaults -nographic -machine virt,acpi=off \
-cpu thead-c906 ...
Thanks,
Björn
On 2024/1/24 20:49, Björn Töpel wrote: > Hi! > > I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that > thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2 > ("target/riscv/cpu.c: restrict 'marchid' value") > > Reverting that commit, or the hack below solves the boot issue: > > --8<-- > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c > index 8cbfc7e781ad..e18596c8a55a 100644 > --- a/target/riscv/cpu.c > +++ b/target/riscv/cpu.c > @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj) > cpu->cfg.ext_xtheadsync = true; > > cpu->cfg.mvendorid = THEAD_VENDOR_ID; > + cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) | > + (QEMU_VERSION_MINOR << 8) | > + (QEMU_VERSION_MICRO)); > #ifndef CONFIG_USER_ONLY > set_satp_mode_max_supported(cpu, VM_1_10_SV39); > #endif > --8<-- > > I'm unsure what the correct qemu way of adding a default value is, > or if c906 should have a proper marchid. > > Maybe Christoph or Zhiwei can answer? > > qemu command-line: > qemu-system-riscv64 -nodefaults -nographic -machine virt,acpi=off \ > -cpu thead-c906 ... Hi Björn, I think it is caused by an mmu extension(named XTheadMaee) not implemented on QEMU which is conflicts with Svpbmt, which is the reason for error-ta in Linux. I will try to fix this on QEMU and at the same time give a way to implement vendor custom CSR(XTheadMaee is controlled by an CSR named mexstatus) on QEMU. Thanks, Zhiwei > > Thanks, > Björn
On 1/24/24 09:49, Björn Töpel wrote: > Hi! > > I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that > thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2 > ("target/riscv/cpu.c: restrict 'marchid' value") > > Reverting that commit, or the hack below solves the boot issue: > > --8<-- > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c > index 8cbfc7e781ad..e18596c8a55a 100644 > --- a/target/riscv/cpu.c > +++ b/target/riscv/cpu.c > @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj) > cpu->cfg.ext_xtheadsync = true; > > cpu->cfg.mvendorid = THEAD_VENDOR_ID; > + cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) | > + (QEMU_VERSION_MINOR << 8) | > + (QEMU_VERSION_MICRO)); > #ifndef CONFIG_USER_ONLY > set_satp_mode_max_supported(cpu, VM_1_10_SV39); > #endif > --8<-- > > I'm unsure what the correct qemu way of adding a default value is, > or if c906 should have a proper marchid. In case you need to set a 'marchid' different than zero for c906, this hack would be a proper fix. As mentioned in the commit msg of the patch you mentioned: "Named CPUs should set 'marchid' to a meaningful value instead, and generic CPUs can set to any valid value." That means that any specific marchid value that the CPU uses must to be set in its own cpu_init() function. Thanks, Daniel > > Maybe Christoph or Zhiwei can answer? > > qemu command-line: > qemu-system-riscv64 -nodefaults -nographic -machine virt,acpi=off \ > -cpu thead-c906 ... > > > Thanks, > Björn
Daniel Henrique Barboza <dbarboza@ventanamicro.com> writes: > On 1/24/24 09:49, Björn Töpel wrote: >> Hi! >> >> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that >> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2 >> ("target/riscv/cpu.c: restrict 'marchid' value") >> >> Reverting that commit, or the hack below solves the boot issue: >> >> --8<-- >> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c >> index 8cbfc7e781ad..e18596c8a55a 100644 >> --- a/target/riscv/cpu.c >> +++ b/target/riscv/cpu.c >> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj) >> cpu->cfg.ext_xtheadsync = true; >> >> cpu->cfg.mvendorid = THEAD_VENDOR_ID; >> + cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) | >> + (QEMU_VERSION_MINOR << 8) | >> + (QEMU_VERSION_MICRO)); >> #ifndef CONFIG_USER_ONLY >> set_satp_mode_max_supported(cpu, VM_1_10_SV39); >> #endif >> --8<-- >> >> I'm unsure what the correct qemu way of adding a default value is, >> or if c906 should have a proper marchid. > > In case you need to set a 'marchid' different than zero for c906, this hack would > be a proper fix. As mentioned in the commit msg of the patch you mentioned: > > "Named CPUs should set 'marchid' to a meaningful value instead, and generic > CPUs can set to any valid value." > > That means that any specific marchid value that the CPU uses must to be set > in its own cpu_init() function. Got it. Thanks, Daniel! For completeness (since it came up on the weekly PW call); Conor pointed out that zero *is* indeed the right marchid for c906, and in fact, the non-zero marchid pre commit d6a427e2c0b2 was incorrect. Post commit d6a427e2c0b2, the correct alternative is picked up, and ERRATA_THEAD_PBMT (using non-standard memory type bits in page-table-entries) kicks in. AFAIU, that's not implemented by qemu's c906 support, which then traps. That's the theory. Maybe Christoph knows if the non-standard bits are implemented or not? Regardless; I removed booting Qemu T-head c906 from the CI, and the build/boot passes nicely ;-) [1] Björn [1] https://github.com/linux-riscv/linux-riscv/actions/runs/7641764759/job/20819801235?pr=447
On 1/24/24 16:26, Björn Töpel wrote: > Daniel Henrique Barboza <dbarboza@ventanamicro.com> writes: > >> On 1/24/24 09:49, Björn Töpel wrote: >>> Hi! >>> >>> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that >>> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2 >>> ("target/riscv/cpu.c: restrict 'marchid' value") >>> >>> Reverting that commit, or the hack below solves the boot issue: >>> >>> --8<-- >>> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c >>> index 8cbfc7e781ad..e18596c8a55a 100644 >>> --- a/target/riscv/cpu.c >>> +++ b/target/riscv/cpu.c >>> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj) >>> cpu->cfg.ext_xtheadsync = true; >>> >>> cpu->cfg.mvendorid = THEAD_VENDOR_ID; >>> + cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) | >>> + (QEMU_VERSION_MINOR << 8) | >>> + (QEMU_VERSION_MICRO)); >>> #ifndef CONFIG_USER_ONLY >>> set_satp_mode_max_supported(cpu, VM_1_10_SV39); >>> #endif >>> --8<-- >>> >>> I'm unsure what the correct qemu way of adding a default value is, >>> or if c906 should have a proper marchid. >> >> In case you need to set a 'marchid' different than zero for c906, this hack would >> be a proper fix. As mentioned in the commit msg of the patch you mentioned: >> >> "Named CPUs should set 'marchid' to a meaningful value instead, and generic >> CPUs can set to any valid value." >> >> That means that any specific marchid value that the CPU uses must to be set >> in its own cpu_init() function. > > Got it. Thanks, Daniel! > > For completeness (since it came up on the weekly PW call); Conor pointed > out that zero *is* indeed the right marchid for c906, and in fact, the > non-zero marchid pre commit d6a427e2c0b2 was incorrect. > > Post commit d6a427e2c0b2, the correct alternative is picked up, and > ERRATA_THEAD_PBMT (using non-standard memory type bits in > page-table-entries) kicks in. AFAIU, that's not implemented by qemu's > c906 support, which then traps. This looks like a very good reason to actually push what you called 'hack' as a fix. Yeah, in theory that commit did nothing wrong, but the side effect (missing support for non-standard memory type bits) is kind of a QEMU problem. You're welcome to format that hack into a patch, explaining in the commit msg why we need to set marchid for c906 to that specific value. I'd even add a TODO tag in rv64_thead_c906_cpu_init() to remind us that this is a band-aid and that we should remove it once we implement the needed support. > > That's the theory. Maybe Christoph knows if the non-standard bits are > implemented or not? > > Regardless; I removed booting Qemu T-head c906 from the CI, and the > build/boot passes nicely ;-) [1] I vote for setting marchid in c906 cpu_init and re-enable it in the CI. Thanks, Daniel > > > Björn > > [1] https://github.com/linux-riscv/linux-riscv/actions/runs/7641764759/job/20819801235?pr=447
Daniel Henrique Barboza <dbarboza@ventanamicro.com> writes: > On 1/24/24 16:26, Björn Töpel wrote: >> Daniel Henrique Barboza <dbarboza@ventanamicro.com> writes: >> >>> On 1/24/24 09:49, Björn Töpel wrote: >>>> Hi! >>>> >>>> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that >>>> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2 >>>> ("target/riscv/cpu.c: restrict 'marchid' value") >>>> >>>> Reverting that commit, or the hack below solves the boot issue: >>>> >>>> --8<-- >>>> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c >>>> index 8cbfc7e781ad..e18596c8a55a 100644 >>>> --- a/target/riscv/cpu.c >>>> +++ b/target/riscv/cpu.c >>>> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj) >>>> cpu->cfg.ext_xtheadsync = true; >>>> >>>> cpu->cfg.mvendorid = THEAD_VENDOR_ID; >>>> + cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) | >>>> + (QEMU_VERSION_MINOR << 8) | >>>> + (QEMU_VERSION_MICRO)); >>>> #ifndef CONFIG_USER_ONLY >>>> set_satp_mode_max_supported(cpu, VM_1_10_SV39); >>>> #endif >>>> --8<-- >>>> >>>> I'm unsure what the correct qemu way of adding a default value is, >>>> or if c906 should have a proper marchid. >>> >>> In case you need to set a 'marchid' different than zero for c906, this hack would >>> be a proper fix. As mentioned in the commit msg of the patch you mentioned: >>> >>> "Named CPUs should set 'marchid' to a meaningful value instead, and generic >>> CPUs can set to any valid value." >>> >>> That means that any specific marchid value that the CPU uses must to be set >>> in its own cpu_init() function. >> >> Got it. Thanks, Daniel! >> >> For completeness (since it came up on the weekly PW call); Conor pointed >> out that zero *is* indeed the right marchid for c906, and in fact, the >> non-zero marchid pre commit d6a427e2c0b2 was incorrect. >> >> Post commit d6a427e2c0b2, the correct alternative is picked up, and >> ERRATA_THEAD_PBMT (using non-standard memory type bits in >> page-table-entries) kicks in. AFAIU, that's not implemented by qemu's >> c906 support, which then traps. > > > This looks like a very good reason to actually push what you called 'hack' as > a fix. Yeah, in theory that commit did nothing wrong, but the side effect > (missing support for non-standard memory type bits) is kind of a QEMU problem. For me, it'd be weird to add the hack (setting marchid to non-zero). Claiming that it's a "thead-c906 emulation" in qemu, but w/o the proper page-bit support. That's just cpu rv64 plus some extra instructions -- not the c906. Björn
On Wed, Jan 24, 2024 at 01:49:51PM +0100, Björn Töpel wrote: > Hi! > > I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that > thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2 > ("target/riscv/cpu.c: restrict 'marchid' value") > > Reverting that commit, or the hack below solves the boot issue: > > --8<-- > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c > index 8cbfc7e781ad..e18596c8a55a 100644 > --- a/target/riscv/cpu.c > +++ b/target/riscv/cpu.c > @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj) > cpu->cfg.ext_xtheadsync = true; > > cpu->cfg.mvendorid = THEAD_VENDOR_ID; > + cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) | > + (QEMU_VERSION_MINOR << 8) | > + (QEMU_VERSION_MICRO)); > #ifndef CONFIG_USER_ONLY > set_satp_mode_max_supported(cpu, VM_1_10_SV39); > #endif > --8<-- > > I'm unsure what the correct qemu way of adding a default value is, > or if c906 should have a proper marchid. The "correct" marchid/mimpid values for the c906 are zero. I haven't looked into the code at all, so I am "assuming" that it is being zero intialised at present. Linux applies the errata fixups for the c906 when archid and impid are both zero - so your patch will avoid these fixups being applied. Do you think that perhaps the emulation in QEMU does not support what the kernel uses once then errata fixups are enabled? > > Maybe Christoph or Zhiwei can answer? > > qemu command-line: > qemu-system-riscv64 -nodefaults -nographic -machine virt,acpi=off \ > -cpu thead-c906 ... > > > Thanks, > Björn >
Conor Dooley <conor@kernel.org> writes: > On Wed, Jan 24, 2024 at 01:49:51PM +0100, Björn Töpel wrote: >> Hi! >> >> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that >> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2 >> ("target/riscv/cpu.c: restrict 'marchid' value") >> >> Reverting that commit, or the hack below solves the boot issue: >> >> --8<-- >> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c >> index 8cbfc7e781ad..e18596c8a55a 100644 >> --- a/target/riscv/cpu.c >> +++ b/target/riscv/cpu.c >> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj) >> cpu->cfg.ext_xtheadsync = true; >> >> cpu->cfg.mvendorid = THEAD_VENDOR_ID; >> + cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) | >> + (QEMU_VERSION_MINOR << 8) | >> + (QEMU_VERSION_MICRO)); >> #ifndef CONFIG_USER_ONLY >> set_satp_mode_max_supported(cpu, VM_1_10_SV39); >> #endif >> --8<-- >> >> I'm unsure what the correct qemu way of adding a default value is, >> or if c906 should have a proper marchid. > > The "correct" marchid/mimpid values for the c906 are zero. Ok! Thanks for clearing that up for me. > I haven't looked into the code at all, so I am "assuming" that it is > being zero intialised at present. Linux applies the errata fixups for > the c906 when archid and impid are both zero - so your patch will avoid > these fixups being applied. I'm also assuming 0, -- will double-check. Hmm, that means that the *previous* marchid was incorrect (pre d6a427e2c0b2). > Do you think that perhaps the emulation in QEMU does not support what > the kernel uses once then errata fixups are enabled? Did a quick look at the c906 "in_asm,int" logs: | 0x80201040: 12000073 sfence.vma zero,zero | 0x80201044: 18051073 csrrw zero,satp,a0 | | riscv_cpu_do_interrupt: hart:0, async:0, cause:000000000000000c, epc:0x0000000080201048, tval:0x0000000080201048, desc=exec_page_fault | riscv_cpu_do_interrupt: hart:0, async:0, cause:000000000000000c, epc:0xffffffff80001048, tval:0xffffffff80001048, desc=exec_page_fault | ...cont forever So it looks like we're tripping over the page tables, when we're turning on paging. Hmm, maybe it's not qemu, but the c906 that has been broken for a while? I'll disable it temporarily from CI anyhow, and will continue digging. Thanks for the pointers/clarifications, Conor! Björn
On Wed, Jan 24, 2024 at 02:27:10PM +0100, Björn Töpel wrote: > Conor Dooley <conor@kernel.org> writes: > > > On Wed, Jan 24, 2024 at 01:49:51PM +0100, Björn Töpel wrote: > >> Hi! > >> > >> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that > >> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2 > >> ("target/riscv/cpu.c: restrict 'marchid' value") > >> > >> Reverting that commit, or the hack below solves the boot issue: > >> > >> --8<-- > >> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c > >> index 8cbfc7e781ad..e18596c8a55a 100644 > >> --- a/target/riscv/cpu.c > >> +++ b/target/riscv/cpu.c > >> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj) > >> cpu->cfg.ext_xtheadsync = true; > >> > >> cpu->cfg.mvendorid = THEAD_VENDOR_ID; > >> + cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) | > >> + (QEMU_VERSION_MINOR << 8) | > >> + (QEMU_VERSION_MICRO)); > >> #ifndef CONFIG_USER_ONLY > >> set_satp_mode_max_supported(cpu, VM_1_10_SV39); > >> #endif > >> --8<-- > >> > >> I'm unsure what the correct qemu way of adding a default value is, > >> or if c906 should have a proper marchid. > > > > The "correct" marchid/mimpid values for the c906 are zero. > > Ok! Thanks for clearing that up for me. > > > I haven't looked into the code at all, so I am "assuming" that it is > > being zero intialised at present. Linux applies the errata fixups for > > the c906 when archid and impid are both zero - so your patch will avoid > > these fixups being applied. > > I'm also assuming 0, -- will double-check. Hmm, that means that the > *previous* marchid was incorrect (pre d6a427e2c0b2). > > > Do you think that perhaps the emulation in QEMU does not support what > > the kernel uses once then errata fixups are enabled? > > Did a quick look at the c906 "in_asm,int" logs: > > | 0x80201040: 12000073 sfence.vma zero,zero > | 0x80201044: 18051073 csrrw zero,satp,a0 > | > | riscv_cpu_do_interrupt: hart:0, async:0, cause:000000000000000c, epc:0x0000000080201048, tval:0x0000000080201048, desc=exec_page_fault > | riscv_cpu_do_interrupt: hart:0, async:0, cause:000000000000000c, epc:0xffffffff80001048, tval:0xffffffff80001048, desc=exec_page_fault > | ...cont forever > > So it looks like we're tripping over the page tables, when we're turning > on paging. > > Hmm, maybe it's not qemu, but the c906 that has been broken for a while? I didn't know what you mean by "not qemu, but the c906", so I went and boot tested my d1 nezha. On today's next (6.8.0-rc1-next-20240124) it booted into my initramfs with no problems. Obivously though my config is unlikely to match yours, but that seems like a core thing that should be hit regardless of config. So perhaps this is a c906-in-QEMU problem? Lacking emulation for something the kernel uses perhaps? I know nothing about the capabilities of its emulation in QEMU, so I am of no help. Cheers, Conor. > > I'll disable it temporarily from CI anyhow, and will continue digging. > > > Thanks for the pointers/clarifications, Conor! > Björn > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
© 2016 - 2024 Red Hat, Inc.