[PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()

Oleksii Kurochko posted 15 patches 1 month, 2 weeks ago
[PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
Posted by Oleksii Kurochko 1 month, 2 weeks ago
Implement function to initialize VCPU's CSR registers to delegate handling
of some traps to VS-mode ( guest ), enable vstimecmp for VS-mode, and
allow some AIA-related register (thier vs* copies ) for VS-mode.

Add detection of Smstateen extension to properly initialize hstateen0 to
allow guest to access AIA-added state.

Add call of vcpu_csr_init() in arch_vcpu_create().

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
 xen/arch/riscv/cpufeature.c                 |  1 +
 xen/arch/riscv/domain.c                     | 63 +++++++++++++++++++++
 xen/arch/riscv/include/asm/cpufeature.h     |  1 +
 xen/arch/riscv/include/asm/riscv_encoding.h |  2 +
 4 files changed, 67 insertions(+)

diff --git a/xen/arch/riscv/cpufeature.c b/xen/arch/riscv/cpufeature.c
index 02b68aeaa49f..03e27b037be0 100644
--- a/xen/arch/riscv/cpufeature.c
+++ b/xen/arch/riscv/cpufeature.c
@@ -137,6 +137,7 @@ const struct riscv_isa_ext_data __initconst riscv_isa_ext[] = {
     RISCV_ISA_EXT_DATA(zbb),
     RISCV_ISA_EXT_DATA(zbs),
     RISCV_ISA_EXT_DATA(smaia),
+    RISCV_ISA_EXT_DATA(smstateen),
     RISCV_ISA_EXT_DATA(ssaia),
     RISCV_ISA_EXT_DATA(svade),
     RISCV_ISA_EXT_DATA(svpbmt),
diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
index e5fda1af4ee9..44387d056546 100644
--- a/xen/arch/riscv/domain.c
+++ b/xen/arch/riscv/domain.c
@@ -3,6 +3,67 @@
 #include <xen/mm.h>
 #include <xen/sched.h>
 
+#include <asm/cpufeature.h>
+#include <asm/csr.h>
+#include <asm/riscv_encoding.h>
+
+static void vcpu_csr_init(struct vcpu *v)
+{
+    unsigned long hedeleg, hideleg, hstatus;
+
+    hedeleg = 0;
+    hedeleg |= (1U << CAUSE_MISALIGNED_FETCH);
+    hedeleg |= (1U << CAUSE_FETCH_ACCESS);
+    hedeleg |= (1U << CAUSE_ILLEGAL_INSTRUCTION);
+    hedeleg |= (1U << CAUSE_MISALIGNED_LOAD);
+    hedeleg |= (1U << CAUSE_LOAD_ACCESS);
+    hedeleg |= (1U << CAUSE_MISALIGNED_STORE);
+    hedeleg |= (1U << CAUSE_STORE_ACCESS);
+    hedeleg |= (1U << CAUSE_BREAKPOINT);
+    hedeleg |= (1U << CAUSE_USER_ECALL);
+    hedeleg |= (1U << CAUSE_FETCH_PAGE_FAULT);
+    hedeleg |= (1U << CAUSE_LOAD_PAGE_FAULT);
+    hedeleg |= (1U << CAUSE_STORE_PAGE_FAULT);
+    v->arch.hedeleg = hedeleg;
+
+    hstatus = HSTATUS_SPV | HSTATUS_SPVP;
+    v->arch.hstatus = hstatus;
+
+    hideleg = MIP_VSTIP |  MIP_VSEIP | MIP_VSSIP;
+    v->arch.hideleg = hideleg;
+
+    /*
+     * VS should access only the time counter directly.
+     * Everything else should trap.
+     */
+    v->arch.hcounteren |= HCOUNTEREN_TM;
+
+    if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_svpbmt) )
+        v->arch.henvcfg |= ENVCFG_PBMTE;
+
+    if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_smstateen) )
+    {
+        /*
+         * If the hypervisor extension is implemented, the same three bitsare
+         * defined also in hypervisor CSR hstateen0 but concern only the state
+         * potentially accessible to a virtual machine executing in privilege
+         * modes VS and VU:
+         *      bit 60 CSRs siselect and sireg (really vsiselect and vsireg)
+         *      bit 59 CSRs siph and sieh (RV32 only) and stopi (really vsiph,
+         *             vsieh, and vstopi)
+         *      bit 58 all state of IMSIC guest interrupt files, including CSR
+         *             stopei (really vstopei)
+         * If one of these bits is zero in hstateen0, and the same bit is one
+         * in mstateen0, then an attempt to access the corresponding state from
+         * VS or VU-mode raises a virtual instruction exception.
+        */
+        v->arch.hstateen0 = SMSTATEEN0_AIA | SMSTATEEN0_IMSIC | SMSTATEEN0_SVSLCT;
+
+        /* Allow guest to access CSR_ENVCFG */
+        v->arch.hstateen0 |= SMSTATEEN0_HSENVCFG;
+    }
+}
+
 static void continue_new_vcpu(struct vcpu *prev)
 {
     BUG_ON("unimplemented\n");
@@ -30,6 +91,8 @@ int arch_vcpu_create(struct vcpu *v)
            v->arch.xen_saved_context.sp, v->arch.xen_saved_context.ra,
            (unsigned long)v->arch.cpu_info);
 
+    vcpu_csr_init(v);
+
     /* Idle VCPUs don't need the rest of this setup */
     if ( is_idle_vcpu(v) )
         return rc;
diff --git a/xen/arch/riscv/include/asm/cpufeature.h b/xen/arch/riscv/include/asm/cpufeature.h
index b69616038888..ef02a3e26d2c 100644
--- a/xen/arch/riscv/include/asm/cpufeature.h
+++ b/xen/arch/riscv/include/asm/cpufeature.h
@@ -36,6 +36,7 @@ enum riscv_isa_ext_id {
     RISCV_ISA_EXT_zbb,
     RISCV_ISA_EXT_zbs,
     RISCV_ISA_EXT_smaia,
+    RISCV_ISA_EXT_smstateen,
     RISCV_ISA_EXT_ssaia,
     RISCV_ISA_EXT_svade,
     RISCV_ISA_EXT_svpbmt,
diff --git a/xen/arch/riscv/include/asm/riscv_encoding.h b/xen/arch/riscv/include/asm/riscv_encoding.h
index 1f7e612366f8..dd15731a86fa 100644
--- a/xen/arch/riscv/include/asm/riscv_encoding.h
+++ b/xen/arch/riscv/include/asm/riscv_encoding.h
@@ -228,6 +228,8 @@
 #define ENVCFG_CBIE_INV			_UL(0x3)
 #define ENVCFG_FIOM			_UL(0x1)
 
+#define HCOUNTEREN_TM BIT(1, U)
+
 /* ===== User-level CSRs ===== */
 
 /* User Trap Setup (N-extension) */
-- 
2.52.0
Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
Posted by Jan Beulich 1 month ago
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> Implement function to initialize VCPU's CSR registers to delegate handling
> of some traps to VS-mode ( guest ), enable vstimecmp for VS-mode, and
> allow some AIA-related register (thier vs* copies ) for VS-mode.

The henvcfg setting isn't covered here at all, unless I'm failing to make the
respective association. Nor is the setting of SMSTATEEN0_HSENVCFG in hstateen0.

Overall it feels like the description here is too terse anyway, as the bits
set (or not) are a pretty crucial thing for running guests. Then again maybe
this is just me, for not being a RISC-V person ...

> --- a/xen/arch/riscv/domain.c
> +++ b/xen/arch/riscv/domain.c
> @@ -3,6 +3,67 @@
>  #include <xen/mm.h>
>  #include <xen/sched.h>
>  
> +#include <asm/cpufeature.h>
> +#include <asm/csr.h>
> +#include <asm/riscv_encoding.h>
> +
> +static void vcpu_csr_init(struct vcpu *v)
> +{
> +    unsigned long hedeleg, hideleg, hstatus;
> +
> +    hedeleg = 0;
> +    hedeleg |= (1U << CAUSE_MISALIGNED_FETCH);
> +    hedeleg |= (1U << CAUSE_FETCH_ACCESS);
> +    hedeleg |= (1U << CAUSE_ILLEGAL_INSTRUCTION);
> +    hedeleg |= (1U << CAUSE_MISALIGNED_LOAD);
> +    hedeleg |= (1U << CAUSE_LOAD_ACCESS);
> +    hedeleg |= (1U << CAUSE_MISALIGNED_STORE);
> +    hedeleg |= (1U << CAUSE_STORE_ACCESS);
> +    hedeleg |= (1U << CAUSE_BREAKPOINT);
> +    hedeleg |= (1U << CAUSE_USER_ECALL);
> +    hedeleg |= (1U << CAUSE_FETCH_PAGE_FAULT);
> +    hedeleg |= (1U << CAUSE_LOAD_PAGE_FAULT);
> +    hedeleg |= (1U << CAUSE_STORE_PAGE_FAULT);
> +    v->arch.hedeleg = hedeleg;

Wouldn't you better start from setting all of the non-reserved bits, to then
clear the few that you mean to not delegate? Then again I'm not quite sure
whether the set of CAUSE_* in the header file is actually complete: MCAUSE
also can hold the values 16, 18, and 19. (Otoh you have CAUSE_MACHINE_ECALL,
which I don't think can ever be observed outside of M-mode.)

Also, while it may seem to not matter much, sorting the above by their numeric
values would ease comparison against the full set.

> +    hstatus = HSTATUS_SPV | HSTATUS_SPVP;
> +    v->arch.hstatus = hstatus;

Why would these (or in fact any) bits need setting here? Isn't hstatus written
upon exit from guest context?

> +    hideleg = MIP_VSTIP |  MIP_VSEIP | MIP_VSSIP;
> +    v->arch.hideleg = hideleg;

Again I think having MIP_VSTIP in the middle (to establish numeric sorting)
would be slightly better.

Also there's a stray blank after the first |.

> +    /*
> +     * VS should access only the time counter directly.
> +     * Everything else should trap.
> +     */
> +    v->arch.hcounteren |= HCOUNTEREN_TM;

Why are this and ...

> +    if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_svpbmt) )
> +        v->arch.henvcfg |= ENVCFG_PBMTE;

... this using |= but the earlier ones simply = ? Unless there is a specific
reason, consistency is likely preferable.

> +    if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_smstateen) )
> +    {
> +        /*
> +         * If the hypervisor extension is implemented, the same three bitsare
> +         * defined also in hypervisor CSR hstateen0 but concern only the state
> +         * potentially accessible to a virtual machine executing in privilege
> +         * modes VS and VU:
> +         *      bit 60 CSRs siselect and sireg (really vsiselect and vsireg)
> +         *      bit 59 CSRs siph and sieh (RV32 only) and stopi (really vsiph,
> +         *             vsieh, and vstopi)
> +         *      bit 58 all state of IMSIC guest interrupt files, including CSR
> +         *             stopei (really vstopei)
> +         * If one of these bits is zero in hstateen0, and the same bit is one
> +         * in mstateen0, then an attempt to access the corresponding state from
> +         * VS or VU-mode raises a virtual instruction exception.
> +        */
> +        v->arch.hstateen0 = SMSTATEEN0_AIA | SMSTATEEN0_IMSIC | SMSTATEEN0_SVSLCT;

What is SVSLCT? Bit 60 is named CSRIND in the spec I'm looking at, and the
commentary above looks to confirm this.

Also, wouldn't you better keep internal state in line with what hardware
actually supports? CSRIND may be read-only-zero in the real register, in
which case having the bit set in the "cached" copy can be misleading.
(This may similarly apply to at least hedeleg and hideleg, btw.)

As to consistency: Further up you use local helper variables (for imo no real
reason), when here you don't. Instead this line ends up being too long.

Jan
Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
Posted by Oleksii Kurochko 3 weeks, 6 days ago
On 1/7/26 9:46 AM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> Implement function to initialize VCPU's CSR registers to delegate handling
>> of some traps to VS-mode ( guest ), enable vstimecmp for VS-mode, and
>> allow some AIA-related register (thier vs* copies ) for VS-mode.
> The henvcfg setting isn't covered here at all, unless I'm failing to make the
> respective association. Nor is the setting of SMSTATEEN0_HSENVCFG in hstateen0.
>
> Overall it feels like the description here is too terse anyway, as the bits
> set (or not) are a pretty crucial thing for running guests. Then again maybe
> this is just me, for not being a RISC-V person ...

I will add more details to commit message then.

>
>> --- a/xen/arch/riscv/domain.c
>> +++ b/xen/arch/riscv/domain.c
>> @@ -3,6 +3,67 @@
>>   #include <xen/mm.h>
>>   #include <xen/sched.h>
>>   
>> +#include <asm/cpufeature.h>
>> +#include <asm/csr.h>
>> +#include <asm/riscv_encoding.h>
>> +
>> +static void vcpu_csr_init(struct vcpu *v)
>> +{
>> +    unsigned long hedeleg, hideleg, hstatus;
>> +
>> +    hedeleg = 0;
>> +    hedeleg |= (1U << CAUSE_MISALIGNED_FETCH);
>> +    hedeleg |= (1U << CAUSE_FETCH_ACCESS);
>> +    hedeleg |= (1U << CAUSE_ILLEGAL_INSTRUCTION);
>> +    hedeleg |= (1U << CAUSE_MISALIGNED_LOAD);
>> +    hedeleg |= (1U << CAUSE_LOAD_ACCESS);
>> +    hedeleg |= (1U << CAUSE_MISALIGNED_STORE);
>> +    hedeleg |= (1U << CAUSE_STORE_ACCESS);
>> +    hedeleg |= (1U << CAUSE_BREAKPOINT);
>> +    hedeleg |= (1U << CAUSE_USER_ECALL);
>> +    hedeleg |= (1U << CAUSE_FETCH_PAGE_FAULT);
>> +    hedeleg |= (1U << CAUSE_LOAD_PAGE_FAULT);
>> +    hedeleg |= (1U << CAUSE_STORE_PAGE_FAULT);
>> +    v->arch.hedeleg = hedeleg;
> Wouldn't you better start from setting all of the non-reserved bits, to then
> clear the few that you mean to not delegate?

Maybe that would be better, but I don’t see much difference, especially if we
use the following define:
  #define HEDELEG_DEFAULT ( BIT(CAUSE_MISALIGNED_FETCH, U) | ... )
It would still be just one instruction to write the value to|hedeleg|.
(I think the compiler will likely produce the same optimization with the
current implementation.)

> Then again I'm not quite sure
> whether the set of CAUSE_* in the header file is actually complete: MCAUSE
> also can hold the values 16, 18, and 19.

Then 14 and 17 could be added as well. I see the sense in adding 18 and 19,
since they are defined as "software check" and "hardware error." However,
I don’t see much value in providing|CAUSE_*| for 14 and 16–17, as they are
just reserved and have no specific meaning.

I could add something like|CAUSE_RES_14|,|CAUSE_RES_16|,|CAUSE_RES_17|, but
since we aren’t actually handling them, I think it’s fine to update|CAUSE_* |only when there is a real use for them, like with 18 and 19.

>   (Otoh you have CAUSE_MACHINE_ECALL,
> which I don't think can ever be observed outside of M-mode.)

Good point, It seems like you are right and M-ecall can't be observed outside of
M-mode and even more it is marked as read only 0 so it is not expected to be
delegated to lower privilige mode, but then I don't know why it was added to
"Table 29 Bits of hedeleg that must be writable or must be read-only zero.".

>
> Also, while it may seem to not matter much, sorting the above by their numeric
> values would ease comparison against the full set.

I will move "hedeleg |= (1U << CAUSE_BREAKPOINT);" up; all others seems are sorted
properly.

>
>> +    hstatus = HSTATUS_SPV | HSTATUS_SPVP;
>> +    v->arch.hstatus = hstatus;
> Why would these (or in fact any) bits need setting here?

It could be moved to continue_new_vcpu() where now (in downstream) I have:
         csr_write(CSR_HSTATUS, vcpu_guest_cpu_user_regs(current)->hstatus);
         reset_stack_and_jump(return_to_new_vcpu);
But I put it here to have vCPU state (all or as much as possible) initialized
in one place.

> Isn't hstatus written
> upon exit from guest context?

Setting these bits manually is necessary for the initial entry into
a guest context.
While it is true that hardware updates hstatus during a trap from a guest,
software must set these bits to define the destination state for the
subsequent SRET instruction.

When a hypervisor prepares to run a guest for the first time, there has been no
previous "exit" from that guest to automatically populate the CSRs. Setting these
bits is essential for the following reasons:
- Defining the target Virtualization Mode (SPV): The SPV bit is used by the SRET
   instruction to determine the new virtualization mode. If the hypervisor is in
   HS-mode (V=0) and executes SRET, the hardware sets the new V to the current
   value of hstatus.SPV. Without manually setting HSTATUS_SPV, the SRET would
   return the hart to V=0 instead of entering the guest (V=1).
- Defining the target Privilege Level (SPVP): The SPVP bit tracks the nominal
   privilege level (S or U) of the guest. When V=1, this determines if the guest
   is in VS-mode or VU-mode.
- Controlling Hypervisor Load/Store Instructions: SPVP specifically controls
   the effective privilege of explicit memory accesses made by hypervisor
   virtual-machine load/store instructions (HLV, HLVX, and HSV).
   If the hypervisor needs to use these instructions to access guest memory
   as if it were the guest supervisor, SPVP must be set to 1.
   But maybe there is no too much sense in this instructions before guest is
   ran.

>
>> +    hideleg = MIP_VSTIP |  MIP_VSEIP | MIP_VSSIP;
>> +    v->arch.hideleg = hideleg;
> Again I think having MIP_VSTIP in the middle (to establish numeric sorting)
> would be slightly better.
>
> Also there's a stray blank after the first |.
>
>> +    /*
>> +     * VS should access only the time counter directly.
>> +     * Everything else should trap.
>> +     */
>> +    v->arch.hcounteren |= HCOUNTEREN_TM;
> Why are this and ...
>
>> +    if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_svpbmt) )
>> +        v->arch.henvcfg |= ENVCFG_PBMTE;
> ... this using |= but the earlier ones simply = ? Unless there is a specific
> reason, consistency is likely preferable.

This was overlooked during refactoring; it seems I simply used|=| instead of||=|.
The idea is that if it’s the first initialization,|=| should be used; otherwise,
for subsequent writes,||=| is used to avoid clearing previous values.
I will update this part to use the same pattern consistently throughout
this function.

>
>> +    if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_smstateen) )
>> +    {
>> +        /*
>> +         * If the hypervisor extension is implemented, the same three bitsare
>> +         * defined also in hypervisor CSR hstateen0 but concern only the state
>> +         * potentially accessible to a virtual machine executing in privilege
>> +         * modes VS and VU:
>> +         *      bit 60 CSRs siselect and sireg (really vsiselect and vsireg)
>> +         *      bit 59 CSRs siph and sieh (RV32 only) and stopi (really vsiph,
>> +         *             vsieh, and vstopi)
>> +         *      bit 58 all state of IMSIC guest interrupt files, including CSR
>> +         *             stopei (really vstopei)
>> +         * If one of these bits is zero in hstateen0, and the same bit is one
>> +         * in mstateen0, then an attempt to access the corresponding state from
>> +         * VS or VU-mode raises a virtual instruction exception.
>> +        */
>> +        v->arch.hstateen0 = SMSTATEEN0_AIA | SMSTATEEN0_IMSIC | SMSTATEEN0_SVSLCT;
> What is SVSLCT? Bit 60 is named CSRIND in the spec I'm looking at, and the
> commentary above looks to confirm this.

This is how OpenSBI called this bit from where riscv_encoding.h was taken.
SVSLCT stands for Supervisor Virtual Select, referring to the access control of the
siselect and vsiselect registers.

>
> Also, wouldn't you better keep internal state in line with what hardware
> actually supports? CSRIND may be read-only-zero in the real register, in
> which case having the bit set in the "cached" copy can be misleading.

According to the AIA spec:
   If extension Smstateen is implemented together with the Advanced Interrupt Architecture (AIA),
   three bits of state-enable register mstateen0 control access to AIA-added state from privilege modes
   less privileged than M-mode:
     bit 60 CSRs siselect, sireg, vsiselect, and vsireg
     bit 59 all other state added by the AIA and not controlled by bits 60 and 58
     bit 58 all IMSIC state, including CSRs stopei and vstopei
What I read as if Smstateen is supported then all the bits are supported by
hardware, and that is why it is enough to check if Smstateen is supported.

But I decided to check what KVM does in the similar case and it seems that I incorrectly read
the first line of the mentioned about AIA's spec and it is need another one if-condition:
	if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN)) {
		cfg->hstateen0 |= SMSTATEEN0_HSENVCFG;
		if (riscv_isa_extension_available(isa, SSAIA))
			cfg->hstateen0 |= SMSTATEEN0_AIA_IMSIC |
					  SMSTATEEN0_AIA |
					  SMSTATEEN0_AIA_ISEL;
		if (riscv_isa_extension_available(isa, SMSTATEEN))
			cfg->hstateen0 |= SMSTATEEN0_SSTATEEN0;
	}

> (This may similarly apply to at least hedeleg and hideleg, btw.)

Regarding the previous bits, I can understand that it would be an issue:
if SSAIA isn’t supported, then it is incorrect to update the corresponding
bits of|hstateen0|.

However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
All writable bits there don’t depend on hardware support. Am I missing something?

>
> As to consistency: Further up you use local helper variables (for imo no real
> reason), when here you don't. Instead this line ends up being too long.

I will update the code to have consistency.

Thanks.

~ Oleksii


Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
Posted by Jan Beulich 3 weeks, 6 days ago
On 12.01.2026 13:59, Oleksii Kurochko wrote:
> On 1/7/26 9:46 AM, Jan Beulich wrote:
>> Also, wouldn't you better keep internal state in line with what hardware
>> actually supports? CSRIND may be read-only-zero in the real register, in
>> which case having the bit set in the "cached" copy can be misleading.
> 
> [...]
> 
>> (This may similarly apply to at least hedeleg and hideleg, btw.)
> 
> Regarding the previous bits, I can understand that it would be an issue:
> if SSAIA isn’t supported, then it is incorrect to update the corresponding
> bits of|hstateen0|.
> 
> However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
> All writable bits there don’t depend on hardware support. Am I missing something?

My reading of the doc was that any of the bits can be r/o 0, with - yes -
no dependencies on particular extensions. In which case you'd need to do
the delegation in software. For which it might be helpful to know what
the two registers are actually set to in hardware (i.e. the cached values
wanting to match the real ones).

Jan

Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
Posted by Oleksii Kurochko 3 weeks, 6 days ago
On 1/12/26 3:28 PM, Jan Beulich wrote:
> On 12.01.2026 13:59, Oleksii Kurochko wrote:
>> On 1/7/26 9:46 AM, Jan Beulich wrote:
>>> Also, wouldn't you better keep internal state in line with what hardware
>>> actually supports? CSRIND may be read-only-zero in the real register, in
>>> which case having the bit set in the "cached" copy can be misleading.
>> [...]
>>
>>> (This may similarly apply to at least hedeleg and hideleg, btw.)
>> Regarding the previous bits, I can understand that it would be an issue:
>> if SSAIA isn’t supported, then it is incorrect to update the corresponding
>> bits of|hstateen0|.
>>
>> However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
>> All writable bits there don’t depend on hardware support. Am I missing something?
> My reading of the doc was that any of the bits can be r/o 0, with - yes -
> no dependencies on particular extensions.

Just to be sure that I get your idea correctly.

Based on the priv. spec:
   Each bit of hedeleg shall be either writable or read-only zero. Many bits of
   hedeleg are required specifically to be writable or zero, as enumerated in
   Table 29.

Now let’s take hedeleg.bit1, which is marked as writable according to Table 29.
Your point is that even though hedeleg.bit1 is defined as writable, it could still
be read-only zero, right?

In general, I agree with that. It is possible that M-mode software decides, for
some reason (for example, because the implementation does not support delegation
of bit1 to a lower mode), not to delegate medeleg.bit1 to HS-mode. In that case,
hedeleg.bit1 would always be read-only zero.

>   In which case you'd need to do
> the delegation in software. For which it might be helpful to know what
> the two registers are actually set to in hardware (i.e. the cached values
> wanting to match the real ones).

Does it make sense then to have the following
   	...
	v->arch.hedeleg = hedeleg;
   	vcpu->arch.hedeleg = csr_read(CSR_HEDELEG);
in arch_vcpu_create()?

Or I can just add the comment that it will be sync-ed with the corresponding
hardware CSR later as ,actually, there is some h{i,e}deleg synchronization
happening during context_switch() (this code is at the moment in downstream),
because restore_csr_regs() is executed and re-reads CSR_H{I,E}DELEG:
   static void restore_csr_regs(struct vcpu *vcpu)
   {
       csr_write(CSR_HEDELEG, vcpu->arch.hedeleg);
       csr_write(CSR_HIDELEG, vcpu->arch.hideleg);
       ...
As a result, vcpu->arch.h{I,E}deleg is kept in sync with the corresponding
hardware CSR.

~ Oleksii


Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
Posted by Jan Beulich 3 weeks, 6 days ago
On 12.01.2026 16:46, Oleksii Kurochko wrote:
> On 1/12/26 3:28 PM, Jan Beulich wrote:
>> On 12.01.2026 13:59, Oleksii Kurochko wrote:
>>> On 1/7/26 9:46 AM, Jan Beulich wrote:
>>>> Also, wouldn't you better keep internal state in line with what hardware
>>>> actually supports? CSRIND may be read-only-zero in the real register, in
>>>> which case having the bit set in the "cached" copy can be misleading.
>>> [...]
>>>
>>>> (This may similarly apply to at least hedeleg and hideleg, btw.)
>>> Regarding the previous bits, I can understand that it would be an issue:
>>> if SSAIA isn’t supported, then it is incorrect to update the corresponding
>>> bits of|hstateen0|.
>>>
>>> However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
>>> All writable bits there don’t depend on hardware support. Am I missing something?
>> My reading of the doc was that any of the bits can be r/o 0, with - yes -
>> no dependencies on particular extensions.
> 
> Just to be sure that I get your idea correctly.
> 
> Based on the priv. spec:
>    Each bit of hedeleg shall be either writable or read-only zero. Many bits of
>    hedeleg are required specifically to be writable or zero, as enumerated in
>    Table 29.
> 
> Now let’s take hedeleg.bit1, which is marked as writable according to Table 29.
> Your point is that even though hedeleg.bit1 is defined as writable, it could still
> be read-only zero, right?
> 
> In general, I agree with that. It is possible that M-mode software decides, for
> some reason (for example, because the implementation does not support delegation
> of bit1 to a lower mode), not to delegate medeleg.bit1 to HS-mode. In that case,
> hedeleg.bit1 would always be read-only zero.
> 
>>   In which case you'd need to do
>> the delegation in software. For which it might be helpful to know what
>> the two registers are actually set to in hardware (i.e. the cached values
>> wanting to match the real ones).
> 
> Does it make sense then to have the following
>    	...
> 	v->arch.hedeleg = hedeleg;
>    	vcpu->arch.hedeleg = csr_read(CSR_HEDELEG);
> in arch_vcpu_create()?

The above makes no sense to me, with or without s/vcpu/v/.

> Or I can just add the comment that it will be sync-ed with the corresponding
> hardware CSR later as ,actually, there is some h{i,e}deleg synchronization
> happening during context_switch() (this code is at the moment in downstream),
> because restore_csr_regs() is executed and re-reads CSR_H{I,E}DELEG:
>    static void restore_csr_regs(struct vcpu *vcpu)
>    {
>        csr_write(CSR_HEDELEG, vcpu->arch.hedeleg);
>        csr_write(CSR_HIDELEG, vcpu->arch.hideleg);
>        ...
> As a result, vcpu->arch.h{I,E}deleg is kept in sync with the corresponding
> hardware CSR.

No, the r/o bits will continue to be out-of-sync between the hw register and
the struct arch_vcpu field.

Jan

Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
Posted by Oleksii Kurochko 3 weeks, 6 days ago
On 1/12/26 4:54 PM, Jan Beulich wrote:
> On 12.01.2026 16:46, Oleksii Kurochko wrote:
>> On 1/12/26 3:28 PM, Jan Beulich wrote:
>>> On 12.01.2026 13:59, Oleksii Kurochko wrote:
>>>> On 1/7/26 9:46 AM, Jan Beulich wrote:
>>>>> Also, wouldn't you better keep internal state in line with what hardware
>>>>> actually supports? CSRIND may be read-only-zero in the real register, in
>>>>> which case having the bit set in the "cached" copy can be misleading.
>>>> [...]
>>>>
>>>>> (This may similarly apply to at least hedeleg and hideleg, btw.)
>>>> Regarding the previous bits, I can understand that it would be an issue:
>>>> if SSAIA isn’t supported, then it is incorrect to update the corresponding
>>>> bits of|hstateen0|.
>>>>
>>>> However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
>>>> All writable bits there don’t depend on hardware support. Am I missing something?
>>> My reading of the doc was that any of the bits can be r/o 0, with - yes -
>>> no dependencies on particular extensions.
>> Just to be sure that I get your idea correctly.
>>
>> Based on the priv. spec:
>>     Each bit of hedeleg shall be either writable or read-only zero. Many bits of
>>     hedeleg are required specifically to be writable or zero, as enumerated in
>>     Table 29.
>>
>> Now let’s take hedeleg.bit1, which is marked as writable according to Table 29.
>> Your point is that even though hedeleg.bit1 is defined as writable, it could still
>> be read-only zero, right?
>>
>> In general, I agree with that. It is possible that M-mode software decides, for
>> some reason (for example, because the implementation does not support delegation
>> of bit1 to a lower mode), not to delegate medeleg.bit1 to HS-mode. In that case,
>> hedeleg.bit1 would always be read-only zero.
>>
>>>    In which case you'd need to do
>>> the delegation in software. For which it might be helpful to know what
>>> the two registers are actually set to in hardware (i.e. the cached values
>>> wanting to match the real ones).
>> Does it make sense then to have the following
>>     	...
>> 	v->arch.hedeleg = hedeleg;
>>     	vcpu->arch.hedeleg = csr_read(CSR_HEDELEG);
>> in arch_vcpu_create()?
> The above makes no sense to me, with or without s/vcpu/v/.

Right...

It should be also csr_write() before csr_read():
  csr_write(CSR_HEDELEG, hedeleg);
  v->arch.hedeleg = csr_read(CSR_HEDELEG);

>
>> Or I can just add the comment that it will be sync-ed with the corresponding
>> hardware CSR later as ,actually, there is some h{i,e}deleg synchronization
>> happening during context_switch() (this code is at the moment in downstream),
>> because restore_csr_regs() is executed and re-reads CSR_H{I,E}DELEG:
>>     static void restore_csr_regs(struct vcpu *vcpu)
>>     {
>>         csr_write(CSR_HEDELEG, vcpu->arch.hedeleg);
>>         csr_write(CSR_HIDELEG, vcpu->arch.hideleg);
>>         ...
>> As a result, vcpu->arch.h{I,E}deleg is kept in sync with the corresponding
>> hardware CSR.
> No, the r/o bits will continue to be out-of-sync between the hw register and
> the struct arch_vcpu field.

Yes, it would be out-of-sync until|save_csr_regs()| is called, where
|csr_read(CSR_HEDELEG)| is executed. So the value remains out-of-sync until a
trap to the hypervisor occurs and a vCPU context switch happens, which triggers
|save_csr_regs()|.
So that’s not an option. The best choice is the one mentioned above.

~ Oleksii


Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
Posted by Jan Beulich 3 weeks, 6 days ago
On 12.01.2026 17:39, Oleksii Kurochko wrote:
> 
> On 1/12/26 4:54 PM, Jan Beulich wrote:
>> On 12.01.2026 16:46, Oleksii Kurochko wrote:
>>> On 1/12/26 3:28 PM, Jan Beulich wrote:
>>>> On 12.01.2026 13:59, Oleksii Kurochko wrote:
>>>>> On 1/7/26 9:46 AM, Jan Beulich wrote:
>>>>>> Also, wouldn't you better keep internal state in line with what hardware
>>>>>> actually supports? CSRIND may be read-only-zero in the real register, in
>>>>>> which case having the bit set in the "cached" copy can be misleading.
>>>>> [...]
>>>>>
>>>>>> (This may similarly apply to at least hedeleg and hideleg, btw.)
>>>>> Regarding the previous bits, I can understand that it would be an issue:
>>>>> if SSAIA isn’t supported, then it is incorrect to update the corresponding
>>>>> bits of|hstateen0|.
>>>>>
>>>>> However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
>>>>> All writable bits there don’t depend on hardware support. Am I missing something?
>>>> My reading of the doc was that any of the bits can be r/o 0, with - yes -
>>>> no dependencies on particular extensions.
>>> Just to be sure that I get your idea correctly.
>>>
>>> Based on the priv. spec:
>>>     Each bit of hedeleg shall be either writable or read-only zero. Many bits of
>>>     hedeleg are required specifically to be writable or zero, as enumerated in
>>>     Table 29.
>>>
>>> Now let’s take hedeleg.bit1, which is marked as writable according to Table 29.
>>> Your point is that even though hedeleg.bit1 is defined as writable, it could still
>>> be read-only zero, right?
>>>
>>> In general, I agree with that. It is possible that M-mode software decides, for
>>> some reason (for example, because the implementation does not support delegation
>>> of bit1 to a lower mode), not to delegate medeleg.bit1 to HS-mode. In that case,
>>> hedeleg.bit1 would always be read-only zero.
>>>
>>>>    In which case you'd need to do
>>>> the delegation in software. For which it might be helpful to know what
>>>> the two registers are actually set to in hardware (i.e. the cached values
>>>> wanting to match the real ones).
>>> Does it make sense then to have the following
>>>     	...
>>> 	v->arch.hedeleg = hedeleg;
>>>     	vcpu->arch.hedeleg = csr_read(CSR_HEDELEG);
>>> in arch_vcpu_create()?
>> The above makes no sense to me, with or without s/vcpu/v/.
> 
> Right...
> 
> It should be also csr_write() before csr_read():
>   csr_write(CSR_HEDELEG, hedeleg);
>   v->arch.hedeleg = csr_read(CSR_HEDELEG);

Ah yes. Alternatively you could obtain a mask of modifiable bits once, and
then simply apply that here in place of the CSR read/write.

Jan