[PATCH] x86/AMD: expose SYSCFG, TOM, and TOM2 to Dom0

Jan Beulich posted 1 patch 2 years, 10 months ago
Test gitlab-ci passed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/c5764274-1257-809e-a2a7-d87b9d0fe675@suse.com
[PATCH] x86/AMD: expose SYSCFG, TOM, and TOM2 to Dom0
Posted by Jan Beulich 2 years, 10 months ago
Sufficiently old Linux (3.12-ish) accesses these MSRs in an unguarded
manner. Furthermore these MSRs, at least on Fam11 and older CPUs, are
also consulted by modern Linux, and their (bogus) built-in zapping of
#GP faults from MSR accesses leads to it effectively reading zero
instead of the intended values, which are relevant for PCI BAR placement
(which ought to all live in MMIO-type space, not in DRAM-type one).

For SYSCFG, only certain bits get exposed. In fact, whether to expose
MtrrVarDramEn is debatable: It controls use of not just TOM, but also
the IORRs. Introduce (consistently named) constants for the bits we're
interested in and use them in pre-existing code as well.

As a welcome side effect, verbosity on/of debug builds gets (perhaps
significantly) reduced.

Note that at least as far as those MSR accesses by Linux are concerned,
there's no similar issue for DomU-s, as the accesses sit behind PCI
device matching logic. The checked for devices would never be exposed to
DomU-s in the first place. Nevertheless I think that at least for HVM we
should return sensible values, not 0 (as svm_msr_read_intercept() does
right now). The intended values may, however, need to be determined by
hvmloader, and then get made known to Xen.

Fixes: 322ec7c89f66 ("x86/pv: disallow access to unknown MSRs")
Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -468,14 +468,14 @@ static void check_syscfg_dram_mod_en(voi
 		return;
 
 	rdmsrl(MSR_K8_SYSCFG, syscfg);
-	if (!(syscfg & K8_MTRRFIXRANGE_DRAM_MODIFY))
+	if (!(syscfg & SYSCFG_MTRR_FIX_DRAM_MOD_EN))
 		return;
 
 	if (!test_and_set_bool(printed))
 		printk(KERN_ERR "MTRR: SYSCFG[MtrrFixDramModEn] not "
 			"cleared by BIOS, clearing this bit\n");
 
-	syscfg &= ~K8_MTRRFIXRANGE_DRAM_MODIFY;
+	syscfg &= ~SYSCFG_MTRR_FIX_DRAM_MOD_EN;
 	wrmsrl(MSR_K8_SYSCFG, syscfg);
 }
 
--- a/xen/arch/x86/cpu/mtrr/generic.c
+++ b/xen/arch/x86/cpu/mtrr/generic.c
@@ -224,7 +224,7 @@ static void __init print_mtrr_state(cons
 		uint64_t syscfg, tom2;
 
 		rdmsrl(MSR_K8_SYSCFG, syscfg);
-		if (syscfg & (1 << 21)) {
+		if (syscfg & SYSCFG_MTRR_TOM2_EN) {
 			rdmsrl(MSR_K8_TOP_MEM2, tom2);
 			printk("%sTOM2: %012"PRIx64"%s\n", level, tom2,
 			       syscfg & (1 << 22) ? " (WB)" : "");
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -339,6 +339,19 @@ int guest_rdmsr(struct vcpu *v, uint32_t
         *val = msrs->tsc_aux;
         break;
 
+    case MSR_K8_SYSCFG:
+    case MSR_K8_TOP_MEM1:
+    case MSR_K8_TOP_MEM2:
+        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
+            goto gp_fault;
+        if ( !is_hardware_domain(d) )
+            return X86EMUL_UNHANDLEABLE;
+        rdmsrl(msr, *val);
+        if ( msr == MSR_K8_SYSCFG )
+            *val &= (SYSCFG_TOM2_FORCE_WB | SYSCFG_MTRR_TOM2_EN |
+                     SYSCFG_MTRR_VAR_DRAM_EN | SYSCFG_MTRR_FIX_DRAM_EN);
+        break;
+
     case MSR_K8_HWCR:
         if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
             goto gp_fault;
--- a/xen/arch/x86/x86_64/mmconf-fam10h.c
+++ b/xen/arch/x86/x86_64/mmconf-fam10h.c
@@ -69,7 +69,7 @@ static void __init get_fam10h_pci_mmconf
 	rdmsrl(address, val);
 
 	/* TOP_MEM2 is not enabled? */
-	if (!(val & (1<<21))) {
+	if (!(val & SYSCFG_MTRR_TOM2_EN)) {
 		tom2 = 1ULL << 32;
 	} else {
 		/* TOP_MEM2 */
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -116,6 +116,13 @@
 #define  PASID_PASID_MASK                   0x000fffff
 #define  PASID_VALID                        (_AC(1, ULL) << 31)
 
+#define MSR_K8_SYSCFG                       0xc0010010
+#define  SYSCFG_MTRR_FIX_DRAM_EN            (_AC(1, ULL) << 18)
+#define  SYSCFG_MTRR_FIX_DRAM_MOD_EN        (_AC(1, ULL) << 19)
+#define  SYSCFG_MTRR_VAR_DRAM_EN            (_AC(1, ULL) << 20)
+#define  SYSCFG_MTRR_TOM2_EN                (_AC(1, ULL) << 21)
+#define  SYSCFG_TOM2_FORCE_WB               (_AC(1, ULL) << 22)
+
 #define MSR_K8_VM_CR                        0xc0010114
 #define  VM_CR_INIT_REDIRECTION             (_AC(1, ULL) <<  1)
 #define  VM_CR_SVM_DISABLE                  (_AC(1, ULL) <<  4)
@@ -279,10 +286,7 @@
 #define MSR_K8_TOP_MEM1			0xc001001a
 #define MSR_K7_CLK_CTL			0xc001001b
 #define MSR_K8_TOP_MEM2			0xc001001d
-#define MSR_K8_SYSCFG			0xc0010010
 
-#define K8_MTRRFIXRANGE_DRAM_ENABLE	0x00040000 /* MtrrFixDramEn bit    */
-#define K8_MTRRFIXRANGE_DRAM_MODIFY	0x00080000 /* MtrrFixDramModEn bit */
 #define K8_MTRR_RDMEM_WRMEM_MASK	0x18181818 /* Mask: RdMem|WrMem    */
 
 #define MSR_K7_HWCR			0xc0010015

Re: [PATCH] x86/AMD: expose SYSCFG, TOM, and TOM2 to Dom0
Posted by Roger Pau Monné 2 years, 10 months ago
On Wed, May 26, 2021 at 02:59:00PM +0200, Jan Beulich wrote:
> Sufficiently old Linux (3.12-ish) accesses these MSRs in an unguarded
> manner. Furthermore these MSRs, at least on Fam11 and older CPUs, are
> also consulted by modern Linux, and their (bogus) built-in zapping of
> #GP faults from MSR accesses leads to it effectively reading zero
> instead of the intended values, which are relevant for PCI BAR placement
> (which ought to all live in MMIO-type space, not in DRAM-type one).
> 
> For SYSCFG, only certain bits get exposed. In fact, whether to expose
> MtrrVarDramEn is debatable: It controls use of not just TOM, but also
> the IORRs. Introduce (consistently named) constants for the bits we're
> interested in and use them in pre-existing code as well.

I think we should also allow access to the IORRs MSRs for coherency
(c001001{6,9}) for the hardware domain.

> As a welcome side effect, verbosity on/of debug builds gets (perhaps
> significantly) reduced.
> 
> Note that at least as far as those MSR accesses by Linux are concerned,
> there's no similar issue for DomU-s, as the accesses sit behind PCI
> device matching logic. The checked for devices would never be exposed to
> DomU-s in the first place. Nevertheless I think that at least for HVM we
> should return sensible values, not 0 (as svm_msr_read_intercept() does
> right now). The intended values may, however, need to be determined by
> hvmloader, and then get made known to Xen.

Could we maybe come up with a fixed memory layout that hvmloader had
to respect?

Ie: DRAM from 0 to 3G, MMIO from 3G to 4G, and then the remaining
DRAM from 4G in a contiguous single block?

hvmloader would have to place BARs that don't fit in the 3G-4G hole at
the end of DRAM (ie: after TOM2).

> 
> Fixes: 322ec7c89f66 ("x86/pv: disallow access to unknown MSRs")
> Reported-by: Olaf Hering <olaf@aepfle.de>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -468,14 +468,14 @@ static void check_syscfg_dram_mod_en(voi
>  		return;
>  
>  	rdmsrl(MSR_K8_SYSCFG, syscfg);
> -	if (!(syscfg & K8_MTRRFIXRANGE_DRAM_MODIFY))
> +	if (!(syscfg & SYSCFG_MTRR_FIX_DRAM_MOD_EN))
>  		return;
>  
>  	if (!test_and_set_bool(printed))
>  		printk(KERN_ERR "MTRR: SYSCFG[MtrrFixDramModEn] not "
>  			"cleared by BIOS, clearing this bit\n");
>  
> -	syscfg &= ~K8_MTRRFIXRANGE_DRAM_MODIFY;
> +	syscfg &= ~SYSCFG_MTRR_FIX_DRAM_MOD_EN;
>  	wrmsrl(MSR_K8_SYSCFG, syscfg);
>  }
>  
> --- a/xen/arch/x86/cpu/mtrr/generic.c
> +++ b/xen/arch/x86/cpu/mtrr/generic.c
> @@ -224,7 +224,7 @@ static void __init print_mtrr_state(cons
>  		uint64_t syscfg, tom2;
>  
>  		rdmsrl(MSR_K8_SYSCFG, syscfg);
> -		if (syscfg & (1 << 21)) {
> +		if (syscfg & SYSCFG_MTRR_TOM2_EN) {
>  			rdmsrl(MSR_K8_TOP_MEM2, tom2);
>  			printk("%sTOM2: %012"PRIx64"%s\n", level, tom2,
>  			       syscfg & (1 << 22) ? " (WB)" : "");
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -339,6 +339,19 @@ int guest_rdmsr(struct vcpu *v, uint32_t
>          *val = msrs->tsc_aux;
>          break;
>  
> +    case MSR_K8_SYSCFG:
> +    case MSR_K8_TOP_MEM1:
> +    case MSR_K8_TOP_MEM2:
> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> +            goto gp_fault;
> +        if ( !is_hardware_domain(d) )
> +            return X86EMUL_UNHANDLEABLE;

It might be clearer to also handle the !is_hardware_domain case here,
instead of deferring to svm_msr_read_intercept:

if ( is_hardware_domain(d) )
    rdmsrl(msr, *val);
else
    *val = 0;

...

> +        rdmsrl(msr, *val);
> +        if ( msr == MSR_K8_SYSCFG )
> +            *val &= (SYSCFG_TOM2_FORCE_WB | SYSCFG_MTRR_TOM2_EN |
> +                     SYSCFG_MTRR_VAR_DRAM_EN | SYSCFG_MTRR_FIX_DRAM_EN);
> +        break;
> +
>      case MSR_K8_HWCR:
>          if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>              goto gp_fault;
> --- a/xen/arch/x86/x86_64/mmconf-fam10h.c
> +++ b/xen/arch/x86/x86_64/mmconf-fam10h.c
> @@ -69,7 +69,7 @@ static void __init get_fam10h_pci_mmconf
>  	rdmsrl(address, val);
>  
>  	/* TOP_MEM2 is not enabled? */
> -	if (!(val & (1<<21))) {
> +	if (!(val & SYSCFG_MTRR_TOM2_EN)) {
>  		tom2 = 1ULL << 32;
>  	} else {
>  		/* TOP_MEM2 */
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -116,6 +116,13 @@
>  #define  PASID_PASID_MASK                   0x000fffff
>  #define  PASID_VALID                        (_AC(1, ULL) << 31)
>  
> +#define MSR_K8_SYSCFG                       0xc0010010
> +#define  SYSCFG_MTRR_FIX_DRAM_EN            (_AC(1, ULL) << 18)
> +#define  SYSCFG_MTRR_FIX_DRAM_MOD_EN        (_AC(1, ULL) << 19)
> +#define  SYSCFG_MTRR_VAR_DRAM_EN            (_AC(1, ULL) << 20)
> +#define  SYSCFG_MTRR_TOM2_EN                (_AC(1, ULL) << 21)
> +#define  SYSCFG_TOM2_FORCE_WB               (_AC(1, ULL) << 22)
> +
>  #define MSR_K8_VM_CR                        0xc0010114
>  #define  VM_CR_INIT_REDIRECTION             (_AC(1, ULL) <<  1)
>  #define  VM_CR_SVM_DISABLE                  (_AC(1, ULL) <<  4)
> @@ -279,10 +286,7 @@
>  #define MSR_K8_TOP_MEM1			0xc001001a
>  #define MSR_K7_CLK_CTL			0xc001001b
>  #define MSR_K8_TOP_MEM2			0xc001001d
> -#define MSR_K8_SYSCFG			0xc0010010
>  
> -#define K8_MTRRFIXRANGE_DRAM_ENABLE	0x00040000 /* MtrrFixDramEn bit    */
> -#define K8_MTRRFIXRANGE_DRAM_MODIFY	0x00080000 /* MtrrFixDramModEn bit */
>  #define K8_MTRR_RDMEM_WRMEM_MASK	0x18181818 /* Mask: RdMem|WrMem    */

That last one seem to be unused, I wonder if you could also drop it as
part of this cleanup?

Thanks, Roger.

Re: [PATCH] x86/AMD: expose SYSCFG, TOM, and TOM2 to Dom0
Posted by Jan Beulich 2 years, 10 months ago
On 27.05.2021 10:33, Roger Pau Monné wrote:
> On Wed, May 26, 2021 at 02:59:00PM +0200, Jan Beulich wrote:
>> Sufficiently old Linux (3.12-ish) accesses these MSRs in an unguarded
>> manner. Furthermore these MSRs, at least on Fam11 and older CPUs, are
>> also consulted by modern Linux, and their (bogus) built-in zapping of
>> #GP faults from MSR accesses leads to it effectively reading zero
>> instead of the intended values, which are relevant for PCI BAR placement
>> (which ought to all live in MMIO-type space, not in DRAM-type one).
>>
>> For SYSCFG, only certain bits get exposed. In fact, whether to expose
>> MtrrVarDramEn is debatable: It controls use of not just TOM, but also
>> the IORRs. Introduce (consistently named) constants for the bits we're
>> interested in and use them in pre-existing code as well.
> 
> I think we should also allow access to the IORRs MSRs for coherency
> (c001001{6,9}) for the hardware domain.

Hmm, originally I was under the impression that these could conceivably
be written by OSes, and hence would want dealing with separately. But
upon re-reading I see that they are supposed to be set by the BIOS alone.
So yes, let me add them for read access, taking care of the limitation
that I had to spell out.

This raises the question then though whether to also include SMMAddr
and SMMMask in the set - the former does get accessed by Linux as well,
and was one of the reasons for needing 6eef0a99262c ("x86/PV:
conditionally avoid raising #GP for early guest MSR reads").

Especially for SMMAddr, and maybe also for IORR_BASE, returning zero
for DomU-s might be acceptable. The respective masks, however, can
imo not sensibly be returned as zero. Hence even there I'd leave DomU
side handling (see below) for a later time.

>> As a welcome side effect, verbosity on/of debug builds gets (perhaps
>> significantly) reduced.
>>
>> Note that at least as far as those MSR accesses by Linux are concerned,
>> there's no similar issue for DomU-s, as the accesses sit behind PCI
>> device matching logic. The checked for devices would never be exposed to
>> DomU-s in the first place. Nevertheless I think that at least for HVM we
>> should return sensible values, not 0 (as svm_msr_read_intercept() does
>> right now). The intended values may, however, need to be determined by
>> hvmloader, and then get made known to Xen.
> 
> Could we maybe come up with a fixed memory layout that hvmloader had
> to respect?
> 
> Ie: DRAM from 0 to 3G, MMIO from 3G to 4G, and then the remaining
> DRAM from 4G in a contiguous single block?
> 
> hvmloader would have to place BARs that don't fit in the 3G-4G hole at
> the end of DRAM (ie: after TOM2).

Such a fixed scheme may be too limiting, I'm afraid.

>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -339,6 +339,19 @@ int guest_rdmsr(struct vcpu *v, uint32_t
>>          *val = msrs->tsc_aux;
>>          break;
>>  
>> +    case MSR_K8_SYSCFG:
>> +    case MSR_K8_TOP_MEM1:
>> +    case MSR_K8_TOP_MEM2:
>> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>> +            goto gp_fault;
>> +        if ( !is_hardware_domain(d) )
>> +            return X86EMUL_UNHANDLEABLE;
> 
> It might be clearer to also handle the !is_hardware_domain case here,
> instead of deferring to svm_msr_read_intercept:
> 
> if ( is_hardware_domain(d) )
>     rdmsrl(msr, *val);
> else
>     *val = 0;

As said in the post-commit-message remark, I don't think returning 0
here is appropriate. I'd be willing to move DomU handling here, but
only once it's sane.

>> @@ -279,10 +286,7 @@
>>  #define MSR_K8_TOP_MEM1			0xc001001a
>>  #define MSR_K7_CLK_CTL			0xc001001b
>>  #define MSR_K8_TOP_MEM2			0xc001001d
>> -#define MSR_K8_SYSCFG			0xc0010010
>>  
>> -#define K8_MTRRFIXRANGE_DRAM_ENABLE	0x00040000 /* MtrrFixDramEn bit    */
>> -#define K8_MTRRFIXRANGE_DRAM_MODIFY	0x00080000 /* MtrrFixDramModEn bit */
>>  #define K8_MTRR_RDMEM_WRMEM_MASK	0x18181818 /* Mask: RdMem|WrMem    */
> 
> That last one seem to be unused, I wonder if you could also drop it as
> part of this cleanup?

It's for an unrelated set of MSRs, so I thought I'd leave it alone
here. Otoh the value is at best bogus anyway, as it assumes both
halves of fixed-size MTRRs get accessed separately. So I guess
since you ask for it, I'll drop it at this occasion.

Jan

Re: [PATCH] x86/AMD: expose SYSCFG, TOM, and TOM2 to Dom0
Posted by Roger Pau Monné 2 years, 10 months ago
On Thu, May 27, 2021 at 12:41:51PM +0200, Jan Beulich wrote:
> On 27.05.2021 10:33, Roger Pau Monné wrote:
> > On Wed, May 26, 2021 at 02:59:00PM +0200, Jan Beulich wrote:
> >> Sufficiently old Linux (3.12-ish) accesses these MSRs in an unguarded
> >> manner. Furthermore these MSRs, at least on Fam11 and older CPUs, are
> >> also consulted by modern Linux, and their (bogus) built-in zapping of
> >> #GP faults from MSR accesses leads to it effectively reading zero
> >> instead of the intended values, which are relevant for PCI BAR placement
> >> (which ought to all live in MMIO-type space, not in DRAM-type one).
> >>
> >> For SYSCFG, only certain bits get exposed. In fact, whether to expose
> >> MtrrVarDramEn is debatable: It controls use of not just TOM, but also
> >> the IORRs. Introduce (consistently named) constants for the bits we're
> >> interested in and use them in pre-existing code as well.
> > 
> > I think we should also allow access to the IORRs MSRs for coherency
> > (c001001{6,9}) for the hardware domain.
> 
> Hmm, originally I was under the impression that these could conceivably
> be written by OSes, and hence would want dealing with separately. But
> upon re-reading I see that they are supposed to be set by the BIOS alone.
> So yes, let me add them for read access, taking care of the limitation
> that I had to spell out.
> 
> This raises the question then though whether to also include SMMAddr
> and SMMMask in the set - the former does get accessed by Linux as well,
> and was one of the reasons for needing 6eef0a99262c ("x86/PV:
> conditionally avoid raising #GP for early guest MSR reads").

That seems fine, we might also want SMM_BASE?

> 
> Especially for SMMAddr, and maybe also for IORR_BASE, returning zero
> for DomU-s might be acceptable. The respective masks, however, can
> imo not sensibly be returned as zero. Hence even there I'd leave DomU
> side handling (see below) for a later time.

Sure. I think for consistency we should however enable reading the
hardware IORR MSRs for the hardware domain, or else returning
MtrrVarDramEn set is likely to cause trouble as the guest could assume
IORRs to be unconditionally present.

> >> As a welcome side effect, verbosity on/of debug builds gets (perhaps
> >> significantly) reduced.
> >>
> >> Note that at least as far as those MSR accesses by Linux are concerned,
> >> there's no similar issue for DomU-s, as the accesses sit behind PCI
> >> device matching logic. The checked for devices would never be exposed to
> >> DomU-s in the first place. Nevertheless I think that at least for HVM we
> >> should return sensible values, not 0 (as svm_msr_read_intercept() does
> >> right now). The intended values may, however, need to be determined by
> >> hvmloader, and then get made known to Xen.
> > 
> > Could we maybe come up with a fixed memory layout that hvmloader had
> > to respect?
> > 
> > Ie: DRAM from 0 to 3G, MMIO from 3G to 4G, and then the remaining
> > DRAM from 4G in a contiguous single block?
> > 
> > hvmloader would have to place BARs that don't fit in the 3G-4G hole at
> > the end of DRAM (ie: after TOM2).
> 
> Such a fixed scheme may be too limiting, I'm afraid.

Maybe, I guess a possible broken scenario would be for a guest to be
setup with a set of 32bit BARs that cannot possibly fit in the 3-4G
hole, but I think that's unlikely.

> 
> >> --- a/xen/arch/x86/msr.c
> >> +++ b/xen/arch/x86/msr.c
> >> @@ -339,6 +339,19 @@ int guest_rdmsr(struct vcpu *v, uint32_t
> >>          *val = msrs->tsc_aux;
> >>          break;
> >>  
> >> +    case MSR_K8_SYSCFG:
> >> +    case MSR_K8_TOP_MEM1:
> >> +    case MSR_K8_TOP_MEM2:
> >> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> >> +            goto gp_fault;
> >> +        if ( !is_hardware_domain(d) )
> >> +            return X86EMUL_UNHANDLEABLE;
> > 
> > It might be clearer to also handle the !is_hardware_domain case here,
> > instead of deferring to svm_msr_read_intercept:
> > 
> > if ( is_hardware_domain(d) )
> >     rdmsrl(msr, *val);
> > else
> >     *val = 0;
> 
> As said in the post-commit-message remark, I don't think returning 0
> here is appropriate. I'd be willing to move DomU handling here, but
> only once it's sane.

Hm, OK. IMO it's fine to move the current domU behavior here without
fixing it in the same patch, and do the fixing afterwards.

At least we would have the handling all done in a single place, and
you are certainly not making the domU case any worse than what
currently is.

Thanks, Roger.

Re: [PATCH] x86/AMD: expose SYSCFG, TOM, and TOM2 to Dom0
Posted by Jan Beulich 2 years, 10 months ago
On 27.05.2021 15:23, Roger Pau Monné wrote:
> On Thu, May 27, 2021 at 12:41:51PM +0200, Jan Beulich wrote:
>> On 27.05.2021 10:33, Roger Pau Monné wrote:
>>> On Wed, May 26, 2021 at 02:59:00PM +0200, Jan Beulich wrote:
>>>> Sufficiently old Linux (3.12-ish) accesses these MSRs in an unguarded
>>>> manner. Furthermore these MSRs, at least on Fam11 and older CPUs, are
>>>> also consulted by modern Linux, and their (bogus) built-in zapping of
>>>> #GP faults from MSR accesses leads to it effectively reading zero
>>>> instead of the intended values, which are relevant for PCI BAR placement
>>>> (which ought to all live in MMIO-type space, not in DRAM-type one).
>>>>
>>>> For SYSCFG, only certain bits get exposed. In fact, whether to expose
>>>> MtrrVarDramEn is debatable: It controls use of not just TOM, but also
>>>> the IORRs. Introduce (consistently named) constants for the bits we're
>>>> interested in and use them in pre-existing code as well.
>>>
>>> I think we should also allow access to the IORRs MSRs for coherency
>>> (c001001{6,9}) for the hardware domain.
>>
>> Hmm, originally I was under the impression that these could conceivably
>> be written by OSes, and hence would want dealing with separately. But
>> upon re-reading I see that they are supposed to be set by the BIOS alone.
>> So yes, let me add them for read access, taking care of the limitation
>> that I had to spell out.
>>
>> This raises the question then though whether to also include SMMAddr
>> and SMMMask in the set - the former does get accessed by Linux as well,
>> and was one of the reasons for needing 6eef0a99262c ("x86/PV:
>> conditionally avoid raising #GP for early guest MSR reads").
> 
> That seems fine, we might also want SMM_BASE?

That's pretty unrelated to the topic here - there's no memory type
or DRAM vs MMIO decision associated with that register. I'm also
having trouble seeing what an OS would want to use SMM's CS value
for.

>> Especially for SMMAddr, and maybe also for IORR_BASE, returning zero
>> for DomU-s might be acceptable. The respective masks, however, can
>> imo not sensibly be returned as zero. Hence even there I'd leave DomU
>> side handling (see below) for a later time.
> 
> Sure. I think for consistency we should however enable reading the
> hardware IORR MSRs for the hardware domain, or else returning
> MtrrVarDramEn set is likely to cause trouble as the guest could assume
> IORRs to be unconditionally present.

Well, yes, I've added IORRs already, as I was under the impression
that we were agreeing already that we want to expose them to Dom0.

>>>> As a welcome side effect, verbosity on/of debug builds gets (perhaps
>>>> significantly) reduced.
>>>>
>>>> Note that at least as far as those MSR accesses by Linux are concerned,
>>>> there's no similar issue for DomU-s, as the accesses sit behind PCI
>>>> device matching logic. The checked for devices would never be exposed to
>>>> DomU-s in the first place. Nevertheless I think that at least for HVM we
>>>> should return sensible values, not 0 (as svm_msr_read_intercept() does
>>>> right now). The intended values may, however, need to be determined by
>>>> hvmloader, and then get made known to Xen.
>>>
>>> Could we maybe come up with a fixed memory layout that hvmloader had
>>> to respect?
>>>
>>> Ie: DRAM from 0 to 3G, MMIO from 3G to 4G, and then the remaining
>>> DRAM from 4G in a contiguous single block?
>>>
>>> hvmloader would have to place BARs that don't fit in the 3G-4G hole at
>>> the end of DRAM (ie: after TOM2).
>>
>> Such a fixed scheme may be too limiting, I'm afraid.
> 
> Maybe, I guess a possible broken scenario would be for a guest to be
> setup with a set of 32bit BARs that cannot possibly fit in the 3-4G
> hole, but I think that's unlikely.

Can't one (almost) arbitrarily size the amount of VRAM of the emulated
VGA? I wouldn't be surprised if this can't be placed above 4Gb.

Jan