target/i386/tcg/sysemu/excp_helper.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
The address translation logic in get_physical_address() will currently
truncate physical addresses to 32 bits unless long mode is enabled.
This is incorrect when using physical address extensions (PAE) outside
of long mode, with the result that a 32-bit operating system using PAE
to access memory above 4G will experience undefined behaviour.
The truncation code was originally introduced in commit 33dfdb5 ("x86:
only allow real mode to access 32bit without LMA"), where it applied
only to translations performed while paging is disabled (and so cannot
affect guests using PAE).
Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX")
rearranged the code such that the truncation also applied to the use
of MMU_PHYS_IDX and MMU_NESTED_IDX. Commit 4a1e9d4 ("target/i386: Use
atomic operations for pte updates") brought this truncation into scope
for page table entry accesses, and is the first commit for which a
Windows 10 32-bit guest will reliably fail to boot if memory above 4G
is present.
Fix by testing for PAE being enabled via the relevant bit in CR4,
instead of testing for long mode being enabled. PAE must be enabled
as a prerequisite of long mode, and so this is a generalisation of the
current test.
Remove the #ifdef TARGET_X86_64 check since PAE exists in both 32-bit
and 64-bit processors, and both should exhibit the same truncation
behaviour when PAE is disabled.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040
Signed-off-by: Michael Brown <mcb30@ipxe.org>
---
target/i386/tcg/sysemu/excp_helper.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c
index 5b86f439ad..3d0d0d78d7 100644
--- a/target/i386/tcg/sysemu/excp_helper.c
+++ b/target/i386/tcg/sysemu/excp_helper.c
@@ -582,12 +582,10 @@ static bool get_physical_address(CPUX86State *env, vaddr addr,
/* Translation disabled. */
out->paddr = addr & x86_get_a20_mask(env);
-#ifdef TARGET_X86_64
- if (!(env->hflags & HF_LMA_MASK)) {
- /* Without long mode we can only address 32bits in real mode */
+ if (!(env->cr[4] & CR4_PAE_MASK)) {
+ /* Without PAE we can address only 32 bits */
out->paddr = (uint32_t)out->paddr;
}
-#endif
out->prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
out->page_size = TARGET_PAGE_SIZE;
return true;
--
2.43.0
On 12/18/23 23:56, Michael Brown wrote: > The address translation logic in get_physical_address() will currently > truncate physical addresses to 32 bits unless long mode is enabled. > This is incorrect when using physical address extensions (PAE) outside > of long mode, with the result that a 32-bit operating system using PAE > to access memory above 4G will experience undefined behaviour. > > The truncation code was originally introduced in commit 33dfdb5 ("x86: > only allow real mode to access 32bit without LMA"), where it applied > only to translations performed while paging is disabled (and so cannot > affect guests using PAE). > > Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX") > rearranged the code such that the truncation also applied to the use > of MMU_PHYS_IDX and MMU_NESTED_IDX. Commit 4a1e9d4 ("target/i386: Use > atomic operations for pte updates") brought this truncation into scope > for page table entry accesses, and is the first commit for which a > Windows 10 32-bit guest will reliably fail to boot if memory above 4G > is present. > > Fix by testing for PAE being enabled via the relevant bit in CR4, > instead of testing for long mode being enabled. PAE must be enabled > as a prerequisite of long mode, and so this is a generalisation of the > current test. > > Remove the #ifdef TARGET_X86_64 check since PAE exists in both 32-bit > and 64-bit processors, and both should exhibit the same truncation > behaviour when PAE is disabled. > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040 > Signed-off-by: Michael Brown <mcb30@ipxe.org> > --- > target/i386/tcg/sysemu/excp_helper.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c > index 5b86f439ad..3d0d0d78d7 100644 > --- a/target/i386/tcg/sysemu/excp_helper.c > +++ b/target/i386/tcg/sysemu/excp_helper.c > @@ -582,12 +582,10 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, > > /* Translation disabled. */ > out->paddr = addr & x86_get_a20_mask(env); > -#ifdef TARGET_X86_64 > - if (!(env->hflags & HF_LMA_MASK)) { > - /* Without long mode we can only address 32bits in real mode */ > + if (!(env->cr[4] & CR4_PAE_MASK)) { > + /* Without PAE we can address only 32 bits */ > out->paddr = (uint32_t)out->paddr; > } > -#endif This is not the correct refactoring. I agree that what we're currently doing is wrong, esp for MMU_PHYS_IDX, but for the default case, if CR0.PG == 0, then CR4.PAE is ignored (vol 3, section 4.1.1). I suspect the correct fix is to have MMU_PHYS_IDX pass through the input address unchanged, and it is the responsibility of the higher level paging mmu_idx to truncate physical addresses per PG_MODE_* before recursing. r~ r~ > out->prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC; > out->page_size = TARGET_PAGE_SIZE; > return true;
On 20/12/2023 04:22, Richard Henderson wrote: > On 12/18/23 23:56, Michael Brown wrote: >> The address translation logic in get_physical_address() will currently >> truncate physical addresses to 32 bits unless long mode is enabled. >> This is incorrect when using physical address extensions (PAE) outside >> of long mode, with the result that a 32-bit operating system using PAE >> to access memory above 4G will experience undefined behaviour. >> >> <snip> >> >> --- a/target/i386/tcg/sysemu/excp_helper.c >> +++ b/target/i386/tcg/sysemu/excp_helper.c >> @@ -582,12 +582,10 @@ static bool get_physical_address(CPUX86State >> *env, vaddr addr, >> /* Translation disabled. */ >> out->paddr = addr & x86_get_a20_mask(env); >> -#ifdef TARGET_X86_64 >> - if (!(env->hflags & HF_LMA_MASK)) { >> - /* Without long mode we can only address 32bits in real mode */ >> + if (!(env->cr[4] & CR4_PAE_MASK)) { >> + /* Without PAE we can address only 32 bits */ >> out->paddr = (uint32_t)out->paddr; >> } >> -#endif > > This is not the correct refactoring. > > I agree that what we're currently doing is wrong, esp for MMU_PHYS_IDX, > but for the default case, if CR0.PG == 0, then CR4.PAE is ignored (vol > 3, section 4.1.1). > > I suspect the correct fix is to have MMU_PHYS_IDX pass through the input > address unchanged, and it is the responsibility of the higher level > paging mmu_idx to truncate physical addresses per PG_MODE_* before > recursing. Thank you for reviewing, and for confirming the bug. For the MMU_PHYS_IDX case, I agree that it makes sense for the address to be passed through unchanged. For the default case, I think it would make sense to unconditionally truncate the address to 32 bits if paging is disabled. (I am not sure why the original commit 33dfdb5 included a test for long mode, since I do not see how it is possible to get the CPU into long mode with paging disabled.) I do not know what ought to be done in the MMU_NESTED_IDX case, and would appreciate your input on this. Thanks, Michael
On 12/20/23 22:03, Michael Brown wrote: > For the default case, I think it would make sense to unconditionally truncate the address > to 32 bits if paging is disabled. (I am not sure why the original commit 33dfdb5 included > a test for long mode, since I do not see how it is possible to get the CPU into long mode > with paging disabled.) You are correct that paging is mandatory for LMA -- indeed, setting CR0.PG is the final step in 10.8.5 Initializing IA-32e Mode, which copies EFER.LME to EFER.LMA. The commit 33dfdb5 that you reference is definitely wrong. > I do not know what ought to be done in the MMU_NESTED_IDX case, and would appreciate your > input on this. It would be ideal if MMU_NESTED_IDX were only used when virtualization is enabled and nested_ctl & SVM_NPT_ENABLED, i.e. when use_stage2 is true. I can't remember why I added a check for use_stage2 for MMU_NESTED_IDX in 98281984a37. It is possible that, as that patch introduced the feature, I was being cautious. It's also possible that I did see something, but then it was cleaned up somewhere in the rest of that rather large patch series. In any case, current intended behaviour is that MMU_NESTED_IDX with !use_stage2 equates to MMU_PHYS_IDX. r~
On 20/12/2023 21:51, Richard Henderson wrote: > On 12/20/23 22:03, Michael Brown wrote: >> For the default case, I think it would make sense to unconditionally >> truncate the address to 32 bits if paging is disabled. (I am not sure >> why the original commit 33dfdb5 included a test for long mode, since I >> do not see how it is possible to get the CPU into long mode with >> paging disabled.) > > You are correct that paging is mandatory for LMA -- indeed, setting > CR0.PG is the final step in 10.8.5 Initializing IA-32e Mode, which > copies EFER.LME to EFER.LMA. > > The commit 33dfdb5 that you reference is definitely wrong. I have done some further investigation, and come to the conclusion that we should just delete the truncation code entirely. With paging disabled, there is (as far as I know) no way to execute an instruction with an address size of 64 bits. With the instruction address size being 32 bits, the linear address will already be truncated to 32 bits anyway. A quick userspace test program confirms that on a physical CPU a 32-bit address size will result in the linear address being truncated to 32 bits: #include <stdint.h> #include <stdio.h> uint8_t val = 42; int main ( void ) { uint8_t *ptr = ( &val + 0x80000001 ); uint8_t res; printf ( "&val = %p\n", &val ); // will be around 0x400000 printf ( "ptr = %p\n", ptr ); printf ( "addr32 read via 0x7fffffff(%p)...\n", ptr ); __asm__ ( "addr32 movb 0x7fffffff(%k1), %b0\n\t" : "=r" ( res ) : "r" ( ptr ) ); printf ( "...got %d\n", res ); printf ( "addr64 read via 0x7fffffff(%p)...\n", ptr ); __asm__ ( "movb 0x7fffffff(%1), %b0\n\t" : "=r" ( res ) : "r" ( ptr ) ); printf ( "...got %d\n", res ); return 0; } produces the expected output: $ cc -o test test.c && ./test &val = 0x40400c ptr = 0x8040400d addr32 read via 0x7fffffff(0x8040400d)... ...got 42 addr64 read via 0x7fffffff(0x8040400d)... Segmentation fault (core dumped) which I believe shows that the addr32 instruction experiences wraparound of the linear address (and so accesses the correct location), while the addr64 instruction with the same base and offset does not experience wraparound of the linear address (and so segfaults). The original commit 33dfdb5 describes 32-bit code relying on the linear address wraparound to access memory below the segment base. This is what iPXE also does: the iPXE code is physically copied to somewhere high in 32-bit memory, the segment base is set to point to the location of iPXE (thereby avoiding the need to apply relocation records), and wraparound is relied upon to access all memory below this. I have tested removing the truncation code from get_physical_address() and verified that iPXE continues to work as expected. I have also verified that removing the truncation code from get_physical_address() does seem to fix the problem with PAE mode (i.e. is able to boot Windows 10 with RAM above 4G). I will send a v2 patch shortly. Thanks, Michael
© 2016 - 2024 Red Hat, Inc.