[SeaBIOS] [PATCH RFC] x86: use volatile asm for read/write{b, w, l} implementations

Vitaly Kuznetsov posted 1 patch 6 years, 3 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/seabios tags/patchew/20180103134115.13686-1-vkuznets@redhat.com
src/x86.h | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
[SeaBIOS] [PATCH RFC] x86: use volatile asm for read/write{b, w, l} implementations
Posted by Vitaly Kuznetsov 6 years, 3 months ago
QEMU/KVM guests running nested on top of Hyper-V fail to boot with
virtio-blk-pci disks, the debug log ends with

Booting from Hard Disk...
call32_smm 0x000edd01 e97a0
handle_smi cmd=b5 smbase=0x000a0000
vp notify fe007000 (2) -- 0x0
vp read   fe005000 (1) -> 0x0
handle_smi cmd=b5 smbase=0x000a0000
call32_smm done 0x000edd01 0
Booting from 0000:7c00
call32_smm 0x000edd01 e97a4
handle_smi cmd=b5 smbase=0x000a0000
vp notify fe007000 (2) -- 0x0
In resume (status=0)
In 32bit resume
Attempting a hard reboot
...

I bisected the breakage to the following commit:

commit f46739b1a819750c63fb5849844d99cc2ab001e8
Author: Kevin O'Connor <kevin@koconnor.net>
Date:   Tue Feb 2 22:34:27 2016 -0500

    virtio: Convert to new PCI BAR helper functions

But the commit itself appears to be correct. The problem is in how
writew() function compiles into vp_notify(). For example, if we drop
'volatile' qualifier from the current writew() implementation everything
starts to work. If we disassemble these two versions (as of f46739b1a)
the difference will be:

00000000 <vp_notify>:

   With 'volatile' (current)              Without 'volatile'

   0:   push   %ebx			   0:   push   %ebx
   1:   mov    %eax,%ecx   		   1:   mov    %eax,%ecx
   3:   mov    0x1518(%edx),%eax	   3:   mov    0x1518(%edx),%eax
   9:   cmpb   $0x0,0x2c(%ecx)		   9:   cmpb   $0x0,0x2c(%ecx)
   d:   je     2f <vp_notify+0x2f>	   d:   je     2e <vp_notify+0x2e>
   f:   mov    0x151c(%edx),%edx	   f:   mov    0x151c(%edx),%edx
  15:   mov    0x28(%ecx),%ebx
  18:   imul   %edx,%ebx                  15:   imul   0x28(%ecx),%edx
  1b:   mov    0x8(%ecx),%edx             19:   mov    0x8(%ecx),%ebx
  1e:   add    %ebx,%edx
  20:   cmpb   $0x0,0xe(%ecx)             1c:   cmpb   $0x0,0xe(%ecx)
  24:   je     2a <vp_notify+0x2a>        20:   je     28 <vp_notify+0x28>
                                          22:   add    %ebx,%edx
  26:   out    %ax,(%dx)                  24:   out    %ax,(%dx)
  28:   jmp    48 <vp_notify+0x48>        26:   jmp    47 <vp_notify+0x47>
  2a:   mov    %ax,(%edx)                 28:   mov    %ax,(%ebx,%edx,1)
  2d:   jmp    48 <vp_notify+0x48>        2c:   jmp    47 <vp_notify+0x47>
  2f:   lea    0x20(%ecx),%ebx            2e:   lea    0x20(%ecx),%ebx
  32:   cltd                              31:   cltd
  33:   push   %edx                       32:   push   %edx
  34:   push   %eax                       33:   push   %eax
  35:   mov    $0x2,%ecx                  34:   mov    $0x2,%ecx
  3a:   mov    $0x10,%edx                 39:   mov    $0x10,%edx
  3f:   mov    %ebx,%eax                  3e:   mov    %ebx,%eax
  41:   call   42 <vp_notify+0x42>        40:   call   41 <vp_notify+0x41>
  46:   pop    %eax                       45:   pop    %eax
  47:   pop    %edx                       46:   pop    %edx
  48:   pop    %ebx                       47:   pop    %ebx
  49:   ret                               48:   ret

My eyes fail to see an obvious compiler flaw here but probably the
mov difference (at '2a' old, '28' new) is to blame. Doing some other
subtle changes (e.g. adding local variables to the function) help in
some cases too. At this point I got a bit lost with my debug so I
looked at how Linux does this stuff and it seems we're not using
'*(volatile u16) = ' there. Rewriting write/read{b,w,l} with volatile
asm help.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
RFC: This is rather an ongoing debug as I'm not able to point finger
at the real culprit yet, I'd be grateful for any help and suggestions.
In particular, I don't quite understand why nested virtualization
makes a difference here.
---
 src/x86.h | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/src/x86.h b/src/x86.h
index 53378e9..d45122c 100644
--- a/src/x86.h
+++ b/src/x86.h
@@ -199,30 +199,27 @@ static inline void smp_wmb(void) {
 }
 
 static inline void writel(void *addr, u32 val) {
-    barrier();
-    *(volatile u32 *)addr = val;
+    asm volatile("movl %0, %1" : : "d"(val), "m"(*(u32 *)addr) : "memory");
 }
 static inline void writew(void *addr, u16 val) {
-    barrier();
-    *(volatile u16 *)addr = val;
+    asm volatile("movw %0, %1" : : "d"(val), "m"(*(u16 *)addr) : "memory");
 }
 static inline void writeb(void *addr, u8 val) {
-    barrier();
-    *(volatile u8 *)addr = val;
+    asm volatile("movb %0, %1" : : "d"(val), "m"(*(u8 *)addr) : "memory");
 }
 static inline u32 readl(const void *addr) {
-    u32 val = *(volatile const u32 *)addr;
-    barrier();
+    u32 val;
+    asm volatile("movl %1, %0" : "=d"(val) : "m"(*(u32 *)addr) : "memory");
     return val;
 }
 static inline u16 readw(const void *addr) {
-    u16 val = *(volatile const u16 *)addr;
-    barrier();
+    u16 val;
+    asm volatile("movw %1, %0" : "=d"(val) : "m"(*(u16 *)addr) : "memory");
     return val;
 }
 static inline u8 readb(const void *addr) {
-    u8 val = *(volatile const u8 *)addr;
-    barrier();
+    u8 val;
+    asm volatile("movb %1, %0" : "=d"(val) : "m"(*(u8 *)addr) : "memory");
     return val;
 }
 
-- 
2.14.3


_______________________________________________
SeaBIOS mailing list
SeaBIOS@seabios.org
https://mail.coreboot.org/mailman/listinfo/seabios
Re: [SeaBIOS] [PATCH RFC] x86: use volatile asm for read/write{b, w, l} implementations
Posted by Laszlo Ersek 6 years, 3 months ago
On 01/03/18 14:41, Vitaly Kuznetsov wrote:
> QEMU/KVM guests running nested on top of Hyper-V fail to boot with
> virtio-blk-pci disks, the debug log ends with
>
> Booting from Hard Disk...
> call32_smm 0x000edd01 e97a0
> handle_smi cmd=b5 smbase=0x000a0000
> vp notify fe007000 (2) -- 0x0
> vp read   fe005000 (1) -> 0x0
> handle_smi cmd=b5 smbase=0x000a0000
> call32_smm done 0x000edd01 0
> Booting from 0000:7c00
> call32_smm 0x000edd01 e97a4
> handle_smi cmd=b5 smbase=0x000a0000
> vp notify fe007000 (2) -- 0x0
> In resume (status=0)
> In 32bit resume
> Attempting a hard reboot
> ...
>
> I bisected the breakage to the following commit:
>
> commit f46739b1a819750c63fb5849844d99cc2ab001e8
> Author: Kevin O'Connor <kevin@koconnor.net>
> Date:   Tue Feb 2 22:34:27 2016 -0500
>
>     virtio: Convert to new PCI BAR helper functions
>
> But the commit itself appears to be correct. The problem is in how
> writew() function compiles into vp_notify(). For example, if we drop
> 'volatile' qualifier from the current writew() implementation
> everything starts to work. If we disassemble these two versions (as of
> f46739b1a) the difference will be:
>
> 00000000 <vp_notify>:
>
>    With 'volatile' (current)              Without 'volatile'
>
>    0:   push   %ebx			   0:   push   %ebx
>    1:   mov    %eax,%ecx   		   1:   mov    %eax,%ecx
>    3:   mov    0x1518(%edx),%eax	   3:   mov    0x1518(%edx),%eax
>    9:   cmpb   $0x0,0x2c(%ecx)		   9:   cmpb   $0x0,0x2c(%ecx)
>    d:   je     2f <vp_notify+0x2f>	   d:   je     2e <vp_notify+0x2e>
>    f:   mov    0x151c(%edx),%edx	   f:   mov    0x151c(%edx),%edx
>   15:   mov    0x28(%ecx),%ebx
>   18:   imul   %edx,%ebx                  15:   imul   0x28(%ecx),%edx
>   1b:   mov    0x8(%ecx),%edx             19:   mov    0x8(%ecx),%ebx
>   1e:   add    %ebx,%edx
>   20:   cmpb   $0x0,0xe(%ecx)             1c:   cmpb   $0x0,0xe(%ecx)
>   24:   je     2a <vp_notify+0x2a>        20:   je     28 <vp_notify+0x28>
>                                           22:   add    %ebx,%edx
>   26:   out    %ax,(%dx)                  24:   out    %ax,(%dx)
>   28:   jmp    48 <vp_notify+0x48>        26:   jmp    47 <vp_notify+0x47>
>   2a:   mov    %ax,(%edx)                 28:   mov    %ax,(%ebx,%edx,1)
>   2d:   jmp    48 <vp_notify+0x48>        2c:   jmp    47 <vp_notify+0x47>
>   2f:   lea    0x20(%ecx),%ebx            2e:   lea    0x20(%ecx),%ebx
>   32:   cltd                              31:   cltd
>   33:   push   %edx                       32:   push   %edx
>   34:   push   %eax                       33:   push   %eax
>   35:   mov    $0x2,%ecx                  34:   mov    $0x2,%ecx
>   3a:   mov    $0x10,%edx                 39:   mov    $0x10,%edx
>   3f:   mov    %ebx,%eax                  3e:   mov    %ebx,%eax
>   41:   call   42 <vp_notify+0x42>        40:   call   41 <vp_notify+0x41>
>   46:   pop    %eax                       45:   pop    %eax
>   47:   pop    %edx                       46:   pop    %edx
>   48:   pop    %ebx                       47:   pop    %ebx
>   49:   ret                               48:   ret
>
> My eyes fail to see an obvious compiler flaw here but probably the
> mov difference (at '2a' old, '28' new) is to blame. Doing some other
> subtle changes (e.g. adding local variables to the function) help in
> some cases too. At this point I got a bit lost with my debug so I
> looked at how Linux does this stuff and it seems we're not using
> '*(volatile u16) = ' there. Rewriting write/read{b,w,l} with volatile
> asm help.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> RFC: This is rather an ongoing debug as I'm not able to point finger
> at the real culprit yet, I'd be grateful for any help and suggestions.
> In particular, I don't quite understand why nested virtualization
> makes a difference here.
> ---
>  src/x86.h | 21 +++++++++------------
>  1 file changed, 9 insertions(+), 12 deletions(-)
>
> diff --git a/src/x86.h b/src/x86.h
> index 53378e9..d45122c 100644
> --- a/src/x86.h
> +++ b/src/x86.h
> @@ -199,30 +199,27 @@ static inline void smp_wmb(void) {
>  }
>
>  static inline void writel(void *addr, u32 val) {
> -    barrier();
> -    *(volatile u32 *)addr = val;
> +    asm volatile("movl %0, %1" : : "d"(val), "m"(*(u32 *)addr) : "memory");
>  }
>  static inline void writew(void *addr, u16 val) {
> -    barrier();
> -    *(volatile u16 *)addr = val;
> +    asm volatile("movw %0, %1" : : "d"(val), "m"(*(u16 *)addr) : "memory");
>  }
>  static inline void writeb(void *addr, u8 val) {
> -    barrier();
> -    *(volatile u8 *)addr = val;
> +    asm volatile("movb %0, %1" : : "d"(val), "m"(*(u8 *)addr) : "memory");
>  }
>  static inline u32 readl(const void *addr) {
> -    u32 val = *(volatile const u32 *)addr;
> -    barrier();
> +    u32 val;
> +    asm volatile("movl %1, %0" : "=d"(val) : "m"(*(u32 *)addr) : "memory");
>      return val;
>  }
>  static inline u16 readw(const void *addr) {
> -    u16 val = *(volatile const u16 *)addr;
> -    barrier();
> +    u16 val;
> +    asm volatile("movw %1, %0" : "=d"(val) : "m"(*(u16 *)addr) : "memory");
>      return val;
>  }
>  static inline u8 readb(const void *addr) {
> -    u8 val = *(volatile const u8 *)addr;
> -    barrier();
> +    u8 val;
> +    asm volatile("movb %1, %0" : "=d"(val) : "m"(*(u8 *)addr) : "memory");
>      return val;
>  }
>
>

barrier() is defined like this, in "src/types.h":

> #define barrier() __asm__ __volatile__("": : :"memory")

and "src/x86.h" has some interesting stuff too, just above the code that
you modify:

> /* Compiler barrier is enough as an x86 CPU does not reorder reads or writes */
> static inline void smp_rmb(void) {
>     barrier();
> }
> static inline void smp_wmb(void) {
>     barrier();
> }


Is it possible that the current barrier() is not sufficient for the
intended purpose in an L2 guest?

What happens if you drop your current patch, but replace

  __asm__ __volatile__("": : :"memory")

in the barrier() macro definition, with a real, heavy-weight barrier,
such as

  __asm__ __volatile__("mfence": : :"memory")

(See mb() in "arch/x86/include/asm/barrier.h" in the kernel.)


... I think running in L2 could play a role here; see
"Documentation/memory-barriers.txt", section "VIRTUAL MACHINE GUESTS";
from kernel commit 6a65d26385bf ("asm-generic: implement virt_xxx memory
barriers", 2016-01-12).

See also the commit message.

Thanks,
Laszlo

_______________________________________________
SeaBIOS mailing list
SeaBIOS@seabios.org
https://mail.coreboot.org/mailman/listinfo/seabios
Re: [SeaBIOS] [PATCH RFC] x86: use volatile asm for read/write{b, w, l} implementations
Posted by Vitaly Kuznetsov 6 years, 3 months ago
Laszlo Ersek <lersek@redhat.com> writes:

> Is it possible that the current barrier() is not sufficient for the
> intended purpose in an L2 guest?
>
> What happens if you drop your current patch, but replace
>
>   __asm__ __volatile__("": : :"memory")
>
> in the barrier() macro definition, with a real, heavy-weight barrier,
> such as
>
>   __asm__ __volatile__("mfence": : :"memory")
>
> (See mb() in "arch/x86/include/asm/barrier.h" in the kernel.)
>

Thanks for the suggestion,

unfortunately, it doesn't change anything :-(

> ... I think running in L2 could play a role here; see
> "Documentation/memory-barriers.txt", section "VIRTUAL MACHINE GUESTS";
> from kernel commit 6a65d26385bf ("asm-generic: implement virt_xxx memory
> barriers", 2016-01-12).
>
> See also the commit message.
>

I see, thank you.

It seems, however, that the issue here is not about barriers: first of
all it is 100% reproducible and second, surrounding '*(volatile u32
*)addr = val' with all sorts of barriers doesn't help. I *think* this is
some sort of a mis-assumption about this memory which is handled with
vmexits so both L0 and L1 hypervisors are getting involved. More
debugging ...

-- 
  Vitaly

_______________________________________________
SeaBIOS mailing list
SeaBIOS@seabios.org
https://mail.coreboot.org/mailman/listinfo/seabios
Re: [SeaBIOS] [PATCH RFC] x86: use volatile asm for read/write{b, w, l} implementations
Posted by Laszlo Ersek 6 years, 3 months ago
On 01/04/18 11:24, Vitaly Kuznetsov wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> Is it possible that the current barrier() is not sufficient for the
>> intended purpose in an L2 guest?
>>
>> What happens if you drop your current patch, but replace
>>
>>   __asm__ __volatile__("": : :"memory")
>>
>> in the barrier() macro definition, with a real, heavy-weight barrier,
>> such as
>>
>>   __asm__ __volatile__("mfence": : :"memory")
>>
>> (See mb() in "arch/x86/include/asm/barrier.h" in the kernel.)
>>
> 
> Thanks for the suggestion,
> 
> unfortunately, it doesn't change anything :-(
> 
>> ... I think running in L2 could play a role here; see
>> "Documentation/memory-barriers.txt", section "VIRTUAL MACHINE GUESTS";
>> from kernel commit 6a65d26385bf ("asm-generic: implement virt_xxx memory
>> barriers", 2016-01-12).
>>
>> See also the commit message.
>>
> 
> I see, thank you.
> 
> It seems, however, that the issue here is not about barriers: first of
> all it is 100% reproducible and second, surrounding '*(volatile u32
> *)addr = val' with all sorts of barriers doesn't help. I *think* this is
> some sort of a mis-assumption about this memory which is handled with
> vmexits so both L0 and L1 hypervisors are getting involved. More
> debugging ...

* Do you see the issue with both legacy-only (0.9.5) and modern-only
(1.0) virtio devices?

Asking about this because legacy and modern virtio devices use registers
in different address spaces (IO vs. MMIO).

* Does it make a difference if you disable EPT in the L1 KVM
configuration? (EPT is probably primarily controlled by the CPU features
exposed by L0 Hyper-V, and secondarily by the "ept" parameter of the
"kvm_intel" module in L1.)

Asking about EPT because the virtio rings and descriptors are in RAM,
accessing which in L2 should "normally" never trap to L1/L0. However (I
*guess*), when those pages are accessed for the very first time in L2,
they likely do trap, and then the EPT setting in L1 might make a difference.

* Somewhat relatedly, can you try launching QEMU in L1 with "-realtime
mlock=on"?

(Anyone please correct me if my ideas are bogus.)

Thanks
Laszlo

_______________________________________________
SeaBIOS mailing list
SeaBIOS@seabios.org
https://mail.coreboot.org/mailman/listinfo/seabios
Re: [SeaBIOS] [PATCH RFC] x86: use volatile asm for read/write{b, w, l} implementations
Posted by Vitaly Kuznetsov 6 years, 3 months ago
Laszlo Ersek <lersek@redhat.com> writes:

> On 01/04/18 11:24, Vitaly Kuznetsov wrote:
>> Laszlo Ersek <lersek@redhat.com> writes:
>> 
>>> Is it possible that the current barrier() is not sufficient for the
>>> intended purpose in an L2 guest?
>>>
>>> What happens if you drop your current patch, but replace
>>>
>>>   __asm__ __volatile__("": : :"memory")
>>>
>>> in the barrier() macro definition, with a real, heavy-weight barrier,
>>> such as
>>>
>>>   __asm__ __volatile__("mfence": : :"memory")
>>>
>>> (See mb() in "arch/x86/include/asm/barrier.h" in the kernel.)
>>>
>> 
>> Thanks for the suggestion,
>> 
>> unfortunately, it doesn't change anything :-(
>> 
>>> ... I think running in L2 could play a role here; see
>>> "Documentation/memory-barriers.txt", section "VIRTUAL MACHINE GUESTS";
>>> from kernel commit 6a65d26385bf ("asm-generic: implement virt_xxx memory
>>> barriers", 2016-01-12).
>>>
>>> See also the commit message.
>>>
>> 
>> I see, thank you.
>> 
>> It seems, however, that the issue here is not about barriers: first of
>> all it is 100% reproducible and second, surrounding '*(volatile u32
>> *)addr = val' with all sorts of barriers doesn't help. I *think* this is
>> some sort of a mis-assumption about this memory which is handled with
>> vmexits so both L0 and L1 hypervisors are getting involved. More
>> debugging ...
>

Thank you for your ideas,

> * Do you see the issue with both legacy-only (0.9.5) and modern-only
> (1.0) virtio devices?
>
> Asking about this because legacy and modern virtio devices use registers
> in different address spaces (IO vs. MMIO).
>


This only affects 'modern' virtio-blk-pci device which I'm using for
boot.

In fact, the only writew() needs patching is in vp_notify(), when I
replace it with 'asm volatile' everything works.

> * Does it make a difference if you disable EPT in the L1 KVM
> configuration? (EPT is probably primarily controlled by the CPU features
> exposed by L0 Hyper-V, and secondarily by the "ept" parameter of the
> "kvm_intel" module in L1.)
>
> Asking about EPT because the virtio rings and descriptors are in RAM,
> accessing which in L2 should "normally" never trap to L1/L0. However (I
> *guess*), when those pages are accessed for the very first time in L2,
> they likely do trap, and then the EPT setting in L1 might make a difference.

Disabling EPT helps!

>
> * Somewhat relatedly, can you try launching QEMU in L1 with "-realtime
> mlock=on"?
>

This doesn't seem to make a difference.

I also tried tracing L1 KVM and the difference between working and
non-working cases seems to be:

1) Working:

...
           <...>-51387 [014] 64765.695019: kvm_page_fault:       address fe007000 error_code 182
           <...>-51387 [014] 64765.695024: kvm_emulate_insn:     0:eca87: 66 89 14 30
           <...>-51387 [014] 64765.695026: vcpu_match_mmio:      gva 0xfe007000 gpa 0xfe007000 Write GPA
           <...>-51387 [014] 64765.695026: kvm_mmio:             mmio write len 2 gpa 0xfe007000 val 0x0
           <...>-51387 [014] 64765.695033: kvm_entry:            vcpu 0
           <...>-51387 [014] 64765.695042: kvm_exit:             reason EPT_VIOLATION rip 0xeae17 info 181 306
           <...>-51387 [014] 64765.695043: kvm_page_fault:       address f0694 error_code 181
           <...>-51387 [014] 64765.695044: kvm_entry:            vcpu 0
...

2) Broken:

...
           <...>-38071 [014] 63385.241117: kvm_page_fault:       address fe007000 error_code 182
           <...>-38071 [014] 63385.241121: kvm_emulate_insn:     0:ecffb: 66 89 06
           <...>-38071 [014] 63385.241123: vcpu_match_mmio:      gva 0xfe007000 gpa 0xfe007000 Write GPA
           <...>-38071 [014] 63385.241124: kvm_mmio:             mmio write len 2 gpa 0xfe007000 val 0x0
           <...>-38071 [014] 63385.241143: kvm_entry:            vcpu 0
           <...>-38071 [014] 63385.241162: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xecffe info 0 800000f6
           <...>-38071 [014] 63385.241162: kvm_entry:            vcpu 0
...

The 'kvm_emulate_insn' difference is actually the diferent versions of
'mov' we get with the current code and with my 'asm volatile'
version. What makes me wonder is where the 'EXTERNAL_INTERRUPT' (only
seen in broken version) comes from.

-- 
  Vitaly

_______________________________________________
SeaBIOS mailing list
SeaBIOS@seabios.org
https://mail.coreboot.org/mailman/listinfo/seabios
Re: [SeaBIOS] [PATCH RFC] x86: use volatile asm for read/write{b, w, l} implementations
Posted by Laszlo Ersek 6 years, 3 months ago
On 01/04/18 15:29, Vitaly Kuznetsov wrote:
> Laszlo Ersek <lersek@redhat.com> writes:

> In fact, the only writew() needs patching is in vp_notify(), when I
> replace it with 'asm volatile' everything works.
> 
>> * Does it make a difference if you disable EPT in the L1 KVM
>> configuration? (EPT is probably primarily controlled by the CPU features
>> exposed by L0 Hyper-V, and secondarily by the "ept" parameter of the
>> "kvm_intel" module in L1.)
>>
>> Asking about EPT because the virtio rings and descriptors are in RAM,
>> accessing which in L2 should "normally" never trap to L1/L0. However (I
>> *guess*), when those pages are accessed for the very first time in L2,
>> they likely do trap, and then the EPT setting in L1 might make a difference.
> 
> Disabling EPT helps!

OK...

> I also tried tracing L1 KVM and the difference between working and
> non-working cases seems to be:
> 
> 1) Working:
> 
> ...
>            <...>-51387 [014] 64765.695019: kvm_page_fault:       address fe007000 error_code 182
>            <...>-51387 [014] 64765.695024: kvm_emulate_insn:     0:eca87: 66 89 14 30
>            <...>-51387 [014] 64765.695026: vcpu_match_mmio:      gva 0xfe007000 gpa 0xfe007000 Write GPA
>            <...>-51387 [014] 64765.695026: kvm_mmio:             mmio write len 2 gpa 0xfe007000 val 0x0
>            <...>-51387 [014] 64765.695033: kvm_entry:            vcpu 0
>            <...>-51387 [014] 64765.695042: kvm_exit:             reason EPT_VIOLATION rip 0xeae17 info 181 306
>            <...>-51387 [014] 64765.695043: kvm_page_fault:       address f0694 error_code 181
>            <...>-51387 [014] 64765.695044: kvm_entry:            vcpu 0
> ...
> 
> 2) Broken:
> 
> ...
>            <...>-38071 [014] 63385.241117: kvm_page_fault:       address fe007000 error_code 182
>            <...>-38071 [014] 63385.241121: kvm_emulate_insn:     0:ecffb: 66 89 06
>            <...>-38071 [014] 63385.241123: vcpu_match_mmio:      gva 0xfe007000 gpa 0xfe007000 Write GPA
>            <...>-38071 [014] 63385.241124: kvm_mmio:             mmio write len 2 gpa 0xfe007000 val 0x0
>            <...>-38071 [014] 63385.241143: kvm_entry:            vcpu 0
>            <...>-38071 [014] 63385.241162: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xecffe info 0 800000f6
>            <...>-38071 [014] 63385.241162: kvm_entry:            vcpu 0
> ...
> 
> The 'kvm_emulate_insn' difference is actually the diferent versions of
> 'mov' we get with the current code and with my 'asm volatile'
> version. What makes me wonder is where the 'EXTERNAL_INTERRUPT' (only
> seen in broken version) comes from.
> 

I don't think said interrupt matters. I also don't think the MOV
differences matter; after all, in both cases we end up with the identical

  vcpu_match_mmio:      gva 0xfe007000 gpa 0xfe007000 Write GPA
  kvm_mmio:             mmio write len 2 gpa 0xfe007000 val 0x0

sequence.


Here's another random idea:

I'll admit that I have no clue how SeaBIOS uses SMM, but I found an
earlier email from Paolo
<886757208.6870637.1484133921200.JavaMail.zimbra@redhat.com> where he
wrote, "the main reason for it [i.e., SMM], is that it provides a safer
way to access a PCI device's memory BARs". (SeaBIOS commit 55215cd425d36
seems to give some background.)

And that kind of access is what vp_notify()/writew() does, and I see
"call32_smm" / "handle_smi" log entries in your thread starter,
intermixed with "vp notify".

Down-stream we disabled SMM in SeaBIOS because we deemed the additional
safety (see above) unnecessary for our limited BIOS service use cases
(=mostly grub), while SMM caused obscure problems:

- https://bugzilla.redhat.com/show_bug.cgi?id=1378006
- https://bugzilla.redhat.com/show_bug.cgi?id=1425516

So... can you rebuild SeaBIOS with "CONFIG_USE_SMM=n"?

(If you originally encountered the strange behavior with downstream
SeaBIOS, which already has CONFIG_USE_SMM=n, then please ignore...)

Thanks,
Laszlo

_______________________________________________
SeaBIOS mailing list
SeaBIOS@seabios.org
https://mail.coreboot.org/mailman/listinfo/seabios
Re: [SeaBIOS] [PATCH RFC] x86: use volatile asm for read/write{b, w, l} implementations
Posted by Vitaly Kuznetsov 6 years, 2 months ago
Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> QEMU/KVM guests running nested on top of Hyper-V fail to boot with
> virtio-blk-pci disks, the debug log ends with
>
> Booting from Hard Disk...
> call32_smm 0x000edd01 e97a0
> handle_smi cmd=b5 smbase=0x000a0000
> vp notify fe007000 (2) -- 0x0
> vp read   fe005000 (1) -> 0x0
> handle_smi cmd=b5 smbase=0x000a0000
> call32_smm done 0x000edd01 0
> Booting from 0000:7c00
> call32_smm 0x000edd01 e97a4
> handle_smi cmd=b5 smbase=0x000a0000
> vp notify fe007000 (2) -- 0x0
> In resume (status=0)
> In 32bit resume
> Attempting a hard reboot
> ...
>
> I bisected the breakage to the following commit:
>
> commit f46739b1a819750c63fb5849844d99cc2ab001e8
> Author: Kevin O'Connor <kevin@koconnor.net>
> Date:   Tue Feb 2 22:34:27 2016 -0500
>
>     virtio: Convert to new PCI BAR helper functions
>
> But the commit itself appears to be correct.

It took me a while to get back to this issue and I *hope* that I solved
the mystery: when we access 0xfe007000 for the second time EPT_MISCONFIG
is triggered but VM_EXIT_INSTRUCTION_LEN in VMCS is set to '1' (!!!)
regardless of the real length. So in kvm we do the following:

... if (... && !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL))
		return kvm_skip_emulated_instruction(vcpu);

and kvm_skip_emulated_instruction() does

        rip = kvm_rip_read(vcpu);
        rip += vmcs_read32(VM_EXIT_INSTRUCTION_LEN); <--- this is '1' 
        kvm_rip_write(vcpu, rip);

so we jump one byte forward and try to execute the new 'instruction'
which is actually just the tail of the old instruction as it is 3 bytes
long.

I think Hyper-V host is misbehaving when it gives us such VMCS for our
L2 guest. I'll check and complain.

-- 
  Vitaly

_______________________________________________
SeaBIOS mailing list
SeaBIOS@seabios.org
https://mail.coreboot.org/mailman/listinfo/seabios