[PATCH 0/3] add MEMORY_FAILURE event

zhenwei pi posted 3 patches 3 years, 7 months ago
Test docker-quick@centos7 failed
Test docker-mingw@fedora failed
Test checkpatch failed
Test FreeBSD failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200914134321.958079-1-pizhenwei@bytedance.com
Maintainers: Richard Henderson <rth@twiddle.net>, Eric Blake <eblake@redhat.com>, Eduardo Habkost <ehabkost@redhat.com>, Marcelo Tosatti <mtosatti@redhat.com>, Markus Armbruster <armbru@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
There is a newer version of this series
qapi/run-state.json  | 46 ++++++++++++++++++++++++++++++++++++++++++++++
target/i386/helper.c | 30 +++++++++++++++++++++++-------
target/i386/kvm.c    |  5 ++++-
3 files changed, 73 insertions(+), 8 deletions(-)
[PATCH 0/3] add MEMORY_FAILURE event
Posted by zhenwei pi 3 years, 7 months ago
Although QEMU could catch signal BUS to handle hardware memory
corrupted event, sadly, QEMU just prints a little log and try to fix
it silently.

In these patches, introduce a 'MEMORY_FAILURE' event with 4 detailed
actions of QEMU, then uplayer could know what situaction QEMU hit and
did. And further step we can do: if a host server hits a 'hypervisor-ignore'
or 'guest-mce', scheduler could migrate VM to another host; if hitting
'hypervisor-stop' or 'guest-triple-fault', scheduler could select other
healthy servers to launch VM.

zhenwei pi (3):
  target-i386: seperate MCIP & MCE_MASK error reason
  iqapi/run-state.json: introduce memory failure event
  target-i386: post memory failure event to uplayer

 qapi/run-state.json  | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/helper.c | 30 +++++++++++++++++++++++-------
 target/i386/kvm.c    |  5 ++++-
 3 files changed, 73 insertions(+), 8 deletions(-)

-- 
2.11.0


ping: [PATCH 0/3] add MEMORY_FAILURE event
Posted by zhenwei pi 3 years, 7 months ago
Hi,

A patchset about handling 'MCE' might have been ignored, can anyone tell 
me whether the purpose is reasonable?

https://patchwork.kernel.org/cover/11773795/

On 9/14/20 9:43 PM, zhenwei pi wrote:
> Although QEMU could catch signal BUS to handle hardware memory
> corrupted event, sadly, QEMU just prints a little log and try to fix
> it silently.
> 
> In these patches, introduce a 'MEMORY_FAILURE' event with 4 detailed
> actions of QEMU, then uplayer could know what situaction QEMU hit and
> did. And further step we can do: if a host server hits a 'hypervisor-ignore'
> or 'guest-mce', scheduler could migrate VM to another host; if hitting
> 'hypervisor-stop' or 'guest-triple-fault', scheduler could select other
> healthy servers to launch VM.
> 
> zhenwei pi (3):
>    target-i386: seperate MCIP & MCE_MASK error reason
>    iqapi/run-state.json: introduce memory failure event
>    target-i386: post memory failure event to uplayer
> 
>   qapi/run-state.json  | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>   target/i386/helper.c | 30 +++++++++++++++++++++++-------
>   target/i386/kvm.c    |  5 ++++-
>   3 files changed, 73 insertions(+), 8 deletions(-)
> 

-- 
zhenwei pi

Re: ping: [PATCH 0/3] add MEMORY_FAILURE event
Posted by Paolo Bonzini 3 years, 7 months ago
On 21/09/20 04:22, zhenwei pi wrote:
> Hi,
> 
> A patchset about handling 'MCE' might have been ignored, can anyone tell
> me whether the purpose is reasonable?
> 
> https://patchwork.kernel.org/cover/11773795/

Yes, it's very useful.  Just one thing, "guest-mce" can be reported for
both AR and AO faults.  Is it worth adding a 'type' field to distinguish
the two?

Paolo

> On 9/14/20 9:43 PM, zhenwei pi wrote:
>> Although QEMU could catch signal BUS to handle hardware memory
>> corrupted event, sadly, QEMU just prints a little log and try to fix
>> it silently.
>>
>> In these patches, introduce a 'MEMORY_FAILURE' event with 4 detailed
>> actions of QEMU, then uplayer could know what situaction QEMU hit and
>> did. And further step we can do: if a host server hits a
>> 'hypervisor-ignore'
>> or 'guest-mce', scheduler could migrate VM to another host; if hitting
>> 'hypervisor-stop' or 'guest-triple-fault', scheduler could select other
>> healthy servers to launch VM.
>>
>> zhenwei pi (3):
>>    target-i386: seperate MCIP & MCE_MASK error reason
>>    iqapi/run-state.json: introduce memory failure event
>>    target-i386: post memory failure event to uplayer
>>
>>   qapi/run-state.json  | 46
>> ++++++++++++++++++++++++++++++++++++++++++++++
>>   target/i386/helper.c | 30 +++++++++++++++++++++++-------
>>   target/i386/kvm.c    |  5 ++++-
>>   3 files changed, 73 insertions(+), 8 deletions(-)
>>
> 


Re: [External] Re: ping: [PATCH 0/3] add MEMORY_FAILURE event
Posted by zhenwei pi 3 years, 7 months ago

On 9/21/20 8:09 PM, Paolo Bonzini wrote:
> On 21/09/20 04:22, zhenwei pi wrote:
>> Hi,
>>
>> A patchset about handling 'MCE' might have been ignored, can anyone tell
>> me whether the purpose is reasonable?
>>
>> https://patchwork.kernel.org/cover/11773795/
> 
> Yes, it's very useful.  Just one thing, "guest-mce" can be reported for
> both AR and AO faults.  Is it worth adding a 'type' field to distinguish
> the two?
> 
> Paolo
> 
Sure. how about adding a 'flags' of a structure? and a field named 
'action-required' to describe AO or AR?
>> On 9/14/20 9:43 PM, zhenwei pi wrote:
>>> Although QEMU could catch signal BUS to handle hardware memory
>>> corrupted event, sadly, QEMU just prints a little log and try to fix
>>> it silently.
>>>
>>> In these patches, introduce a 'MEMORY_FAILURE' event with 4 detailed
>>> actions of QEMU, then uplayer could know what situaction QEMU hit and
>>> did. And further step we can do: if a host server hits a
>>> 'hypervisor-ignore'
>>> or 'guest-mce', scheduler could migrate VM to another host; if hitting
>>> 'hypervisor-stop' or 'guest-triple-fault', scheduler could select other
>>> healthy servers to launch VM.
>>>
>>> zhenwei pi (3):
>>>     target-i386: seperate MCIP & MCE_MASK error reason
>>>     iqapi/run-state.json: introduce memory failure event
>>>     target-i386: post memory failure event to uplayer
>>>
>>>    qapi/run-state.json  | 46
>>> ++++++++++++++++++++++++++++++++++++++++++++++
>>>    target/i386/helper.c | 30 +++++++++++++++++++++++-------
>>>    target/i386/kvm.c    |  5 ++++-
>>>    3 files changed, 73 insertions(+), 8 deletions(-)
>>>
>>
> 

-- 
zhenwei pi