[PATCH] docs: correct x86 MCE command line option info

Jan Beulich posted 1 patch 2 years, 2 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
[PATCH] docs: correct x86 MCE command line option info
Posted by Jan Beulich 2 years, 2 months ago
Not even the types were correct, let alone defaults being spelled out or
the purpose of the options actually mentioned in any way.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1681,10 +1681,21 @@ one pending bit to be allocated.
 Defaults to 20 bits (to cover at most 1048576 interrupts).
 
 ### mce (x86)
-> `= <integer>`
+> `= <boolean>`
+
+> Default: `true`
+
+Allows to disable the use of Machine Check Exceptions.  Note that this
+may result in silent shutdown of the system in case an event occurs
+which would have resulted in raising a Machine Check Exception.
 
 ### mce_fb (Intel)
-> `= <integer>`
+> `= <boolean>`
+
+> Default: `false`
+
+Force broadcasting of Machine Check Exceptions, suppressing the use of
+Local MCE functionality available in newer Intel hardware.
 
 ### mce_verbosity (x86)
 > `= verbose`
Re: [PATCH] docs: correct x86 MCE command line option info
Posted by Andrew Cooper 2 years, 2 months ago
On 28/02/2022 10:20, Jan Beulich wrote:
> Not even the types were correct,

Huh yes.  c/s 97638f08f4 was plain wrong.

>  let alone defaults being spelled out or
> the purpose of the options actually mentioned in any way.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1681,10 +1681,21 @@ one pending bit to be allocated.
>  Defaults to 20 bits (to cover at most 1048576 interrupts).
>  
>  ### mce (x86)
> -> `= <integer>`
> +> `= <boolean>`
> +
> +> Default: `true`
> +
> +Allows to disable the use of Machine Check Exceptions.  Note that this
> +may result in silent shutdown of the system in case an event occurs
> +which would have resulted in raising a Machine Check Exception.

This description appears backwards.  Errors happen irrespective of MCE,
and will by default cause a system shutdown.

MCE offers the OS/VMM some ability to deal with certain
not-totally-fatal errors in a less impactful way than killing the whole
system.  Also, it allows reporting of corrected errors which are
indicative of failing components.

Also, it's not silent - the MCE registers explicitly don't clear on
reset so they can be recovered after warm reset.  Firmware collects
these and is supposed to do something useful with them, although
"useful" is a matter of opinion, and in some cases depends on how much
extra you're willing to pay your OEM.

~Andrew
Re: [PATCH] docs: correct x86 MCE command line option info
Posted by Jan Beulich 2 years, 2 months ago
On 28.02.2022 14:19, Andrew Cooper wrote:
> On 28/02/2022 10:20, Jan Beulich wrote:
>> Not even the types were correct,
> 
> Huh yes.  c/s 97638f08f4 was plain wrong.
> 
>>  let alone defaults being spelled out or
>> the purpose of the options actually mentioned in any way.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -1681,10 +1681,21 @@ one pending bit to be allocated.
>>  Defaults to 20 bits (to cover at most 1048576 interrupts).
>>  
>>  ### mce (x86)
>> -> `= <integer>`
>> +> `= <boolean>`
>> +
>> +> Default: `true`
>> +
>> +Allows to disable the use of Machine Check Exceptions.  Note that this
>> +may result in silent shutdown of the system in case an event occurs
>> +which would have resulted in raising a Machine Check Exception.
> 
> This description appears backwards.  Errors happen irrespective of MCE,
> and will by default cause a system shutdown.

Of course. Would s/this/doing so/ make things more clear? It was certainly
meant that way.

> MCE offers the OS/VMM some ability to deal with certain
> not-totally-fatal errors in a less impactful way than killing the whole
> system.  Also, it allows reporting of corrected errors which are
> indicative of failing components.
> 
> Also, it's not silent - the MCE registers explicitly don't clear on
> reset so they can be recovered after warm reset.  Firmware collects
> these and is supposed to do something useful with them, although
> "useful" is a matter of opinion, and in some cases depends on how much
> extra you're willing to pay your OEM.

It's still silent as far as Xen disappearing goes. Whether firmware
properly collects and exposes the data isn't something we ought to be
concerned of. If some _simple_ adjustment is going to meet your
approval, I'll be happy to make such an adjustment. If you suggest I
explain machine check machinery here, then I'll simply withdraw the
patch.

Jan