[PATCH] x86/fault: ignore RSVD flag in error code if P flag is 0

Vasily Averin posted 1 patch 3 years, 9 months ago
There is a newer version of this series
arch/x86/mm/fault.c | 9 +++++++++
1 file changed, 9 insertions(+)
[PATCH] x86/fault: ignore RSVD flag in error code if P flag is 0
Posted by Vasily Averin 3 years, 9 months ago
Some older Intel CPUs have errata:
"Not-Present Page Faults May Set the RSVD Flag in the Error Code

Problem:
An attempt to access a page that is not marked present causes a page
fault. Such a page fault delivers an error code in which both the
P flag (bit 0) and the RSVD flag (bit 3) are 0. Due to this erratum,
not-present page faults may deliver an error code in which the P flag
is 0 but the RSVD flag is 1.

Implication:
Software may erroneously infer that a page fault was due to a
reserved-bit violation when it was actually due to an attempt
to access a not-present page.

Workaround: Page-fault handlers should ignore the RSVD flag in the error
code if the P flag is 0."

This issues was observed on several nodes crashed with messages
httpd: Corrupted page table at address 7f62d5b48e68
PGD 80000002e92bf067 PUD 1c99c5067 PMD 195015067 PTE 7fffffffb78b680
Bad pagetable: 000c [#1] SMP

Let's follow the recommendation and will ignore the RSVD flag in the
error code if the P flag is 0

Link: https://lore.kernel.org/all/aae9c7c6-989c-0261-470a-252537493b53@openvz.org
Signed-off-by: Vasily Averin <vvs@openvz.org>
---
 arch/x86/mm/fault.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fe10c6d76bac..ffc6d6bd2a22 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1481,6 +1481,15 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
 	if (unlikely(kmmio_fault(regs, address)))
 		return;
 
+	/*
+	 * Some older Intel CPUs have errata
+	 * "Not-Present Page Faults May Set the RSVD Flag in the Error Code"
+	 * It is recommended to ignore the RSVD flag (bit 3) in the error code
+	 * if the P flag (bit 0) is 0.
+	 */
+	if (unlikely((error_code & X86_PF_RSVD) && !(error_code & X86_PF_PROT)))
+		error_code &= ~X86_PF_RSVD;
+
 	/* Was the fault on kernel-controlled part of the address space? */
 	if (unlikely(fault_in_kernel_space(address))) {
 		do_kern_addr_fault(regs, error_code, address);
-- 
2.36.1
Re: [PATCH] x86/fault: ignore RSVD flag in error code if P flag is 0
Posted by H. Peter Anvin 3 years, 9 months ago
On June 29, 2022 10:58:36 PM PDT, Vasily Averin <vvs@openvz.org> wrote:
>Some older Intel CPUs have errata:
>"Not-Present Page Faults May Set the RSVD Flag in the Error Code
>
>Problem:
>An attempt to access a page that is not marked present causes a page
>fault. Such a page fault delivers an error code in which both the
>P flag (bit 0) and the RSVD flag (bit 3) are 0. Due to this erratum,
>not-present page faults may deliver an error code in which the P flag
>is 0 but the RSVD flag is 1.
>
>Implication:
>Software may erroneously infer that a page fault was due to a
>reserved-bit violation when it was actually due to an attempt
>to access a not-present page.
>
>Workaround: Page-fault handlers should ignore the RSVD flag in the error
>code if the P flag is 0."
>
>This issues was observed on several nodes crashed with messages
>httpd: Corrupted page table at address 7f62d5b48e68
>PGD 80000002e92bf067 PUD 1c99c5067 PMD 195015067 PTE 7fffffffb78b680
>Bad pagetable: 000c [#1] SMP
>
>Let's follow the recommendation and will ignore the RSVD flag in the
>error code if the P flag is 0
>
>Link: https://lore.kernel.org/all/aae9c7c6-989c-0261-470a-252537493b53@openvz.org
>Signed-off-by: Vasily Averin <vvs@openvz.org>
>---
> arch/x86/mm/fault.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
>diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>index fe10c6d76bac..ffc6d6bd2a22 100644
>--- a/arch/x86/mm/fault.c
>+++ b/arch/x86/mm/fault.c
>@@ -1481,6 +1481,15 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
> 	if (unlikely(kmmio_fault(regs, address)))
> 		return;
> 
>+	/*
>+	 * Some older Intel CPUs have errata
>+	 * "Not-Present Page Faults May Set the RSVD Flag in the Error Code"
>+	 * It is recommended to ignore the RSVD flag (bit 3) in the error code
>+	 * if the P flag (bit 0) is 0.
>+	 */
>+	if (unlikely((error_code & X86_PF_RSVD) && !(error_code & X86_PF_PROT)))
>+		error_code &= ~X86_PF_RSVD;
>+
> 	/* Was the fault on kernel-controlled part of the address space? */
> 	if (unlikely(fault_in_kernel_space(address))) {
> 		do_kern_addr_fault(regs, error_code, address);

Are there other bits we could/should mask.out in the case P = 0? The only bits that should be able to appear are ones that are independent of the PTE content.
Re: [PATCH] x86/fault: ignore RSVD flag in error code if P flag is 0
Posted by Vasily Averin 3 years, 9 months ago
On 7/1/22 03:42, H. Peter Anvin wrote:
> On June 29, 2022 10:58:36 PM PDT, Vasily Averin <vvs@openvz.org> wrote:
>> Some older Intel CPUs have errata:
>> "Not-Present Page Faults May Set the RSVD Flag in the Error Code
>>
>> Problem:
>> An attempt to access a page that is not marked present causes a page
>> fault. Such a page fault delivers an error code in which both the
>> P flag (bit 0) and the RSVD flag (bit 3) are 0. Due to this erratum,
>> not-present page faults may deliver an error code in which the P flag
>> is 0 but the RSVD flag is 1.
>>
>> Implication:
>> Software may erroneously infer that a page fault was due to a
>> reserved-bit violation when it was actually due to an attempt
>> to access a not-present page.
>>
>> Workaround: Page-fault handlers should ignore the RSVD flag in the error
>> code if the P flag is 0."
>>
>> This issues was observed on several nodes crashed with messages
>> httpd: Corrupted page table at address 7f62d5b48e68
>> PGD 80000002e92bf067 PUD 1c99c5067 PMD 195015067 PTE 7fffffffb78b680
>> Bad pagetable: 000c [#1] SMP
>>
>> Let's follow the recommendation and will ignore the RSVD flag in the
>> error code if the P flag is 0
>>
>> Link: https://lore.kernel.org/all/aae9c7c6-989c-0261-470a-252537493b53@openvz.org
>> Signed-off-by: Vasily Averin <vvs@openvz.org>
>> ---
>> arch/x86/mm/fault.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index fe10c6d76bac..ffc6d6bd2a22 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -1481,6 +1481,15 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
>> 	if (unlikely(kmmio_fault(regs, address)))
>> 		return;
>>
>> +	/*
>> +	 * Some older Intel CPUs have errata
>> +	 * "Not-Present Page Faults May Set the RSVD Flag in the Error Code"
>> +	 * It is recommended to ignore the RSVD flag (bit 3) in the error code
>> +	 * if the P flag (bit 0) is 0.
>> +	 */
>> +	if (unlikely((error_code & X86_PF_RSVD) && !(error_code & X86_PF_PROT)))
>> +		error_code &= ~X86_PF_RSVD;
>> +
>> 	/* Was the fault on kernel-controlled part of the address space? */
>> 	if (unlikely(fault_in_kernel_space(address))) {
>> 		do_kern_addr_fault(regs, error_code, address);
> 
> Are there other bits we could/should mask.out in the case P = 0? The
> only bits that should be able to appear are ones that are independent
> of the PTE content.
In accordance with the "Intel® 64 and IA-32 Architectures Software Developer’s
Manual Volume 3A: System Programming Guide, Part 1" there are several other
similar bits:
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

"4.7 PAGE-FAULT EXCEPTIONS
...
• HLAT (bit 7).
This flag is 1 if there is no translation for the linear address using HLAT 
paging because, in one of the paging structure entries used to translate that
address, either the P flag was 0 or a reserved bit was set. An error code will
set this flag only if it clears bit 0 or sets bit 3. This flag will not be set
by a page fault resulting from a violation of access rights, nor for one
encountered during ordinary paging, including the case in which there has been
a restart of HLAT paging.

• SGX flag (bit 15).
This flag is 1 if the exception is unrelated to paging and resulted from 
violation of SGX-specific access-control requirements. Because such a violation
can occur only if there is no ordinary page fault, this flag is set only if
the P flag (bit 0) is 1 and the RSVD flag (bit 3) and the PK flag (bit 5)
are both 0."

However, only the RSVD flag has errata in real processors.
So I don't think any other bits should be masked in some way.

Thank you,
	Vasily Averin
Re: [PATCH] x86/fault: ignore RSVD flag in error code if P flag is 0
Posted by Dave Hansen 3 years, 9 months ago
On 6/29/22 22:58, Vasily Averin wrote:
> Some older Intel CPUs have errata:
> "Not-Present Page Faults May Set the RSVD Flag in the Error Code

Please include a link to the documentation when you cite things like
this.  For example, this is very helpful:

	Several older Intel CPUs have this or a similar erratum.  For
	instance, the "Intel Xeon Processor 5400 Series Specification
	Update" has "AX74 ... Not-Present Page Faults May Set the RSVD
	Flag in the Error Code".

	https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-5400-spec-update.pdf

That makes it a *LOT* easier to find the actual erratum and its text.  I
honestly also woudln't mind if you just copy a chunk of the problem text
verbatim into the changelog.  Intel does have a habit of updating text
in documents like that and it's quite handy to have a snapshot of what
you were reading when you wrote the patch.
[PATCH v2] x86/fault: ignore RSVD flag in error code if P flag is 0
Posted by Vasily Averin 3 years, 9 months ago
Several older Intel CPUs have this or a similar erratum.
For instance, the "Intel Xeon Processor 5400 Series
Specification Update" [1] has

"AX74. Not-Present Page Faults May Set the RSVD Flag in the Error Code

Problem:
 An attempt to access a page that is not marked present causes
 a page fault. Such a page fault delivers an error code in which
 both the P flag (bit 0) and the RSVD flag (bit 3) are 0.
 Due to this erratum, not-present page faults may deliver
 an error code in which the P flag is 0 but the RSVD flag is 1.

Implication:
 Software may erroneously infer that a page fault was due to
 a reserved-bit violation when it was actually due to an attempt
 to access a not-present page. Intel has not observed this erratum
 with any commercially available software.

Workaround:
 Page-fault handlers should ignore the RSVD flag in the error
 code if the P flag is 0"

This problem has been observed several times on several nodes using
Intel Xeon E5450 processors. These nodes were crashed after
"Bad pagetable: 000c" messages like this:

Corrupted page table at address 7f62d5b48e68
PGD 80000002e92bf067 PUD 1c99c5067 PMD 195015067 PTE 7fffffffb78b680
Bad pagetable: 000c [#1] SMP

Error code here is 0xc, it have set RSVD flag (bit 3), however P flag
(bit 0) is clear.

Let's follow the recommendations and ignore the RSVD flag in the cases
described.

Link: [1] https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-5400-spec-update.pdf
Link: https://lore.kernel.org/all/aae9c7c6-989c-0261-470a-252537493b53@openvz.org
Reported-by: Steve Sipes <steve.sipes@comandsolutions.com>
Signed-off-by: Vasily Averin <vvs@openvz.org>
---
v2: added original reporter
    improved patch description, added link to CPU spec update
---
 arch/x86/mm/fault.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fe10c6d76bac..ffc6d6bd2a22 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1481,6 +1481,15 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
 	if (unlikely(kmmio_fault(regs, address)))
 		return;
 
+	/*
+	 * Some older Intel CPUs have errata
+	 * "Not-Present Page Faults May Set the RSVD Flag in the Error Code"
+	 * It is recommended to ignore the RSVD flag (bit 3) in the error code
+	 * if the P flag (bit 0) is 0.
+	 */
+	if (unlikely((error_code & X86_PF_RSVD) && !(error_code & X86_PF_PROT)))
+		error_code &= ~X86_PF_RSVD;
+
 	/* Was the fault on kernel-controlled part of the address space? */
 	if (unlikely(fault_in_kernel_space(address))) {
 		do_kern_addr_fault(regs, error_code, address);
-- 
2.36.1