[PATCH 0/2] x86/fred: Fix two problems during the FRED initialization

Hou Wenlong posted 2 patches 1 year, 5 months ago
arch/x86/entry/entry_fred.c     |  2 +-
arch/x86/include/asm/idtentry.h | 15 +++++++++++++--
arch/x86/mm/fault.c             | 19 +++++++++++++++----
3 files changed, 29 insertions(+), 7 deletions(-)
[PATCH 0/2] x86/fred: Fix two problems during the FRED initialization
Posted by Hou Wenlong 1 year, 5 months ago
When I reviewed the FRED code and attempted to implement a FRED-like
event delivery for my PV guest, I encountered two problems which I may
have misunderstood.

One issue is that FRED can be disabled in trap_init(), but
sysvec_install() can be called before trap_init(), thus the system
interrupt handler is not installed into the IDT if FRED is disabled
later. Initially, I attempted to parse the cmdline and decide whether to
enable or disable FRED after parse_early_param(). However, I ultimately
chose to always install the system handler into the IDT in
sysvec_install(), which is simple and should be sufficient.

Another problem is that the page fault handler (exc_page_fault()) is
installed into the IDT before FRED is enabled. Consequently, if a #PF is
triggered in this gap, the handler would receive the wrong CR2 from the
stack if FRED feature is present. To address this, I added a page fault
entry stub for FRED similar to the debug entry. However, I'm uncertain
whether this is enough reason to add a new entry. Perhaps a static key
may suffice to indicate whether FRED setup is completed and the handler
can use it.

Note: I didn't test them on the Intel emulator, as I'm not familiar with
configuring it to boot with my compiled kernel.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Xin Li <xin3.li@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>

Hou Wenlong (2):
  x86/fred: Always install system interrupt handler into IDT
  x86/fred: Add a page fault entry stub for FRED

 arch/x86/entry/entry_fred.c     |  2 +-
 arch/x86/include/asm/idtentry.h | 15 +++++++++++++--
 arch/x86/mm/fault.c             | 19 +++++++++++++++----
 3 files changed, 29 insertions(+), 7 deletions(-)

--
2.31.1
Re: [PATCH 0/2] x86/fred: Fix two problems during the FRED initialization
Posted by Xin Li 1 year, 5 months ago
On 6/21/2024 6:12 AM, Hou Wenlong wrote:
> When I reviewed the FRED code and attempted to implement a FRED-like
> event delivery for my PV guest, I encountered two problems which I may
> have misunderstood.

Hi Wenlong,

Thanks for bringing the issues up.

> 
> One issue is that FRED can be disabled in trap_init(), but
> sysvec_install() can be called before trap_init(), thus the system
> interrupt handler is not installed into the IDT if FRED is disabled
> later. Initially, I attempted to parse the cmdline and decide whether to
> enable or disable FRED after parse_early_param(). However, I ultimately
> chose to always install the system handler into the IDT in
> sysvec_install(), which is simple and should be sufficient.

Which module with a system vector gets initialized before trap_init() is
called?

> Another problem is that the page fault handler (exc_page_fault()) is
> installed into the IDT before FRED is enabled. Consequently, if a #PF is
> triggered in this gap, the handler would receive the wrong CR2 from the
> stack if FRED feature is present. To address this, I added a page fault
> entry stub for FRED similar to the debug entry. However, I'm uncertain
> whether this is enough reason to add a new entry. Perhaps a static key
> may suffice to indicate whether FRED setup is completed and the handler
> can use it.

How could a #PF get triggered during that gap?

Initialization time funnies are really unpleasant.

Thanks!
     Xin
Re: [PATCH 0/2] x86/fred: Fix two problems during the FRED initialization
Posted by Hou Wenlong 1 year, 5 months ago
On Sat, Jun 22, 2024 at 08:31:26AM +0800, Xin Li wrote:
> On 6/21/2024 6:12 AM, Hou Wenlong wrote:
> >When I reviewed the FRED code and attempted to implement a FRED-like
> >event delivery for my PV guest, I encountered two problems which I may
> >have misunderstood.
> 
> Hi Wenlong,
> 
> Thanks for bringing the issues up.
> 
Thanks for your kind reply.
 
> >
> >One issue is that FRED can be disabled in trap_init(), but
> >sysvec_install() can be called before trap_init(), thus the system
> >interrupt handler is not installed into the IDT if FRED is disabled
> >later. Initially, I attempted to parse the cmdline and decide whether to
> >enable or disable FRED after parse_early_param(). However, I ultimately
> >chose to always install the system handler into the IDT in
> >sysvec_install(), which is simple and should be sufficient.
> 
> Which module with a system vector gets initialized before trap_init() is
> called?
>
Sorry, I didn't mention the case here. I see sysvec_install() is used
only in the guest part (KVM, HYPERV) currently. For example, the KVM
guest will register the HYPERVISOR_CALLBACK_VECTOR APF handler in
kvm_guest_init(), which is called before trap_init(). So if only the FRED
handler is registered and FRED is disabled in trap_init() later, then the
IDT handler of the APF handler is missing.

> >Another problem is that the page fault handler (exc_page_fault()) is
> >installed into the IDT before FRED is enabled. Consequently, if a #PF is
> >triggered in this gap, the handler would receive the wrong CR2 from the
> >stack if FRED feature is present. To address this, I added a page fault
> >entry stub for FRED similar to the debug entry. However, I'm uncertain
> >whether this is enough reason to add a new entry. Perhaps a static key
> >may suffice to indicate whether FRED setup is completed and the handler
> >can use it.
> 
> How could a #PF get triggered during that gap?
> 
> Initialization time funnies are really unpleasant.
>
I'm not sure if there will be a #PF during that gap; I just received the
wrong fault address when I made a mistake in that gap and a #PF
occurred. Before idt_setup_early_pf(), the registered page fault handler
is do_early_exception(), which uses native_read_cr2(). However, after
that, the page fault handler had been changed to exc_page_fault(), so if
something bad happened and an unexpected #PF occurred, the fault address
in the error output will be wrong, although the CR2 in __show_regs() is
correct. I'm not sure if this matters or not since the kernel will panic
at that time.

Thanks!
> Thanks!
>     Xin
Re: [PATCH 0/2] x86/fred: Fix two problems during the FRED initialization
Posted by Xin Li 1 year, 5 months ago
On 6/23/2024 11:21 PM, Hou Wenlong wrote:
> On Sat, Jun 22, 2024 at 08:31:26AM +0800, Xin Li wrote:
>> On 6/21/2024 6:12 AM, Hou Wenlong wrote:
>>> One issue is that FRED can be disabled in trap_init(), but
>>> sysvec_install() can be called before trap_init(), thus the system
>>> interrupt handler is not installed into the IDT if FRED is disabled
>>> later. Initially, I attempted to parse the cmdline and decide whether to
>>> enable or disable FRED after parse_early_param(). However, I ultimately
>>> chose to always install the system handler into the IDT in
>>> sysvec_install(), which is simple and should be sufficient.
>>
>> Which module with a system vector gets initialized before trap_init() is
>> called?
>>
> Sorry, I didn't mention the case here. I see sysvec_install() is used
> only in the guest part (KVM, HYPERV) currently. For example, the KVM
> guest will register the HYPERVISOR_CALLBACK_VECTOR APF handler in
> kvm_guest_init(), which is called before trap_init(). So if only the FRED
> handler is registered and FRED is disabled in trap_init() later, then the
> IDT handler of the APF handler is missing.

This is a bug!  Your fix looks good to me.

>>> Another problem is that the page fault handler (exc_page_fault()) is
>>> installed into the IDT before FRED is enabled. Consequently, if a #PF is
>>> triggered in this gap, the handler would receive the wrong CR2 from the
>>> stack if FRED feature is present. To address this, I added a page fault
>>> entry stub for FRED similar to the debug entry. However, I'm uncertain
>>> whether this is enough reason to add a new entry. Perhaps a static key
>>> may suffice to indicate whether FRED setup is completed and the handler
>>> can use it.
>>
>> How could a #PF get triggered during that gap?
>>
>> Initialization time funnies are really unpleasant.
>>
> I'm not sure if there will be a #PF during that gap; I just received the
> wrong fault address when I made a mistake in that gap and a #PF
> occurred. Before idt_setup_early_pf(), the registered page fault handler
> is do_early_exception(), which uses native_read_cr2(). However, after
> that, the page fault handler had been changed to exc_page_fault(), so if
> something bad happened and an unexpected #PF occurred, the fault address
> in the error output will be wrong, although the CR2 in __show_regs() is
> correct. I'm not sure if this matters or not since the kernel will panic
> at that time.

So this doesn't sound a real problem, right?

We could simply do:

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e6c469b323cc..e500777ed2b4 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1495,7 +1495,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
            irqentry_state_t state;
            unsigned long address;

-       address = cpu_feature_enabled(X86_FEATURE_FRED) ? 
fred_event_data(regs) : read_cr2();
+       address = native_read_cr4() & X86_CR4_FRED ? 
fred_event_data(regs) : read_cr2();

            prefetchw(&current->mm->mmap_lock);


But the page fault handler is an extreme hot path, it's not worth it.

Thanks!
        Xin
Re: [PATCH 0/2] x86/fred: Fix two problems during the FRED initialization
Posted by H. Peter Anvin 1 year, 5 months ago
On June 25, 2024 2:09:29 AM PDT, Xin Li <xin@zytor.com> wrote:
>On 6/23/2024 11:21 PM, Hou Wenlong wrote:
>> On Sat, Jun 22, 2024 at 08:31:26AM +0800, Xin Li wrote:
>>> On 6/21/2024 6:12 AM, Hou Wenlong wrote:
>>>> One issue is that FRED can be disabled in trap_init(), but
>>>> sysvec_install() can be called before trap_init(), thus the system
>>>> interrupt handler is not installed into the IDT if FRED is disabled
>>>> later. Initially, I attempted to parse the cmdline and decide whether to
>>>> enable or disable FRED after parse_early_param(). However, I ultimately
>>>> chose to always install the system handler into the IDT in
>>>> sysvec_install(), which is simple and should be sufficient.
>>> 
>>> Which module with a system vector gets initialized before trap_init() is
>>> called?
>>> 
>> Sorry, I didn't mention the case here. I see sysvec_install() is used
>> only in the guest part (KVM, HYPERV) currently. For example, the KVM
>> guest will register the HYPERVISOR_CALLBACK_VECTOR APF handler in
>> kvm_guest_init(), which is called before trap_init(). So if only the FRED
>> handler is registered and FRED is disabled in trap_init() later, then the
>> IDT handler of the APF handler is missing.
>
>This is a bug!  Your fix looks good to me.
>
>>>> Another problem is that the page fault handler (exc_page_fault()) is
>>>> installed into the IDT before FRED is enabled. Consequently, if a #PF is
>>>> triggered in this gap, the handler would receive the wrong CR2 from the
>>>> stack if FRED feature is present. To address this, I added a page fault
>>>> entry stub for FRED similar to the debug entry. However, I'm uncertain
>>>> whether this is enough reason to add a new entry. Perhaps a static key
>>>> may suffice to indicate whether FRED setup is completed and the handler
>>>> can use it.
>>> 
>>> How could a #PF get triggered during that gap?
>>> 
>>> Initialization time funnies are really unpleasant.
>>> 
>> I'm not sure if there will be a #PF during that gap; I just received the
>> wrong fault address when I made a mistake in that gap and a #PF
>> occurred. Before idt_setup_early_pf(), the registered page fault handler
>> is do_early_exception(), which uses native_read_cr2(). However, after
>> that, the page fault handler had been changed to exc_page_fault(), so if
>> something bad happened and an unexpected #PF occurred, the fault address
>> in the error output will be wrong, although the CR2 in __show_regs() is
>> correct. I'm not sure if this matters or not since the kernel will panic
>> at that time.
>
>So this doesn't sound a real problem, right?
>
>We could simply do:
>
>diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>index e6c469b323cc..e500777ed2b4 100644
>--- a/arch/x86/mm/fault.c
>+++ b/arch/x86/mm/fault.c
>@@ -1495,7 +1495,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
>           irqentry_state_t state;
>           unsigned long address;
>
>-       address = cpu_feature_enabled(X86_FEATURE_FRED) ? fred_event_data(regs) : read_cr2();
>+       address = native_read_cr4() & X86_CR4_FRED ? fred_event_data(regs) : read_cr2();
>
>           prefetchw(&current->mm->mmap_lock);
>
>
>But the page fault handler is an extreme hot path, it's not worth it.
>
>Thanks!
>       Xin
>
>
>
>

Reading CR4 is insane from a performance perspective. Also, there is pretty much never a reason to, since CR4 is programmed by the OS.

But this is basically insane. We should enable FRED from the point we cut over from the early exception vector. That is: 

Early IDT → Final IDT
or
Early IDT → FRED

But not

Early IDT → Final IDT → FRED

Eventually we should enable FRED for early exceptions too (which ought to be quite trivial, but makes the whole CLI enable/disable a bit of a mess.)
Re: [PATCH 0/2] x86/fred: Fix two problems during the FRED initialization
Posted by Xin Li 1 year, 5 months ago
On 6/25/2024 10:24 AM, H. Peter Anvin wrote:
> On June 25, 2024 2:09:29 AM PDT, Xin Li <xin@zytor.com> wrote:
>> On 6/23/2024 11:21 PM, Hou Wenlong wrote:
>>> I'm not sure if there will be a #PF during that gap; I just received the
>>> wrong fault address when I made a mistake in that gap and a #PF
>>> occurred. Before idt_setup_early_pf(), the registered page fault handler
>>> is do_early_exception(), which uses native_read_cr2(). However, after
>>> that, the page fault handler had been changed to exc_page_fault(), so if
>>> something bad happened and an unexpected #PF occurred, the fault address
>>> in the error output will be wrong, although the CR2 in __show_regs() is
>>> correct. I'm not sure if this matters or not since the kernel will panic
>>> at that time.
>>
>> So this doesn't sound a real problem, right?
>>
>> We could simply do:
>>
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index e6c469b323cc..e500777ed2b4 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -1495,7 +1495,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
>>            irqentry_state_t state;
>>            unsigned long address;
>>
>> -       address = cpu_feature_enabled(X86_FEATURE_FRED) ? fred_event_data(regs) : read_cr2();
>> +       address = native_read_cr4() & X86_CR4_FRED ? fred_event_data(regs) : read_cr2();
>>
>>            prefetchw(&current->mm->mmap_lock);
>>
>>
>> But the page fault handler is an extreme hot path, it's not worth it.
>>
>> Thanks!
>>        Xin


> Reading CR4 is insane from a performance perspective. Also, there is pretty much never a reason to, since CR4 is programmed by the OS.

Agreed!

> 
> But this is basically insane. We should enable FRED from the point we cut over from the early exception vector. That is:
> 
> Early IDT → Final IDT
> or
> Early IDT → FRED
> 
> But not
> 
> Early IDT → Final IDT → FRED
> 
> Eventually we should enable FRED for early exceptions too (which ought to be quite trivial, but makes the whole CLI enable/disable a bit of a mess.)
> 

I think you and tglx are talking the same thing :)

Thanks!
     Xin