[edk2] [RFC v5 0/8] Stack trace support in X64 exception handling

Paulo Alcantara posted 8 patches 6 years, 3 months ago
Failed in applying to current master (apply log)
UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.c        | 537 ++++++++++++++++++--
UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.h        |  59 ++-
UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ArchExceptionHandler.c | 483 +++++++++++++++++-
UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ArchExceptionHandler.c  | 426 +++++++++++++++-
4 files changed, 1435 insertions(+), 70 deletions(-)
[edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
Posted by Paulo Alcantara 6 years, 3 months ago
Hi,

This series adds stack trace support during IA32 and X64 CPU exceptions.

Informations like back trace, stack contents and image module names
(that were part of the call stack) will be dumped out.

The current limitation is that it relies on available frame pointers
(GCC only) in order to successfully unwind the stack.

Jiewen,

Thank you very much for your time on this. I've applied the changes you
suggested, as well as tested it on IA32 PAE paging mode - it worked as
expected.

Other than that, I also tested the stack trace in SMM code by manually
calling CpuBreakPoint() and then it broke with another exception
(page fault). I didn't have much time to look into that, but what I've
observed is that the page fault ocurred during the search of PE/COFF
image base address (in PeCoffSearchImageBase). The function attempts to
search for the image base from "Address" through 0, so any of those
dereferenced addresses triggers the page fault.

Do you know how we could fix that issue? Perhaps introducing a
AddressValidationLib (as Brian suggested previously) and use it within
PeCoffSearchImageBase()?

I'd also like to thank Brian & Jeff for all the support!

Thanks
Paulo

Repo:   https://github.com/pcacjr/edk2.git
Branch: stacktrace_v5

Cc: Rick Bramley <richard.bramley@hp.com>
Cc: Kimon Berlin <kimon.berlin@hp.com>
Cc: Diego Medaglia <diego.meaglia@hp.com>
Cc: Andrew Fish <afish@apple.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Brian Johnson <brian.johnson@hpe.com>
Cc: Jeff Fan <vanjeff_919@hotmail.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Paulo Alcantara <paulo@hp.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Paulo Alcantara <paulo@paulo.ac>
---

v1 -> v2:
  * Add IA32 arch support (GCC toolchain only)
  * Replace hard-coded stack alignment value (16) with
    CPU_STACK_ALIGNMENT.
  * Check for proper stack and frame pointer alignments.
  * Fix initialization of UnwoundStacksCount to 1.
  * Move GetPdbFileName() to common code since it will be used by both
    IA32 and X64 implementations.

v2 -> v3:
  * Fixed wrong assumption about "RIP < ImageBase" to start searching
    for another PE/COFF image. That is, RIP may point to lower and
    higher addresses for any other PE/COFF images. Both IA32 & X64.
    (Thanks Andrew & Jiewen)
  * Fixed typo: unwond -> unwound. Both IA32 & X64. (Thanks Brian)

v3 -> v4:
  * Validate all frame/stack pointer addresses before dereferencing them
    as requested by Brian & Jiewen.
  * Correctly print out IP addresses during the stack traces (by Jeff)

v4 -> v5:
  * Fixed address calculations and improved code as suggested by Jiewen.
  * Fixed parameter validation as suggested by Brian.
  * Tested stack stack with IA32 PAE paging mode.

Paulo Alcantara (8):
  UefiCpuPkg/CpuExceptionHandlerLib/X64: Add stack trace support
  UefiCpuPkg/CpuExceptionHandlerLib: Export GetPdbFileName()
  UefiCpuPkg/CpuExceptionHandlerLib/Ia32: Add stack trace support
  UefiCpuPkg/CpuExceptionHandlerLib: Add helper to validate memory
    addresses
  UefiCpuPkg/CpuExceptionHandlerLib: Ensure valid frame/stack pointers
  UefiCpuPkg/CpuExceptionHandlerLib: Correctly print IP addresses
  UefiCpuPkg/CpuExceptionHandlerLib: Validate memory address ranges
  UefiCpuPkg/CpuExceptionHandlerLib: Add early check in
    DumpStackContents

 UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.c        | 537 ++++++++++++++++++--
 UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.h        |  59 ++-
 UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ArchExceptionHandler.c | 483 +++++++++++++++++-
 UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ArchExceptionHandler.c  | 426 +++++++++++++++-
 4 files changed, 1435 insertions(+), 70 deletions(-)

-- 
2.14.3

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
Posted by Yao, Jiewen 6 years, 3 months ago
Thanks Paulo.

The series looks really good. I give it thumb up. :-)

The 8 patches keep updating 4 files, so I squash them when I review. The comment below is for the final code, not for a specific patch.

1) CpuExceptionCommon.c: IsLinearAddressRangeValid().
  //
  // Check for valid input parameters
  //
  if (LinearAddressStart == 0 || LinearAddressEnd == 0 ||
      LinearAddressStart > LinearAddressEnd) {
    return FALSE;
  }

I think LinearAddressStart is a valid case. In BIOS, we do update IVT in 0 address when CSM is enabled.
I do not think we need exclude it here. If we enabled ZeroPointer protection, it can be excluded in page table parsing.

2) I found below logic appears in 2 functions - DumpImageModuleNames() and DumpStacktrace(). Is that possible to merge them?
We can calculate in DumpImageAndCpuContent() and use Eip as parameter. Just in case there is 3rd function need use EIP, it won't miss the calculation.
(It is a bug fix we did recently.)

  //
  // Set current EIP address
  //
  if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
      ((SystemContext.SystemContextIa32->ExceptionData & IA32_PF_EC_ID) != 0)) {
    //
    // The EIP in SystemContext could not be used
    // if it is page fault with I/D set.
    //
    Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
  } else {
    Eip = SystemContext.SystemContextIa32->Eip;
  }

3) I am a little surprised on PeCoffSearchImageBase() issue.

We have 4 PeCoffSearchImageBase() call in each arch. DumpImageModuleNames() calls twice and DumpStacktrace() calls twice.
Do you know which specific one triggers the zero address #PF issue?

  C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(547):  ImageBase = PeCoffSearchImageBase (Eip);
  C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Eip);
  C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(682):  ImageBase = PeCoffSearchImageBase (Eip);
  C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(741):    ImageBase = PeCoffSearchImageBase (Eip);
  C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(540):  ImageBase = PeCoffSearchImageBase (Rip);
  C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Rip);
  C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(710):  ImageBase = PeCoffSearchImageBase (Rip);
  C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(779):    ImageBase = PeCoffSearchImageBase (Rip);

The EIP from SystemContext seems good. I assume there is not a problem here.

  if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
      ((SystemContext.SystemContextIa32->ExceptionData & IA32_PF_EC_ID) != 0)) {
    //
    // The EIP in SystemContext could not be used
    // if it is page fault with I/D set.
    //
    Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
  } else {
    Eip = SystemContext.SystemContextIa32->Eip;
  }

  //
  // Get initial PE/COFF image base address from current EIP
  //
  ImageBase = PeCoffSearchImageBase (Eip);

But the EIP from stack frame is unknown and risky. Especially for the code in mode-switch, such as PEI->DXE, UEFI->CSM, AP wakeup->AP C code, SMM entrypoint->SMM C code.
Should we add a check for EIP here?

    //
    // Set EIP with return address from current stack frame
    //
    Eip = *(UINT32 *)((UINTN)Ebp + 4);

    //
    // If EIP is zero, then stop unwinding the stack
    //
    if (Eip == 0) {
      break;
    }

    //
    // Search for the respective PE/COFF image based on EIP
    //
    ImageBase = PeCoffSearchImageBase (Eip);

If you can help us do some more investigation on the root-cause and narrow down the issue, I will appreciate that.

We can decide how to fix once it is root-caused.

Maybe we just do a simple IsLogicalAddressValid () check before we call PeCoffSearchImageBase(). :-)


Thank you
Yao Jiewen

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Paulo
> Alcantara
> Sent: Monday, January 15, 2018 8:23 AM
> To: edk2-devel@lists.01.org
> Cc: Rick Bramley <richard.bramley@hp.com>; Dong, Eric
> <eric.dong@intel.com>; Kimon Berlin <kimon.berlin@hp.com>; Andrew Fish
> <afish@apple.com>; Yao, Jiewen <jiewen.yao@intel.com>; Diego Medaglia
> <diego.meaglia@hp.com>; Laszlo Ersek <lersek@redhat.com>
> Subject: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
> 
> Hi,
> 
> This series adds stack trace support during IA32 and X64 CPU exceptions.
> 
> Informations like back trace, stack contents and image module names
> (that were part of the call stack) will be dumped out.
> 
> The current limitation is that it relies on available frame pointers
> (GCC only) in order to successfully unwind the stack.
> 
> Jiewen,
> 
> Thank you very much for your time on this. I've applied the changes you
> suggested, as well as tested it on IA32 PAE paging mode - it worked as
> expected.
> 
> Other than that, I also tested the stack trace in SMM code by manually
> calling CpuBreakPoint() and then it broke with another exception
> (page fault). I didn't have much time to look into that, but what I've
> observed is that the page fault ocurred during the search of PE/COFF
> image base address (in PeCoffSearchImageBase). The function attempts to
> search for the image base from "Address" through 0, so any of those
> dereferenced addresses triggers the page fault.
> 
> Do you know how we could fix that issue? Perhaps introducing a
> AddressValidationLib (as Brian suggested previously) and use it within
> PeCoffSearchImageBase()?
> 
> I'd also like to thank Brian & Jeff for all the support!
> 
> Thanks
> Paulo
> 
> Repo:   https://github.com/pcacjr/edk2.git
> Branch: stacktrace_v5
> 
> Cc: Rick Bramley <richard.bramley@hp.com>
> Cc: Kimon Berlin <kimon.berlin@hp.com>
> Cc: Diego Medaglia <diego.meaglia@hp.com>
> Cc: Andrew Fish <afish@apple.com>
> Cc: Eric Dong <eric.dong@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Brian Johnson <brian.johnson@hpe.com>
> Cc: Jeff Fan <vanjeff_919@hotmail.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Cc: Paulo Alcantara <paulo@hp.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Paulo Alcantara <paulo@paulo.ac>
> ---
> 
> v1 -> v2:
>   * Add IA32 arch support (GCC toolchain only)
>   * Replace hard-coded stack alignment value (16) with
>     CPU_STACK_ALIGNMENT.
>   * Check for proper stack and frame pointer alignments.
>   * Fix initialization of UnwoundStacksCount to 1.
>   * Move GetPdbFileName() to common code since it will be used by both
>     IA32 and X64 implementations.
> 
> v2 -> v3:
>   * Fixed wrong assumption about "RIP < ImageBase" to start searching
>     for another PE/COFF image. That is, RIP may point to lower and
>     higher addresses for any other PE/COFF images. Both IA32 & X64.
>     (Thanks Andrew & Jiewen)
>   * Fixed typo: unwond -> unwound. Both IA32 & X64. (Thanks Brian)
> 
> v3 -> v4:
>   * Validate all frame/stack pointer addresses before dereferencing them
>     as requested by Brian & Jiewen.
>   * Correctly print out IP addresses during the stack traces (by Jeff)
> 
> v4 -> v5:
>   * Fixed address calculations and improved code as suggested by Jiewen.
>   * Fixed parameter validation as suggested by Brian.
>   * Tested stack stack with IA32 PAE paging mode.
> 
> Paulo Alcantara (8):
>   UefiCpuPkg/CpuExceptionHandlerLib/X64: Add stack trace support
>   UefiCpuPkg/CpuExceptionHandlerLib: Export GetPdbFileName()
>   UefiCpuPkg/CpuExceptionHandlerLib/Ia32: Add stack trace support
>   UefiCpuPkg/CpuExceptionHandlerLib: Add helper to validate memory
>     addresses
>   UefiCpuPkg/CpuExceptionHandlerLib: Ensure valid frame/stack pointers
>   UefiCpuPkg/CpuExceptionHandlerLib: Correctly print IP addresses
>   UefiCpuPkg/CpuExceptionHandlerLib: Validate memory address ranges
>   UefiCpuPkg/CpuExceptionHandlerLib: Add early check in
>     DumpStackContents
> 
>  UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.c
> | 537 ++++++++++++++++++--
>  UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.h
> |  59 ++-
>  UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ArchExceptionHandler.c |
> 483 +++++++++++++++++-
>  UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ArchExceptionHandler.c  |
> 426 +++++++++++++++-
>  4 files changed, 1435 insertions(+), 70 deletions(-)
> 
> --
> 2.14.3
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
Posted by Yao, Jiewen 6 years, 3 months ago
Hi Paulo
I have some more thought on #3 for your consideration.

Given the situation that we are doing heap guard feature, a *valid* EIP address might not be enough to guarantee the address is inside of an *executable* page.

I am thinking if we need check the non-executable bit (BIT63) as well for DumpImageModuleNames(). (No need for DumpStacktrace()).
We can add IsLogicalAddressExecutable() on top of IsLogicalAddressValid().


Also I do not object the idea to add check inside of PeCoffSearchImageBase().
I am thinking in another way - we can let *caller* to input a validation function for the PeCoffLib, instead of let PeCoffLib depend on another library.

For example:
PeCoffSearchImageBaseEx (Address, ValidateAddressFunc)

Thank you
Yao Jiewen


> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Yao,
> Jiewen
> Sent: Wednesday, January 17, 2018 8:57 PM
> To: Paulo Alcantara <paulo@paulo.ac>; edk2-devel@lists.01.org
> Cc: Rick Bramley <richard.bramley@hp.com>; Dong, Eric
> <eric.dong@intel.com>; Kimon Berlin <kimon.berlin@hp.com>; Andrew Fish
> <afish@apple.com>; Diego Medaglia <diego.meaglia@hp.com>; Laszlo Ersek
> <lersek@redhat.com>
> Subject: Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
> 
> Thanks Paulo.
> 
> The series looks really good. I give it thumb up. :-)
> 
> The 8 patches keep updating 4 files, so I squash them when I review. The
> comment below is for the final code, not for a specific patch.
> 
> 1) CpuExceptionCommon.c: IsLinearAddressRangeValid().
>   //
>   // Check for valid input parameters
>   //
>   if (LinearAddressStart == 0 || LinearAddressEnd == 0 ||
>       LinearAddressStart > LinearAddressEnd) {
>     return FALSE;
>   }
> 
> I think LinearAddressStart is a valid case. In BIOS, we do update IVT in 0 address
> when CSM is enabled.
> I do not think we need exclude it here. If we enabled ZeroPointer protection, it
> can be excluded in page table parsing.
> 
> 2) I found below logic appears in 2 functions - DumpImageModuleNames() and
> DumpStacktrace(). Is that possible to merge them?
> We can calculate in DumpImageAndCpuContent() and use Eip as parameter. Just
> in case there is 3rd function need use EIP, it won't miss the calculation.
> (It is a bug fix we did recently.)
> 
>   //
>   // Set current EIP address
>   //
>   if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
>       ((SystemContext.SystemContextIa32->ExceptionData &
> IA32_PF_EC_ID) != 0)) {
>     //
>     // The EIP in SystemContext could not be used
>     // if it is page fault with I/D set.
>     //
>     Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
>   } else {
>     Eip = SystemContext.SystemContextIa32->Eip;
>   }
> 
> 3) I am a little surprised on PeCoffSearchImageBase() issue.
> 
> We have 4 PeCoffSearchImageBase() call in each arch.
> DumpImageModuleNames() calls twice and DumpStacktrace() calls twice.
> Do you know which specific one triggers the zero address #PF issue?
> 
> 
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
> ExceptionHandler.c(547):  ImageBase = PeCoffSearchImageBase (Eip);
> 
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
> ExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Eip);
> 
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
> ExceptionHandler.c(682):  ImageBase = PeCoffSearchImageBase (Eip);
> 
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
> ExceptionHandler.c(741):    ImageBase = PeCoffSearchImageBase (Eip);
> 
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
> ExceptionHandler.c(540):  ImageBase = PeCoffSearchImageBase (Rip);
> 
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
> ExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Rip);
> 
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
> ExceptionHandler.c(710):  ImageBase = PeCoffSearchImageBase (Rip);
> 
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
> ExceptionHandler.c(779):    ImageBase = PeCoffSearchImageBase (Rip);
> 
> The EIP from SystemContext seems good. I assume there is not a problem here.
> 
>   if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
>       ((SystemContext.SystemContextIa32->ExceptionData &
> IA32_PF_EC_ID) != 0)) {
>     //
>     // The EIP in SystemContext could not be used
>     // if it is page fault with I/D set.
>     //
>     Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
>   } else {
>     Eip = SystemContext.SystemContextIa32->Eip;
>   }
> 
>   //
>   // Get initial PE/COFF image base address from current EIP
>   //
>   ImageBase = PeCoffSearchImageBase (Eip);
> 
> But the EIP from stack frame is unknown and risky. Especially for the code in
> mode-switch, such as PEI->DXE, UEFI->CSM, AP wakeup->AP C code, SMM
> entrypoint->SMM C code.
> Should we add a check for EIP here?
> 
>     //
>     // Set EIP with return address from current stack frame
>     //
>     Eip = *(UINT32 *)((UINTN)Ebp + 4);
> 
>     //
>     // If EIP is zero, then stop unwinding the stack
>     //
>     if (Eip == 0) {
>       break;
>     }
> 
>     //
>     // Search for the respective PE/COFF image based on EIP
>     //
>     ImageBase = PeCoffSearchImageBase (Eip);
> 
> If you can help us do some more investigation on the root-cause and narrow
> down the issue, I will appreciate that.
> 
> We can decide how to fix once it is root-caused.
> 
> Maybe we just do a simple IsLogicalAddressValid () check before we call
> PeCoffSearchImageBase(). :-)
> 
> 
> Thank you
> Yao Jiewen
> 
> > -----Original Message-----
> > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> Paulo
> > Alcantara
> > Sent: Monday, January 15, 2018 8:23 AM
> > To: edk2-devel@lists.01.org
> > Cc: Rick Bramley <richard.bramley@hp.com>; Dong, Eric
> > <eric.dong@intel.com>; Kimon Berlin <kimon.berlin@hp.com>; Andrew Fish
> > <afish@apple.com>; Yao, Jiewen <jiewen.yao@intel.com>; Diego Medaglia
> > <diego.meaglia@hp.com>; Laszlo Ersek <lersek@redhat.com>
> > Subject: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
> >
> > Hi,
> >
> > This series adds stack trace support during IA32 and X64 CPU exceptions.
> >
> > Informations like back trace, stack contents and image module names
> > (that were part of the call stack) will be dumped out.
> >
> > The current limitation is that it relies on available frame pointers
> > (GCC only) in order to successfully unwind the stack.
> >
> > Jiewen,
> >
> > Thank you very much for your time on this. I've applied the changes you
> > suggested, as well as tested it on IA32 PAE paging mode - it worked as
> > expected.
> >
> > Other than that, I also tested the stack trace in SMM code by manually
> > calling CpuBreakPoint() and then it broke with another exception
> > (page fault). I didn't have much time to look into that, but what I've
> > observed is that the page fault ocurred during the search of PE/COFF
> > image base address (in PeCoffSearchImageBase). The function attempts to
> > search for the image base from "Address" through 0, so any of those
> > dereferenced addresses triggers the page fault.
> >
> > Do you know how we could fix that issue? Perhaps introducing a
> > AddressValidationLib (as Brian suggested previously) and use it within
> > PeCoffSearchImageBase()?
> >
> > I'd also like to thank Brian & Jeff for all the support!
> >
> > Thanks
> > Paulo
> >
> > Repo:   https://github.com/pcacjr/edk2.git
> > Branch: stacktrace_v5
> >
> > Cc: Rick Bramley <richard.bramley@hp.com>
> > Cc: Kimon Berlin <kimon.berlin@hp.com>
> > Cc: Diego Medaglia <diego.meaglia@hp.com>
> > Cc: Andrew Fish <afish@apple.com>
> > Cc: Eric Dong <eric.dong@intel.com>
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Cc: Brian Johnson <brian.johnson@hpe.com>
> > Cc: Jeff Fan <vanjeff_919@hotmail.com>
> > Cc: Jiewen Yao <jiewen.yao@intel.com>
> > Cc: Paulo Alcantara <paulo@hp.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Paulo Alcantara <paulo@paulo.ac>
> > ---
> >
> > v1 -> v2:
> >   * Add IA32 arch support (GCC toolchain only)
> >   * Replace hard-coded stack alignment value (16) with
> >     CPU_STACK_ALIGNMENT.
> >   * Check for proper stack and frame pointer alignments.
> >   * Fix initialization of UnwoundStacksCount to 1.
> >   * Move GetPdbFileName() to common code since it will be used by both
> >     IA32 and X64 implementations.
> >
> > v2 -> v3:
> >   * Fixed wrong assumption about "RIP < ImageBase" to start searching
> >     for another PE/COFF image. That is, RIP may point to lower and
> >     higher addresses for any other PE/COFF images. Both IA32 & X64.
> >     (Thanks Andrew & Jiewen)
> >   * Fixed typo: unwond -> unwound. Both IA32 & X64. (Thanks Brian)
> >
> > v3 -> v4:
> >   * Validate all frame/stack pointer addresses before dereferencing them
> >     as requested by Brian & Jiewen.
> >   * Correctly print out IP addresses during the stack traces (by Jeff)
> >
> > v4 -> v5:
> >   * Fixed address calculations and improved code as suggested by Jiewen.
> >   * Fixed parameter validation as suggested by Brian.
> >   * Tested stack stack with IA32 PAE paging mode.
> >
> > Paulo Alcantara (8):
> >   UefiCpuPkg/CpuExceptionHandlerLib/X64: Add stack trace support
> >   UefiCpuPkg/CpuExceptionHandlerLib: Export GetPdbFileName()
> >   UefiCpuPkg/CpuExceptionHandlerLib/Ia32: Add stack trace support
> >   UefiCpuPkg/CpuExceptionHandlerLib: Add helper to validate memory
> >     addresses
> >   UefiCpuPkg/CpuExceptionHandlerLib: Ensure valid frame/stack pointers
> >   UefiCpuPkg/CpuExceptionHandlerLib: Correctly print IP addresses
> >   UefiCpuPkg/CpuExceptionHandlerLib: Validate memory address ranges
> >   UefiCpuPkg/CpuExceptionHandlerLib: Add early check in
> >     DumpStackContents
> >
> >  UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.c
> > | 537 ++++++++++++++++++--
> >  UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.h
> > |  59 ++-
> >  UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ArchExceptionHandler.c |
> > 483 +++++++++++++++++-
> >  UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ArchExceptionHandler.c
> |
> > 426 +++++++++++++++-
> >  4 files changed, 1435 insertions(+), 70 deletions(-)
> >
> > --
> > 2.14.3
> >
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
Posted by Paulo Alcantara 6 years, 3 months ago
Hi Jiewen,

"Yao, Jiewen" <jiewen.yao@intel.com> writes:

> I have some more thought on #3 for your consideration.
>
> Given the situation that we are doing heap guard feature, a *valid*
> EIP address might not be enough to guarantee the address is inside of
> an *executable* page.

OK.

>
> I am thinking if we need check the non-executable bit (BIT63) as well for DumpImageModuleNames(). (No need for DumpStacktrace()).
> We can add IsLogicalAddressExecutable() on top of
> IsLogicalAddressValid().

OK. I can do that.

> Also I do not object the idea to add check inside of PeCoffSearchImageBase().
> I am thinking in another way - we can let *caller* to input a validation function for the PeCoffLib, instead of let PeCoffLib depend on another library.
>
> For example:
> PeCoffSearchImageBaseEx (Address, ValidateAddressFunc)

Looks good to me! Whether or not we're making it into a external library
in the future, we should at least get it working and well tested for the
stack trace support.

Thanks!
Paulo

>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Yao,
>> Jiewen
>> Sent: Wednesday, January 17, 2018 8:57 PM
>> To: Paulo Alcantara <paulo@paulo.ac>; edk2-devel@lists.01.org
>> Cc: Rick Bramley <richard.bramley@hp.com>; Dong, Eric
>> <eric.dong@intel.com>; Kimon Berlin <kimon.berlin@hp.com>; Andrew Fish
>> <afish@apple.com>; Diego Medaglia <diego.meaglia@hp.com>; Laszlo Ersek
>> <lersek@redhat.com>
>> Subject: Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
>> 
>> Thanks Paulo.
>> 
>> The series looks really good. I give it thumb up. :-)
>> 
>> The 8 patches keep updating 4 files, so I squash them when I review. The
>> comment below is for the final code, not for a specific patch.
>> 
>> 1) CpuExceptionCommon.c: IsLinearAddressRangeValid().
>>   //
>>   // Check for valid input parameters
>>   //
>>   if (LinearAddressStart == 0 || LinearAddressEnd == 0 ||
>>       LinearAddressStart > LinearAddressEnd) {
>>     return FALSE;
>>   }
>> 
>> I think LinearAddressStart is a valid case. In BIOS, we do update IVT in 0 address
>> when CSM is enabled.
>> I do not think we need exclude it here. If we enabled ZeroPointer protection, it
>> can be excluded in page table parsing.
>> 
>> 2) I found below logic appears in 2 functions - DumpImageModuleNames() and
>> DumpStacktrace(). Is that possible to merge them?
>> We can calculate in DumpImageAndCpuContent() and use Eip as parameter. Just
>> in case there is 3rd function need use EIP, it won't miss the calculation.
>> (It is a bug fix we did recently.)
>> 
>>   //
>>   // Set current EIP address
>>   //
>>   if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
>>       ((SystemContext.SystemContextIa32->ExceptionData &
>> IA32_PF_EC_ID) != 0)) {
>>     //
>>     // The EIP in SystemContext could not be used
>>     // if it is page fault with I/D set.
>>     //
>>     Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
>>   } else {
>>     Eip = SystemContext.SystemContextIa32->Eip;
>>   }
>> 
>> 3) I am a little surprised on PeCoffSearchImageBase() issue.
>> 
>> We have 4 PeCoffSearchImageBase() call in each arch.
>> DumpImageModuleNames() calls twice and DumpStacktrace() calls twice.
>> Do you know which specific one triggers the zero address #PF issue?
>> 
>> 
>> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
>> ExceptionHandler.c(547):  ImageBase = PeCoffSearchImageBase (Eip);
>> 
>> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
>> ExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Eip);
>> 
>> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
>> ExceptionHandler.c(682):  ImageBase = PeCoffSearchImageBase (Eip);
>> 
>> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
>> ExceptionHandler.c(741):    ImageBase = PeCoffSearchImageBase (Eip);
>> 
>> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
>> ExceptionHandler.c(540):  ImageBase = PeCoffSearchImageBase (Rip);
>> 
>> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
>> ExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Rip);
>> 
>> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
>> ExceptionHandler.c(710):  ImageBase = PeCoffSearchImageBase (Rip);
>> 
>> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
>> ExceptionHandler.c(779):    ImageBase = PeCoffSearchImageBase (Rip);
>> 
>> The EIP from SystemContext seems good. I assume there is not a problem here.
>> 
>>   if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
>>       ((SystemContext.SystemContextIa32->ExceptionData &
>> IA32_PF_EC_ID) != 0)) {
>>     //
>>     // The EIP in SystemContext could not be used
>>     // if it is page fault with I/D set.
>>     //
>>     Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
>>   } else {
>>     Eip = SystemContext.SystemContextIa32->Eip;
>>   }
>> 
>>   //
>>   // Get initial PE/COFF image base address from current EIP
>>   //
>>   ImageBase = PeCoffSearchImageBase (Eip);
>> 
>> But the EIP from stack frame is unknown and risky. Especially for the code in
>> mode-switch, such as PEI->DXE, UEFI->CSM, AP wakeup->AP C code, SMM
>> entrypoint->SMM C code.
>> Should we add a check for EIP here?
>> 
>>     //
>>     // Set EIP with return address from current stack frame
>>     //
>>     Eip = *(UINT32 *)((UINTN)Ebp + 4);
>> 
>>     //
>>     // If EIP is zero, then stop unwinding the stack
>>     //
>>     if (Eip == 0) {
>>       break;
>>     }
>> 
>>     //
>>     // Search for the respective PE/COFF image based on EIP
>>     //
>>     ImageBase = PeCoffSearchImageBase (Eip);
>> 
>> If you can help us do some more investigation on the root-cause and narrow
>> down the issue, I will appreciate that.
>> 
>> We can decide how to fix once it is root-caused.
>> 
>> Maybe we just do a simple IsLogicalAddressValid () check before we call
>> PeCoffSearchImageBase(). :-)
>> 
>> 
>> Thank you
>> Yao Jiewen
>> 
>> > -----Original Message-----
>> > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
>> Paulo
>> > Alcantara
>> > Sent: Monday, January 15, 2018 8:23 AM
>> > To: edk2-devel@lists.01.org
>> > Cc: Rick Bramley <richard.bramley@hp.com>; Dong, Eric
>> > <eric.dong@intel.com>; Kimon Berlin <kimon.berlin@hp.com>; Andrew Fish
>> > <afish@apple.com>; Yao, Jiewen <jiewen.yao@intel.com>; Diego Medaglia
>> > <diego.meaglia@hp.com>; Laszlo Ersek <lersek@redhat.com>
>> > Subject: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
>> >
>> > Hi,
>> >
>> > This series adds stack trace support during IA32 and X64 CPU exceptions.
>> >
>> > Informations like back trace, stack contents and image module names
>> > (that were part of the call stack) will be dumped out.
>> >
>> > The current limitation is that it relies on available frame pointers
>> > (GCC only) in order to successfully unwind the stack.
>> >
>> > Jiewen,
>> >
>> > Thank you very much for your time on this. I've applied the changes you
>> > suggested, as well as tested it on IA32 PAE paging mode - it worked as
>> > expected.
>> >
>> > Other than that, I also tested the stack trace in SMM code by manually
>> > calling CpuBreakPoint() and then it broke with another exception
>> > (page fault). I didn't have much time to look into that, but what I've
>> > observed is that the page fault ocurred during the search of PE/COFF
>> > image base address (in PeCoffSearchImageBase). The function attempts to
>> > search for the image base from "Address" through 0, so any of those
>> > dereferenced addresses triggers the page fault.
>> >
>> > Do you know how we could fix that issue? Perhaps introducing a
>> > AddressValidationLib (as Brian suggested previously) and use it within
>> > PeCoffSearchImageBase()?
>> >
>> > I'd also like to thank Brian & Jeff for all the support!
>> >
>> > Thanks
>> > Paulo
>> >
>> > Repo:   https://github.com/pcacjr/edk2.git
>> > Branch: stacktrace_v5
>> >
>> > Cc: Rick Bramley <richard.bramley@hp.com>
>> > Cc: Kimon Berlin <kimon.berlin@hp.com>
>> > Cc: Diego Medaglia <diego.meaglia@hp.com>
>> > Cc: Andrew Fish <afish@apple.com>
>> > Cc: Eric Dong <eric.dong@intel.com>
>> > Cc: Laszlo Ersek <lersek@redhat.com>
>> > Cc: Brian Johnson <brian.johnson@hpe.com>
>> > Cc: Jeff Fan <vanjeff_919@hotmail.com>
>> > Cc: Jiewen Yao <jiewen.yao@intel.com>
>> > Cc: Paulo Alcantara <paulo@hp.com>
>> > Contributed-under: TianoCore Contribution Agreement 1.1
>> > Signed-off-by: Paulo Alcantara <paulo@paulo.ac>
>> > ---
>> >
>> > v1 -> v2:
>> >   * Add IA32 arch support (GCC toolchain only)
>> >   * Replace hard-coded stack alignment value (16) with
>> >     CPU_STACK_ALIGNMENT.
>> >   * Check for proper stack and frame pointer alignments.
>> >   * Fix initialization of UnwoundStacksCount to 1.
>> >   * Move GetPdbFileName() to common code since it will be used by both
>> >     IA32 and X64 implementations.
>> >
>> > v2 -> v3:
>> >   * Fixed wrong assumption about "RIP < ImageBase" to start searching
>> >     for another PE/COFF image. That is, RIP may point to lower and
>> >     higher addresses for any other PE/COFF images. Both IA32 & X64.
>> >     (Thanks Andrew & Jiewen)
>> >   * Fixed typo: unwond -> unwound. Both IA32 & X64. (Thanks Brian)
>> >
>> > v3 -> v4:
>> >   * Validate all frame/stack pointer addresses before dereferencing them
>> >     as requested by Brian & Jiewen.
>> >   * Correctly print out IP addresses during the stack traces (by Jeff)
>> >
>> > v4 -> v5:
>> >   * Fixed address calculations and improved code as suggested by Jiewen.
>> >   * Fixed parameter validation as suggested by Brian.
>> >   * Tested stack stack with IA32 PAE paging mode.
>> >
>> > Paulo Alcantara (8):
>> >   UefiCpuPkg/CpuExceptionHandlerLib/X64: Add stack trace support
>> >   UefiCpuPkg/CpuExceptionHandlerLib: Export GetPdbFileName()
>> >   UefiCpuPkg/CpuExceptionHandlerLib/Ia32: Add stack trace support
>> >   UefiCpuPkg/CpuExceptionHandlerLib: Add helper to validate memory
>> >     addresses
>> >   UefiCpuPkg/CpuExceptionHandlerLib: Ensure valid frame/stack pointers
>> >   UefiCpuPkg/CpuExceptionHandlerLib: Correctly print IP addresses
>> >   UefiCpuPkg/CpuExceptionHandlerLib: Validate memory address ranges
>> >   UefiCpuPkg/CpuExceptionHandlerLib: Add early check in
>> >     DumpStackContents
>> >
>> >  UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.c
>> > | 537 ++++++++++++++++++--
>> >  UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.h
>> > |  59 ++-
>> >  UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ArchExceptionHandler.c |
>> > 483 +++++++++++++++++-
>> >  UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ArchExceptionHandler.c
>> |
>> > 426 +++++++++++++++-
>> >  4 files changed, 1435 insertions(+), 70 deletions(-)
>> >
>> > --
>> > 2.14.3
>> >
>> > _______________________________________________
>> > edk2-devel mailing list
>> > edk2-devel@lists.01.org
>> > https://lists.01.org/mailman/listinfo/edk2-devel
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
Posted by Paulo Alcantara 6 years, 3 months ago
"Yao, Jiewen" <jiewen.yao@intel.com> writes:

Hi Jiewen,

Thank you very much for teh review! My comments below:

> The 8 patches keep updating 4 files, so I squash them when I review. The comment below is for the final code, not for a specific patch.
>
> 1) CpuExceptionCommon.c: IsLinearAddressRangeValid().
>   //
>   // Check for valid input parameters
>   //
>   if (LinearAddressStart == 0 || LinearAddressEnd == 0 ||
>       LinearAddressStart > LinearAddressEnd) {
>     return FALSE;
>   }
>
> I think LinearAddressStart is a valid case. In BIOS, we do update IVT in 0 address when CSM is enabled.
> I do not think we need exclude it here. If we enabled ZeroPointer
> protection, it can be excluded in page table parsing.

OK - didn't know about it. I'll remove that check.

> 2) I found below logic appears in 2 functions - DumpImageModuleNames() and DumpStacktrace(). Is that possible to merge them?
> We can calculate in DumpImageAndCpuContent() and use Eip as parameter. Just in case there is 3rd function need use EIP, it won't miss the calculation.
> (It is a bug fix we did recently.)
>
>   //
>   // Set current EIP address
>   //
>   if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
>       ((SystemContext.SystemContextIa32->ExceptionData & IA32_PF_EC_ID) != 0)) {
>     //
>     // The EIP in SystemContext could not be used
>     // if it is page fault with I/D set.
>     //
>     Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
>   } else {
>     Eip = SystemContext.SystemContextIa32->Eip;
>   }

Yes - I think it's possible and a good improvement. I'll do it.

> 3) I am a little surprised on PeCoffSearchImageBase() issue.
>
> We have 4 PeCoffSearchImageBase() call in each arch. DumpImageModuleNames() calls twice and DumpStacktrace() calls twice.
> Do you know which specific one triggers the zero address #PF issue?
>
>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(547):  ImageBase = PeCoffSearchImageBase (Eip);
>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Eip);
>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(682):  ImageBase = PeCoffSearchImageBase (Eip);
>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(741):    ImageBase = PeCoffSearchImageBase (Eip);
>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(540):  ImageBase = PeCoffSearchImageBase (Rip);
>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Rip);
>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(710):  ImageBase = PeCoffSearchImageBase (Rip);
>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(779):    ImageBase = PeCoffSearchImageBase (Rip);
>

When I saw the #PF when testing stack trace in SMM, I was running out of
time and I just saved the log file with the trace. I'm attaching the
log for you, but I'm still going to look into that issue when time
permits.

I haven't looked on how the SMI entry is set up, but don't you think
that we should do something similiar like in DXE and PEI phases with
"push $0" and help the debugger or tracer know when stop unwinding the
stack -- in DEBUG mode, at least? Of course, if that's really possible.

> The EIP from SystemContext seems good. I assume there is not a problem here.
>
>   if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
>       ((SystemContext.SystemContextIa32->ExceptionData & IA32_PF_EC_ID) != 0)) {
>     //
>     // The EIP in SystemContext could not be used
>     // if it is page fault with I/D set.
>     //
>     Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
>   } else {
>     Eip = SystemContext.SystemContextIa32->Eip;
>   }

Possibly. I'll add some debug and see what the Eip looks like at this
point.

>   //
>   // Get initial PE/COFF image base address from current EIP
>   //
>   ImageBase = PeCoffSearchImageBase (Eip);
>
> But the EIP from stack frame is unknown and risky. Especially for the code in mode-switch, such as PEI->DXE, UEFI->CSM, AP wakeup->AP C code, SMM entrypoint->SMM C code.
> Should we add a check for EIP here?
>
>     //
>     // Set EIP with return address from current stack frame
>     //
>     Eip = *(UINT32 *)((UINTN)Ebp + 4);
>
>     //
>     // If EIP is zero, then stop unwinding the stack
>     //
>     if (Eip == 0) {
>       break;
>     }
>
>     //
>     // Search for the respective PE/COFF image based on EIP
>     //
>     ImageBase = PeCoffSearchImageBase (Eip);

I think so. There is no guarantee that the return address (Eip) will be
valid even if we're handling a valid frame pointer. It might get
corrupted at some point. Thanks for pointing it out! I'll validate
them.

> If you can help us do some more investigation on the root-cause and narrow down the issue, I will appreciate that.
>
> We can decide how to fix once it is root-caused.

Sure. I will.

Thanks for the comments! I learnt a lot of with them :-)

Paulo

>
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Paulo
>> Alcantara
>> Sent: Monday, January 15, 2018 8:23 AM
>> To: edk2-devel@lists.01.org
>> Cc: Rick Bramley <richard.bramley@hp.com>; Dong, Eric
>> <eric.dong@intel.com>; Kimon Berlin <kimon.berlin@hp.com>; Andrew Fish
>> <afish@apple.com>; Yao, Jiewen <jiewen.yao@intel.com>; Diego Medaglia
>> <diego.meaglia@hp.com>; Laszlo Ersek <lersek@redhat.com>
>> Subject: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
>> 
>> Hi,
>> 
>> This series adds stack trace support during IA32 and X64 CPU exceptions.
>> 
>> Informations like back trace, stack contents and image module names
>> (that were part of the call stack) will be dumped out.
>> 
>> The current limitation is that it relies on available frame pointers
>> (GCC only) in order to successfully unwind the stack.
>> 
>> Jiewen,
>> 
>> Thank you very much for your time on this. I've applied the changes you
>> suggested, as well as tested it on IA32 PAE paging mode - it worked as
>> expected.
>> 
>> Other than that, I also tested the stack trace in SMM code by manually
>> calling CpuBreakPoint() and then it broke with another exception
>> (page fault). I didn't have much time to look into that, but what I've
>> observed is that the page fault ocurred during the search of PE/COFF
>> image base address (in PeCoffSearchImageBase). The function attempts to
>> search for the image base from "Address" through 0, so any of those
>> dereferenced addresses triggers the page fault.
>> 
>> Do you know how we could fix that issue? Perhaps introducing a
>> AddressValidationLib (as Brian suggested previously) and use it within
>> PeCoffSearchImageBase()?
>> 
>> I'd also like to thank Brian & Jeff for all the support!
>> 
>> Thanks
>> Paulo
>> 
>> Repo:   https://github.com/pcacjr/edk2.git
>> Branch: stacktrace_v5
>> 
>> Cc: Rick Bramley <richard.bramley@hp.com>
>> Cc: Kimon Berlin <kimon.berlin@hp.com>
>> Cc: Diego Medaglia <diego.meaglia@hp.com>
>> Cc: Andrew Fish <afish@apple.com>
>> Cc: Eric Dong <eric.dong@intel.com>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Brian Johnson <brian.johnson@hpe.com>
>> Cc: Jeff Fan <vanjeff_919@hotmail.com>
>> Cc: Jiewen Yao <jiewen.yao@intel.com>
>> Cc: Paulo Alcantara <paulo@hp.com>
>> Contributed-under: TianoCore Contribution Agreement 1.1
>> Signed-off-by: Paulo Alcantara <paulo@paulo.ac>
>> ---
>> 
>> v1 -> v2:
>>   * Add IA32 arch support (GCC toolchain only)
>>   * Replace hard-coded stack alignment value (16) with
>>     CPU_STACK_ALIGNMENT.
>>   * Check for proper stack and frame pointer alignments.
>>   * Fix initialization of UnwoundStacksCount to 1.
>>   * Move GetPdbFileName() to common code since it will be used by both
>>     IA32 and X64 implementations.
>> 
>> v2 -> v3:
>>   * Fixed wrong assumption about "RIP < ImageBase" to start searching
>>     for another PE/COFF image. That is, RIP may point to lower and
>>     higher addresses for any other PE/COFF images. Both IA32 & X64.
>>     (Thanks Andrew & Jiewen)
>>   * Fixed typo: unwond -> unwound. Both IA32 & X64. (Thanks Brian)
>> 
>> v3 -> v4:
>>   * Validate all frame/stack pointer addresses before dereferencing them
>>     as requested by Brian & Jiewen.
>>   * Correctly print out IP addresses during the stack traces (by Jeff)
>> 
>> v4 -> v5:
>>   * Fixed address calculations and improved code as suggested by Jiewen.
>>   * Fixed parameter validation as suggested by Brian.
>>   * Tested stack stack with IA32 PAE paging mode.
>> 
>> Paulo Alcantara (8):
>>   UefiCpuPkg/CpuExceptionHandlerLib/X64: Add stack trace support
>>   UefiCpuPkg/CpuExceptionHandlerLib: Export GetPdbFileName()
>>   UefiCpuPkg/CpuExceptionHandlerLib/Ia32: Add stack trace support
>>   UefiCpuPkg/CpuExceptionHandlerLib: Add helper to validate memory
>>     addresses
>>   UefiCpuPkg/CpuExceptionHandlerLib: Ensure valid frame/stack pointers
>>   UefiCpuPkg/CpuExceptionHandlerLib: Correctly print IP addresses
>>   UefiCpuPkg/CpuExceptionHandlerLib: Validate memory address ranges
>>   UefiCpuPkg/CpuExceptionHandlerLib: Add early check in
>>     DumpStackContents
>> 
>>  UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.c
>> | 537 ++++++++++++++++++--
>>  UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.h
>> |  59 ++-
>>  UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ArchExceptionHandler.c |
>> 483 +++++++++++++++++-
>>  UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ArchExceptionHandler.c  |
>> 426 +++++++++++++++-
>>  4 files changed, 1435 insertions(+), 70 deletions(-)
>> 
>> --
>> 2.14.3
>> 
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
Posted by Paulo Alcantara 6 years, 3 months ago
Paulo Alcantara <paulo@paulo.ac> writes:

>> 3) I am a little surprised on PeCoffSearchImageBase() issue.
>>
>> We have 4 PeCoffSearchImageBase() call in each arch. DumpImageModuleNames() calls twice and DumpStacktrace() calls twice.
>> Do you know which specific one triggers the zero address #PF issue?
>>
>>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(547):  ImageBase = PeCoffSearchImageBase (Eip);
>>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Eip);
>>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(682):  ImageBase = PeCoffSearchImageBase (Eip);
>>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(741):    ImageBase = PeCoffSearchImageBase (Eip);
>>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(540):  ImageBase = PeCoffSearchImageBase (Rip);
>>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Rip);
>>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(710):  ImageBase = PeCoffSearchImageBase (Rip);
>>   C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(779):    ImageBase = PeCoffSearchImageBase (Rip);
>>
>
> When I saw the #PF when testing stack trace in SMM, I was running out of
> time and I just saved the log file with the trace. I'm attaching the
> log for you, but I'm still going to look into that issue when time
> permits.

Forgot to attach the log file. Done. :-)
!!!! X64 Exception Type - 03(#BP - Breakpoint)  CPU Apic ID - 00000000 !!!!
RIP  - 000000007FF48151, CS  - 0000000000000038, RFLAGS - 0000000000000046
RAX  - 0000000000000000, RCX - 000000007FEBB020, RDX - 0000000000040000
RBX  - 0000000000000000, RSP - 000000007FF7B760, RBP - 000000007FF7B760
RSI  - 000000007FF44018, RDI - 000000007FEBB018
R8   - 00000000000000FF, R9  - 0000000000048400, R10 - 00000000000000C9
R11  - 0000000000000069, R12 - 0000000000000000, R13 - 0000000000000000
R14  - 0000000000000000, R15 - 0000000000000000
DS   - 0000000000000020, ES  - 0000000000000020, FS  - 0000000000000020
GS   - 0000000000000020, SS  - 0000000000000020
CR0  - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000007FF6C000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000007FF6B000 000000000000004F, LDTR - 0000000000000000
IDTR - 000000007FF75000 00000000000001FF,   TR - 0000000000000040
FXSAVE_STATE - 000000007FF7B3C0

Call trace:
0 0x000000007FF48151 @ 0x000000007FF46000+0x2151 (0x000000007FF7B760) in VariableSmm.dll
1 0x000000007FF54E35 @ 0x000000007FF46000+0xEE35 (0x000000007FF7B7D0) in VariableSmm.dll
2 0x000000007FF5659E @ 0x000000007FF46000+0x1059E (0x000000007FF7B800) in VariableSmm.dll
3 0x000000007FF4BCFE @ 0x000000007FF46000+0x5CFE (0x000000007FF7B840) in VariableSmm.dll
4 0x000000007FFE9BFA @ 0x000000007FFDB000+0xEBFA (0x000000007FF7B8A0) in PiSmmCore.dll
5 0x000000007FFEAAC4 @ 0x000000007FFDB000+0xFAC4 (0x000000007FF7B9C0) in PiSmmCore.dll
6 0x000000007FFEADD1 @ 0x000000007FFDB000+0xFDD1 (0x000000007FF7BA20) in PiSmmCore.dll
7 0x000000007FFE43F8 @ 0x000000007FFDB000+0x93F8 (0x000000007FF7BA90) in PiSmmCore.dll
8 0x000000007FFA8B9A @ 0x000000007FF9C000+0xCB9A (0x000000007FF7BD50) in PiSmmCpuDxeSmm.dll
9 0x000000007FFA9E92 @ 0x000000007FF9C000+0xDE92 (0x000000007FF7BDC0) in PiSmmCpuDxeSmm.dll
!!!! X64 Exception Type - 0E(#PF - Page-Fault)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000000  I:0 R:0 U:0 W:0 P:0 PK:0 S:0
RIP  - 000000007FFA1BBE, CS  - 0000000000000038, RFLAGS - 0000000000010006
RAX  - 000000007FF89FFC, RCX - 000000007FF8E149, RDX - FFFFFFFFFFFFFFFF
RBX  - 0000000000000000, RSP - 000000007FF7B1A0, RBP - 000000007FF7B1E0
RSI  - 000000007FFA9E92, RDI - 000000007FF9C000
R8   - 0000000000000001, R9  - 0000000000000000, R10 - 0000000000000000
R11  - 0000000000000069, R12 - 0000000000000000, R13 - 0000000000000000
R14  - 0000000000000000, R15 - 0000000000000000
DS   - 0000000000000020, ES  - 0000000000000020, FS  - 0000000000000020
GS   - 0000000000000020, SS  - 0000000000000020
CR0  - 0000000080010033, CR2 - 000000007FF89FFC, CR3 - 000000007FF6C000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000007FF6B000 000000000000004F, LDTR - 0000000000000000
IDTR - 000000007FF75000 00000000000001FF,   TR - 0000000000000040
FXSAVE_STATE - 000000007FF76C60

Paulo

>>> -----Original Message-----
>>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Paulo
>>> Alcantara
>>> Sent: Monday, January 15, 2018 8:23 AM
>>> To: edk2-devel@lists.01.org
>>> Cc: Rick Bramley <richard.bramley@hp.com>; Dong, Eric
>>> <eric.dong@intel.com>; Kimon Berlin <kimon.berlin@hp.com>; Andrew Fish
>>> <afish@apple.com>; Yao, Jiewen <jiewen.yao@intel.com>; Diego Medaglia
>>> <diego.meaglia@hp.com>; Laszlo Ersek <lersek@redhat.com>
>>> Subject: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
>>> 
>>> Hi,
>>> 
>>> This series adds stack trace support during IA32 and X64 CPU exceptions.
>>> 
>>> Informations like back trace, stack contents and image module names
>>> (that were part of the call stack) will be dumped out.
>>> 
>>> The current limitation is that it relies on available frame pointers
>>> (GCC only) in order to successfully unwind the stack.
>>> 
>>> Jiewen,
>>> 
>>> Thank you very much for your time on this. I've applied the changes you
>>> suggested, as well as tested it on IA32 PAE paging mode - it worked as
>>> expected.
>>> 
>>> Other than that, I also tested the stack trace in SMM code by manually
>>> calling CpuBreakPoint() and then it broke with another exception
>>> (page fault). I didn't have much time to look into that, but what I've
>>> observed is that the page fault ocurred during the search of PE/COFF
>>> image base address (in PeCoffSearchImageBase). The function attempts to
>>> search for the image base from "Address" through 0, so any of those
>>> dereferenced addresses triggers the page fault.
>>> 
>>> Do you know how we could fix that issue? Perhaps introducing a
>>> AddressValidationLib (as Brian suggested previously) and use it within
>>> PeCoffSearchImageBase()?
>>> 
>>> I'd also like to thank Brian & Jeff for all the support!
>>> 
>>> Thanks
>>> Paulo
>>> 
>>> Repo:   https://github.com/pcacjr/edk2.git
>>> Branch: stacktrace_v5
>>> 
>>> Cc: Rick Bramley <richard.bramley@hp.com>
>>> Cc: Kimon Berlin <kimon.berlin@hp.com>
>>> Cc: Diego Medaglia <diego.meaglia@hp.com>
>>> Cc: Andrew Fish <afish@apple.com>
>>> Cc: Eric Dong <eric.dong@intel.com>
>>> Cc: Laszlo Ersek <lersek@redhat.com>
>>> Cc: Brian Johnson <brian.johnson@hpe.com>
>>> Cc: Jeff Fan <vanjeff_919@hotmail.com>
>>> Cc: Jiewen Yao <jiewen.yao@intel.com>
>>> Cc: Paulo Alcantara <paulo@hp.com>
>>> Contributed-under: TianoCore Contribution Agreement 1.1
>>> Signed-off-by: Paulo Alcantara <paulo@paulo.ac>
>>> ---
>>> 
>>> v1 -> v2:
>>>   * Add IA32 arch support (GCC toolchain only)
>>>   * Replace hard-coded stack alignment value (16) with
>>>     CPU_STACK_ALIGNMENT.
>>>   * Check for proper stack and frame pointer alignments.
>>>   * Fix initialization of UnwoundStacksCount to 1.
>>>   * Move GetPdbFileName() to common code since it will be used by both
>>>     IA32 and X64 implementations.
>>> 
>>> v2 -> v3:
>>>   * Fixed wrong assumption about "RIP < ImageBase" to start searching
>>>     for another PE/COFF image. That is, RIP may point to lower and
>>>     higher addresses for any other PE/COFF images. Both IA32 & X64.
>>>     (Thanks Andrew & Jiewen)
>>>   * Fixed typo: unwond -> unwound. Both IA32 & X64. (Thanks Brian)
>>> 
>>> v3 -> v4:
>>>   * Validate all frame/stack pointer addresses before dereferencing them
>>>     as requested by Brian & Jiewen.
>>>   * Correctly print out IP addresses during the stack traces (by Jeff)
>>> 
>>> v4 -> v5:
>>>   * Fixed address calculations and improved code as suggested by Jiewen.
>>>   * Fixed parameter validation as suggested by Brian.
>>>   * Tested stack stack with IA32 PAE paging mode.
>>> 
>>> Paulo Alcantara (8):
>>>   UefiCpuPkg/CpuExceptionHandlerLib/X64: Add stack trace support
>>>   UefiCpuPkg/CpuExceptionHandlerLib: Export GetPdbFileName()
>>>   UefiCpuPkg/CpuExceptionHandlerLib/Ia32: Add stack trace support
>>>   UefiCpuPkg/CpuExceptionHandlerLib: Add helper to validate memory
>>>     addresses
>>>   UefiCpuPkg/CpuExceptionHandlerLib: Ensure valid frame/stack pointers
>>>   UefiCpuPkg/CpuExceptionHandlerLib: Correctly print IP addresses
>>>   UefiCpuPkg/CpuExceptionHandlerLib: Validate memory address ranges
>>>   UefiCpuPkg/CpuExceptionHandlerLib: Add early check in
>>>     DumpStackContents
>>> 
>>>  UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.c
>>> | 537 ++++++++++++++++++--
>>>  UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.h
>>> |  59 ++-
>>>  UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ArchExceptionHandler.c |
>>> 483 +++++++++++++++++-
>>>  UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ArchExceptionHandler.c  |
>>> 426 +++++++++++++++-
>>>  4 files changed, 1435 insertions(+), 70 deletions(-)
>>> 
>>> --
>>> 2.14.3
>>> 
>>> _______________________________________________
>>> edk2-devel mailing list
>>> edk2-devel@lists.01.org
>>> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
Posted by Paulo Alcantara 6 years, 2 months ago
Hi Jiewen,

On 1/17/2018 10:57 AM, Yao, Jiewen wrote:
> Thanks Paulo.
> 
> The series looks really good. I give it thumb up. :-)
> 
> The 8 patches keep updating 4 files, so I squash them when I review. The comment below is for the final code, not for a specific patch.
> 
> 1) CpuExceptionCommon.c: IsLinearAddressRangeValid().
>    //
>    // Check for valid input parameters
>    //
>    if (LinearAddressStart == 0 || LinearAddressEnd == 0 ||
>        LinearAddressStart > LinearAddressEnd) {
>      return FALSE;
>    }
> 
> I think LinearAddressStart is a valid case. In BIOS, we do update IVT in 0 address when CSM is enabled.
> I do not think we need exclude it here. If we enabled ZeroPointer protection, it can be excluded in page table parsing.
> 
> 2) I found below logic appears in 2 functions - DumpImageModuleNames() and DumpStacktrace(). Is that possible to merge them?
> We can calculate in DumpImageAndCpuContent() and use Eip as parameter. Just in case there is 3rd function need use EIP, it won't miss the calculation.
> (It is a bug fix we did recently.)
> 
>    //
>    // Set current EIP address
>    //
>    if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
>        ((SystemContext.SystemContextIa32->ExceptionData & IA32_PF_EC_ID) != 0)) {
>      //
>      // The EIP in SystemContext could not be used
>      // if it is page fault with I/D set.
>      //
>      Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
>    } else {
>      Eip = SystemContext.SystemContextIa32->Eip;
>    }
> 
> 3) I am a little surprised on PeCoffSearchImageBase() issue.
> 
> We have 4 PeCoffSearchImageBase() call in each arch. DumpImageModuleNames() calls twice and DumpStacktrace() calls twice.
> Do you know which specific one triggers the zero address #PF issue?
> 
>    C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(547):  ImageBase = PeCoffSearchImageBase (Eip);
>    C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Eip);
>    C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(682):  ImageBase = PeCoffSearchImageBase (Eip);
>    C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\ArchExceptionHandler.c(741):    ImageBase = PeCoffSearchImageBase (Eip);
>    C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(540):  ImageBase = PeCoffSearchImageBase (Rip);
>    C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Rip);
>    C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(710):  ImageBase = PeCoffSearchImageBase (Rip);
>    C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\ArchExceptionHandler.c(779):    ImageBase = PeCoffSearchImageBase (Rip);
> 
> The EIP from SystemContext seems good. I assume there is not a problem here.
> 
>    if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
>        ((SystemContext.SystemContextIa32->ExceptionData & IA32_PF_EC_ID) != 0)) {
>      //
>      // The EIP in SystemContext could not be used
>      // if it is page fault with I/D set.
>      //
>      Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
>    } else {
>      Eip = SystemContext.SystemContextIa32->Eip;
>    }
> 
>    //
>    // Get initial PE/COFF image base address from current EIP
>    //
>    ImageBase = PeCoffSearchImageBase (Eip);
> 
> But the EIP from stack frame is unknown and risky. Especially for the code in mode-switch, such as PEI->DXE, UEFI->CSM, AP wakeup->AP C code, SMM entrypoint->SMM C code.
> Should we add a check for EIP here?
> 
>      //
>      // Set EIP with return address from current stack frame
>      //
>      Eip = *(UINT32 *)((UINTN)Ebp + 4);
> 
>      //
>      // If EIP is zero, then stop unwinding the stack
>      //
>      if (Eip == 0) {
>        break;
>      }
> 
>      //
>      // Search for the respective PE/COFF image based on EIP
>      //
>      ImageBase = PeCoffSearchImageBase (Eip);
> 
> If you can help us do some more investigation on the root-cause and narrow down the issue, I will appreciate that.
> 
> We can decide how to fix once it is root-caused.
> 
> Maybe we just do a simple IsLogicalAddressValid () check before we call PeCoffSearchImageBase(). :-)

I was able to work a little bit on stack trace support yesterday, and 
here's what I came up with by following your suggestions:

1. Centralized the calculation of initial EIP in DumpImageAndCpuContent()

2. LinearAddressStart 0 is now a valid case

3. Validate all return addresses (EIP/RIP) from every frame pointer.

Additionally, I run a few quick tests by generating exceptions from DXE 
and SMM and here's my result:

OVMF X64: DXE & SMM worked.
OVMF X64 IA32: DXE & SMM worked.
OVMF IA32: DXE worked, but SMM not.

SMM was failing in both X64 and IA32 - however when I started validating 
the return addresses from the frame pointers and, when we find an 
invalid return address (e.g. last valid frame), we stop the trace. With 
that, SMM is now working in X64.

Since the page fault is only occurring in IA32 with no paging enabled 
(default case), I suspect that when we don't have paging, we are unable 
to successfully validate the memory address when it's i.e. outside SMRAM 
- that is, we don't know when to stop unwinding the stack.

Any ideas on what might be causing that?

Other than that, I'll try to do some more tests this weekend and come 
back with the results. (That's why I didn't send out the v6 yet)

Repo:   https://git.paulo.ac/pub/edk2.git
Branch: stacktrace_v6

Thanks
Paulo

> 
> 
> Thank you
> Yao Jiewen
> 
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Paulo
>> Alcantara
>> Sent: Monday, January 15, 2018 8:23 AM
>> To: edk2-devel@lists.01.org
>> Cc: Rick Bramley <richard.bramley@hp.com>; Dong, Eric
>> <eric.dong@intel.com>; Kimon Berlin <kimon.berlin@hp.com>; Andrew Fish
>> <afish@apple.com>; Yao, Jiewen <jiewen.yao@intel.com>; Diego Medaglia
>> <diego.meaglia@hp.com>; Laszlo Ersek <lersek@redhat.com>
>> Subject: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
>>
>> Hi,
>>
>> This series adds stack trace support during IA32 and X64 CPU exceptions.
>>
>> Informations like back trace, stack contents and image module names
>> (that were part of the call stack) will be dumped out.
>>
>> The current limitation is that it relies on available frame pointers
>> (GCC only) in order to successfully unwind the stack.
>>
>> Jiewen,
>>
>> Thank you very much for your time on this. I've applied the changes you
>> suggested, as well as tested it on IA32 PAE paging mode - it worked as
>> expected.
>>
>> Other than that, I also tested the stack trace in SMM code by manually
>> calling CpuBreakPoint() and then it broke with another exception
>> (page fault). I didn't have much time to look into that, but what I've
>> observed is that the page fault ocurred during the search of PE/COFF
>> image base address (in PeCoffSearchImageBase). The function attempts to
>> search for the image base from "Address" through 0, so any of those
>> dereferenced addresses triggers the page fault.
>>
>> Do you know how we could fix that issue? Perhaps introducing a
>> AddressValidationLib (as Brian suggested previously) and use it within
>> PeCoffSearchImageBase()?
>>
>> I'd also like to thank Brian & Jeff for all the support!
>>
>> Thanks
>> Paulo
>>
>> Repo:   https://github.com/pcacjr/edk2.git
>> Branch: stacktrace_v5
>>
>> Cc: Rick Bramley <richard.bramley@hp.com>
>> Cc: Kimon Berlin <kimon.berlin@hp.com>
>> Cc: Diego Medaglia <diego.meaglia@hp.com>
>> Cc: Andrew Fish <afish@apple.com>
>> Cc: Eric Dong <eric.dong@intel.com>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Brian Johnson <brian.johnson@hpe.com>
>> Cc: Jeff Fan <vanjeff_919@hotmail.com>
>> Cc: Jiewen Yao <jiewen.yao@intel.com>
>> Cc: Paulo Alcantara <paulo@hp.com>
>> Contributed-under: TianoCore Contribution Agreement 1.1
>> Signed-off-by: Paulo Alcantara <paulo@paulo.ac>
>> ---
>>
>> v1 -> v2:
>>    * Add IA32 arch support (GCC toolchain only)
>>    * Replace hard-coded stack alignment value (16) with
>>      CPU_STACK_ALIGNMENT.
>>    * Check for proper stack and frame pointer alignments.
>>    * Fix initialization of UnwoundStacksCount to 1.
>>    * Move GetPdbFileName() to common code since it will be used by both
>>      IA32 and X64 implementations.
>>
>> v2 -> v3:
>>    * Fixed wrong assumption about "RIP < ImageBase" to start searching
>>      for another PE/COFF image. That is, RIP may point to lower and
>>      higher addresses for any other PE/COFF images. Both IA32 & X64.
>>      (Thanks Andrew & Jiewen)
>>    * Fixed typo: unwond -> unwound. Both IA32 & X64. (Thanks Brian)
>>
>> v3 -> v4:
>>    * Validate all frame/stack pointer addresses before dereferencing them
>>      as requested by Brian & Jiewen.
>>    * Correctly print out IP addresses during the stack traces (by Jeff)
>>
>> v4 -> v5:
>>    * Fixed address calculations and improved code as suggested by Jiewen.
>>    * Fixed parameter validation as suggested by Brian.
>>    * Tested stack stack with IA32 PAE paging mode.
>>
>> Paulo Alcantara (8):
>>    UefiCpuPkg/CpuExceptionHandlerLib/X64: Add stack trace support
>>    UefiCpuPkg/CpuExceptionHandlerLib: Export GetPdbFileName()
>>    UefiCpuPkg/CpuExceptionHandlerLib/Ia32: Add stack trace support
>>    UefiCpuPkg/CpuExceptionHandlerLib: Add helper to validate memory
>>      addresses
>>    UefiCpuPkg/CpuExceptionHandlerLib: Ensure valid frame/stack pointers
>>    UefiCpuPkg/CpuExceptionHandlerLib: Correctly print IP addresses
>>    UefiCpuPkg/CpuExceptionHandlerLib: Validate memory address ranges
>>    UefiCpuPkg/CpuExceptionHandlerLib: Add early check in
>>      DumpStackContents
>>
>>   UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.c
>> | 537 ++++++++++++++++++--
>>   UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.h
>> |  59 ++-
>>   UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ArchExceptionHandler.c |
>> 483 +++++++++++++++++-
>>   UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ArchExceptionHandler.c  |
>> 426 +++++++++++++++-
>>   4 files changed, 1435 insertions(+), 70 deletions(-)
>>
>> --
>> 2.14.3
>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
Posted by Yao, Jiewen 6 years, 2 months ago
Thanks Paulo.
=======================
OVMF IA32: DXE worked, but SMM not.

Since the page fault is only occurring in IA32 with no paging enabled 
(default case), I suspect that when we don't have paging, we are unable 
to successfully validate the memory address when it's i.e. outside SMRAM 
- that is, we don't know when to stop unwinding the stack.
=======================

For IA32 SMM, I am a little confused.
We unconditionally setup page table for SMM, no matter it is IA32 or X64.

If you find a SMM driver running in a page-disable environment, it means, the SMM CORE load the driver in SMRAM, but the SMM CPU driver has not rebased the CPU yet.
SMM driver is still using the PageTable/GDT/IDT setup by DXE CPU driver, not SMM CPU driver.

Would you please double confirm what you have observed?

You can just check the boot log file to see if PiSmmCpu driver has run or not.



Thank you
Yao Jiewen


> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Paulo
> Alcantara
> Sent: Monday, January 29, 2018 9:38 PM
> To: Yao, Jiewen <jiewen.yao@intel.com>; edk2-devel@lists.01.org
> Cc: Rick Bramley <richard.bramley@hp.com>; Dong, Eric
> <eric.dong@intel.com>; Kimon Berlin <kimon.berlin@hp.com>; Andrew Fish
> <afish@apple.com>; Diego Medaglia <diego.meaglia@hp.com>; Laszlo Ersek
> <lersek@redhat.com>
> Subject: Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
> 
> Hi Jiewen,
> 
> On 1/17/2018 10:57 AM, Yao, Jiewen wrote:
> > Thanks Paulo.
> >
> > The series looks really good. I give it thumb up. :-)
> >
> > The 8 patches keep updating 4 files, so I squash them when I review. The
> comment below is for the final code, not for a specific patch.
> >
> > 1) CpuExceptionCommon.c: IsLinearAddressRangeValid().
> >    //
> >    // Check for valid input parameters
> >    //
> >    if (LinearAddressStart == 0 || LinearAddressEnd == 0 ||
> >        LinearAddressStart > LinearAddressEnd) {
> >      return FALSE;
> >    }
> >
> > I think LinearAddressStart is a valid case. In BIOS, we do update IVT in 0
> address when CSM is enabled.
> > I do not think we need exclude it here. If we enabled ZeroPointer protection, it
> can be excluded in page table parsing.
> >
> > 2) I found below logic appears in 2 functions - DumpImageModuleNames() and
> DumpStacktrace(). Is that possible to merge them?
> > We can calculate in DumpImageAndCpuContent() and use Eip as parameter.
> Just in case there is 3rd function need use EIP, it won't miss the calculation.
> > (It is a bug fix we did recently.)
> >
> >    //
> >    // Set current EIP address
> >    //
> >    if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
> >        ((SystemContext.SystemContextIa32->ExceptionData &
> IA32_PF_EC_ID) != 0)) {
> >      //
> >      // The EIP in SystemContext could not be used
> >      // if it is page fault with I/D set.
> >      //
> >      Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
> >    } else {
> >      Eip = SystemContext.SystemContextIa32->Eip;
> >    }
> >
> > 3) I am a little surprised on PeCoffSearchImageBase() issue.
> >
> > We have 4 PeCoffSearchImageBase() call in each arch.
> DumpImageModuleNames() calls twice and DumpStacktrace() calls twice.
> > Do you know which specific one triggers the zero address #PF issue?
> >
> >
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
> ExceptionHandler.c(547):  ImageBase = PeCoffSearchImageBase (Eip);
> >
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
> ExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Eip);
> >
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
> ExceptionHandler.c(682):  ImageBase = PeCoffSearchImageBase (Eip);
> >
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\Ia32\Arch
> ExceptionHandler.c(741):    ImageBase = PeCoffSearchImageBase (Eip);
> >
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
> ExceptionHandler.c(540):  ImageBase = PeCoffSearchImageBase (Rip);
> >
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
> ExceptionHandler.c(613):    ImageBase = PeCoffSearchImageBase (Rip);
> >
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
> ExceptionHandler.c(710):  ImageBase = PeCoffSearchImageBase (Rip);
> >
> C:\home\EdkIIGit\edk2\UefiCpuPkg\Library\CpuExceptionHandlerLib\X64\Arch
> ExceptionHandler.c(779):    ImageBase = PeCoffSearchImageBase (Rip);
> >
> > The EIP from SystemContext seems good. I assume there is not a problem here.
> >
> >    if ((ExceptionType == EXCEPT_IA32_PAGE_FAULT) &&
> >        ((SystemContext.SystemContextIa32->ExceptionData &
> IA32_PF_EC_ID) != 0)) {
> >      //
> >      // The EIP in SystemContext could not be used
> >      // if it is page fault with I/D set.
> >      //
> >      Eip = *(UINT32 *)(UINTN)SystemContext.SystemContextIa32->Esp;
> >    } else {
> >      Eip = SystemContext.SystemContextIa32->Eip;
> >    }
> >
> >    //
> >    // Get initial PE/COFF image base address from current EIP
> >    //
> >    ImageBase = PeCoffSearchImageBase (Eip);
> >
> > But the EIP from stack frame is unknown and risky. Especially for the code in
> mode-switch, such as PEI->DXE, UEFI->CSM, AP wakeup->AP C code, SMM
> entrypoint->SMM C code.
> > Should we add a check for EIP here?
> >
> >      //
> >      // Set EIP with return address from current stack frame
> >      //
> >      Eip = *(UINT32 *)((UINTN)Ebp + 4);
> >
> >      //
> >      // If EIP is zero, then stop unwinding the stack
> >      //
> >      if (Eip == 0) {
> >        break;
> >      }
> >
> >      //
> >      // Search for the respective PE/COFF image based on EIP
> >      //
> >      ImageBase = PeCoffSearchImageBase (Eip);
> >
> > If you can help us do some more investigation on the root-cause and narrow
> down the issue, I will appreciate that.
> >
> > We can decide how to fix once it is root-caused.
> >
> > Maybe we just do a simple IsLogicalAddressValid () check before we call
> PeCoffSearchImageBase(). :-)
> 
> I was able to work a little bit on stack trace support yesterday, and
> here's what I came up with by following your suggestions:
> 
> 1. Centralized the calculation of initial EIP in DumpImageAndCpuContent()
> 
> 2. LinearAddressStart 0 is now a valid case
> 
> 3. Validate all return addresses (EIP/RIP) from every frame pointer.
> 
> Additionally, I run a few quick tests by generating exceptions from DXE
> and SMM and here's my result:
> 
> OVMF X64: DXE & SMM worked.
> OVMF X64 IA32: DXE & SMM worked.
> OVMF IA32: DXE worked, but SMM not.
> 
> SMM was failing in both X64 and IA32 - however when I started validating
> the return addresses from the frame pointers and, when we find an
> invalid return address (e.g. last valid frame), we stop the trace. With
> that, SMM is now working in X64.
> 
> Since the page fault is only occurring in IA32 with no paging enabled
> (default case), I suspect that when we don't have paging, we are unable
> to successfully validate the memory address when it's i.e. outside SMRAM
> - that is, we don't know when to stop unwinding the stack.
> 
> Any ideas on what might be causing that?
> 
> Other than that, I'll try to do some more tests this weekend and come
> back with the results. (That's why I didn't send out the v6 yet)
> 
> Repo:   https://git.paulo.ac/pub/edk2.git
> Branch: stacktrace_v6
> 
> Thanks
> Paulo
> 
> >
> >
> > Thank you
> > Yao Jiewen
> >
> >> -----Original Message-----
> >> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> Paulo
> >> Alcantara
> >> Sent: Monday, January 15, 2018 8:23 AM
> >> To: edk2-devel@lists.01.org
> >> Cc: Rick Bramley <richard.bramley@hp.com>; Dong, Eric
> >> <eric.dong@intel.com>; Kimon Berlin <kimon.berlin@hp.com>; Andrew Fish
> >> <afish@apple.com>; Yao, Jiewen <jiewen.yao@intel.com>; Diego Medaglia
> >> <diego.meaglia@hp.com>; Laszlo Ersek <lersek@redhat.com>
> >> Subject: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
> >>
> >> Hi,
> >>
> >> This series adds stack trace support during IA32 and X64 CPU exceptions.
> >>
> >> Informations like back trace, stack contents and image module names
> >> (that were part of the call stack) will be dumped out.
> >>
> >> The current limitation is that it relies on available frame pointers
> >> (GCC only) in order to successfully unwind the stack.
> >>
> >> Jiewen,
> >>
> >> Thank you very much for your time on this. I've applied the changes you
> >> suggested, as well as tested it on IA32 PAE paging mode - it worked as
> >> expected.
> >>
> >> Other than that, I also tested the stack trace in SMM code by manually
> >> calling CpuBreakPoint() and then it broke with another exception
> >> (page fault). I didn't have much time to look into that, but what I've
> >> observed is that the page fault ocurred during the search of PE/COFF
> >> image base address (in PeCoffSearchImageBase). The function attempts to
> >> search for the image base from "Address" through 0, so any of those
> >> dereferenced addresses triggers the page fault.
> >>
> >> Do you know how we could fix that issue? Perhaps introducing a
> >> AddressValidationLib (as Brian suggested previously) and use it within
> >> PeCoffSearchImageBase()?
> >>
> >> I'd also like to thank Brian & Jeff for all the support!
> >>
> >> Thanks
> >> Paulo
> >>
> >> Repo:   https://github.com/pcacjr/edk2.git
> >> Branch: stacktrace_v5
> >>
> >> Cc: Rick Bramley <richard.bramley@hp.com>
> >> Cc: Kimon Berlin <kimon.berlin@hp.com>
> >> Cc: Diego Medaglia <diego.meaglia@hp.com>
> >> Cc: Andrew Fish <afish@apple.com>
> >> Cc: Eric Dong <eric.dong@intel.com>
> >> Cc: Laszlo Ersek <lersek@redhat.com>
> >> Cc: Brian Johnson <brian.johnson@hpe.com>
> >> Cc: Jeff Fan <vanjeff_919@hotmail.com>
> >> Cc: Jiewen Yao <jiewen.yao@intel.com>
> >> Cc: Paulo Alcantara <paulo@hp.com>
> >> Contributed-under: TianoCore Contribution Agreement 1.1
> >> Signed-off-by: Paulo Alcantara <paulo@paulo.ac>
> >> ---
> >>
> >> v1 -> v2:
> >>    * Add IA32 arch support (GCC toolchain only)
> >>    * Replace hard-coded stack alignment value (16) with
> >>      CPU_STACK_ALIGNMENT.
> >>    * Check for proper stack and frame pointer alignments.
> >>    * Fix initialization of UnwoundStacksCount to 1.
> >>    * Move GetPdbFileName() to common code since it will be used by both
> >>      IA32 and X64 implementations.
> >>
> >> v2 -> v3:
> >>    * Fixed wrong assumption about "RIP < ImageBase" to start searching
> >>      for another PE/COFF image. That is, RIP may point to lower and
> >>      higher addresses for any other PE/COFF images. Both IA32 & X64.
> >>      (Thanks Andrew & Jiewen)
> >>    * Fixed typo: unwond -> unwound. Both IA32 & X64. (Thanks Brian)
> >>
> >> v3 -> v4:
> >>    * Validate all frame/stack pointer addresses before dereferencing them
> >>      as requested by Brian & Jiewen.
> >>    * Correctly print out IP addresses during the stack traces (by Jeff)
> >>
> >> v4 -> v5:
> >>    * Fixed address calculations and improved code as suggested by Jiewen.
> >>    * Fixed parameter validation as suggested by Brian.
> >>    * Tested stack stack with IA32 PAE paging mode.
> >>
> >> Paulo Alcantara (8):
> >>    UefiCpuPkg/CpuExceptionHandlerLib/X64: Add stack trace support
> >>    UefiCpuPkg/CpuExceptionHandlerLib: Export GetPdbFileName()
> >>    UefiCpuPkg/CpuExceptionHandlerLib/Ia32: Add stack trace support
> >>    UefiCpuPkg/CpuExceptionHandlerLib: Add helper to validate memory
> >>      addresses
> >>    UefiCpuPkg/CpuExceptionHandlerLib: Ensure valid frame/stack pointers
> >>    UefiCpuPkg/CpuExceptionHandlerLib: Correctly print IP addresses
> >>    UefiCpuPkg/CpuExceptionHandlerLib: Validate memory address ranges
> >>    UefiCpuPkg/CpuExceptionHandlerLib: Add early check in
> >>      DumpStackContents
> >>
> >>   UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.c
> >> | 537 ++++++++++++++++++--
> >>   UefiCpuPkg/Library/CpuExceptionHandlerLib/CpuExceptionCommon.h
> >> |  59 ++-
> >>
> UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ArchExceptionHandler.c |
> >> 483 +++++++++++++++++-
> >>   UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ArchExceptionHandler.c
> |
> >> 426 +++++++++++++++-
> >>   4 files changed, 1435 insertions(+), 70 deletions(-)
> >>
> >> --
> >> 2.14.3
> >>
> >> _______________________________________________
> >> edk2-devel mailing list
> >> edk2-devel@lists.01.org
> >> https://lists.01.org/mailman/listinfo/edk2-devel
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v5 0/8] Stack trace support in X64 exception handling
Posted by Paulo Alcantara 6 years, 2 months ago
"Yao, Jiewen" <jiewen.yao@intel.com> writes:

Hi Jiewen,

> =======================
> OVMF IA32: DXE worked, but SMM not.
>
> Since the page fault is only occurring in IA32 with no paging enabled 
> (default case), I suspect that when we don't have paging, we are unable 
> to successfully validate the memory address when it's i.e. outside SMRAM 
> - that is, we don't know when to stop unwinding the stack.
> =======================
>
> For IA32 SMM, I am a little confused.
> We unconditionally setup page table for SMM, no matter it is IA32 or X64.
>
> If you find a SMM driver running in a page-disable environment, it means, the SMM CORE load the driver in SMRAM, but the SMM CPU driver has not rebased the CPU yet.
> SMM driver is still using the PageTable/GDT/IDT setup by DXE CPU
> driver, not SMM CPU driver.

OK - thanks for clarifying that.

> Would you please double confirm what you have observed?
>
> You can just check the boot log file to see if PiSmmCpu driver has run
> or not.

Sure. I will do it and then get back to you once I got the results.

Thanks
Paulo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel