[PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Igor Druzhinin posted 1 patch 21 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/1590097438-28829-1-git-send-email-igor.druzhinin@citrix.com
Maintainers: Jan Beulich <jbeulich@suse.com>, Wei Liu <wl@xen.org>, "Roger Pau Monné" <roger.pau@citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>
xen/arch/x86/hvm/svm/svm.c | 4 ++++
1 file changed, 4 insertions(+)

[PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Igor Druzhinin 21 weeks ago
If a recalculation NPT fault hasn't been handled explicitly in
hvm_hap_nested_page_fault() then it's potentially safe to retry -
US bit has been re-instated in PTE and any real fault would be correctly
re-raised next time.

This covers a specific case of migration with vGPU assigned on AMD:
global log-dirty is enabled and causes immediate recalculation NPT
fault in MMIO area upon access. This type of fault isn't described
explicitly in hvm_hap_nested_page_fault (this isn't called on
EPT misconfig exit on Intel) which results in domain crash.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
 xen/arch/x86/hvm/svm/svm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 46a1aac..f0d0bd3 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
         /* inject #VMEXIT(NPF) into guest. */
         nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
         return;
+    case 0:
+        /* If a recalculation page fault hasn't been handled - just retry. */
+        if ( pfec & PFEC_user_mode )
+            return;
     }
 
     /* Everything else is an error. */
-- 
2.7.4


Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Roger Pau Monné 21 weeks ago
On Thu, May 21, 2020 at 10:43:58PM +0100, Igor Druzhinin wrote:
> If a recalculation NPT fault hasn't been handled explicitly in
> hvm_hap_nested_page_fault() then it's potentially safe to retry -
> US bit has been re-instated in PTE and any real fault would be correctly
> re-raised next time.
> 
> This covers a specific case of migration with vGPU assigned on AMD:
> global log-dirty is enabled and causes immediate recalculation NPT
> fault in MMIO area upon access. This type of fault isn't described
> explicitly in hvm_hap_nested_page_fault (this isn't called on
> EPT misconfig exit on Intel) which results in domain crash.

Couldn't direct MMIO regions be handled like other types of memory for
the purposes of logdiry mode?

I assume there's already a path here used for other memory types when
logdirty is turned on, and hence would seem better to just make direct
MMIO regions also use that path?

> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
>  xen/arch/x86/hvm/svm/svm.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 46a1aac..f0d0bd3 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
>          /* inject #VMEXIT(NPF) into guest. */
>          nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
>          return;
> +    case 0:
> +        /* If a recalculation page fault hasn't been handled - just retry. */
> +        if ( pfec & PFEC_user_mode )
> +            return;

I'm slightly worried that this diverges from the EPT implementation
now, in the sense that returning 0 from hvm_hap_nested_page_fault will
no longer trigger a guest crash.

Thanks, Roger.

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Igor Druzhinin 21 weeks ago
On 22/05/2020 11:08, Roger Pau Monné wrote:
> On Thu, May 21, 2020 at 10:43:58PM +0100, Igor Druzhinin wrote:
>> If a recalculation NPT fault hasn't been handled explicitly in
>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
>> US bit has been re-instated in PTE and any real fault would be correctly
>> re-raised next time.
>>
>> This covers a specific case of migration with vGPU assigned on AMD:
>> global log-dirty is enabled and causes immediate recalculation NPT
>> fault in MMIO area upon access. This type of fault isn't described
>> explicitly in hvm_hap_nested_page_fault (this isn't called on
>> EPT misconfig exit on Intel) which results in domain crash.
> 
> Couldn't direct MMIO regions be handled like other types of memory for
> the purposes of logdiry mode?
> 
> I assume there's already a path here used for other memory types when
> logdirty is turned on, and hence would seem better to just make direct
> MMIO regions also use that path?

The proble of handling only MMIO case is that the issue still stays.
It will be hit with some other memory type since it's not MMIO specific.
The issue is that if global recalculation is called, the next hit to
this type will cause a transient fault which will not be handled
correctly after a due fixup by neither of our handlers.

>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>> ---
>>  xen/arch/x86/hvm/svm/svm.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>> index 46a1aac..f0d0bd3 100644
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
>>          /* inject #VMEXIT(NPF) into guest. */
>>          nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
>>          return;
>> +    case 0:
>> +        /* If a recalculation page fault hasn't been handled - just retry. */
>> +        if ( pfec & PFEC_user_mode )
>> +            return;
> 
> I'm slightly worried that this diverges from the EPT implementation
> now, in the sense that returning 0 from hvm_hap_nested_page_fault will
> no longer trigger a guest crash.

My second alternative from my follow up email addresses this. I also
didn't like this aspect.

Igor

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Roger Pau Monné 21 weeks ago
On Fri, May 22, 2020 at 11:14:24AM +0100, Igor Druzhinin wrote:
> On 22/05/2020 11:08, Roger Pau Monné wrote:
> > On Thu, May 21, 2020 at 10:43:58PM +0100, Igor Druzhinin wrote:
> >> If a recalculation NPT fault hasn't been handled explicitly in
> >> hvm_hap_nested_page_fault() then it's potentially safe to retry -
> >> US bit has been re-instated in PTE and any real fault would be correctly
> >> re-raised next time.
> >>
> >> This covers a specific case of migration with vGPU assigned on AMD:
> >> global log-dirty is enabled and causes immediate recalculation NPT
> >> fault in MMIO area upon access. This type of fault isn't described
> >> explicitly in hvm_hap_nested_page_fault (this isn't called on
> >> EPT misconfig exit on Intel) which results in domain crash.
> > 
> > Couldn't direct MMIO regions be handled like other types of memory for
> > the purposes of logdiry mode?
> > 
> > I assume there's already a path here used for other memory types when
> > logdirty is turned on, and hence would seem better to just make direct
> > MMIO regions also use that path?
> 
> The proble of handling only MMIO case is that the issue still stays.
> It will be hit with some other memory type since it's not MMIO specific.
> The issue is that if global recalculation is called, the next hit to
> this type will cause a transient fault which will not be handled
> correctly after a due fixup by neither of our handlers.

I admit I should go look at the code, but for example RAM p2m types
don't require this fix, so I assume there's some different path taken
in that case that avoids all this?

Ie: when global logdirty is enabled you will start to get nested page
faults for every access, yet only direct MMIO types require this fix?

Thanks, Roger

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Igor Druzhinin 21 weeks ago
On 22/05/2020 11:23, Roger Pau Monné wrote:
> On Fri, May 22, 2020 at 11:14:24AM +0100, Igor Druzhinin wrote:
>> On 22/05/2020 11:08, Roger Pau Monné wrote:
>>> On Thu, May 21, 2020 at 10:43:58PM +0100, Igor Druzhinin wrote:
>>>> If a recalculation NPT fault hasn't been handled explicitly in
>>>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
>>>> US bit has been re-instated in PTE and any real fault would be correctly
>>>> re-raised next time.
>>>>
>>>> This covers a specific case of migration with vGPU assigned on AMD:
>>>> global log-dirty is enabled and causes immediate recalculation NPT
>>>> fault in MMIO area upon access. This type of fault isn't described
>>>> explicitly in hvm_hap_nested_page_fault (this isn't called on
>>>> EPT misconfig exit on Intel) which results in domain crash.
>>>
>>> Couldn't direct MMIO regions be handled like other types of memory for
>>> the purposes of logdiry mode?
>>>
>>> I assume there's already a path here used for other memory types when
>>> logdirty is turned on, and hence would seem better to just make direct
>>> MMIO regions also use that path?
>>
>> The proble of handling only MMIO case is that the issue still stays.
>> It will be hit with some other memory type since it's not MMIO specific.
>> The issue is that if global recalculation is called, the next hit to
>> this type will cause a transient fault which will not be handled
>> correctly after a due fixup by neither of our handlers.
> 
> I admit I should go look at the code, but for example RAM p2m types
> don't require this fix, so I assume there's some different path taken
> in that case that avoids all this?
> 
> Ie: when global logdirty is enabled you will start to get nested page
> faults for every access, yet only direct MMIO types require this fix?

It's not "only MMIO" - it's just MMIO area is hit in my particular case.
I'd prefer this fix to address the general issue otherwise for SVM
we would have to write handlers in hvm_hap_nested_page_fault() for
every case as soon as we hit it.

Igor

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Roger Pau Monné 21 weeks ago
On Fri, May 22, 2020 at 11:27:38AM +0100, Igor Druzhinin wrote:
> On 22/05/2020 11:23, Roger Pau Monné wrote:
> > On Fri, May 22, 2020 at 11:14:24AM +0100, Igor Druzhinin wrote:
> >> On 22/05/2020 11:08, Roger Pau Monné wrote:
> >>> On Thu, May 21, 2020 at 10:43:58PM +0100, Igor Druzhinin wrote:
> >>>> If a recalculation NPT fault hasn't been handled explicitly in
> >>>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
> >>>> US bit has been re-instated in PTE and any real fault would be correctly
> >>>> re-raised next time.
> >>>>
> >>>> This covers a specific case of migration with vGPU assigned on AMD:
> >>>> global log-dirty is enabled and causes immediate recalculation NPT
> >>>> fault in MMIO area upon access. This type of fault isn't described
> >>>> explicitly in hvm_hap_nested_page_fault (this isn't called on
> >>>> EPT misconfig exit on Intel) which results in domain crash.
> >>>
> >>> Couldn't direct MMIO regions be handled like other types of memory for
> >>> the purposes of logdiry mode?
> >>>
> >>> I assume there's already a path here used for other memory types when
> >>> logdirty is turned on, and hence would seem better to just make direct
> >>> MMIO regions also use that path?
> >>
> >> The proble of handling only MMIO case is that the issue still stays.
> >> It will be hit with some other memory type since it's not MMIO specific.
> >> The issue is that if global recalculation is called, the next hit to
> >> this type will cause a transient fault which will not be handled
> >> correctly after a due fixup by neither of our handlers.
> > 
> > I admit I should go look at the code, but for example RAM p2m types
> > don't require this fix, so I assume there's some different path taken
> > in that case that avoids all this?
> > 
> > Ie: when global logdirty is enabled you will start to get nested page
> > faults for every access, yet only direct MMIO types require this fix?
> 
> It's not "only MMIO" - it's just MMIO area is hit in my particular case.
> I'd prefer this fix to address the general issue otherwise for SVM
> we would have to write handlers in hvm_hap_nested_page_fault() for
> every case as soon as we hit it.

Hm, I'm not sure I agree. p2m memory types are limited, and IMO we
want to have strict control about how they are handled.
hvm_hap_nested_page_fault is already full of special casing for each
memory type for that reason.

That being said, I also don't like the fact that logdity is handled
differently between EPT and NPT, as on EPT it's handled as a
misconfig while on NPT it's handled as a violation.

Thanks, Roger.

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Jan Beulich 21 weeks ago
On 22.05.2020 13:11, Roger Pau Monné wrote:
> That being said, I also don't like the fact that logdity is handled
> differently between EPT and NPT, as on EPT it's handled as a
> misconfig while on NPT it's handled as a violation.

Because, well, there is no concept of misconfig in NPT.

Jan

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Andrew Cooper 21 weeks ago
On 22/05/2020 14:04, Jan Beulich wrote:
> On 22.05.2020 13:11, Roger Pau Monné wrote:
>> That being said, I also don't like the fact that logdity is handled
>> differently between EPT and NPT, as on EPT it's handled as a
>> misconfig while on NPT it's handled as a violation.
> Because, well, there is no concept of misconfig in NPT.

Indeed.  Intel chose to split EPT errors into two - MISCONFIG for
structural errors (not present, or reserved bits set) and VIOLATION for
permissions errors.

AMD reused the same silicon pagewalker design, so have a single
NPT_FAULT vmexit which behaves much more like a regular pagefault,
encoding structural vs permission errors in the error code.

~Andrew

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Roger Pau Monné 21 weeks ago
On Fri, May 22, 2020 at 02:11:15PM +0100, Andrew Cooper wrote:
> On 22/05/2020 14:04, Jan Beulich wrote:
> > On 22.05.2020 13:11, Roger Pau Monné wrote:
> >> That being said, I also don't like the fact that logdity is handled
> >> differently between EPT and NPT, as on EPT it's handled as a
> >> misconfig while on NPT it's handled as a violation.
> > Because, well, there is no concept of misconfig in NPT.
> 
> Indeed.  Intel chose to split EPT errors into two - MISCONFIG for
> structural errors (not present, or reserved bits set) and VIOLATION for
> permissions errors.
> 
> AMD reused the same silicon pagewalker design, so have a single
> NPT_FAULT vmexit which behaves much more like a regular pagefault,
> encoding structural vs permission errors in the error code.

Maybe I should clarify, I understand that NPT doesn't have such
differentiation regarding nested page table faults vs EPT, but I feel
like it would be clearer if part of the code could be shared, ie:
unify EPT resolve_misconfig and NPT do_recalc into a single function
for example that uses the necessary p2m-> helpers for the differing
implementations. I think we should be able to tell apart when a NPT
page fault is a recalc one by looking at the bits in the EXITINFO1
error field?

Anyway, this was just a rant, and it's tangential to the issue at
hand, sorry for distracting.

Roger.

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Andrew Cooper 21 weeks ago
On 22/05/2020 14:32, Roger Pau Monné wrote:
> On Fri, May 22, 2020 at 02:11:15PM +0100, Andrew Cooper wrote:
>> On 22/05/2020 14:04, Jan Beulich wrote:
>>> On 22.05.2020 13:11, Roger Pau Monné wrote:
>>>> That being said, I also don't like the fact that logdity is handled
>>>> differently between EPT and NPT, as on EPT it's handled as a
>>>> misconfig while on NPT it's handled as a violation.
>>> Because, well, there is no concept of misconfig in NPT.
>> Indeed.  Intel chose to split EPT errors into two - MISCONFIG for
>> structural errors (not present, or reserved bits set) and VIOLATION for
>> permissions errors.
>>
>> AMD reused the same silicon pagewalker design, so have a single
>> NPT_FAULT vmexit which behaves much more like a regular pagefault,
>> encoding structural vs permission errors in the error code.
> Maybe I should clarify, I understand that NPT doesn't have such
> differentiation regarding nested page table faults vs EPT, but I feel
> like it would be clearer if part of the code could be shared, ie:
> unify EPT resolve_misconfig and NPT do_recalc into a single function
> for example that uses the necessary p2m-> helpers for the differing
> implementations. I think we should be able to tell apart when a NPT
> page fault is a recalc one by looking at the bits in the EXITINFO1
> error field?

But they use fundamentally different mechanisms.

EPT uses an invalid caching type, while NPT sets the User bit (and even
this is going to have to change when we want to support GMET for Windows
VBS in the long term).

You could abstract a few things out into common logic, but none of the
bit positions match (not even the recalc software bit), and the result
would be more complicated than its current form.

~Andrew

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Andrew Cooper 21 weeks ago
On 21/05/2020 22:43, Igor Druzhinin wrote:
> If a recalculation NPT fault hasn't been handled explicitly in
> hvm_hap_nested_page_fault() then it's potentially safe to retry -
> US bit has been re-instated in PTE and any real fault would be correctly
> re-raised next time.
>
> This covers a specific case of migration with vGPU assigned on AMD:
> global log-dirty is enabled and causes immediate recalculation NPT
> fault in MMIO area upon access. This type of fault isn't described
> explicitly in hvm_hap_nested_page_fault (this isn't called on
> EPT misconfig exit on Intel) which results in domain crash.
>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
>  xen/arch/x86/hvm/svm/svm.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 46a1aac..f0d0bd3 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
>          /* inject #VMEXIT(NPF) into guest. */
>          nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
>          return;
> +    case 0:
> +        /* If a recalculation page fault hasn't been handled - just retry. */
> +        if ( pfec & PFEC_user_mode )
> +            return;

This smells like it is a recipe for livelocks.

Everything should have been handled properly by the call to
p2m_pt_handle_deferred_changes() which precedes svm_do_nested_pgfault().

It is legitimate for the MMIO mapping to end up being transiently
recalculated, but the fact that p2m_pt_handle_deferred_changes() doesn't
fix it up suggests that the bug is there.

Do you have the complete NPT walk to the bad mapping? Do we have
_PAGE_USER in the leaf mapping, or is this perhaps a spurious fault?

~Andrew

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Igor Druzhinin 21 weeks ago
On 22/05/2020 10:45, Andrew Cooper wrote:
> On 21/05/2020 22:43, Igor Druzhinin wrote:
>> If a recalculation NPT fault hasn't been handled explicitly in
>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
>> US bit has been re-instated in PTE and any real fault would be correctly
>> re-raised next time.
>>
>> This covers a specific case of migration with vGPU assigned on AMD:
>> global log-dirty is enabled and causes immediate recalculation NPT
>> fault in MMIO area upon access. This type of fault isn't described
>> explicitly in hvm_hap_nested_page_fault (this isn't called on
>> EPT misconfig exit on Intel) which results in domain crash.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>> ---
>>  xen/arch/x86/hvm/svm/svm.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>> index 46a1aac..f0d0bd3 100644
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
>>          /* inject #VMEXIT(NPF) into guest. */
>>          nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
>>          return;
>> +    case 0:
>> +        /* If a recalculation page fault hasn't been handled - just retry. */
>> +        if ( pfec & PFEC_user_mode )
>> +            return;
> 
> This smells like it is a recipe for livelocks.
> 
> Everything should have been handled properly by the call to
> p2m_pt_handle_deferred_changes() which precedes svm_do_nested_pgfault().
> 
> It is legitimate for the MMIO mapping to end up being transiently
> recalculated, but the fact that p2m_pt_handle_deferred_changes() doesn't
> fix it up suggests that the bug is there.
> 
> Do you have the complete NPT walk to the bad mapping? Do we have
> _PAGE_USER in the leaf mapping, or is this perhaps a spurious fault?

It does fix it up. The problem is that currently in SVM we enter
svm_do_nested_pgfault immediately after p2m_pt_handle_deferred_changes
is finished finished. 

Yes, we don't have _PAGE_USER initially and, yes, it's fixed up
correctly in p2m_pt_handle_deferred_changes but svm_do_nested_pgfault
doesn't know about it.

Please read my second email about alternatives that suggest to resolve
the issue you're worrying about.

Igor

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Andrew Cooper 21 weeks ago
On 22/05/2020 11:05, Igor Druzhinin wrote:
> On 22/05/2020 10:45, Andrew Cooper wrote:
>> On 21/05/2020 22:43, Igor Druzhinin wrote:
>>> If a recalculation NPT fault hasn't been handled explicitly in
>>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
>>> US bit has been re-instated in PTE and any real fault would be correctly
>>> re-raised next time.
>>>
>>> This covers a specific case of migration with vGPU assigned on AMD:
>>> global log-dirty is enabled and causes immediate recalculation NPT
>>> fault in MMIO area upon access. This type of fault isn't described
>>> explicitly in hvm_hap_nested_page_fault (this isn't called on
>>> EPT misconfig exit on Intel) which results in domain crash.
>>>
>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>> ---
>>>  xen/arch/x86/hvm/svm/svm.c | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>>> index 46a1aac..f0d0bd3 100644
>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
>>>          /* inject #VMEXIT(NPF) into guest. */
>>>          nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
>>>          return;
>>> +    case 0:
>>> +        /* If a recalculation page fault hasn't been handled - just retry. */
>>> +        if ( pfec & PFEC_user_mode )
>>> +            return;
>> This smells like it is a recipe for livelocks.
>>
>> Everything should have been handled properly by the call to
>> p2m_pt_handle_deferred_changes() which precedes svm_do_nested_pgfault().
>>
>> It is legitimate for the MMIO mapping to end up being transiently
>> recalculated, but the fact that p2m_pt_handle_deferred_changes() doesn't
>> fix it up suggests that the bug is there.
>>
>> Do you have the complete NPT walk to the bad mapping? Do we have
>> _PAGE_USER in the leaf mapping, or is this perhaps a spurious fault?
> It does fix it up. The problem is that currently in SVM we enter
> svm_do_nested_pgfault immediately after p2m_pt_handle_deferred_changes
> is finished finished.

Oh - so we do.  I'd read the entry condition for svm_do_nested_pgfault()
incorrectly.

Jan - why did you chose to do it this way?  If
p2m_pt_handle_deferred_changes() has made a modification, there is
surely nothing relevant to do in svm_do_nested_pgfault().

~Andrew

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Jan Beulich 21 weeks ago
On 22.05.2020 12:19, Andrew Cooper wrote:
> On 22/05/2020 11:05, Igor Druzhinin wrote:
>> On 22/05/2020 10:45, Andrew Cooper wrote:
>>> On 21/05/2020 22:43, Igor Druzhinin wrote:
>>>> If a recalculation NPT fault hasn't been handled explicitly in
>>>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
>>>> US bit has been re-instated in PTE and any real fault would be correctly
>>>> re-raised next time.
>>>>
>>>> This covers a specific case of migration with vGPU assigned on AMD:
>>>> global log-dirty is enabled and causes immediate recalculation NPT
>>>> fault in MMIO area upon access. This type of fault isn't described
>>>> explicitly in hvm_hap_nested_page_fault (this isn't called on
>>>> EPT misconfig exit on Intel) which results in domain crash.
>>>>
>>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>>> ---
>>>>  xen/arch/x86/hvm/svm/svm.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>>>> index 46a1aac..f0d0bd3 100644
>>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>>> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
>>>>          /* inject #VMEXIT(NPF) into guest. */
>>>>          nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
>>>>          return;
>>>> +    case 0:
>>>> +        /* If a recalculation page fault hasn't been handled - just retry. */
>>>> +        if ( pfec & PFEC_user_mode )
>>>> +            return;
>>> This smells like it is a recipe for livelocks.
>>>
>>> Everything should have been handled properly by the call to
>>> p2m_pt_handle_deferred_changes() which precedes svm_do_nested_pgfault().
>>>
>>> It is legitimate for the MMIO mapping to end up being transiently
>>> recalculated, but the fact that p2m_pt_handle_deferred_changes() doesn't
>>> fix it up suggests that the bug is there.
>>>
>>> Do you have the complete NPT walk to the bad mapping? Do we have
>>> _PAGE_USER in the leaf mapping, or is this perhaps a spurious fault?
>> It does fix it up. The problem is that currently in SVM we enter
>> svm_do_nested_pgfault immediately after p2m_pt_handle_deferred_changes
>> is finished finished.
> 
> Oh - so we do.  I'd read the entry condition for svm_do_nested_pgfault()
> incorrectly.
> 
> Jan - why did you chose to do it this way?  If
> p2m_pt_handle_deferred_changes() has made a modification, there is
> surely nothing relevant to do in svm_do_nested_pgfault().

Why? There can very well be multiple reasons for a single exit
to occur, and hence it did seem to make sense to allow processing
of such possible further reasons right away, instead of re-
entering the guest just to deal with another VM exit. What I can
accept is that the error code may not correctly represent the
"remaining" fault reason anymore at this point.

Jan

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Igor Druzhinin 21 weeks ago
On 22/05/2020 11:19, Andrew Cooper wrote:
> On 22/05/2020 11:05, Igor Druzhinin wrote:
>> On 22/05/2020 10:45, Andrew Cooper wrote:
>>> On 21/05/2020 22:43, Igor Druzhinin wrote:
>>>> If a recalculation NPT fault hasn't been handled explicitly in
>>>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
>>>> US bit has been re-instated in PTE and any real fault would be correctly
>>>> re-raised next time.
>>>>
>>>> This covers a specific case of migration with vGPU assigned on AMD:
>>>> global log-dirty is enabled and causes immediate recalculation NPT
>>>> fault in MMIO area upon access. This type of fault isn't described
>>>> explicitly in hvm_hap_nested_page_fault (this isn't called on
>>>> EPT misconfig exit on Intel) which results in domain crash.
>>>>
>>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>>> ---
>>>>  xen/arch/x86/hvm/svm/svm.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>>>> index 46a1aac..f0d0bd3 100644
>>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>>> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
>>>>          /* inject #VMEXIT(NPF) into guest. */
>>>>          nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
>>>>          return;
>>>> +    case 0:
>>>> +        /* If a recalculation page fault hasn't been handled - just retry. */
>>>> +        if ( pfec & PFEC_user_mode )
>>>> +            return;
>>> This smells like it is a recipe for livelocks.
>>>
>>> Everything should have been handled properly by the call to
>>> p2m_pt_handle_deferred_changes() which precedes svm_do_nested_pgfault().
>>>
>>> It is legitimate for the MMIO mapping to end up being transiently
>>> recalculated, but the fact that p2m_pt_handle_deferred_changes() doesn't
>>> fix it up suggests that the bug is there.
>>>
>>> Do you have the complete NPT walk to the bad mapping? Do we have
>>> _PAGE_USER in the leaf mapping, or is this perhaps a spurious fault?
>> It does fix it up. The problem is that currently in SVM we enter
>> svm_do_nested_pgfault immediately after p2m_pt_handle_deferred_changes
>> is finished finished.
> 
> Oh - so we do.  I'd read the entry condition for svm_do_nested_pgfault()
> incorrectly.
> 
> Jan - why did you chose to do it this way?  If
> p2m_pt_handle_deferred_changes() has made a modification, there is
> surely nothing relevant to do in svm_do_nested_pgfault().

In Jan's defense that saves one additional VMEXIT in rare cases
if the fault had other implications (write to RO page in log-dirty)
in addition to recalculation.

Igor
other 

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

Posted by Igor Druzhinin 21 weeks ago
On 21/05/2020 22:43, Igor Druzhinin wrote:
> If a recalculation NPT fault hasn't been handled explicitly in
> hvm_hap_nested_page_fault() then it's potentially safe to retry -
> US bit has been re-instated in PTE and any real fault would be correctly
> re-raised next time.
> 
> This covers a specific case of migration with vGPU assigned on AMD:
> global log-dirty is enabled and causes immediate recalculation NPT
> fault in MMIO area upon access. This type of fault isn't described
> explicitly in hvm_hap_nested_page_fault (this isn't called on
> EPT misconfig exit on Intel) which results in domain crash.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---

Alternatively, I can re-raise the fault immediately after recalculation is
done which is less efficient (will take one more VMEXIT) but safer IMO -
hvm_hap_nested_page_fault might potentially leave VM in inconsistent state
in case of a real failure and cause second page fault to conceal it.

Another alternative is to inject fall_through bool into hvm_hap_nested_page_fault
to give it the idea of expected behavior in that case and avoid guessing in SVM
code. I think that's an improvement over suggestion in v1 and a candidate for v2. 

Igor