This is a debug behaviour to identify buggy kernels. Crashing the domain is
the most unhelpful thing to do, because it discards the relevant context.
Instead, inject #GP[0] like other permission errors in x86. In particular,
this lets the kernel provide a backtrace that's actually helpful to a
developer trying to figure out what's going wrong.
As a bugfix, this always injects #GP[0] to current, not l1e_owner. It is not
l1e_owner's fault if dom0 using superpowers triggers an implicit unmap.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Juergen Gross <jgross@suse.com>
This is a prerequisite to investigating
https://github.com/QubesOS/qubes-issues/issues/7631 which is looking like an
error in Linux's gntdev driver.
---
xen/arch/x86/mm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5b81d5fbdbb2..b3393385ffb6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1232,7 +1232,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
gdprintk(XENLOG_WARNING,
"Attempt to implicitly unmap a granted PTE %" PRIpte "\n",
l1e_get_intpte(l1e));
- domain_crash(l1e_owner);
+ pv_inject_hw_exception(TRAP_gp_fault, 0);
}
#endif
--
2.11.0
On 25.07.2022 19:50, Andrew Cooper wrote: > This is a debug behaviour to identify buggy kernels. Crashing the domain is > the most unhelpful thing to do, because it discards the relevant context. > > Instead, inject #GP[0] like other permission errors in x86. In particular, > this lets the kernel provide a backtrace that's actually helpful to a > developer trying to figure out what's going wrong. > > As a bugfix, this always injects #GP[0] to current, not l1e_owner. It is not > l1e_owner's fault if dom0 using superpowers triggers an implicit unmap. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Albeit preferably with ... > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -1232,7 +1232,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner) > gdprintk(XENLOG_WARNING, > "Attempt to implicitly unmap a granted PTE %" PRIpte "\n", > l1e_get_intpte(l1e)); > - domain_crash(l1e_owner); > + pv_inject_hw_exception(TRAP_gp_fault, 0); > } > #endif ... the gdprintk() adjusted to also log l1e_owner. Jan
On 26/07/2022 07:29, Jan Beulich wrote:
> On 25.07.2022 19:50, Andrew Cooper wrote:
>> This is a debug behaviour to identify buggy kernels. Crashing the domain is
>> the most unhelpful thing to do, because it discards the relevant context.
>>
>> Instead, inject #GP[0] like other permission errors in x86. In particular,
>> this lets the kernel provide a backtrace that's actually helpful to a
>> developer trying to figure out what's going wrong.
>>
>> As a bugfix, this always injects #GP[0] to current, not l1e_owner. It is not
>> l1e_owner's fault if dom0 using superpowers triggers an implicit unmap.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
>
> Albeit preferably with ...
>
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -1232,7 +1232,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
>> gdprintk(XENLOG_WARNING,
>> "Attempt to implicitly unmap a granted PTE %" PRIpte "\n",
>> l1e_get_intpte(l1e));
>> - domain_crash(l1e_owner);
>> + pv_inject_hw_exception(TRAP_gp_fault, 0);
>> }
>> #endif
> ... the gdprintk() adjusted to also log l1e_owner.
Ok, how about this incremental?
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b3393385ffb6..74054fb5f4ee 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1229,9 +1229,9 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct
domain *l1e_owner)
if ( (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
!l1e_owner->is_shutting_down && !l1e_owner->is_dying )
{
- gdprintk(XENLOG_WARNING,
- "Attempt to implicitly unmap a granted PTE %" PRIpte "\n",
- l1e_get_intpte(l1e));
+ gprintk(XENLOG_WARNING,
+ "Attempt to implicitly %pd's gntmap PTE %" PRIpte "\n",
+ l1e_owner, l1e_get_intpte(l1e));
pv_inject_hw_exception(TRAP_gp_fault, 0);
}
#endif
The printk() needs to not be omitted in release builds which happen to
have this logic compiled in.
~Andrew
On 26.07.2022 13:51, Andrew Cooper wrote:
> On 26/07/2022 07:29, Jan Beulich wrote:
>> On 25.07.2022 19:50, Andrew Cooper wrote:
>>> This is a debug behaviour to identify buggy kernels. Crashing the domain is
>>> the most unhelpful thing to do, because it discards the relevant context.
>>>
>>> Instead, inject #GP[0] like other permission errors in x86. In particular,
>>> this lets the kernel provide a backtrace that's actually helpful to a
>>> developer trying to figure out what's going wrong.
>>>
>>> As a bugfix, this always injects #GP[0] to current, not l1e_owner. It is not
>>> l1e_owner's fault if dom0 using superpowers triggers an implicit unmap.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Acked-by: Jan Beulich <jbeulich@suse.com>
>>
>> Albeit preferably with ...
>>
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -1232,7 +1232,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
>>> gdprintk(XENLOG_WARNING,
>>> "Attempt to implicitly unmap a granted PTE %" PRIpte "\n",
>>> l1e_get_intpte(l1e));
>>> - domain_crash(l1e_owner);
>>> + pv_inject_hw_exception(TRAP_gp_fault, 0);
>>> }
>>> #endif
>> ... the gdprintk() adjusted to also log l1e_owner.
>
> Ok, how about this incremental?
>
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index b3393385ffb6..74054fb5f4ee 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -1229,9 +1229,9 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct
> domain *l1e_owner)
> if ( (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
> !l1e_owner->is_shutting_down && !l1e_owner->is_dying )
> {
> - gdprintk(XENLOG_WARNING,
> - "Attempt to implicitly unmap a granted PTE %" PRIpte "\n",
> - l1e_get_intpte(l1e));
> + gprintk(XENLOG_WARNING,
> + "Attempt to implicitly %pd's gntmap PTE %" PRIpte "\n",
> + l1e_owner, l1e_get_intpte(l1e));
DYM to drop "unmap"? With it restored (or anything similar to that
effect), fine with me.
Jan
© 2016 - 2026 Red Hat, Inc.