[PATCH] x86: Always have CR4.PKE set in HVM context

Andrew Cooper posted 1 patch 2 years, 11 months ago
Test gitlab-ci passed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20210429221223.28348-1-andrew.cooper3@citrix.com
xen/arch/x86/mm/guest_walk.c    |  2 +-
xen/arch/x86/pv/domain.c        | 16 +++++++++++++++-
xen/arch/x86/setup.c            |  3 +++
xen/include/asm-x86/processor.h | 10 +---------
4 files changed, 20 insertions(+), 11 deletions(-)
[PATCH] x86: Always have CR4.PKE set in HVM context
Posted by Andrew Cooper 2 years, 11 months ago
The sole user of read_pkru() is the emulated pagewalk, and guarded behind
guest_pku_enabled() which restricts the path to HVM (hap, even) context only.

The commentary in read_pkru() concerning _PAGE_GNTTAB overlapping with
_PAGE_PKEY_BITS is only applicable to PV guests.

The context switch path, via write_ptbase() unconditionally writes CR4 on any
context switch.

Therefore, we can guarantee to separate CR4.PKE between PV and HVM context at
no extra cost.  Set PKE in mmu_cr4_features on boot, so it becomes set in HVM
context, and clear it in pv_make_cr4().

Rename read_pkru() to rdpkru() now that it is a simple wrapper around the
instruction.  This saves two CR4 writes on every pagewalk, which typically
occur more than one per emulation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

It also occurs to me that for HVM/Idle => HVM/Idle context switches, we never
need to change CR4.  I think this is substantially clearer following XSA-293 /
c/s b2dd00574a4f ("x86/pv: Rewrite guest %cr4 handling from scratch") which
introduced pv_make_cr4().

Therefore, it is probably work having a hvm fastpath path in write_ptbase()
which only touches CR3.
---
 xen/arch/x86/mm/guest_walk.c    |  2 +-
 xen/arch/x86/pv/domain.c        | 16 +++++++++++++++-
 xen/arch/x86/setup.c            |  3 +++
 xen/include/asm-x86/processor.h | 10 +---------
 4 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 1c601314f3..30d83cf1e0 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -416,7 +416,7 @@ guest_walk_tables(const struct vcpu *v, struct p2m_domain *p2m,
          guest_pku_enabled(v) )
     {
         unsigned int pkey = guest_l1e_get_pkey(gw->l1e);
-        unsigned int pkru = read_pkru();
+        unsigned int pkru = rdpkru();
 
         if ( read_pkru_ad(pkru, pkey) ||
              ((walk & PFEC_write_access) && read_pkru_wd(pkru, pkey) &&
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index f1cb92585e..731a262a2b 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -182,7 +182,21 @@ unsigned long pv_make_cr4(const struct vcpu *v)
 {
     const struct domain *d = v->domain;
     unsigned long cr4 = mmu_cr4_features &
-        ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD);
+        ~(X86_CR4_PCIDE | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_PKE);
+
+    /*
+     * We want CR4.PKE set in HVM context when avaialble, but don't support it
+     * in PV context at all.
+     *
+     * _PAGE_PKEY_BITS where previously software available PTE bits.  In
+     * principle, we could let an aware PV guest enable PKE.
+     *
+     * However, Xen uses _PAGE_GNTTAB in debug builds which overlaps with
+     * _PAGE_PKEY_BITS, and the ownership of (and eligibility to move)
+     * software PTE bits is not considered in the PV ABI at all.  For now,
+     * punt the problem to whichever unluckly person finds a compelling
+     * usecase for PKRU in PV guests.
+     */
 
     /*
      * PCIDE or PGE depends on the PCID/XPTI settings, but must not both be
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index f2dff2ae6a..8105dc36bb 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1790,6 +1790,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( boot_cpu_has(X86_FEATURE_FSGSBASE) )
         set_in_cr4(X86_CR4_FSGSBASE);
 
+    if ( boot_cpu_has(X86_FEATURE_PKU) )
+        set_in_cr4(X86_CR4_PKE);
+
     if ( opt_invpcid && cpu_has_invpcid )
         use_invpcid = true;
 
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index d5f467d245..d8d0dc8034 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -367,20 +367,12 @@ static always_inline void set_in_cr4 (unsigned long mask)
     write_cr4(read_cr4() | mask);
 }
 
-static inline unsigned int read_pkru(void)
+static inline unsigned int rdpkru(void)
 {
     unsigned int pkru;
-    unsigned long cr4 = read_cr4();
 
-    /*
-     * _PAGE_PKEY_BITS have a conflict with _PAGE_GNTTAB used by PV guests,
-     * so that X86_CR4_PKE  is disabled on hypervisor. To use RDPKRU, CR4.PKE
-     * gets temporarily enabled.
-     */
-    write_cr4(cr4 | X86_CR4_PKE);
     asm volatile (".byte 0x0f,0x01,0xee"
         : "=a" (pkru) : "c" (0) : "dx");
-    write_cr4(cr4);
 
     return pkru;
 }
-- 
2.11.0


Re: [PATCH] x86: Always have CR4.PKE set in HVM context
Posted by Jan Beulich 2 years, 11 months ago
On 30.04.2021 00:12, Andrew Cooper wrote:
> The sole user of read_pkru() is the emulated pagewalk, and guarded behind
> guest_pku_enabled() which restricts the path to HVM (hap, even) context only.
> 
> The commentary in read_pkru() concerning _PAGE_GNTTAB overlapping with
> _PAGE_PKEY_BITS is only applicable to PV guests.
> 
> The context switch path, via write_ptbase() unconditionally writes CR4 on any
> context switch.
> 
> Therefore, we can guarantee to separate CR4.PKE between PV and HVM context at
> no extra cost.  Set PKE in mmu_cr4_features on boot, so it becomes set in HVM
> context, and clear it in pv_make_cr4().
> 
> Rename read_pkru() to rdpkru() now that it is a simple wrapper around the
> instruction.  This saves two CR4 writes on every pagewalk, which typically
> occur more than one per emulation.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

The changes looks perfectly fine to me, but I still feel urged to make
Reviewed-by: Jan Beulich <jbeulich@suse.com>
conditional upon my "x86emul: support RDPKRU/WRPKRU" (submitted exactly
half a year ago) going in first, unless there were to be review comments
making extensive rework necessary. In the absence of such review
feedback, I consider it pretty inappropriate for me to do rebasing work
(no matter that this would be largely mechanical afaics) here for a
patch which has been pending and effectively ignored for so long. (The
main thing that immediately struck me as odd was "The sole user of
read_pkru() is ...".)

I'm sorry, Jan

Re: [PATCH] x86: Always have CR4.PKE set in HVM context
Posted by Andrew Cooper 2 years, 11 months ago
On 30/04/2021 10:08, Jan Beulich wrote:
> On 30.04.2021 00:12, Andrew Cooper wrote:
>> The sole user of read_pkru() is the emulated pagewalk, and guarded behind
>> guest_pku_enabled() which restricts the path to HVM (hap, even) context only.
>>
>> The commentary in read_pkru() concerning _PAGE_GNTTAB overlapping with
>> _PAGE_PKEY_BITS is only applicable to PV guests.
>>
>> The context switch path, via write_ptbase() unconditionally writes CR4 on any
>> context switch.
>>
>> Therefore, we can guarantee to separate CR4.PKE between PV and HVM context at
>> no extra cost.  Set PKE in mmu_cr4_features on boot, so it becomes set in HVM
>> context, and clear it in pv_make_cr4().
>>
>> Rename read_pkru() to rdpkru() now that it is a simple wrapper around the
>> instruction.  This saves two CR4 writes on every pagewalk, which typically
>> occur more than one per emulation.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> The changes looks perfectly fine to me, but I still feel urged to make
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> conditional upon my "x86emul: support RDPKRU/WRPKRU" (submitted exactly
> half a year ago) going in first, unless there were to be review comments
> making extensive rework necessary. In the absence of such review
> feedback, I consider it pretty inappropriate for me to do rebasing work
> (no matter that this would be largely mechanical afaics) here for a
> patch which has been pending and effectively ignored for so long. (The
> main thing that immediately struck me as odd was "The sole user of
> read_pkru() is ...".)

So the observation about "sole user" occurred to me too, right after I
sent this.  Then I thought "surely Jan has spotted this and done a patch
for it".

Presumably you're talking about "x86emul: support RDPKRU/WRPKRU" from
Oct 30th ?  I found that via the archives, but I literally don't have a
copy in my inbox.


If I do the rebase, are you happy for this patch to stay as it is (so
the complicated change concerning context switching doesn't get more
complicated), and so we're not knowingly adding new constructs which
need immediate changes?

~Andrew


Re: [PATCH] x86: Always have CR4.PKE set in HVM context
Posted by Jan Beulich 2 years, 11 months ago
On 30.04.2021 12:21, Andrew Cooper wrote:
> On 30/04/2021 10:08, Jan Beulich wrote:
>> On 30.04.2021 00:12, Andrew Cooper wrote:
>>> The sole user of read_pkru() is the emulated pagewalk, and guarded behind
>>> guest_pku_enabled() which restricts the path to HVM (hap, even) context only.
>>>
>>> The commentary in read_pkru() concerning _PAGE_GNTTAB overlapping with
>>> _PAGE_PKEY_BITS is only applicable to PV guests.
>>>
>>> The context switch path, via write_ptbase() unconditionally writes CR4 on any
>>> context switch.
>>>
>>> Therefore, we can guarantee to separate CR4.PKE between PV and HVM context at
>>> no extra cost.  Set PKE in mmu_cr4_features on boot, so it becomes set in HVM
>>> context, and clear it in pv_make_cr4().
>>>
>>> Rename read_pkru() to rdpkru() now that it is a simple wrapper around the
>>> instruction.  This saves two CR4 writes on every pagewalk, which typically
>>> occur more than one per emulation.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> The changes looks perfectly fine to me, but I still feel urged to make
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> conditional upon my "x86emul: support RDPKRU/WRPKRU" (submitted exactly
>> half a year ago) going in first, unless there were to be review comments
>> making extensive rework necessary. In the absence of such review
>> feedback, I consider it pretty inappropriate for me to do rebasing work
>> (no matter that this would be largely mechanical afaics) here for a
>> patch which has been pending and effectively ignored for so long. (The
>> main thing that immediately struck me as odd was "The sole user of
>> read_pkru() is ...".)
> 
> So the observation about "sole user" occurred to me too, right after I
> sent this.  Then I thought "surely Jan has spotted this and done a patch
> for it".
> 
> Presumably you're talking about "x86emul: support RDPKRU/WRPKRU" from
> Oct 30th ?  I found that via the archives, but I literally don't have a
> copy in my inbox.

Odd. Was that then the time of your severe email issues, and were your
IT folks not even able to make sure pending email got delivered
afterwards (or at least enumerate what emails got discarded)? If I had
got a reply saying the mail couldn't be delivered, I surely would have
resent it.

Just to be sure - you seem to have got a copy of "x86emul: de-duplicate
scatters to the same linear address", as I seem to recall you responding
there, albeit not with an ack or anything I could actually act upon (and
this might have been in irc). That was sent just a few days later, and
suffers a similar stall. And while in a ping I did say I would commit it
without ack, I'm really hesitant to do so there. I've put it on next
week's community meeting's agenda, just in case.

> If I do the rebase, are you happy for this patch to stay as it is (so
> the complicated change concerning context switching doesn't get more
> complicated), and so we're not knowingly adding new constructs which
> need immediate changes?

Well, the answer is not just "yes", but in reality I wouldn't mind
doing the rebasing myself, if only I knew it wasn't for the purpose of
waiting another half year for an ack (or otherwise).

Jan

Re: [PATCH] x86: Always have CR4.PKE set in HVM context
Posted by Andrew Cooper 2 years, 11 months ago
On 30/04/2021 11:42, Jan Beulich wrote:
> On 30.04.2021 12:21, Andrew Cooper wrote:
>> On 30/04/2021 10:08, Jan Beulich wrote:
>>> On 30.04.2021 00:12, Andrew Cooper wrote:
>>>> The sole user of read_pkru() is the emulated pagewalk, and guarded behind
>>>> guest_pku_enabled() which restricts the path to HVM (hap, even) context only.
>>>>
>>>> The commentary in read_pkru() concerning _PAGE_GNTTAB overlapping with
>>>> _PAGE_PKEY_BITS is only applicable to PV guests.
>>>>
>>>> The context switch path, via write_ptbase() unconditionally writes CR4 on any
>>>> context switch.
>>>>
>>>> Therefore, we can guarantee to separate CR4.PKE between PV and HVM context at
>>>> no extra cost.  Set PKE in mmu_cr4_features on boot, so it becomes set in HVM
>>>> context, and clear it in pv_make_cr4().
>>>>
>>>> Rename read_pkru() to rdpkru() now that it is a simple wrapper around the
>>>> instruction.  This saves two CR4 writes on every pagewalk, which typically
>>>> occur more than one per emulation.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> The changes looks perfectly fine to me, but I still feel urged to make
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>> conditional upon my "x86emul: support RDPKRU/WRPKRU" (submitted exactly
>>> half a year ago) going in first, unless there were to be review comments
>>> making extensive rework necessary. In the absence of such review
>>> feedback, I consider it pretty inappropriate for me to do rebasing work
>>> (no matter that this would be largely mechanical afaics) here for a
>>> patch which has been pending and effectively ignored for so long. (The
>>> main thing that immediately struck me as odd was "The sole user of
>>> read_pkru() is ...".)
>> So the observation about "sole user" occurred to me too, right after I
>> sent this.  Then I thought "surely Jan has spotted this and done a patch
>> for it".
>>
>> Presumably you're talking about "x86emul: support RDPKRU/WRPKRU" from
>> Oct 30th ?  I found that via the archives, but I literally don't have a
>> copy in my inbox.
> Odd. Was that then the time of your severe email issues, and were your
> IT folks not even able to make sure pending email got delivered
> afterwards (or at least enumerate what emails got discarded)? If I had
> got a reply saying the mail couldn't be delivered, I surely would have
> resent it.
>
> Just to be sure - you seem to have got a copy of "x86emul: de-duplicate
> scatters to the same linear address", as I seem to recall you responding
> there, albeit not with an ack or anything I could actually act upon (and
> this might have been in irc). That was sent just a few days later, and
> suffers a similar stall. And while in a ping I did say I would commit it
> without ack, I'm really hesitant to do so there. I've put it on next
> week's community meeting's agenda, just in case.

I do recall that patch.

>
>> If I do the rebase, are you happy for this patch to stay as it is (so
>> the complicated change concerning context switching doesn't get more
>> complicated), and so we're not knowingly adding new constructs which
>> need immediate changes?
> Well, the answer is not just "yes", but in reality I wouldn't mind
> doing the rebasing myself, if only I knew it wasn't for the purpose of
> waiting another half year for an ack (or otherwise).

The patch itself looks entirely fine.  Reviewed-by: Andrew Cooper
<andrew.cooper3@citrix.com>

The only observation I've got is that the other instructions in Grp7
probably want a blanket conversion from generate_exception_if(vex.pfx,
EXC_UD); to use the unimplemented_insn path instead, but that's clearly
further work.

I'll commit this patch, and the rebase delta on yours ought to just be
the naming of the helpers.

~Andrew


Re: [PATCH] x86: Always have CR4.PKE set in HVM context
Posted by Jan Beulich 2 years, 11 months ago
On 30.04.2021 16:37, Andrew Cooper wrote:
> On 30/04/2021 11:42, Jan Beulich wrote:
>> On 30.04.2021 12:21, Andrew Cooper wrote:
>>> If I do the rebase, are you happy for this patch to stay as it is (so
>>> the complicated change concerning context switching doesn't get more
>>> complicated), and so we're not knowingly adding new constructs which
>>> need immediate changes?
>> Well, the answer is not just "yes", but in reality I wouldn't mind
>> doing the rebasing myself, if only I knew it wasn't for the purpose of
>> waiting another half year for an ack (or otherwise).
> 
> The patch itself looks entirely fine.  Reviewed-by: Andrew Cooper
> <andrew.cooper3@citrix.com>

Thanks.

> The only observation I've got is that the other instructions in Grp7
> probably want a blanket conversion from generate_exception_if(vex.pfx,
> EXC_UD); to use the unimplemented_insn path instead, but that's clearly
> further work.

Since this is highly inconsistent at present, we first need to put
down a scheme, as I don't think we want _all_ #UD raising converted
to "goto unimplemented_insn". I've been thinking about where to draw
the line every time I've been adding new insns half-way recently,
but I didn't come to any good conclusion yet.

> I'll commit this patch, and the rebase delta on yours ought to just be
> the naming of the helpers.

Plus the dropping of the CR4 writes in write_pkru().

Jan

Re: [PATCH] x86: Always have CR4.PKE set in HVM context
Posted by Jan Beulich 2 years, 11 months ago
On 30.04.2021 00:12, Andrew Cooper wrote:
> The sole user of read_pkru() is the emulated pagewalk, and guarded behind
> guest_pku_enabled() which restricts the path to HVM (hap, even) context only.
> 
> The commentary in read_pkru() concerning _PAGE_GNTTAB overlapping with
> _PAGE_PKEY_BITS is only applicable to PV guests.
> 
> The context switch path, via write_ptbase() unconditionally writes CR4 on any
> context switch.
> 
> Therefore, we can guarantee to separate CR4.PKE between PV and HVM context at
> no extra cost.  Set PKE in mmu_cr4_features on boot, so it becomes set in HVM
> context, and clear it in pv_make_cr4().
> 
> Rename read_pkru() to rdpkru() now that it is a simple wrapper around the
> instruction.  This saves two CR4 writes on every pagewalk, which typically
> occur more than one per emulation.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> 
> It also occurs to me that for HVM/Idle => HVM/Idle context switches, we never
> need to change CR4.  I think this is substantially clearer following XSA-293 /
> c/s b2dd00574a4f ("x86/pv: Rewrite guest %cr4 handling from scratch") which
> introduced pv_make_cr4().

Never needing to change CR4 doesn't uniformly mean writes can be avoided.
Part of the purpose of the writes is to flush the TLB. Per-domain as well
as shadow mappings may be in need of such if global mappings are used
anywhere.

Jan

Re: [PATCH] x86: Always have CR4.PKE set in HVM context
Posted by Andrew Cooper 2 years, 11 months ago
On 03/05/2021 13:42, Jan Beulich wrote:
> On 30.04.2021 00:12, Andrew Cooper wrote:
>> The sole user of read_pkru() is the emulated pagewalk, and guarded behind
>> guest_pku_enabled() which restricts the path to HVM (hap, even) context only.
>>
>> The commentary in read_pkru() concerning _PAGE_GNTTAB overlapping with
>> _PAGE_PKEY_BITS is only applicable to PV guests.
>>
>> The context switch path, via write_ptbase() unconditionally writes CR4 on any
>> context switch.
>>
>> Therefore, we can guarantee to separate CR4.PKE between PV and HVM context at
>> no extra cost.  Set PKE in mmu_cr4_features on boot, so it becomes set in HVM
>> context, and clear it in pv_make_cr4().
>>
>> Rename read_pkru() to rdpkru() now that it is a simple wrapper around the
>> instruction.  This saves two CR4 writes on every pagewalk, which typically
>> occur more than one per emulation.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Wei Liu <wl@xen.org>
>>
>> It also occurs to me that for HVM/Idle => HVM/Idle context switches, we never
>> need to change CR4.  I think this is substantially clearer following XSA-293 /
>> c/s b2dd00574a4f ("x86/pv: Rewrite guest %cr4 handling from scratch") which
>> introduced pv_make_cr4().
> Never needing to change CR4 doesn't uniformly mean writes can be avoided.
> Part of the purpose of the writes is to flush the TLB. Per-domain as well
> as shadow mappings may be in need of such if global mappings are used
> anywhere.

Per domain are not global.  Shadows from HVM guests are a) surely not
global given their changeability, and b) in a non-root TLB tag.

Details like this do need checking, but it shouldn't be hard to improve
on what we've currently got.

~Andrew