[PATCH v7 0/8] KVM: PKS Virtualization support

Lei Wang posted 8 patches 4 years ago
arch/x86/include/asm/kvm_host.h |  17 +++--
arch/x86/include/asm/vmx.h      |   6 ++
arch/x86/kvm/cpuid.c            |  13 +++-
arch/x86/kvm/kvm_cache_regs.h   |   7 ++
arch/x86/kvm/mmu.h              |  29 +++----
arch/x86/kvm/mmu/mmu.c          | 130 +++++++++++++++++++++++---------
arch/x86/kvm/vmx/capabilities.h |   6 ++
arch/x86/kvm/vmx/nested.c       |  36 ++++++++-
arch/x86/kvm/vmx/vmcs.h         |   1 +
arch/x86/kvm/vmx/vmcs12.c       |   2 +
arch/x86/kvm/vmx/vmcs12.h       |   4 +
arch/x86/kvm/vmx/vmx.c          |  85 +++++++++++++++++++--
arch/x86/kvm/vmx/vmx.h          |  14 +++-
arch/x86/kvm/x86.c              |   9 ++-
arch/x86/kvm/x86.h              |   8 ++
arch/x86/mm/pkeys.c             |   6 ++
include/linux/pks.h             |   7 ++
17 files changed, 301 insertions(+), 79 deletions(-)
[PATCH v7 0/8] KVM: PKS Virtualization support
Posted by Lei Wang 4 years ago
This patch series is based on top of v10 PKS core support kernel patchset:
https://lore.kernel.org/lkml/20220419170649.1022246-1-ira.weiny@intel.com/

---

Protection Keys for Supervisor Pages(PKS) is a feature that extends the
Protection Keys architecture to support thread-specific permission
restrictions on supervisor pages.

PKS works similar to an existing feature named PKU(protecting user pages).
They both perform an additional check after normal paging permission
checks are done. Access or Writes can be disabled via a MSR update
without TLB flushes when permissions changes. If violating this
addional check, #PF occurs and PFEC.PK bit will be set.

PKS introduces MSR IA32_PKRS to manage supervisor protection key
rights. The MSR contains 16 pairs of ADi and WDi bits. Each pair
advertises on a group of pages with the same key which is set in the
leaf paging-structure entries(bits[62:59]). Currently, IA32_PKRS is not
supported by XSAVES architecture.

This patchset aims to add the virtualization of PKS in KVM. It
implemented PKS CPUID enumeration, vmentry/vmexit configuration, MSR
exposure, nested supported etc. Currently, PKS is not yet supported for
shadow paging. 

Detailed information about PKS can be found in the latest Intel 64 and
IA-32 Architectures Software Developer's Manual.

---

Changelogs:

v6->v7
- Add documentation to note that it's nice-to-have cache tracking for PKRS,
  and we also needn't hesitate to rip it out in the future if there's a strong
  reason to drop the caching. (Sean)
- Blindly reading PKRU/PKRS is wrong, fixed. (Sean)
- Add a non-inline helper kvm_mmu_pkr_bits() to read PKR bits. (Sean)
- Delete the comment for exposing the PKS because the pattern is common and the
  behavior is self-explanatory. (Sean)
- Add a helper vmx_set_host_pkrs() for setting host pkrs and rewrite the
  related code for concise. (Sean)
- Align an indentation in arch/x86/kvm/vmx/nested.c. (Sean)
- Read the current PKRS if from_vmentry == false under the nested condition.
  (Sean)
- v6: https://lore.kernel.org/lkml/20220221080840.7369-1-chenyi.qiang@intel.com/

v5->v6
- PKRS is preserved on INIT. Add the PKRS reset operation in kvm_vcpu_reset.
  (Sean)
- Track the pkrs as u32. Add the code WARN on bits 64:32 being set in VMCS field.
  (Sean)
- Adjust the MSR intercept and entry/exit control in VMCS according to
  guest CPUID. This resolve the issue when userspace re-enable this feature.
  (Sean)
- Split VMX restriction on PKS support(entry/exit load controls) out of
  common x86. And put tdp restriction together with PKU in common x86.
  (Sean)
- Thanks for Sean to revise the comments in mmu.c related to
  update_pkr_bitmap, which make it more clear for pkr bitmask cache usage.
- v5: https://lore.kernel.org/lkml/20210811101126.8973-1-chenyi.qiang@intel.com/

v4->v5
- Make setting of MSR intercept/vmcs control bits not dependent on guest.CR4.PKS.
  And set them if PKS is exposed to guest. (Suggested by Sean)
- Add pkrs to standard register caching mechanism to help update
  vcpu->arch.pkrs on demand. Add related helper functions. (Suggested by Sean)
- Do the real pkrs update in VMCS field in vmx_vcpu_reset and
  vmx_sync_vmcs_host_state(). (Sean)
- Add a new mmu_role cr4_pks instead of smushing PKU and PKS together.
  (Sean & Paolo)
- v4: https://lore.kernel.org/lkml/20210205083706.14146-1-chenyi.qiang@intel.com/

v3->v4
- Make the MSR intercept and load-controls setting depend on CR4.PKS value
- shadow the guest pkrs and make it usable in PKS emultion
- add the cr4_pke and cr4_pks check in pkr_mask update
- squash PATCH 2 and PATCH 5 to make the dependencies read more clear
- v3: https://lore.kernel.org/lkml/20201105081805.5674-1-chenyi.qiang@intel.com/

v2->v3:
- No function changes since last submit
- rebase on the latest PKS kernel support:
  https://lore.kernel.org/lkml/20201102205320.1458656-1-ira.weiny@intel.com/
- add MSR_IA32_PKRS to the vmx_possible_passthrough_msrs[]
- RFC v2: https://lore.kernel.org/lkml/20201014021157.18022-1-chenyi.qiang@intel.com/

v1->v2:
- rebase on the latest PKS kernel support:
  https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3
- add a kvm-unit-tests for PKS
- add the check in kvm_init_msr_list for PKRS
- place the X86_CR4_PKS in mmu_role_bits in kvm_set_cr4
- add the support to expose VM_{ENTRY, EXIT}_LOAD_IA32_PKRS in nested
  VMX MSR
- RFC v1: https://lore.kernel.org/lkml/20200807084841.7112-1-chenyi.qiang@intel.com/

---

Chenyi Qiang (7):
  KVM: VMX: Introduce PKS VMCS fields
  KVM: VMX: Add proper cache tracking for PKRS
  KVM: X86: Expose IA32_PKRS MSR
  KVM: MMU: Rename the pkru to pkr
  KVM: MMU: Add support for PKS emulation
  KVM: VMX: Expose PKS to guest
  KVM: VMX: Enable PKS for nested VM

Lei Wang (1):
  KVM: MMU: Add helper function to get pkr bits

 arch/x86/include/asm/kvm_host.h |  17 +++--
 arch/x86/include/asm/vmx.h      |   6 ++
 arch/x86/kvm/cpuid.c            |  13 +++-
 arch/x86/kvm/kvm_cache_regs.h   |   7 ++
 arch/x86/kvm/mmu.h              |  29 +++----
 arch/x86/kvm/mmu/mmu.c          | 130 +++++++++++++++++++++++---------
 arch/x86/kvm/vmx/capabilities.h |   6 ++
 arch/x86/kvm/vmx/nested.c       |  36 ++++++++-
 arch/x86/kvm/vmx/vmcs.h         |   1 +
 arch/x86/kvm/vmx/vmcs12.c       |   2 +
 arch/x86/kvm/vmx/vmcs12.h       |   4 +
 arch/x86/kvm/vmx/vmx.c          |  85 +++++++++++++++++++--
 arch/x86/kvm/vmx/vmx.h          |  14 +++-
 arch/x86/kvm/x86.c              |   9 ++-
 arch/x86/kvm/x86.h              |   8 ++
 arch/x86/mm/pkeys.c             |   6 ++
 include/linux/pks.h             |   7 ++
 17 files changed, 301 insertions(+), 79 deletions(-)

-- 
2.25.1
Re: [PATCH v7 0/8] KVM: PKS Virtualization support
Posted by Wang, Lei 4 years ago
Kindly ping for the comments.

On 4/24/2022 6:15 PM, Lei Wang wrote:
> This patch series is based on top of v10 PKS core support kernel patchset:
> https://lore.kernel.org/lkml/20220419170649.1022246-1-ira.weiny@intel.com/
>
> ---
>
> Protection Keys for Supervisor Pages(PKS) is a feature that extends the
> Protection Keys architecture to support thread-specific permission
> restrictions on supervisor pages.
>
> PKS works similar to an existing feature named PKU(protecting user pages).
> They both perform an additional check after normal paging permission
> checks are done. Access or Writes can be disabled via a MSR update
> without TLB flushes when permissions changes. If violating this
> addional check, #PF occurs and PFEC.PK bit will be set.
>
> PKS introduces MSR IA32_PKRS to manage supervisor protection key
> rights. The MSR contains 16 pairs of ADi and WDi bits. Each pair
> advertises on a group of pages with the same key which is set in the
> leaf paging-structure entries(bits[62:59]). Currently, IA32_PKRS is not
> supported by XSAVES architecture.
>
> This patchset aims to add the virtualization of PKS in KVM. It
> implemented PKS CPUID enumeration, vmentry/vmexit configuration, MSR
> exposure, nested supported etc. Currently, PKS is not yet supported for
> shadow paging.
>
> Detailed information about PKS can be found in the latest Intel 64 and
> IA-32 Architectures Software Developer's Manual.
>
> ---
>
> Changelogs:
>
> v6->v7
> - Add documentation to note that it's nice-to-have cache tracking for PKRS,
>    and we also needn't hesitate to rip it out in the future if there's a strong
>    reason to drop the caching. (Sean)
> - Blindly reading PKRU/PKRS is wrong, fixed. (Sean)
> - Add a non-inline helper kvm_mmu_pkr_bits() to read PKR bits. (Sean)
> - Delete the comment for exposing the PKS because the pattern is common and the
>    behavior is self-explanatory. (Sean)
> - Add a helper vmx_set_host_pkrs() for setting host pkrs and rewrite the
>    related code for concise. (Sean)
> - Align an indentation in arch/x86/kvm/vmx/nested.c. (Sean)
> - Read the current PKRS if from_vmentry == false under the nested condition.
>    (Sean)
> - v6: https://lore.kernel.org/lkml/20220221080840.7369-1-chenyi.qiang@intel.com/
>
> v5->v6
> - PKRS is preserved on INIT. Add the PKRS reset operation in kvm_vcpu_reset.
>    (Sean)
> - Track the pkrs as u32. Add the code WARN on bits 64:32 being set in VMCS field.
>    (Sean)
> - Adjust the MSR intercept and entry/exit control in VMCS according to
>    guest CPUID. This resolve the issue when userspace re-enable this feature.
>    (Sean)
> - Split VMX restriction on PKS support(entry/exit load controls) out of
>    common x86. And put tdp restriction together with PKU in common x86.
>    (Sean)
> - Thanks for Sean to revise the comments in mmu.c related to
>    update_pkr_bitmap, which make it more clear for pkr bitmask cache usage.
> - v5: https://lore.kernel.org/lkml/20210811101126.8973-1-chenyi.qiang@intel.com/
>
> v4->v5
> - Make setting of MSR intercept/vmcs control bits not dependent on guest.CR4.PKS.
>    And set them if PKS is exposed to guest. (Suggested by Sean)
> - Add pkrs to standard register caching mechanism to help update
>    vcpu->arch.pkrs on demand. Add related helper functions. (Suggested by Sean)
> - Do the real pkrs update in VMCS field in vmx_vcpu_reset and
>    vmx_sync_vmcs_host_state(). (Sean)
> - Add a new mmu_role cr4_pks instead of smushing PKU and PKS together.
>    (Sean & Paolo)
> - v4: https://lore.kernel.org/lkml/20210205083706.14146-1-chenyi.qiang@intel.com/
>
> v3->v4
> - Make the MSR intercept and load-controls setting depend on CR4.PKS value
> - shadow the guest pkrs and make it usable in PKS emultion
> - add the cr4_pke and cr4_pks check in pkr_mask update
> - squash PATCH 2 and PATCH 5 to make the dependencies read more clear
> - v3: https://lore.kernel.org/lkml/20201105081805.5674-1-chenyi.qiang@intel.com/
>
> v2->v3:
> - No function changes since last submit
> - rebase on the latest PKS kernel support:
>    https://lore.kernel.org/lkml/20201102205320.1458656-1-ira.weiny@intel.com/
> - add MSR_IA32_PKRS to the vmx_possible_passthrough_msrs[]
> - RFC v2: https://lore.kernel.org/lkml/20201014021157.18022-1-chenyi.qiang@intel.com/
>
> v1->v2:
> - rebase on the latest PKS kernel support:
>    https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3
> - add a kvm-unit-tests for PKS
> - add the check in kvm_init_msr_list for PKRS
> - place the X86_CR4_PKS in mmu_role_bits in kvm_set_cr4
> - add the support to expose VM_{ENTRY, EXIT}_LOAD_IA32_PKRS in nested
>    VMX MSR
> - RFC v1: https://lore.kernel.org/lkml/20200807084841.7112-1-chenyi.qiang@intel.com/
>
> ---
>
> Chenyi Qiang (7):
>    KVM: VMX: Introduce PKS VMCS fields
>    KVM: VMX: Add proper cache tracking for PKRS
>    KVM: X86: Expose IA32_PKRS MSR
>    KVM: MMU: Rename the pkru to pkr
>    KVM: MMU: Add support for PKS emulation
>    KVM: VMX: Expose PKS to guest
>    KVM: VMX: Enable PKS for nested VM
>
> Lei Wang (1):
>    KVM: MMU: Add helper function to get pkr bits
>
>   arch/x86/include/asm/kvm_host.h |  17 +++--
>   arch/x86/include/asm/vmx.h      |   6 ++
>   arch/x86/kvm/cpuid.c            |  13 +++-
>   arch/x86/kvm/kvm_cache_regs.h   |   7 ++
>   arch/x86/kvm/mmu.h              |  29 +++----
>   arch/x86/kvm/mmu/mmu.c          | 130 +++++++++++++++++++++++---------
>   arch/x86/kvm/vmx/capabilities.h |   6 ++
>   arch/x86/kvm/vmx/nested.c       |  36 ++++++++-
>   arch/x86/kvm/vmx/vmcs.h         |   1 +
>   arch/x86/kvm/vmx/vmcs12.c       |   2 +
>   arch/x86/kvm/vmx/vmcs12.h       |   4 +
>   arch/x86/kvm/vmx/vmx.c          |  85 +++++++++++++++++++--
>   arch/x86/kvm/vmx/vmx.h          |  14 +++-
>   arch/x86/kvm/x86.c              |   9 ++-
>   arch/x86/kvm/x86.h              |   8 ++
>   arch/x86/mm/pkeys.c             |   6 ++
>   include/linux/pks.h             |   7 ++
>   17 files changed, 301 insertions(+), 79 deletions(-)
>
The current status of PKS virtualization
Posted by Ruihan Li 6 months ago
Hi,

I'm sorry to bother you by replying to the email from years ago. I would like
to learn about the current status of PKS virtualization.

In short, I tried to rebase this patch series on the latest kernel. The result
was a working kernel that supports PKS virtualization, which would be useful
for my purposes. Would PKS virtualization be accepted even if the kernel itself
does not use PKS?

Here's a longer explanation: I noticed that this patch series is built on top
of basic PKS support. Meanwhile, it appears that the basic PKS support "was
dropped after the main use case was rejected (pmem stray write protection)"
[1]. I suppose that's why this patch series won't be merged into the kernel?

 [1]: https://lore.kernel.org/lkml/3b3c941f1fb69d67706457a30cecc96bfde57353.camel@intel.com/

For my purposes, I don't need the Linux kernel to use PKS. I do want the kernel
to support PKS virtualization so that I can run another OS that requires PKS
support with the help of KVM. Fundamentally, I don't think this patch series
has to be built on top of basic PKS support. But I am unsure whether there is a
policy or convention that states virtualization support can only be added after
basic support.

One problem is that if the Linux kernel does not use PKS, we will be unable to
test PKS virtualization with a guest Linux kernel. However, given that we have
KVM unit test infrastructure, I believe we can find a way to properly test PKS
virtualization for its correctness?

I'd like to hear from you to know whether I understand things correctly. Thank
you in advance for any feedback.

Thanks,
Ruihan Li
Re: The current status of PKS virtualization
Posted by Paolo Bonzini 6 months ago
Il lun 10 nov 2025, 17:32 Ruihan Li <lrh2000@pku.edu.cn> ha scritto:
>
> Hi,
>
> I'm sorry to bother you by replying to the email from years ago. I would like
> to learn about the current status of PKS virtualization.
>
> In short, I tried to rebase this patch series on the latest kernel. The result
> was a working kernel that supports PKS virtualization, which would be useful
> for my purposes. Would PKS virtualization be accepted even if the kernel itself
> does not use PKS?


Yes, I think it should.

Virtualized PKS does not depend on host PKS, because it uses an MSR
rather than XSAVE areas (which are harder to add to KVM without host
support).

> Fundamentally, I don't think this patch series
> has to be built on top of basic PKS support. But I am unsure whether there is a
> policy or convention that states virtualization support can only be added after
> basic support.

No, there is none. In fact, the only dependency of the original series
on host PKS was for functions to read/write the host PKRS MSR. Without
host PKS support it could be loaded with all-ones, or technically it
could even be left with the guest value. Since the host clears
CR4.PKS, the actual value won't matter.

> One problem is that if the Linux kernel does not use PKS, we will be unable to
> test PKS virtualization with a guest Linux kernel. However, given that we have
> KVM unit test infrastructure, I believe we can find a way to properly test PKS
> virtualization for its correctness?

I agree. Thanks!

Paolo

> I'd like to hear from you to know whether I understand things correctly. Thank
> you in advance for any feedback.
>
> Thanks,
> Ruihan Li
>
Re: The current status of PKS virtualization
Posted by Ruihan Li 6 months ago
On Mon, Nov 10, 2025 at 09:44:36PM +0100, Paolo Bonzini wrote:
> No, there is none. In fact, the only dependency of the original series
> on host PKS was for functions to read/write the host PKRS MSR. Without
> host PKS support it could be loaded with all-ones, or technically it
> could even be left with the guest value. Since the host clears
> CR4.PKS, the actual value won't matter.

Thanks a lot for your quick and detailed reply! That's good news for me.
Then I plan to spend some time tidying up my rebased version to see if I
can get PKS virtualization upstreamed.

As a side note, I can no longer contact the original author of this
patch series. At least, my previous email was returned by Intel's
server. However, since more than three years have passed, I assume it's
okay for me to post a new version after I have the code ready.

Thanks,
Ruihan Li
Re: The current status of PKS virtualization
Posted by Chenyi Qiang 6 months ago

On 11/11/2025 9:14 AM, Ruihan Li wrote:
> On Mon, Nov 10, 2025 at 09:44:36PM +0100, Paolo Bonzini wrote:
>> No, there is none. In fact, the only dependency of the original series
>> on host PKS was for functions to read/write the host PKRS MSR. Without
>> host PKS support it could be loaded with all-ones, or technically it
>> could even be left with the guest value. Since the host clears
>> CR4.PKS, the actual value won't matter.
> 
> Thanks a lot for your quick and detailed reply! That's good news for me.
> Then I plan to spend some time tidying up my rebased version to see if I
> can get PKS virtualization upstreamed.
> 
> As a side note, I can no longer contact the original author of this
> patch series. At least, my previous email was returned by Intel's
> server. However, since more than three years have passed, I assume it's
> okay for me to post a new version after I have the code ready.

Lei has left Intel so the mail address in unreachable.

And as you found, we dropped the PKS KVM upstream along with the base PKS support
due to no valid use case in Linux. You can feel free to continue the upstream work.
But I'm not sure if your use case is compelling enough to be accepted by maintainers.

> 
> Thanks,
> Ruihan Li
>
Re: The current status of PKS virtualization
Posted by Ruihan Li 6 months ago
On Tue, Nov 11, 2025 at 01:40:08PM +0800, Chenyi Qiang wrote:
> Lei has left Intel so the mail address in unreachable.
> 
> And as you found, we dropped the PKS KVM upstream along with the base PKS support
> due to no valid use case in Linux. You can feel free to continue the upstream work.

Thanks for the reply and for the information!

By the way, I'm just curious (feel free to ignore this question): Is
there an on-list discussion that rejects the originally proposed PKS use
cases?

I found that pmem stray write protection was rejected [1], but there is
no reason given nor any reference provided. After searching the list, I
found the latest patch series that attempts to add pmem stray protection
[2]. However, I didn't find any discussion rejecting the use case. Maybe
the discussion happened off the list? Or did I miss something?

 [1]: https://lore.kernel.org/lkml/3b3c941f1fb69d67706457a30cecc96bfde57353.camel@intel.com/
 [2]: https://lore.kernel.org/lkml/20220419170649.1022246-1-ira.weiny@intel.com/

> But I'm not sure if your use case is compelling enough to be accepted by maintainers.

Yeah, I have the same concern. So, I asked about the current status
first. So far, Paolo's reply seems positive, so maybe I can try it out
if I can find enough time.

Thanks,
Ruihan Li
Re: The current status of PKS virtualization
Posted by Chenyi Qiang 5 months, 4 weeks ago

On 11/11/2025 10:24 PM, Ruihan Li wrote:
> On Tue, Nov 11, 2025 at 01:40:08PM +0800, Chenyi Qiang wrote:
>> Lei has left Intel so the mail address in unreachable.
>>
>> And as you found, we dropped the PKS KVM upstream along with the base PKS support
>> due to no valid use case in Linux. You can feel free to continue the upstream work.
> 
> Thanks for the reply and for the information!
> 
> By the way, I'm just curious (feel free to ignore this question): Is
> there an on-list discussion that rejects the originally proposed PKS use
> cases?
> 
> I found that pmem stray write protection was rejected [1], but there is
> no reason given nor any reference provided. After searching the list, I
> found the latest patch series that attempts to add pmem stray protection
> [2]. However, I didn't find any discussion rejecting the use case. Maybe
> the discussion happened off the list? Or did I miss something?

The discussion happened off the list. The customer demand for this feature
declined to such an extent that it was not worth working on any longer.