[PATCH v2 0/3] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure

tabba@google.com posted 3 patches 1 week, 2 days ago
arch/arm64/kvm/mmu.c | 39 +++++++++++++++++++++++++++++++++------
1 file changed, 33 insertions(+), 6 deletions(-)
[PATCH v2 0/3] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
Posted by tabba@google.com 1 week, 2 days ago
Hi folks,

The first two started as bugs I found testing Sashiko locally with
fixes to review-prompts. The third grew out of the v1 discussion.

share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c maintain
a host-side RB-tree mirroring the set of pages shared with EL2. The
hypercalls they wrap can fail (page-state mismatch, EL2 refcount still
held), and neither the per-pfn helpers nor the multi-page wrappers
cleaned up correctly on failure:

- share_pfn_hyp() left its tracking node in the tree on failure,
  leaking the allocation and presenting a phantom share to a later
  unshare (patch 1).

- unshare_pfn_hyp() erased its tracking node before the hypercall, so
  on failure the host lost its record while EL2 still owned the share
  (patch 2).

- kvm_share_hyp() returned on the first per-page failure, stranding the
  pages already shared by that call: the caller treats the whole range
  as failed and never unshares them (patch 3).

As Vincent and Marc noted on v1, none of this compromises isolation. A
page that cannot be unshared is simply leaked: it stays shared with the
hypervisor and is no longer reusable for pKVM. So kvm_share_hyp() now
rolls back on failure, and the unshare WARN_ON()s are left non-fatal
and documented rather than promoted to BUG_ON(). The system keeps
running, and only later pKVM reuse of a leaked page would fail. We do
not expect any of these paths to trigger in practice.

Severity is low and this can wait for 7.2. Patch 3 builds on patch 2,
otherwise they are independent.

Changes since v1:
 - New patch 3: roll back partial shares in kvm_share_hyp(); document
   the deliberate leak-on-WARN in kvm_unshare_hyp() (Vincent, Marc).
 - Patches 1 and 2 functionally unchanged (patch 2 gains the call-site
   comment).
 - v1: https://lore.kernel.org/all/20260529074341.2271950-1-tabba@google.com/

Cheers,
/fuad

Fuad Tabba (3):
  KVM: arm64: Free hyp-share tracking node when share hypercall fails
  KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure
  KVM: arm64: Roll back partial shares on kvm_share_hyp() failure

 arch/arm64/kvm/mmu.c | 39 +++++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

-- 
2.54.0.929.g9b7fa37559-goog
Re: [PATCH v2 0/3] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
Posted by Vincent Donnefort 5 days ago
On Fri, May 29, 2026 at 01:17:52PM +0100, tabba@google.com wrote:
> Hi folks,
> 
> The first two started as bugs I found testing Sashiko locally with
> fixes to review-prompts. The third grew out of the v1 discussion.
> 
> share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c maintain
> a host-side RB-tree mirroring the set of pages shared with EL2. The
> hypercalls they wrap can fail (page-state mismatch, EL2 refcount still
> held), and neither the per-pfn helpers nor the multi-page wrappers
> cleaned up correctly on failure:
> 
> - share_pfn_hyp() left its tracking node in the tree on failure,
>   leaking the allocation and presenting a phantom share to a later
>   unshare (patch 1).
> 
> - unshare_pfn_hyp() erased its tracking node before the hypercall, so
>   on failure the host lost its record while EL2 still owned the share
>   (patch 2).
> 
> - kvm_share_hyp() returned on the first per-page failure, stranding the
>   pages already shared by that call: the caller treats the whole range
>   as failed and never unshares them (patch 3).
> 
> As Vincent and Marc noted on v1, none of this compromises isolation. A
> page that cannot be unshared is simply leaked: it stays shared with the
> hypervisor and is no longer reusable for pKVM. So kvm_share_hyp() now
> rolls back on failure, and the unshare WARN_ON()s are left non-fatal
> and documented rather than promoted to BUG_ON(). The system keeps
> running, and only later pKVM reuse of a leaked page would fail. We do
> not expect any of these paths to trigger in practice.
> 
> Severity is low and this can wait for 7.2. Patch 3 builds on patch 2,
> otherwise they are independent.
> 
> Changes since v1:
>  - New patch 3: roll back partial shares in kvm_share_hyp(); document
>    the deliberate leak-on-WARN in kvm_unshare_hyp() (Vincent, Marc).
>  - Patches 1 and 2 functionally unchanged (patch 2 gains the call-site
>    comment).
>  - v1: https://lore.kernel.org/all/20260529074341.2271950-1-tabba@google.com/
> 
> Cheers,
> /fuad

For the whole series:

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>

> 
> Fuad Tabba (3):
>   KVM: arm64: Free hyp-share tracking node when share hypercall fails
>   KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure
>   KVM: arm64: Roll back partial shares on kvm_share_hyp() failure
> 
>  arch/arm64/kvm/mmu.c | 39 +++++++++++++++++++++++++++++++++------
>  1 file changed, 33 insertions(+), 6 deletions(-)
> 
> -- 
> 2.54.0.929.g9b7fa37559-goog
>