[Qemu-devel] [PATCH] KVM: MMU: fast cleanup D bit based on fast write protect

Zhuangyanying posted 1 patch 5 years, 3 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
mmu.c | 5 ++++-
vmx.c | 3 +--
2 files changed, 5 insertions(+), 3 deletions(-)
[Qemu-devel] [PATCH] KVM: MMU: fast cleanup D bit based on fast write protect
Posted by Zhuangyanying 5 years, 3 months ago
From: Zhuang Yanying <ann.zhuangyanying@huawei.com>

Recently I tested live-migration with large-memory guests, find vcpu may hang for a long time while starting migration, such as 9s for 2048G(linux-4.20.1+qemu-3.1.0).
The reason is memory_global_dirty_log_start() taking too long, and the vcpu is waiting for BQL. The page-by-page D bit clearup is the main time consumption.
I think that the idea of "KVM: MMU: fast write protect" by xiaoguangrong, especially the function kvm_mmu_write_protect_all_pages(), is very helpful.
After a little modifcation, on his patch, can solve this problem, 9s to 0.5s.

At the begining of live migration, write protection is only applied to the top-level SPTE. Then the write from vm trigger the EPT violation, with for_each_shadow_entry write protection is performed at dirct_map.
Finally the Dirty bit of the target page(at level 1 page table) is cleared, and the dirty page tracking is started. Of coure, the page where GPA is located is marked dirty when mmu_set_spte.
A similar implementation on xen, just emt instead of write protection.

What do you think about this solution?
---
 mmu.c | 5 ++++-
 vmx.c | 3 +--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/mmu.c b/mmu.c
index b079d74..f49d316 100755
--- a/mmu.c
+++ b/mmu.c
@@ -3210,7 +3210,10 @@ static bool mmu_load_shadow_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 			break;
 
                 if (is_last_spte(spte, sp->role.level)) {
-			flush |= spte_write_protect(sptep, false);
+			if (sp->role.level == PT_PAGE_TABLE_LEVEL)
+				flush |= spte_clear_dirty(sptep);
+			else
+				flush |= spte_write_protect(sptep, false);
 			continue;
                 }
 
diff --git a/vmx.c b/vmx.c
index 95784bc..7ec717f 100755
--- a/vmx.c
+++ b/vmx.c
@@ -14421,8 +14421,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_slot_enable_log_dirty(struct kvm *kvm,
 				     struct kvm_memory_slot *slot)
 {
-	kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
-	kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
+	kvm_mmu_write_protect_all_pages(kvm, true);
 }
 
 static void vmx_slot_disable_log_dirty(struct kvm *kvm,
-- 
1.8.3.1



Re: [Qemu-devel] [PATCH] KVM: MMU: fast cleanup D bit based on fast write protect
Posted by Paolo Bonzini 5 years, 3 months ago
On 12/01/19 09:20, Zhuangyanying wrote:
> From: Zhuang Yanying <ann.zhuangyanying@huawei.com>
> 
> Recently I tested live-migration with large-memory guests, find vcpu may hang for a long time while starting migration, such as 9s for 2048G(linux-4.20.1+qemu-3.1.0).
> The reason is memory_global_dirty_log_start() taking too long, and the vcpu is waiting for BQL. The page-by-page D bit clearup is the main time consumption.
> I think that the idea of "KVM: MMU: fast write protect" by xiaoguangrong, especially the function kvm_mmu_write_protect_all_pages(), is very helpful.
> After a little modifcation, on his patch, can solve this problem, 9s to 0.5s.
> 
> At the begining of live migration, write protection is only applied to the top-level SPTE. Then the write from vm trigger the EPT violation, with for_each_shadow_entry write protection is performed at dirct_map.
> Finally the Dirty bit of the target page(at level 1 page table) is cleared, and the dirty page tracking is started. Of coure, the page where GPA is located is marked dirty when mmu_set_spte.
> A similar implementation on xen, just emt instead of write protection.
> 
> What do you think about this solution?

What tree does this patch apply to?

Paolo

> ---
>  mmu.c | 5 ++++-
>  vmx.c | 3 +--
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mmu.c b/mmu.c
> index b079d74..f49d316 100755
> --- a/mmu.c
> +++ b/mmu.c
> @@ -3210,7 +3210,10 @@ static bool mmu_load_shadow_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  			break;
>  
>                  if (is_last_spte(spte, sp->role.level)) {
> -			flush |= spte_write_protect(sptep, false);
> +			if (sp->role.level == PT_PAGE_TABLE_LEVEL)
> +				flush |= spte_clear_dirty(sptep);
> +			else
> +				flush |= spte_write_protect(sptep, false);
>  			continue;
>                  }
>  
> diff --git a/vmx.c b/vmx.c
> index 95784bc..7ec717f 100755
> --- a/vmx.c
> +++ b/vmx.c
> @@ -14421,8 +14421,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
>  static void vmx_slot_enable_log_dirty(struct kvm *kvm,
>  				     struct kvm_memory_slot *slot)
>  {
> -	kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
> -	kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
> +	kvm_mmu_write_protect_all_pages(kvm, true);
>  }
>  
>  static void vmx_slot_disable_log_dirty(struct kvm *kvm,
>