From nobody Mon Nov 10 19:09:07 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1558322514; cv=none; d=zoho.com; s=zohoarc; b=bgbW5IEhKMbxo2QXva1l7ia/NSQYrOwiws+5q7E4s+i8NRBYRT0C9Y5y3C3HMBLIlQ6pWQky/2Vq9V3Zthgx7s459McSOjCx4IpCI1/nL1ltJBBERcF5nccvtIcjap92fFHyTDCt1cJNvXbHB/6AIYW1tngPJZK7ARbHoamTdXY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1558322514; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=Lamhexsg6irZTw1ICXHgpLijQV7EdbgHfyq29hIwGVY=; b=KUrg0NPlIpfmIi2wxAlDEsIzRgAMXOfjdVAmWKJ8NF0X5JsR6zx8NEQlWHCVwVnol/qx7s/8VCtmNchzeKbwSyygap7wRtHQEQv3tjQua2T4Rj8vUglTgaM5nPjl0BqgWCl8rz+ZWJIHCmoT9fgSuxDclzq/dsKT2cQP8wBqI5g= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1558322514580192.7136500904685; Sun, 19 May 2019 20:21:54 -0700 (PDT) Received: from localhost ([127.0.0.1]:57143 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hSYrc-0003Bo-86 for importer@patchew.org; Sun, 19 May 2019 23:21:24 -0400 Received: from eggs.gnu.org ([209.51.188.92]:49269) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hSYg4-0002Hn-AO for qemu-devel@nongnu.org; Sun, 19 May 2019 23:09:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hSYg2-0001pL-FA for qemu-devel@nongnu.org; Sun, 19 May 2019 23:09:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40426) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hSYg0-0001oa-Hn for qemu-devel@nongnu.org; Sun, 19 May 2019 23:09:26 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 73C46C01DDEF for ; Mon, 20 May 2019 03:09:23 +0000 (UTC) Received: from xz-x1.nay.redhat.com (dhcp-15-205.nay.redhat.com [10.66.15.205]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1087D100200A; Mon, 20 May 2019 03:09:18 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Mon, 20 May 2019 11:08:36 +0800 Message-Id: <20190520030839.6795-13-peterx@redhat.com> In-Reply-To: <20190520030839.6795-1-peterx@redhat.com> References: <20190520030839.6795-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 20 May 2019 03:09:23 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 12/15] kvm: Support KVM_CLEAR_DIRTY_LOG X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Laurent Vivier , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Juan Quintela Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Firstly detect the interface using KVM_CAP_MANUAL_DIRTY_LOG_PROTECT and mark it. When failed to enable the new feature we'll fall back to the old sync. Provide the log_clear() hook for the memory listeners for both address spaces of KVM (normal system memory, and SMM) and deliever the clear message to kernel. Signed-off-by: Peter Xu --- accel/kvm/kvm-all.c | 180 +++++++++++++++++++++++++++++++++++++++++ accel/kvm/trace-events | 1 + 2 files changed, 181 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index a19535bf6a..062bf8b5b0 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -91,6 +91,7 @@ struct KVMState int many_ioeventfds; int intx_set_mask; bool sync_mmu; + bool manual_dirty_log_protect; /* The man page (and posix) say ioctl numbers are signed int, but * they're not. Linux, glibc and *BSD all treat ioctl numbers as * unsigned, and treating them as signed here can break things */ @@ -536,6 +537,157 @@ out: return ret; } =20 +/* Alignment requirement for KVM_CLEAR_DIRTY_LOG - 64 pages */ +#define KVM_CLEAR_LOG_SHIFT 6 +#define KVM_CLEAR_LOG_ALIGN (qemu_real_host_page_size << KVM_CLEAR_LOG_SH= IFT) +#define KVM_CLEAR_LOG_MASK (-KVM_CLEAR_LOG_ALIGN) + +/** + * kvm_physical_log_clear - Clear the kernel's dirty bitmap for range + * + * NOTE: this will be a no-op if we haven't enabled manual dirty log + * protection in the host kernel because in that case this operation + * will be done within log_sync(). + * + * @kml: the kvm memory listener + * @section: the memory range to clear dirty bitmap + */ +static int kvm_physical_log_clear(KVMMemoryListener *kml, + MemoryRegionSection *section) +{ + KVMState *s =3D kvm_state; + struct kvm_clear_dirty_log d; + uint64_t start, end, bmap_start, start_delta, bmap_npages, size; + unsigned long *bmap_clear =3D NULL, psize =3D qemu_real_host_page_size; + KVMSlot *mem =3D NULL; + int ret, i; + + if (!s->manual_dirty_log_protect) { + /* No need to do explicit clear */ + return 0; + } + + start =3D section->offset_within_address_space; + size =3D int128_get64(section->size); + + if (!size) { + /* Nothing more we can do... */ + return 0; + } + + kvm_slots_lock(kml); + + /* Find any possible slot that covers the section */ + for (i =3D 0; i < s->nr_slots; i++) { + mem =3D &kml->slots[i]; + if (mem->start_addr <=3D start && + start + size <=3D mem->start_addr + mem->memory_size) { + break; + } + } + + /* + * We should always find one memslot until this point, otherwise + * there could be something wrong from the upper layer + */ + assert(mem && i !=3D s->nr_slots); + + /* + * We need to extend either the start or the size or both to + * satisfy the KVM interface requirement. Firstly, do the start + * page alignment on 64 host pages + */ + bmap_start =3D (start - mem->start_addr) & KVM_CLEAR_LOG_MASK; + start_delta =3D start - mem->start_addr - bmap_start; + bmap_start /=3D psize; + + /* + * The kernel interface has restriction on the size too, that either: + * + * (1) the size is 64 host pages aligned (just like the start), or + * (2) the size fills up until the end of the KVM memslot. + */ + bmap_npages =3D DIV_ROUND_UP(size + start_delta, KVM_CLEAR_LOG_ALIGN) + << KVM_CLEAR_LOG_SHIFT; + end =3D mem->memory_size / psize; + if (bmap_npages > end - bmap_start) { + bmap_npages =3D end - bmap_start; + } + start_delta /=3D psize; + + /* + * Prepare the bitmap to clear dirty bits. Here we must guarantee + * that we won't clear any unknown dirty bits otherwise we might + * accidentally clear some set bits which are not yet synced from + * the kernel into QEMU's bitmap, then we'll lose track of the + * guest modifications upon those pages (which can directly lead + * to guest data loss or panic after migration). + * + * Layout of the KVMSlot.dirty_bmap: + * + * |<-------- bmap_npages -----------..>| + * [1] + * start_delta size + * |----------------|-------------|------------------|------------| + * ^ ^ ^ ^ + * | | | | + * start bmap_start (start) end + * of memslot of memslot + * + * [1] bmap_npages can be aligned to either 64 pages or the end of slot + */ + + assert(bmap_start % BITS_PER_LONG =3D=3D 0); + if (start_delta) { + /* Slow path - we need to manipulate a temp bitmap */ + bmap_clear =3D bitmap_new(bmap_npages); + bitmap_copy_with_src_offset(bmap_clear, mem->dirty_bmap, + bmap_start, start_delta + size / psize= ); + /* + * We need to fill the holes at start because that was not + * specified by the caller and we extended the bitmap only for + * 64 pages alignment + */ + bitmap_clear(bmap_clear, 0, start_delta); + d.dirty_bitmap =3D bmap_clear; + } else { + /* Fast path - start address aligns well with BITS_PER_LONG */ + d.dirty_bitmap =3D mem->dirty_bmap + BIT_WORD(bmap_start); + } + + d.first_page =3D bmap_start; + /* It should never overflow. If it happens, say something */ + assert(bmap_npages <=3D UINT32_MAX); + d.num_pages =3D bmap_npages; + d.slot =3D mem->slot | (kml->as_id << 16); + + if (kvm_vm_ioctl(s, KVM_CLEAR_DIRTY_LOG, &d) =3D=3D -1) { + ret =3D -errno; + error_report("%s: KVM_CLEAR_DIRTY_LOG failed, slot=3D%d, " + "start=3D0x%"PRIx64", size=3D0x%"PRIx32", errno=3D%d", + __func__, d.slot, (uint64_t)d.first_page, + (uint32_t)d.num_pages, ret); + } else { + ret =3D 0; + trace_kvm_clear_dirty_log(d.slot, d.first_page, d.num_pages); + } + + /* + * After we have updated the remote dirty bitmap, we update the + * cached bitmap as well for the memslot, then if another user + * clears the same region we know we shouldn't clear it again on + * the remote otherwise it's data loss as well. + */ + bitmap_clear(mem->dirty_bmap, bmap_start + start_delta, + size / psize); + /* This handles the NULL case well */ + g_free(bmap_clear); + + kvm_slots_unlock(kml); + + return ret; +} + static void kvm_coalesce_mmio_region(MemoryListener *listener, MemoryRegionSection *secion, hwaddr start, hwaddr size) @@ -888,6 +1040,22 @@ static void kvm_log_sync(MemoryListener *listener, } } =20 +static void kvm_log_clear(MemoryListener *listener, + MemoryRegionSection *section) +{ + KVMMemoryListener *kml =3D container_of(listener, KVMMemoryListener, l= istener); + int r; + + r =3D kvm_physical_log_clear(kml, section); + if (r < 0) { + error_report_once("%s: kvm log clear failed: mr=3D%s " + "offset=3D%"HWADDR_PRIx" size=3D%"PRIx64, __func= __, + section->mr->name, section->offset_within_region, + int128_get64(section->size)); + abort(); + } +} + static void kvm_mem_ioeventfd_add(MemoryListener *listener, MemoryRegionSection *section, bool match_data, uint64_t data, @@ -975,6 +1143,7 @@ void kvm_memory_listener_register(KVMState *s, KVMMemo= ryListener *kml, kml->listener.log_start =3D kvm_log_start; kml->listener.log_stop =3D kvm_log_stop; kml->listener.log_sync =3D kvm_log_sync; + kml->listener.log_clear =3D kvm_log_clear; kml->listener.priority =3D 10; =20 memory_listener_register(&kml->listener, as); @@ -1699,6 +1868,17 @@ static int kvm_init(MachineState *ms) s->coalesced_pio =3D s->coalesced_mmio && kvm_check_extension(s, KVM_CAP_COALESCED_PIO); =20 + s->manual_dirty_log_protect =3D + kvm_check_extension(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2); + if (s->manual_dirty_log_protect) { + ret =3D kvm_vm_enable_cap(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, 0,= 1); + if (ret) { + warn_report("Trying to enable KVM_CAP_MANUAL_DIRTY_LOG_PROTECT= " + "but failed. Falling back to the legacy mode. "); + s->manual_dirty_log_protect =3D false; + } + } + #ifdef KVM_CAP_VCPU_EVENTS s->vcpu_events =3D kvm_check_extension(s, KVM_CAP_VCPU_EVENTS); #endif diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events index 33c5b1b3af..4fb6e59d19 100644 --- a/accel/kvm/trace-events +++ b/accel/kvm/trace-events @@ -15,4 +15,5 @@ kvm_irqchip_release_virq(int virq) "virq %d" kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, u= int32_t size, bool datamatch) "fd: %d @0x%" PRIx64 " val=3D0x%x assign: %d = size: %d match: %d" kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, ui= nt32_t size, bool datamatch) "fd: %d @0x%x val=3D0x%x assign: %d size: %d m= atch: %d" kvm_set_user_memory(uint32_t slot, uint32_t flags, uint64_t guest_phys_add= r, uint64_t memory_size, uint64_t userspace_addr, int ret) "Slot#%d flags= =3D0x%x gpa=3D0x%"PRIx64 " size=3D0x%"PRIx64 " ua=3D0x%"PRIx64 " ret=3D%d" +kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"= PRId32" start 0x%"PRIx64" size 0x%"PRIx32 =20 --=20 2.17.1