From nobody Fri Nov 14 02:16:23 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=nvidia.com ARC-Seal: i=1; a=rsa-sha256; t=1584037875; cv=none; d=zohomail.com; s=zohoarc; b=V4x+Ri5YrGwRMdUvQT4a3Tx+4rB6DXJKBZnArvhBE5uYjAKcMUXNChAL7i1+G/ezUSooFIl0LJxSpL7PDQh4mPtp6MznG+qjZ+L2DeWR17zEzTo/uMjkppjKKRZOn+D4eGC5Cus8PDIdZ4rkVEfTxmKvPB38aCMeHkbl1nHe14Q= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1584037875; h=Content-Type:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=V1wb9ECL52xyqZJb9MjOEYW9QJMqHRRswX5rQIgY5tg=; b=BtYIMiau9Djw4o42nGZ7qAnUjWxTgkj2CAgRQ59LkmTOUGlZgmvNaEnZTsVdGkFWNP5HJwfQC6k+d198v0zC2PiDQ9KOAKqvblGBZQR4IVMAk7GakGC6Wmp1yO8WF703l8fqg5WMXDU/77demAGSPRwj7MCltFvCOPP/ASQLwak= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1584037875263887.151889430958; Thu, 12 Mar 2020 11:31:15 -0700 (PDT) Received: from localhost ([::1]:47744 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jCSbx-0001S1-0E for importer@patchew.org; Thu, 12 Mar 2020 14:31:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57277) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jCSYO-0005QP-7W for qemu-devel@nongnu.org; Thu, 12 Mar 2020 14:27:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jCSYK-00086r-DM for qemu-devel@nongnu.org; Thu, 12 Mar 2020 14:27:32 -0400 Received: from hqnvemgate25.nvidia.com ([216.228.121.64]:13119) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jCSYK-00085u-6j for qemu-devel@nongnu.org; Thu, 12 Mar 2020 14:27:28 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 12 Mar 2020 11:26:41 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Thu, 12 Mar 2020 11:27:27 -0700 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 12 Mar 2020 18:27:26 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 12 Mar 2020 18:27:20 +0000 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Thu, 12 Mar 2020 11:27:27 -0700 From: Kirti Wankhede To: , Subject: [PATCH v13 Kernel 4/7] vfio iommu: Implementation of ioctl to for dirty pages tracking. Date: Thu, 12 Mar 2020 23:23:24 +0530 Message-ID: <1584035607-23166-5-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1584035607-23166-1-git-send-email-kwankhede@nvidia.com> References: <1584035607-23166-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1584037601; bh=V1wb9ECL52xyqZJb9MjOEYW9QJMqHRRswX5rQIgY5tg=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=nlVNDTnrd1p/dE9neMO+ojy6uVv6YZfUw2X5syWBjRj6ScVYPZeP76iQ9wftSEAu+ +krbU0Qngrb+zk5iVGOI7oWrCo25FWDqdO6PU0G4texxgCe36axKAr21r9Qzo7tPiu u+vm6sx5DK5fpzjoR5hlYN27015NDjGOZneBg0ynznYc25bgDNZVJs2ZU8Yd8LhVjb z2APpRYT+wWVl6231I2L6aSGAMqBrBAM472tFxvC2kOJK5kxbgcWdOAGNF6VXWJfET EwqLlxN9YDCPn4or5JaViOGAWf4PJp1eVlC5NdsoLI2LP/cBW3EQYHA6uYkLCaQ5I2 edn5SXM3JGLwg== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.64 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations: - Start pinned and unpinned pages tracking while migration is active - Stop pinned and unpinned dirty pages tracking. This is also used to stop dirty pages tracking if migration failed or cancelled. - Get dirty pages bitmap. This ioctl returns bitmap of dirty pages, its user space application responsibility to copy content of dirty pages from source to destination during migration. To prevent DoS attack, memory for bitmap is allocated per vfio_dma structure. Bitmap size is calculated considering smallest supported page size. Bitmap is allocated when dirty logging is enabled for those vfio_dmas whose vpfn list is not empty or whole range is mapped, in case of pass-through device. Bitmap is populated for already pinned pages when bitmap is allocated for a vfio_dma with the smallest supported page size. Update bitmap from pinning and unpinning functions. When user application queries bitmap, check if requested page size is same as page size used to populated bitmap. If it is equal, copy bitmap, but if not equal, return error. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 243 ++++++++++++++++++++++++++++++++++++= +++- 1 file changed, 237 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type= 1.c index d386461e5d11..435e84269a28 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -70,6 +70,7 @@ struct vfio_iommu { unsigned int dma_avail; bool v2; bool nesting; + bool dirty_page_tracking; }; =20 struct vfio_domain { @@ -90,6 +91,7 @@ struct vfio_dma { bool lock_cap; /* capable(CAP_IPC_LOCK) */ struct task_struct *task; struct rb_root pfn_list; /* Ex-user pinned pfn list */ + unsigned long *bitmap; }; =20 struct vfio_group { @@ -125,6 +127,7 @@ struct vfio_regions { (!list_empty(&iommu->domain_list)) =20 static int put_pfn(unsigned long pfn, int prot); +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu); =20 /* * This code handles mapping and unmapping of user data buffers @@ -174,6 +177,76 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, = struct vfio_dma *old) rb_erase(&old->node, &iommu->dma_list); } =20 +static inline unsigned long dirty_bitmap_bytes(unsigned int npages) +{ + if (!npages) + return 0; + + return ALIGN(npages, BITS_PER_LONG) / sizeof(unsigned long); +} + +static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, unsigned long pgsiz= e) +{ + if (!RB_EMPTY_ROOT(&dma->pfn_list) || dma->iommu_mapped) { + unsigned long npages =3D dma->size / pgsize; + + dma->bitmap =3D kvzalloc(dirty_bitmap_bytes(npages), GFP_KERNEL); + if (!dma->bitmap) + return -ENOMEM; + } + return 0; +} + +static int vfio_dma_all_bitmap_alloc(struct vfio_iommu *iommu, + unsigned long pgsize) +{ + struct rb_node *n =3D rb_first(&iommu->dma_list); + int ret; + + for (; n; n =3D rb_next(n)) { + struct vfio_dma *dma =3D rb_entry(n, struct vfio_dma, node); + struct rb_node *n; + + ret =3D vfio_dma_bitmap_alloc(dma, pgsize); + if (ret) { + struct rb_node *p =3D rb_prev(n); + + for (; p; p =3D rb_prev(p)) { + struct vfio_dma *dma =3D rb_entry(n, + struct vfio_dma, node); + + kfree(dma->bitmap); + dma->bitmap =3D NULL; + } + return ret; + } + + if (!dma->bitmap) + continue; + + for (n =3D rb_first(&dma->pfn_list); n; n =3D rb_next(n)) { + struct vfio_pfn *vpfn =3D rb_entry(n, struct vfio_pfn, + node); + + bitmap_set(dma->bitmap, + (vpfn->iova - dma->iova) / pgsize, 1); + } + } + return 0; +} + +static void vfio_dma_all_bitmap_free(struct vfio_iommu *iommu) +{ + struct rb_node *n =3D rb_first(&iommu->dma_list); + + for (; n; n =3D rb_next(n)) { + struct vfio_dma *dma =3D rb_entry(n, struct vfio_dma, node); + + kfree(dma->bitmap); + dma->bitmap =3D NULL; + } +} + /* * Helper Functions for host iova-pfn list */ @@ -254,12 +327,16 @@ static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct= vfio_dma *dma, return vpfn; } =20 -static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *v= pfn) +static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *v= pfn, + bool do_tracking, unsigned long pgsize) { int ret =3D 0; =20 vpfn->ref_count--; if (!vpfn->ref_count) { + if (do_tracking && dma->bitmap) + bitmap_set(dma->bitmap, + (vpfn->iova - dma->iova) / pgsize, 1); ret =3D put_pfn(vpfn->pfn, dma->prot); vfio_remove_from_pfn_list(dma, vpfn); } @@ -484,7 +561,8 @@ static int vfio_pin_page_external(struct vfio_dma *dma,= unsigned long vaddr, } =20 static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova, - bool do_accounting) + bool do_accounting, bool do_tracking, + unsigned long pgsize) { int unlocked; struct vfio_pfn *vpfn =3D vfio_find_vpfn(dma, iova); @@ -492,7 +570,7 @@ static int vfio_unpin_page_external(struct vfio_dma *dm= a, dma_addr_t iova, if (!vpfn) return 0; =20 - unlocked =3D vfio_iova_put_vfio_pfn(dma, vpfn); + unlocked =3D vfio_iova_put_vfio_pfn(dma, vpfn, do_tracking, pgsize); =20 if (do_accounting) vfio_lock_acct(dma, -unlocked, true); @@ -563,9 +641,26 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, =20 ret =3D vfio_add_to_pfn_list(dma, iova, phys_pfn[i]); if (ret) { - vfio_unpin_page_external(dma, iova, do_accounting); + vfio_unpin_page_external(dma, iova, do_accounting, + false, 0); goto pin_unwind; } + + if (iommu->dirty_page_tracking) { + unsigned long pgshift =3D + __ffs(vfio_pgsize_bitmap(iommu)); + + if (!dma->bitmap) { + ret =3D vfio_dma_bitmap_alloc(dma, 1 << pgshift); + if (ret) { + vfio_unpin_page_external(dma, iova, + do_accounting, false, 0); + goto pin_unwind; + } + } + bitmap_set(dma->bitmap, + (vpfn->iova - dma->iova) >> pgshift, 1); + } } =20 ret =3D i; @@ -578,7 +673,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, =20 iova =3D user_pfn[j] << PAGE_SHIFT; dma =3D vfio_find_dma(iommu, iova, PAGE_SIZE); - vfio_unpin_page_external(dma, iova, do_accounting); + vfio_unpin_page_external(dma, iova, do_accounting, false, 0); phys_pfn[j] =3D 0; } pin_done: @@ -612,7 +707,8 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_dat= a, dma =3D vfio_find_dma(iommu, iova, PAGE_SIZE); if (!dma) goto unpin_exit; - vfio_unpin_page_external(dma, iova, do_accounting); + vfio_unpin_page_external(dma, iova, do_accounting, + iommu->dirty_page_tracking, PAGE_SIZE); } =20 unpin_exit: @@ -800,6 +896,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, s= truct vfio_dma *dma) vfio_unmap_unpin(iommu, dma, true); vfio_unlink_dma(iommu, dma); put_task_struct(dma->task); + kfree(dma->bitmap); kfree(dma); iommu->dma_avail++; } @@ -830,6 +927,54 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_io= mmu *iommu) return bitmap; } =20 +static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iov= a, + size_t size, uint64_t pgsize, + unsigned char __user *bitmap) +{ + struct vfio_dma *dma; + unsigned long pgshift =3D __ffs(pgsize); + unsigned int npages, bitmap_size; + + dma =3D vfio_find_dma(iommu, iova, 1); + + if (!dma) + return -EINVAL; + + if (dma->iova !=3D iova || dma->size !=3D size) + return -EINVAL; + + npages =3D dma->size >> pgshift; + bitmap_size =3D dirty_bitmap_bytes(npages); + + /* mark all pages dirty if all pages are pinned and mapped. */ + if (dma->iommu_mapped) + bitmap_set(dma->bitmap, 0, npages); + + if (dma->bitmap) { + if (copy_to_user((void __user *)bitmap, dma->bitmap, + bitmap_size)) + return -EFAULT; + + memset(dma->bitmap, 0, bitmap_size); + } + return 0; +} + +static int verify_bitmap_size(unsigned long npages, unsigned long bitmap_s= ize) +{ + long bsize; + + if (!bitmap_size || bitmap_size > SIZE_MAX) + return -EINVAL; + + bsize =3D dirty_bitmap_bytes(npages); + + if (bitmap_size < bsize) + return -EINVAL; + + return 0; +} + static int vfio_dma_do_unmap(struct vfio_iommu *iommu, struct vfio_iommu_type1_dma_unmap *unmap) { @@ -2277,6 +2422,92 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, =20 return copy_to_user((void __user *)arg, &unmap, minsz) ? -EFAULT : 0; + } else if (cmd =3D=3D VFIO_IOMMU_DIRTY_PAGES) { + struct vfio_iommu_type1_dirty_bitmap dirty; + uint32_t mask =3D VFIO_IOMMU_DIRTY_PAGES_FLAG_START | + VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP | + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP; + int ret; + + if (!iommu->v2) + return -EACCES; + + minsz =3D offsetofend(struct vfio_iommu_type1_dirty_bitmap, + flags); + + if (copy_from_user(&dirty, (void __user *)arg, minsz)) + return -EFAULT; + + if (dirty.argsz < minsz || dirty.flags & ~mask) + return -EINVAL; + + /* only one flag should be set at a time */ + if (__ffs(dirty.flags) !=3D __fls(dirty.flags)) + return -EINVAL; + + if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_START) { + unsigned long iommu_pgsize =3D + 1 << __ffs(vfio_pgsize_bitmap(iommu)); + + mutex_lock(&iommu->lock); + ret =3D vfio_dma_all_bitmap_alloc(iommu, iommu_pgsize); + if (!ret) + iommu->dirty_page_tracking =3D true; + mutex_unlock(&iommu->lock); + return ret; + } else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP) { + mutex_lock(&iommu->lock); + if (iommu->dirty_page_tracking) { + iommu->dirty_page_tracking =3D false; + vfio_dma_all_bitmap_free(iommu); + } + mutex_unlock(&iommu->lock); + return 0; + } else if (dirty.flags & + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) { + struct vfio_iommu_type1_dirty_bitmap_get range; + unsigned long pgshift; + size_t data_size =3D dirty.argsz - minsz; + uint64_t iommu_pgsize =3D + 1 << __ffs(vfio_pgsize_bitmap(iommu)); + + if (!data_size || data_size < sizeof(range)) + return -EINVAL; + + if (copy_from_user(&range, (void __user *)(arg + minsz), + sizeof(range))) + return -EFAULT; + + // allow only min supported pgsize + if (range.pgsize !=3D iommu_pgsize) + return -EINVAL; + if (range.iova & (iommu_pgsize - 1)) + return -EINVAL; + if (!range.size || range.size & (iommu_pgsize - 1)) + return -EINVAL; + if (range.iova + range.size < range.iova) + return -EINVAL; + if (!access_ok((void __user *)range.bitmap, + range.bitmap_size)) + return -EINVAL; + + pgshift =3D __ffs(range.pgsize); + ret =3D verify_bitmap_size(range.size >> pgshift, + range.bitmap_size); + if (ret) + return ret; + + mutex_lock(&iommu->lock); + if (iommu->dirty_page_tracking) + ret =3D vfio_iova_dirty_bitmap(iommu, range.iova, + range.size, range.pgsize, + (unsigned char __user *)range.bitmap); + else + ret =3D -EINVAL; + mutex_unlock(&iommu->lock); + + return ret; + } } =20 return -ENOTTY; --=20 2.7.0