From nobody Thu May 2 22:02:25 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1499863765469380.568349557519; Wed, 12 Jul 2017 05:49:25 -0700 (PDT) Received: from localhost ([::1]:52565 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH4z-0006nF-C4 for importer@patchew.org; Wed, 12 Jul 2017 08:49:21 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41302) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH3A-0005g0-G5 for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dVH39-0003Nn-0g for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:28 -0400 Received: from mga06.intel.com ([134.134.136.31]:42654) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dVH38-0003Lp-O6 for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:26 -0400 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP; 12 Jul 2017 05:47:26 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga005.jf.intel.com with ESMTP; 12 Jul 2017 05:47:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,349,1496127600"; d="scan'208";a="124247978" From: Wei Wang To: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, david@redhat.com, cornelia.huck@de.ibm.com, akpm@linux-foundation.org, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com Date: Wed, 12 Jul 2017 20:40:14 +0800 Message-Id: <1499863221-16206-2-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> References: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.31 Subject: [Qemu-devel] [PATCH v12 1/8] virtio-balloon: deflate via a page list X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: yang.zhang.wz@gmail.com, virtio-dev@lists.oasis-open.org, quan.xu@aliyun.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Liang Li This patch saves the deflated pages to a list, instead of the PFN array. Accordingly, the balloon_pfn_to_page() function is removed. Signed-off-by: Liang Li Signed-off-by: Michael S. Tsirkin Signed-off-by: Wei Wang --- drivers/virtio/virtio_balloon.c | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloo= n.c index 22caf80..7f38ae6 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -104,12 +104,6 @@ static u32 page_to_balloon_pfn(struct page *page) return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE; } =20 -static struct page *balloon_pfn_to_page(u32 pfn) -{ - BUG_ON(pfn % VIRTIO_BALLOON_PAGES_PER_PAGE); - return pfn_to_page(pfn / VIRTIO_BALLOON_PAGES_PER_PAGE); -} - static void balloon_ack(struct virtqueue *vq) { struct virtio_balloon *vb =3D vq->vdev->priv; @@ -182,18 +176,16 @@ static unsigned fill_balloon(struct virtio_balloon *v= b, size_t num) return num_allocated_pages; } =20 -static void release_pages_balloon(struct virtio_balloon *vb) +static void release_pages_balloon(struct virtio_balloon *vb, + struct list_head *pages) { - unsigned int i; - struct page *page; + struct page *page, *next; =20 - /* Find pfns pointing at start of each page, get pages and free them. */ - for (i =3D 0; i < vb->num_pfns; i +=3D VIRTIO_BALLOON_PAGES_PER_PAGE) { - page =3D balloon_pfn_to_page(virtio32_to_cpu(vb->vdev, - vb->pfns[i])); + list_for_each_entry_safe(page, next, pages, lru) { if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) adjust_managed_page_count(page, 1); + list_del(&page->lru); put_page(page); /* balloon reference */ } } @@ -203,6 +195,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb,= size_t num) unsigned num_freed_pages; struct page *page; struct balloon_dev_info *vb_dev_info =3D &vb->vb_dev_info; + LIST_HEAD(pages); =20 /* We can only do one array worth at a time. */ num =3D min(num, ARRAY_SIZE(vb->pfns)); @@ -216,6 +209,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb,= size_t num) if (!page) break; set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + list_add(&page->lru, &pages); vb->num_pages -=3D VIRTIO_BALLOON_PAGES_PER_PAGE; } =20 @@ -227,7 +221,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb,= size_t num) */ if (vb->num_pfns !=3D 0) tell_host(vb, vb->deflate_vq); - release_pages_balloon(vb); + release_pages_balloon(vb, &pages); mutex_unlock(&vb->balloon_lock); return num_freed_pages; } --=20 2.7.4 From nobody Thu May 2 22:02:25 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1499863924318979.5753460195505; Wed, 12 Jul 2017 05:52:04 -0700 (PDT) Received: from localhost ([::1]:52577 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH7Y-0000vt-Uu for importer@patchew.org; Wed, 12 Jul 2017 08:52:01 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41334) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH3D-0005gP-2Q for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dVH3C-0003Pn-Aa for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:31 -0400 Received: from mga06.intel.com ([134.134.136.31]:42654) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dVH3C-0003Lp-2B for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:30 -0400 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP; 12 Jul 2017 05:47:29 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga005.jf.intel.com with ESMTP; 12 Jul 2017 05:47:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,349,1496127600"; d="scan'208";a="124247988" From: Wei Wang To: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, david@redhat.com, cornelia.huck@de.ibm.com, akpm@linux-foundation.org, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com Date: Wed, 12 Jul 2017 20:40:15 +0800 Message-Id: <1499863221-16206-3-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> References: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.31 Subject: [Qemu-devel] [PATCH v12 2/8] virtio-balloon: coding format cleanup X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: yang.zhang.wz@gmail.com, virtio-dev@lists.oasis-open.org, quan.xu@aliyun.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Clean up the comment format. Signed-off-by: Wei Wang --- drivers/virtio/virtio_balloon.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloo= n.c index 7f38ae6..f0b3a0b 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -132,8 +132,10 @@ static void set_page_pfns(struct virtio_balloon *vb, { unsigned int i; =20 - /* Set balloon pfns pointing at this page. - * Note that the first pfn points at start of the page. */ + /* + * Set balloon pfns pointing at this page. + * Note that the first pfn points at start of the page. + */ for (i =3D 0; i < VIRTIO_BALLOON_PAGES_PER_PAGE; i++) pfns[i] =3D cpu_to_virtio32(vb->vdev, page_to_balloon_pfn(page) + i); --=20 2.7.4 From nobody Thu May 2 22:02:25 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1499863778006655.6645187584926; Wed, 12 Jul 2017 05:49:38 -0700 (PDT) Received: from localhost ([::1]:52566 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH5D-0007AA-F8 for importer@patchew.org; Wed, 12 Jul 2017 08:49:35 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41360) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH3G-0005jY-V3 for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dVH3F-0003R0-NF for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:35 -0400 Received: from mga06.intel.com ([134.134.136.31]:42654) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dVH3F-0003Lp-CV for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:33 -0400 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP; 12 Jul 2017 05:47:33 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga005.jf.intel.com with ESMTP; 12 Jul 2017 05:47:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,349,1496127600"; d="scan'208";a="124248003" From: Wei Wang To: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, david@redhat.com, cornelia.huck@de.ibm.com, akpm@linux-foundation.org, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com Date: Wed, 12 Jul 2017 20:40:16 +0800 Message-Id: <1499863221-16206-4-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> References: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.31 Subject: [Qemu-devel] [PATCH v12 3/8] Introduce xbitmap X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: yang.zhang.wz@gmail.com, virtio-dev@lists.oasis-open.org, quan.xu@aliyun.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Matthew Wilcox The eXtensible Bitmap is a sparse bitmap representation which is efficient for set bits which tend to cluster. It supports up to 'unsigned long' worth of bits, and this commit adds the bare bones -- xb_set_bit(), xb_clear_bit() and xb_test_bit(). Signed-off-by: Matthew Wilcox Signed-off-by: Wei Wang --- include/linux/radix-tree.h | 2 + include/linux/xbitmap.h | 49 ++++++++++++++++ lib/radix-tree.c | 138 +++++++++++++++++++++++++++++++++++++++++= +++- 3 files changed, 187 insertions(+), 2 deletions(-) create mode 100644 include/linux/xbitmap.h diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 3e57350..428ccc9 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -317,6 +317,8 @@ void radix_tree_iter_delete(struct radix_tree_root *, struct radix_tree_iter *iter, void __rcu **slot); void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void= *); void *radix_tree_delete(struct radix_tree_root *, unsigned long); +bool __radix_tree_delete(struct radix_tree_root *root, + struct radix_tree_node *node, void __rcu **slot); void radix_tree_clear_tags(struct radix_tree_root *, struct radix_tree_nod= e *, void __rcu **slot); unsigned int radix_tree_gang_lookup(const struct radix_tree_root *, diff --git a/include/linux/xbitmap.h b/include/linux/xbitmap.h new file mode 100644 index 0000000..0b93a46 --- /dev/null +++ b/include/linux/xbitmap.h @@ -0,0 +1,49 @@ +/* + * eXtensible Bitmaps + * Copyright (c) 2017 Microsoft Corporation + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of the + * License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * eXtensible Bitmaps provide an unlimited-size sparse bitmap facility. + * All bits are initially zero. + */ + +#include + +struct xb { + struct radix_tree_root xbrt; +}; + +#define XB_INIT { \ + .xbrt =3D RADIX_TREE_INIT(IDR_RT_MARKER | GFP_NOWAIT), \ +} +#define DEFINE_XB(name) struct xb name =3D XB_INIT + +static inline void xb_init(struct xb *xb) +{ + INIT_RADIX_TREE(&xb->xbrt, IDR_RT_MARKER | GFP_NOWAIT); +} + +int xb_set_bit(struct xb *xb, unsigned long bit); +bool xb_test_bit(const struct xb *xb, unsigned long bit); +int xb_clear_bit(struct xb *xb, unsigned long bit); + +static inline bool xb_empty(const struct xb *xb) +{ + return radix_tree_empty(&xb->xbrt); +} + +void xb_preload(gfp_t gfp); + +static inline void xb_preload_end(void) +{ + preempt_enable(); +} diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 898e879..d624914 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -37,6 +37,7 @@ #include #include #include +#include =20 =20 /* Number of nodes in fully populated tree of given height */ @@ -78,6 +79,14 @@ static struct kmem_cache *radix_tree_node_cachep; #define IDA_PRELOAD_SIZE (IDA_MAX_PATH * 2 - 1) =20 /* + * The XB can go up to unsigned long, but also uses a bitmap. + */ +#define XB_INDEX_BITS (BITS_PER_LONG - ilog2(IDA_BITMAP_BITS)) +#define XB_MAX_PATH (DIV_ROUND_UP(XB_INDEX_BITS, \ + RADIX_TREE_MAP_SHIFT)) +#define XB_PRELOAD_SIZE (XB_MAX_PATH * 2 - 1) + +/* * Per-cpu pool of preloaded nodes */ struct radix_tree_preload { @@ -840,6 +849,8 @@ int __radix_tree_create(struct radix_tree_root *root, u= nsigned long index, offset, 0, 0); if (!child) return -ENOMEM; + if (is_idr(root)) + all_tag_set(child, IDR_FREE); rcu_assign_pointer(*slot, node_to_entry(child)); if (node) node->count++; @@ -1986,8 +1997,8 @@ void __radix_tree_delete_node(struct radix_tree_root = *root, delete_node(root, node, update_node, private); } =20 -static bool __radix_tree_delete(struct radix_tree_root *root, - struct radix_tree_node *node, void __rcu **slot) +bool __radix_tree_delete(struct radix_tree_root *root, + struct radix_tree_node *node, void __rcu **slot) { void *old =3D rcu_dereference_raw(*slot); int exceptional =3D radix_tree_exceptional_entry(old) ? -1 : 0; @@ -2137,6 +2148,129 @@ int ida_pre_get(struct ida *ida, gfp_t gfp) } EXPORT_SYMBOL(ida_pre_get); =20 +void xb_preload(gfp_t gfp) +{ + __radix_tree_preload(gfp, XB_PRELOAD_SIZE); + if (!this_cpu_read(ida_bitmap)) { + struct ida_bitmap *bitmap =3D kmalloc(sizeof(*bitmap), gfp); + + if (!bitmap) + return; + bitmap =3D this_cpu_cmpxchg(ida_bitmap, NULL, bitmap); + kfree(bitmap); + } +} +EXPORT_SYMBOL(xb_preload); + +int xb_set_bit(struct xb *xb, unsigned long bit) +{ + int err; + unsigned long index =3D bit / IDA_BITMAP_BITS; + struct radix_tree_root *root =3D &xb->xbrt; + struct radix_tree_node *node; + void **slot; + struct ida_bitmap *bitmap; + unsigned long ebit; + + bit %=3D IDA_BITMAP_BITS; + ebit =3D bit + 2; + + err =3D __radix_tree_create(root, index, 0, &node, &slot); + if (err) + return err; + bitmap =3D rcu_dereference_raw(*slot); + if (radix_tree_exception(bitmap)) { + unsigned long tmp =3D (unsigned long)bitmap; + + if (ebit < BITS_PER_LONG) { + tmp |=3D 1UL << ebit; + rcu_assign_pointer(*slot, (void *)tmp); + return 0; + } + bitmap =3D this_cpu_xchg(ida_bitmap, NULL); + if (!bitmap) + return -EAGAIN; + memset(bitmap, 0, sizeof(*bitmap)); + bitmap->bitmap[0] =3D tmp >> RADIX_TREE_EXCEPTIONAL_SHIFT; + rcu_assign_pointer(*slot, bitmap); + } + + if (!bitmap) { + if (ebit < BITS_PER_LONG) { + bitmap =3D (void *)((1UL << ebit) | + RADIX_TREE_EXCEPTIONAL_ENTRY); + __radix_tree_replace(root, node, slot, bitmap, NULL, + NULL); + return 0; + } + bitmap =3D this_cpu_xchg(ida_bitmap, NULL); + if (!bitmap) + return -EAGAIN; + memset(bitmap, 0, sizeof(*bitmap)); + __radix_tree_replace(root, node, slot, bitmap, NULL, NULL); + } + + __set_bit(bit, bitmap->bitmap); + return 0; +} + +int xb_clear_bit(struct xb *xb, unsigned long bit) +{ + unsigned long index =3D bit / IDA_BITMAP_BITS; + struct radix_tree_root *root =3D &xb->xbrt; + struct radix_tree_node *node; + void **slot; + struct ida_bitmap *bitmap; + unsigned long ebit; + + bit %=3D IDA_BITMAP_BITS; + ebit =3D bit + 2; + + bitmap =3D __radix_tree_lookup(root, index, &node, &slot); + if (radix_tree_exception(bitmap)) { + unsigned long tmp =3D (unsigned long)bitmap; + + if (ebit >=3D BITS_PER_LONG) + return 0; + tmp &=3D ~(1UL << ebit); + if (tmp =3D=3D RADIX_TREE_EXCEPTIONAL_ENTRY) + __radix_tree_delete(root, node, slot); + else + rcu_assign_pointer(*slot, (void *)tmp); + return 0; + } + + if (!bitmap) + return 0; + + __clear_bit(bit, bitmap->bitmap); + if (bitmap_empty(bitmap->bitmap, IDA_BITMAP_BITS)) { + kfree(bitmap); + __radix_tree_delete(root, node, slot); + } + + return 0; +} + +bool xb_test_bit(const struct xb *xb, unsigned long bit) +{ + unsigned long index =3D bit / IDA_BITMAP_BITS; + const struct radix_tree_root *root =3D &xb->xbrt; + struct ida_bitmap *bitmap =3D radix_tree_lookup(root, index); + + bit %=3D IDA_BITMAP_BITS; + + if (!bitmap) + return false; + if (radix_tree_exception(bitmap)) { + bit +=3D RADIX_TREE_EXCEPTIONAL_SHIFT; + if (bit > BITS_PER_LONG) + return false; + return (unsigned long)bitmap & (1UL << bit); + } + return test_bit(bit, bitmap->bitmap); +} + void __rcu **idr_get_free(struct radix_tree_root *root, struct radix_tree_iter *iter, gfp_t gfp, int end) { --=20 2.7.4 From nobody Thu May 2 22:02:25 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 149986377888876.98135978867367; Wed, 12 Jul 2017 05:49:38 -0700 (PDT) Received: from localhost ([::1]:52567 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH5E-0007Cx-Kq for importer@patchew.org; Wed, 12 Jul 2017 08:49:36 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41388) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH3K-0005lv-5t for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dVH3J-0003S4-31 for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:38 -0400 Received: from mga06.intel.com ([134.134.136.31]:42654) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dVH3I-0003Lp-Ob for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:36 -0400 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP; 12 Jul 2017 05:47:36 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga005.jf.intel.com with ESMTP; 12 Jul 2017 05:47:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,349,1496127600"; d="scan'208";a="124248016" From: Wei Wang To: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, david@redhat.com, cornelia.huck@de.ibm.com, akpm@linux-foundation.org, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com Date: Wed, 12 Jul 2017 20:40:17 +0800 Message-Id: <1499863221-16206-5-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> References: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.31 Subject: [Qemu-devel] [PATCH v12 4/8] xbitmap: add xb_find_next_bit() and xb_zero() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: yang.zhang.wz@gmail.com, virtio-dev@lists.oasis-open.org, quan.xu@aliyun.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" xb_find_next_bit() is added to support find the next "1" or "0" bit in the given range. xb_zero() is added to support zero the given range of bits. Signed-off-by: Wei Wang --- include/linux/xbitmap.h | 4 ++++ lib/radix-tree.c | 26 ++++++++++++++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/include/linux/xbitmap.h b/include/linux/xbitmap.h index 0b93a46..88c2045 100644 --- a/include/linux/xbitmap.h +++ b/include/linux/xbitmap.h @@ -36,6 +36,10 @@ int xb_set_bit(struct xb *xb, unsigned long bit); bool xb_test_bit(const struct xb *xb, unsigned long bit); int xb_clear_bit(struct xb *xb, unsigned long bit); =20 +void xb_zero(struct xb *xb, unsigned long start, unsigned long end); +unsigned long xb_find_next_bit(struct xb *xb, unsigned long start, + unsigned long end, bool set); + static inline bool xb_empty(const struct xb *xb) { return radix_tree_empty(&xb->xbrt); diff --git a/lib/radix-tree.c b/lib/radix-tree.c index d624914..c45b910 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -2271,6 +2271,32 @@ bool xb_test_bit(const struct xb *xb, unsigned long = bit) return test_bit(bit, bitmap->bitmap); } =20 +void xb_zero(struct xb *xb, unsigned long start, unsigned long end) +{ + unsigned long i; + + for (i =3D start; i <=3D end; i++) + xb_clear_bit(xb, i); +} + +/* + * Find the next one (@set =3D 1) or zero (@set =3D 0) bit within the bit = range + * from @start to @end in @xb. If no such bit is found in the given range, + * bit end + 1 will be returned. + */ +unsigned long xb_find_next_bit(struct xb *xb, unsigned long start, + unsigned long end, bool set) +{ + unsigned long i; + + for (i =3D start; i <=3D end; i++) { + if (xb_test_bit(xb, i) =3D=3D set) + break; + } + + return i; +} + void __rcu **idr_get_free(struct radix_tree_root *root, struct radix_tree_iter *iter, gfp_t gfp, int end) { --=20 2.7.4 From nobody Thu May 2 22:02:25 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1499863943863451.0913585888255; Wed, 12 Jul 2017 05:52:23 -0700 (PDT) Received: from localhost ([::1]:52580 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH7t-0001AN-GL for importer@patchew.org; Wed, 12 Jul 2017 08:52:21 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41428) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH3T-0005tr-Rm for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dVH3Q-0003Tl-N8 for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:47 -0400 Received: from mga09.intel.com ([134.134.136.24]:53782) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dVH3Q-0003Sz-7n for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:44 -0400 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Jul 2017 05:47:42 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga005.jf.intel.com with ESMTP; 12 Jul 2017 05:47:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,349,1496127600"; d="scan'208";a="124248078" From: Wei Wang To: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, david@redhat.com, cornelia.huck@de.ibm.com, akpm@linux-foundation.org, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com Date: Wed, 12 Jul 2017 20:40:18 +0800 Message-Id: <1499863221-16206-6-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> References: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.24 Subject: [Qemu-devel] [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: yang.zhang.wz@gmail.com, virtio-dev@lists.oasis-open.org, quan.xu@aliyun.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add a new feature, VIRTIO_BALLOON_F_SG, which enables to transfer a chunk of ballooned (i.e. inflated/deflated) pages using scatter-gather lists to the host. The implementation of the previous virtio-balloon is not very efficient, because the balloon pages are transferred to the host one by one. Here is the breakdown of the time in percentage spent on each step of the balloon inflating process (inflating 7GB of an 8GB idle guest). 1) allocating pages (6.5%) 2) sending PFNs to host (68.3%) 3) address translation (6.1%) 4) madvise (19%) It takes about 4126ms for the inflating process to complete. The above profiling shows that the bottlenecks are stage 2) and stage 4). This patch optimizes step 2) by transferring pages to the host in sgs. An sg describes a chunk of guest physically continuous pages. With this mechanism, step 4) can also be optimized by doing address translation and madvise() in chunks rather than page by page. With this new feature, the above ballooning process takes ~491ms resulting in an improvement of ~88%. TODO: optimize stage 1) by allocating/freeing a chunk of pages instead of a single page each time. Signed-off-by: Wei Wang Signed-off-by: Liang Li Suggested-by: Michael S. Tsirkin --- drivers/virtio/virtio_balloon.c | 141 ++++++++++++++++++++++--- drivers/virtio/virtio_ring.c | 199 ++++++++++++++++++++++++++++++++= +--- include/linux/virtio.h | 20 ++++ include/uapi/linux/virtio_balloon.h | 1 + 4 files changed, 329 insertions(+), 32 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloo= n.c index f0b3a0b..aa4e7ec 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -32,6 +32,7 @@ #include #include #include +#include =20 /* * Balloon device works in 4K page units. So each page is pointed to by @@ -79,6 +80,9 @@ struct virtio_balloon { /* Synchronize access/update to this struct virtio_balloon elements */ struct mutex balloon_lock; =20 + /* The xbitmap used to record ballooned pages */ + struct xb page_xb; + /* The array of pfns we tell the Host about. */ unsigned int num_pfns; __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]; @@ -141,13 +145,71 @@ static void set_page_pfns(struct virtio_balloon *vb, page_to_balloon_pfn(page) + i); } =20 +/* + * Send balloon pages in sgs to host. + * The balloon pages are recorded in the page xbitmap. Each bit in the bit= map + * corresponds to a page of PAGE_SIZE. The page xbitmap is searched for + * continuous "1" bits, which correspond to continuous pages, to chunk into + * sgs. + * + * @page_xb_start and @page_xb_end form the range of bits in the xbitmap t= hat + * need to be serached. + */ +static void tell_host_sgs(struct virtio_balloon *vb, + struct virtqueue *vq, + unsigned long page_xb_start, + unsigned long page_xb_end) +{ + unsigned int head_id =3D VIRTQUEUE_DESC_ID_INIT, + prev_id =3D VIRTQUEUE_DESC_ID_INIT; + unsigned long sg_pfn_start, sg_pfn_end; + uint64_t sg_addr; + uint32_t sg_size; + + sg_pfn_start =3D page_xb_start; + while (sg_pfn_start < page_xb_end) { + sg_pfn_start =3D xb_find_next_bit(&vb->page_xb, sg_pfn_start, + page_xb_end, 1); + if (sg_pfn_start =3D=3D page_xb_end + 1) + break; + sg_pfn_end =3D xb_find_next_bit(&vb->page_xb, sg_pfn_start + 1, + page_xb_end, 0); + sg_addr =3D sg_pfn_start << PAGE_SHIFT; + sg_size =3D (sg_pfn_end - sg_pfn_start) * PAGE_SIZE; + virtqueue_add_chain_desc(vq, sg_addr, sg_size, &head_id, + &prev_id, 0); + xb_zero(&vb->page_xb, sg_pfn_start, sg_pfn_end); + sg_pfn_start =3D sg_pfn_end + 1; + } + + if (head_id !=3D VIRTQUEUE_DESC_ID_INIT) { + virtqueue_add_chain(vq, head_id, 0, NULL, vb, NULL); + virtqueue_kick_async(vq, vb->acked); + } +} + +/* Update pfn_max and pfn_min according to the pfn of @page */ +static inline void update_pfn_range(struct virtio_balloon *vb, + struct page *page, + unsigned long *pfn_min, + unsigned long *pfn_max) +{ + unsigned long pfn =3D page_to_pfn(page); + + *pfn_min =3D min(pfn, *pfn_min); + *pfn_max =3D max(pfn, *pfn_max); +} + static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) { struct balloon_dev_info *vb_dev_info =3D &vb->vb_dev_info; unsigned num_allocated_pages; + bool use_sg =3D virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG); + unsigned long pfn_max =3D 0, pfn_min =3D ULONG_MAX; =20 /* We can only do one array worth at a time. */ - num =3D min(num, ARRAY_SIZE(vb->pfns)); + if (!use_sg) + num =3D min(num, ARRAY_SIZE(vb->pfns)); =20 mutex_lock(&vb->balloon_lock); for (vb->num_pfns =3D 0; vb->num_pfns < num; @@ -162,7 +224,12 @@ static unsigned fill_balloon(struct virtio_balloon *vb= , size_t num) msleep(200); break; } - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + if (use_sg) { + update_pfn_range(vb, page, &pfn_min, &pfn_max); + xb_set_bit(&vb->page_xb, page_to_pfn(page)); + } else { + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + } vb->num_pages +=3D VIRTIO_BALLOON_PAGES_PER_PAGE; if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) @@ -171,8 +238,12 @@ static unsigned fill_balloon(struct virtio_balloon *vb= , size_t num) =20 num_allocated_pages =3D vb->num_pfns; /* Did we get any? */ - if (vb->num_pfns !=3D 0) - tell_host(vb, vb->inflate_vq); + if (vb->num_pfns !=3D 0) { + if (use_sg) + tell_host_sgs(vb, vb->inflate_vq, pfn_min, pfn_max); + else + tell_host(vb, vb->inflate_vq); + } mutex_unlock(&vb->balloon_lock); =20 return num_allocated_pages; @@ -198,9 +269,12 @@ static unsigned leak_balloon(struct virtio_balloon *vb= , size_t num) struct page *page; struct balloon_dev_info *vb_dev_info =3D &vb->vb_dev_info; LIST_HEAD(pages); + bool use_sg =3D virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG); + unsigned long pfn_max =3D 0, pfn_min =3D ULONG_MAX; =20 - /* We can only do one array worth at a time. */ - num =3D min(num, ARRAY_SIZE(vb->pfns)); + /* Traditionally, we can only do one array worth at a time. */ + if (!use_sg) + num =3D min(num, ARRAY_SIZE(vb->pfns)); =20 mutex_lock(&vb->balloon_lock); /* We can't release more pages than taken */ @@ -210,7 +284,12 @@ static unsigned leak_balloon(struct virtio_balloon *vb= , size_t num) page =3D balloon_page_dequeue(vb_dev_info); if (!page) break; - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + if (use_sg) { + update_pfn_range(vb, page, &pfn_min, &pfn_max); + xb_set_bit(&vb->page_xb, page_to_pfn(page)); + } else { + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + } list_add(&page->lru, &pages); vb->num_pages -=3D VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -221,8 +300,12 @@ static unsigned leak_balloon(struct virtio_balloon *vb= , size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - if (vb->num_pfns !=3D 0) - tell_host(vb, vb->deflate_vq); + if (vb->num_pfns !=3D 0) { + if (use_sg) + tell_host_sgs(vb, vb->deflate_vq, pfn_min, pfn_max); + else + tell_host(vb, vb->deflate_vq); + } release_pages_balloon(vb, &pages); mutex_unlock(&vb->balloon_lock); return num_freed_pages; @@ -441,6 +524,18 @@ static int init_vqs(struct virtio_balloon *vb) } =20 #ifdef CONFIG_BALLOON_COMPACTION + +static void tell_host_one_page(struct virtio_balloon *vb, struct virtqueue= *vq, + struct page *page) +{ + unsigned int id =3D VIRTQUEUE_DESC_ID_INIT; + u64 addr =3D page_to_pfn(page) << VIRTIO_BALLOON_PFN_SHIFT; + + virtqueue_add_chain_desc(vq, addr, PAGE_SIZE, &id, &id, 0); + virtqueue_add_chain(vq, id, 0, NULL, (void *)addr, NULL); + virtqueue_kick_async(vq, vb->acked); +} + /* * virtballoon_migratepage - perform the balloon page migration on behalf = of * a compation thread. (called under page lock) @@ -464,6 +559,7 @@ static int virtballoon_migratepage(struct balloon_dev_i= nfo *vb_dev_info, { struct virtio_balloon *vb =3D container_of(vb_dev_info, struct virtio_balloon, vb_dev_info); + bool use_sg =3D virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG); unsigned long flags; =20 /* @@ -485,16 +581,22 @@ static int virtballoon_migratepage(struct balloon_dev= _info *vb_dev_info, vb_dev_info->isolated_pages--; __count_vm_event(BALLOON_MIGRATE); spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags); - vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; - set_page_pfns(vb, vb->pfns, newpage); - tell_host(vb, vb->inflate_vq); - + if (use_sg) { + tell_host_one_page(vb, vb->inflate_vq, newpage); + } else { + vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; + set_page_pfns(vb, vb->pfns, newpage); + tell_host(vb, vb->inflate_vq); + } /* balloon's page migration 2nd step -- deflate "page" */ balloon_page_delete(page); - vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; - set_page_pfns(vb, vb->pfns, page); - tell_host(vb, vb->deflate_vq); - + if (use_sg) { + tell_host_one_page(vb, vb->deflate_vq, page); + } else { + vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; + set_page_pfns(vb, vb->pfns, page); + tell_host(vb, vb->deflate_vq); + } mutex_unlock(&vb->balloon_lock); =20 put_page(page); /* balloon reference */ @@ -553,6 +655,9 @@ static int virtballoon_probe(struct virtio_device *vdev) if (err) goto out_free_vb; =20 + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_SG)) + xb_init(&vb->page_xb); + vb->nb.notifier_call =3D virtballoon_oom_notify; vb->nb.priority =3D VIRTBALLOON_OOM_NOTIFY_PRIORITY; err =3D register_oom_notifier(&vb->nb); @@ -618,6 +723,7 @@ static void virtballoon_remove(struct virtio_device *vd= ev) cancel_work_sync(&vb->update_balloon_size_work); cancel_work_sync(&vb->update_balloon_stats_work); =20 + xb_empty(&vb->page_xb); remove_common(vb); #ifdef CONFIG_BALLOON_COMPACTION if (vb->vb_dev_info.inode) @@ -669,6 +775,7 @@ static unsigned int features[] =3D { VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, + VIRTIO_BALLOON_F_SG, }; =20 static struct virtio_driver virtio_balloon_driver =3D { diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 5e1b548..b9d7e10 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -269,7 +269,7 @@ static inline int virtqueue_add(struct virtqueue *_vq, struct vring_virtqueue *vq =3D to_vvq(_vq); struct scatterlist *sg; struct vring_desc *desc; - unsigned int i, n, avail, descs_used, uninitialized_var(prev), err_idx; + unsigned int i, n, descs_used, uninitialized_var(prev), err_id; int head; bool indirect; =20 @@ -387,10 +387,68 @@ static inline int virtqueue_add(struct virtqueue *_vq, else vq->free_head =3D i; =20 - /* Store token and indirect buffer state. */ + END_USE(vq); + + return virtqueue_add_chain(_vq, head, indirect, desc, data, ctx); + +unmap_release: + err_id =3D i; + i =3D head; + + for (n =3D 0; n < total_sg; n++) { + if (i =3D=3D err_id) + break; + vring_unmap_one(vq, &desc[i]); + i =3D virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next); + } + + vq->vq.num_free +=3D total_sg; + + if (indirect) + kfree(desc); + + END_USE(vq); + return -EIO; +} + +/** + * virtqueue_add_chain - expose a chain of buffers to the other end + * @_vq: the struct virtqueue we're talking about. + * @head: desc id of the chain head. + * @indirect: set if the chain of descs are indrect descs. + * @indir_desc: the first indirect desc. + * @data: the token identifying the chain. + * @ctx: extra context for the token. + * + * Caller must ensure we don't call this with other virtqueue operations + * at the same time (except where noted). + * + * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO). + */ +int virtqueue_add_chain(struct virtqueue *_vq, + unsigned int head, + bool indirect, + struct vring_desc *indir_desc, + void *data, + void *ctx) +{ + struct vring_virtqueue *vq =3D to_vvq(_vq); + unsigned int avail; + + /* The desc chain is empty. */ + if (head =3D=3D VIRTQUEUE_DESC_ID_INIT) + return 0; + + START_USE(vq); + + if (unlikely(vq->broken)) { + END_USE(vq); + return -EIO; + } + vq->desc_state[head].data =3D data; if (indirect) - vq->desc_state[head].indir_desc =3D desc; + vq->desc_state[head].indir_desc =3D indir_desc; if (ctx) vq->desc_state[head].indir_desc =3D ctx; =20 @@ -415,26 +473,87 @@ static inline int virtqueue_add(struct virtqueue *_vq, virtqueue_kick(_vq); =20 return 0; +} +EXPORT_SYMBOL_GPL(virtqueue_add_chain); =20 -unmap_release: - err_idx =3D i; - i =3D head; +/** + * virtqueue_add_chain_desc - add a buffer to a chain using a vring desc + * @vq: the struct virtqueue we're talking about. + * @addr: address of the buffer to add. + * @len: length of the buffer. + * @head_id: desc id of the chain head. + * @prev_id: desc id of the previous buffer. + * @in: set if the buffer is for the device to write. + * + * Caller must ensure we don't call this with other virtqueue operations + * at the same time (except where noted). + * + * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO). + */ +int virtqueue_add_chain_desc(struct virtqueue *_vq, + uint64_t addr, + uint32_t len, + unsigned int *head_id, + unsigned int *prev_id, + bool in) +{ + struct vring_virtqueue *vq =3D to_vvq(_vq); + struct vring_desc *desc =3D vq->vring.desc; + uint16_t flags =3D in ? VRING_DESC_F_WRITE : 0; + unsigned int i; =20 - for (n =3D 0; n < total_sg; n++) { - if (i =3D=3D err_idx) - break; - vring_unmap_one(vq, &desc[i]); - i =3D virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next); + /* Sanity check */ + if (!_vq || !head_id || !prev_id) + return -EINVAL; +retry: + START_USE(vq); + if (unlikely(vq->broken)) { + END_USE(vq); + return -EIO; } =20 - vq->vq.num_free +=3D total_sg; + if (vq->vq.num_free < 1) { + /* + * If there is no desc avail in the vq, so kick what is + * already added, and re-start to build a new chain for + * the passed sg. + */ + if (likely(*head_id !=3D VIRTQUEUE_DESC_ID_INIT)) { + END_USE(vq); + virtqueue_add_chain(_vq, *head_id, 0, NULL, vq, NULL); + virtqueue_kick_sync(_vq); + *head_id =3D VIRTQUEUE_DESC_ID_INIT; + *prev_id =3D VIRTQUEUE_DESC_ID_INIT; + goto retry; + } else { + END_USE(vq); + return -ENOSPC; + } + } =20 - if (indirect) - kfree(desc); + i =3D vq->free_head; + flags &=3D ~VRING_DESC_F_NEXT; + desc[i].flags =3D cpu_to_virtio16(_vq->vdev, flags); + desc[i].addr =3D cpu_to_virtio64(_vq->vdev, addr); + desc[i].len =3D cpu_to_virtio32(_vq->vdev, len); + + /* Add the desc to the end of the chain */ + if (*prev_id !=3D VIRTQUEUE_DESC_ID_INIT) { + desc[*prev_id].next =3D cpu_to_virtio16(_vq->vdev, i); + desc[*prev_id].flags |=3D cpu_to_virtio16(_vq->vdev, + VRING_DESC_F_NEXT); + } + *prev_id =3D i; + if (*head_id =3D=3D VIRTQUEUE_DESC_ID_INIT) + *head_id =3D *prev_id; =20 + vq->vq.num_free--; + vq->free_head =3D virtio16_to_cpu(_vq->vdev, desc[i].next); END_USE(vq); - return -EIO; + + return 0; } +EXPORT_SYMBOL_GPL(virtqueue_add_chain_desc); =20 /** * virtqueue_add_sgs - expose buffers to other end @@ -627,6 +746,56 @@ bool virtqueue_kick(struct virtqueue *vq) } EXPORT_SYMBOL_GPL(virtqueue_kick); =20 +/** + * virtqueue_kick_sync - update after add_buf and busy wait till update is= done + * @vq: the struct virtqueue + * + * After one or more virtqueue_add_* calls, invoke this to kick + * the other side. Busy wait till the other side is done with the update. + * + * Caller must ensure we don't call this with other virtqueue + * operations at the same time (except where noted). + * + * Returns false if kick failed, otherwise true. + */ +bool virtqueue_kick_sync(struct virtqueue *vq) +{ + u32 len; + + if (likely(virtqueue_kick(vq))) { + while (!virtqueue_get_buf(vq, &len) && + !virtqueue_is_broken(vq)) + cpu_relax(); + return true; + } + return false; +} +EXPORT_SYMBOL_GPL(virtqueue_kick_sync); + +/** + * virtqueue_kick_async - update after add_buf and blocking till update is= done + * @vq: the struct virtqueue + * + * After one or more virtqueue_add_* calls, invoke this to kick + * the other side. Blocking till the other side is done with the update. + * + * Caller must ensure we don't call this with other virtqueue + * operations at the same time (except where noted). + * + * Returns false if kick failed, otherwise true. + */ +bool virtqueue_kick_async(struct virtqueue *vq, wait_queue_head_t wq) +{ + u32 len; + + if (likely(virtqueue_kick(vq))) { + wait_event(wq, virtqueue_get_buf(vq, &len)); + return true; + } + return false; +} +EXPORT_SYMBOL_GPL(virtqueue_kick_async); + static void detach_buf(struct vring_virtqueue *vq, unsigned int head, void **ctx) { diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 28b0e96..9f27101 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -57,8 +57,28 @@ int virtqueue_add_sgs(struct virtqueue *vq, void *data, gfp_t gfp); =20 +/* A desc with this init id is treated as an invalid desc */ +#define VIRTQUEUE_DESC_ID_INIT UINT_MAX +int virtqueue_add_chain_desc(struct virtqueue *_vq, + uint64_t addr, + uint32_t len, + unsigned int *head_id, + unsigned int *prev_id, + bool in); + +int virtqueue_add_chain(struct virtqueue *_vq, + unsigned int head, + bool indirect, + struct vring_desc *indirect_desc, + void *data, + void *ctx); + bool virtqueue_kick(struct virtqueue *vq); =20 +bool virtqueue_kick_sync(struct virtqueue *vq); + +bool virtqueue_kick_async(struct virtqueue *vq, wait_queue_head_t wq); + bool virtqueue_kick_prepare(struct virtqueue *vq); =20 bool virtqueue_notify(struct virtqueue *vq); diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virti= o_balloon.h index 343d7dd..37780a7 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -34,6 +34,7 @@ #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages = */ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ +#define VIRTIO_BALLOON_F_SG 3 /* Use sg instead of PFN lists */ =20 /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 --=20 2.7.4 From nobody Thu May 2 22:02:25 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1499863939874791.8188112832012; Wed, 12 Jul 2017 05:52:19 -0700 (PDT) Received: from localhost ([::1]:52579 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH7n-000190-G7 for importer@patchew.org; Wed, 12 Jul 2017 08:52:15 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41429) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH3T-0005ts-Rv for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dVH3S-0003UD-6Y for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:47 -0400 Received: from mga09.intel.com ([134.134.136.24]:53782) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dVH3R-0003Sz-Qg for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:46 -0400 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Jul 2017 05:47:45 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga005.jf.intel.com with ESMTP; 12 Jul 2017 05:47:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,349,1496127600"; d="scan'208";a="124248086" From: Wei Wang To: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, david@redhat.com, cornelia.huck@de.ibm.com, akpm@linux-foundation.org, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com Date: Wed, 12 Jul 2017 20:40:19 +0800 Message-Id: <1499863221-16206-7-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> References: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.24 Subject: [Qemu-devel] [PATCH v12 6/8] mm: support reporting free page blocks X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: yang.zhang.wz@gmail.com, virtio-dev@lists.oasis-open.org, quan.xu@aliyun.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This patch adds support for reporting blocks of pages on the free list specified by the caller. As pages can leave the free list during this call or immediately afterwards, they are not guaranteed to be free after the function returns. The only guarantee this makes is that the page was on the free list at some point in time after the function has been invoked. Therefore, it is not safe for caller to use any pages on the returned block or to discard data that is put there after the function returns. However, it is safe for caller to discard data that was in one of these pages before the function was invoked. Signed-off-by: Wei Wang Signed-off-by: Liang Li --- include/linux/mm.h | 5 +++ mm/page_alloc.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 2 files changed, 101 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 46b9ac5..76cb433 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1835,6 +1835,11 @@ extern void free_area_init_node(int nid, unsigned lo= ng * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); =20 +#if IS_ENABLED(CONFIG_VIRTIO_BALLOON) +extern int report_unused_page_block(struct zone *zone, unsigned int order, + unsigned int migratetype, + struct page **page); +#endif /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) * into the buddy system. The freed pages will be poisoned with pattern diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 64b7d82..8b3c9dd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4753,6 +4753,102 @@ void show_free_areas(unsigned int filter, nodemask_= t *nodemask) show_swap_cache_info(); } =20 +#if IS_ENABLED(CONFIG_VIRTIO_BALLOON) + +/* + * Heuristically get a page block in the system that is unused. + * It is possible that pages from the page block are used immediately after + * report_unused_page_block() returns. It is the caller's responsibility + * to either detect or prevent the use of such pages. + * + * The free list to check: zone->free_area[order].free_list[migratetype]. + * + * If the caller supplied page block (i.e. **page) is on the free list, of= fer + * the next page block on the list to the caller. Otherwise, offer the fir= st + * page block on the list. + * + * Note: it is not safe for caller to use any pages on the returned + * block or to discard data that is put there after the function returns. + * However, it is safe for caller to discard data that was in one of these + * pages before the function was invoked. + * + * Return 0 when a page block is found on the caller specified free list. + */ +int report_unused_page_block(struct zone *zone, unsigned int order, + unsigned int migratetype, struct page **page) +{ + struct zone *this_zone; + struct list_head *this_list; + int ret =3D 0; + unsigned long flags; + + /* Sanity check */ + if (zone =3D=3D NULL || page =3D=3D NULL || order >=3D MAX_ORDER || + migratetype >=3D MIGRATE_TYPES) + return -EINVAL; + + /* Zone validity check */ + for_each_populated_zone(this_zone) { + if (zone =3D=3D this_zone) + break; + } + + /* Got a non-existent zone from the caller? */ + if (zone !=3D this_zone) + return -EINVAL; + + spin_lock_irqsave(&this_zone->lock, flags); + + this_list =3D &zone->free_area[order].free_list[migratetype]; + if (list_empty(this_list)) { + *page =3D NULL; + ret =3D 1; + goto out; + } + + /* The caller is asking for the first free page block on the list */ + if ((*page) =3D=3D NULL) { + *page =3D list_first_entry(this_list, struct page, lru); + ret =3D 0; + goto out; + } + + /* + * The page block passed from the caller is not on this free list + * anymore (e.g. a 1MB free page block has been split). In this case, + * offer the first page block on the free list that the caller is + * asking for. + */ + if (PageBuddy(*page) && order !=3D page_order(*page)) { + *page =3D list_first_entry(this_list, struct page, lru); + ret =3D 0; + goto out; + } + + /* + * The page block passed from the caller has been the last page block + * on the list. + */ + if ((*page)->lru.next =3D=3D this_list) { + *page =3D NULL; + ret =3D 1; + goto out; + } + + /* + * Finally, fall into the regular case: the page block passed from the + * caller is still on the free list. Offer the next one. + */ + *page =3D list_next_entry((*page), lru); + ret =3D 0; +out: + spin_unlock_irqrestore(&this_zone->lock, flags); + return ret; +} +EXPORT_SYMBOL(report_unused_page_block); + +#endif + static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) { zoneref->zone =3D zone; --=20 2.7.4 From nobody Thu May 2 22:02:25 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1499864092669454.9650688443447; Wed, 12 Jul 2017 05:54:52 -0700 (PDT) Received: from localhost ([::1]:52590 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVHAI-0003Qm-BQ for importer@patchew.org; Wed, 12 Jul 2017 08:54:50 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41464) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH3W-0005wF-7e for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dVH3V-0003Vy-JK for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:50 -0400 Received: from mga09.intel.com ([134.134.136.24]:53782) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dVH3V-0003Sz-9q for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:49 -0400 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Jul 2017 05:47:48 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga005.jf.intel.com with ESMTP; 12 Jul 2017 05:47:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,349,1496127600"; d="scan'208";a="124248099" From: Wei Wang To: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, david@redhat.com, cornelia.huck@de.ibm.com, akpm@linux-foundation.org, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com Date: Wed, 12 Jul 2017 20:40:20 +0800 Message-Id: <1499863221-16206-8-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> References: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.24 Subject: [Qemu-devel] [PATCH v12 7/8] mm: export symbol of next_zone and first_online_pgdat X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: yang.zhang.wz@gmail.com, virtio-dev@lists.oasis-open.org, quan.xu@aliyun.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This patch enables for_each_zone()/for_each_populated_zone() to be invoked by a kernel module. Signed-off-by: Wei Wang --- mm/mmzone.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/mmzone.c b/mm/mmzone.c index a51c0a6..08a2a3a 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -13,6 +13,7 @@ struct pglist_data *first_online_pgdat(void) { return NODE_DATA(first_online_node); } +EXPORT_SYMBOL_GPL(first_online_pgdat); =20 struct pglist_data *next_online_pgdat(struct pglist_data *pgdat) { @@ -41,6 +42,7 @@ struct zone *next_zone(struct zone *zone) } return zone; } +EXPORT_SYMBOL_GPL(next_zone); =20 static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes) { --=20 2.7.4 From nobody Thu May 2 22:02:25 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1499864080353599.2389800286993; Wed, 12 Jul 2017 05:54:40 -0700 (PDT) Received: from localhost ([::1]:52588 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVHA5-0003A6-JJ for importer@patchew.org; Wed, 12 Jul 2017 08:54:37 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41510) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dVH3c-00061Z-CX for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dVH3Z-0003XF-9q for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:56 -0400 Received: from mga09.intel.com ([134.134.136.24]:53782) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dVH3Y-0003Sz-QT for qemu-devel@nongnu.org; Wed, 12 Jul 2017 08:47:53 -0400 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Jul 2017 05:47:52 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga005.jf.intel.com with ESMTP; 12 Jul 2017 05:47:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,349,1496127600"; d="scan'208";a="124248137" From: Wei Wang To: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, david@redhat.com, cornelia.huck@de.ibm.com, akpm@linux-foundation.org, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com Date: Wed, 12 Jul 2017 20:40:21 +0800 Message-Id: <1499863221-16206-9-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> References: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.24 Subject: [Qemu-devel] [PATCH v12 8/8] virtio-balloon: VIRTIO_BALLOON_F_CMD_VQ X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: yang.zhang.wz@gmail.com, virtio-dev@lists.oasis-open.org, quan.xu@aliyun.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add a new vq, cmdq, to handle requests between the device and driver. This patch implements two commands sent from the device and handled in the driver. 1) VIRTIO_BALLOON_CMDQ_REPORT_STATS: this command is used to report the guest memory statistics to the host. The stats_vq mechanism is not used when the cmdq mechanism is enabled. 2) VIRTIO_BALLOON_CMDQ_REPORT_UNUSED_PAGES: this command is used to report the guest unused pages to the host. Since now we have a vq to handle multiple commands, we need to keep only one vq operation at a time. Here, we change the existing START_USE() and END_USE() to lock on each vq operation. Signed-off-by: Wei Wang Signed-off-by: Liang Li --- drivers/virtio/virtio_balloon.c | 245 ++++++++++++++++++++++++++++++++= ++-- drivers/virtio/virtio_ring.c | 25 +++- include/linux/virtio.h | 2 + include/uapi/linux/virtio_balloon.h | 10 ++ 4 files changed, 265 insertions(+), 17 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloo= n.c index aa4e7ec..ae91fbf 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -54,11 +54,12 @@ static struct vfsmount *balloon_mnt; =20 struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *cmd_vq; =20 /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; struct work_struct update_balloon_size_work; + struct work_struct cmdq_handle_work; =20 /* Prevent updating balloon when it is being canceled. */ spinlock_t stop_update_lock; @@ -90,6 +91,12 @@ struct virtio_balloon { /* Memory statistics */ struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; =20 + /* Cmdq msg buffer for memory statistics */ + struct virtio_balloon_cmdq_hdr cmdq_stats_hdr; + + /* Cmdq msg buffer for reporting ununsed pages */ + struct virtio_balloon_cmdq_hdr cmdq_unused_page_hdr; + /* To register callback in oom notifier call chain */ struct notifier_block nb; }; @@ -485,25 +492,214 @@ static void update_balloon_size_func(struct work_str= uct *work) queue_work(system_freezable_wq, work); } =20 +static unsigned int cmdq_hdr_add(struct virtqueue *vq, + struct virtio_balloon_cmdq_hdr *hdr, + bool in) +{ + unsigned int id =3D VIRTQUEUE_DESC_ID_INIT; + uint64_t hdr_pa =3D (uint64_t)virt_to_phys((void *)hdr); + + virtqueue_add_chain_desc(vq, hdr_pa, sizeof(*hdr), &id, &id, in); + + /* Deliver the hdr for the host to send commands. */ + if (in) { + hdr->flags =3D 0; + virtqueue_add_chain(vq, id, 0, NULL, hdr, NULL); + virtqueue_kick(vq); + } + + return id; +} + +static void cmdq_add_chain_desc(struct virtio_balloon *vb, + struct virtio_balloon_cmdq_hdr *hdr, + uint64_t addr, + uint32_t len, + unsigned int *head_id, + unsigned int *prev_id) +{ +retry: + if (*head_id =3D=3D VIRTQUEUE_DESC_ID_INIT) { + *head_id =3D cmdq_hdr_add(vb->cmd_vq, hdr, 0); + *prev_id =3D *head_id; + } + + virtqueue_add_chain_desc(vb->cmd_vq, addr, len, head_id, prev_id, 0); + if (*head_id =3D=3D *prev_id) { + /* + * The VQ was full and kicked to release some descs. Now we + * will re-start to build the chain by using the hdr as the + * first desc, so we need to detach the desc that was just + * added, and re-start to add the hdr. + */ + virtqueue_detach_buf(vb->cmd_vq, *head_id, NULL); + *head_id =3D VIRTQUEUE_DESC_ID_INIT; + *prev_id =3D VIRTQUEUE_DESC_ID_INIT; + goto retry; + } +} + +static void cmdq_handle_stats(struct virtio_balloon *vb) +{ + unsigned int num_stats, + head_id =3D VIRTQUEUE_DESC_ID_INIT, + prev_id =3D VIRTQUEUE_DESC_ID_INIT; + uint64_t addr =3D (uint64_t)virt_to_phys((void *)vb->stats); + uint32_t len; + + spin_lock(&vb->stop_update_lock); + if (!vb->stop_update) { + num_stats =3D update_balloon_stats(vb); + len =3D sizeof(struct virtio_balloon_stat) * num_stats; + cmdq_add_chain_desc(vb, &vb->cmdq_stats_hdr, addr, len, + &head_id, &prev_id); + virtqueue_add_chain(vb->cmd_vq, head_id, 0, NULL, vb, NULL); + virtqueue_kick_sync(vb->cmd_vq); + } + spin_unlock(&vb->stop_update_lock); +} + +static void cmdq_add_unused_page(struct virtio_balloon *vb, + struct zone *zone, + unsigned int order, + unsigned int type, + struct page *page, + unsigned int *head_id, + unsigned int *prev_id) +{ + uint64_t addr; + uint32_t len; + + while (!report_unused_page_block(zone, order, type, &page)) { + addr =3D (u64)page_to_pfn(page) << VIRTIO_BALLOON_PFN_SHIFT; + len =3D (u64)(1 << order) << VIRTIO_BALLOON_PFN_SHIFT; + cmdq_add_chain_desc(vb, &vb->cmdq_unused_page_hdr, addr, len, + head_id, prev_id); + } +} + +static void cmdq_handle_unused_pages(struct virtio_balloon *vb) +{ + struct virtqueue *vq =3D vb->cmd_vq; + unsigned int order =3D 0, type =3D 0, + head_id =3D VIRTQUEUE_DESC_ID_INIT, + prev_id =3D VIRTQUEUE_DESC_ID_INIT; + struct zone *zone =3D NULL; + struct page *page =3D NULL; + + for_each_populated_zone(zone) + for_each_migratetype_order(order, type) + cmdq_add_unused_page(vb, zone, order, type, page, + &head_id, &prev_id); + + /* Set the cmd completion flag. */ + vb->cmdq_unused_page_hdr.flags |=3D + cpu_to_le32(VIRTIO_BALLOON_CMDQ_F_COMPLETION); + virtqueue_add_chain(vq, head_id, 0, NULL, vb, NULL); + virtqueue_kick_sync(vb->cmd_vq); +} + +static void cmdq_handle(struct virtio_balloon *vb) +{ + struct virtio_balloon_cmdq_hdr *hdr; + unsigned int len; + + while ((hdr =3D (struct virtio_balloon_cmdq_hdr *) + virtqueue_get_buf(vb->cmd_vq, &len)) !=3D NULL) { + switch (__le32_to_cpu(hdr->cmd)) { + case VIRTIO_BALLOON_CMDQ_REPORT_STATS: + cmdq_handle_stats(vb); + break; + case VIRTIO_BALLOON_CMDQ_REPORT_UNUSED_PAGES: + cmdq_handle_unused_pages(vb); + break; + default: + dev_warn(&vb->vdev->dev, "%s: wrong cmd\n", __func__); + return; + } + /* + * Replenish all the command buffer to the device after a + * command is handled. This is for the convenience of the + * device to rewind the cmdq to get back all the command + * buffer after live migration. + */ + cmdq_hdr_add(vb->cmd_vq, &vb->cmdq_stats_hdr, 1); + cmdq_hdr_add(vb->cmd_vq, &vb->cmdq_unused_page_hdr, 1); + } +} + +static void cmdq_handle_work_func(struct work_struct *work) +{ + struct virtio_balloon *vb; + + vb =3D container_of(work, struct virtio_balloon, + cmdq_handle_work); + cmdq_handle(vb); +} + +static void cmdq_callback(struct virtqueue *vq) +{ + struct virtio_balloon *vb =3D vq->vdev->priv; + + queue_work(system_freezable_wq, &vb->cmdq_handle_work); +} + static int init_vqs(struct virtio_balloon *vb) { - struct virtqueue *vqs[3]; - vq_callback_t *callbacks[] =3D { balloon_ack, balloon_ack, stats_request = }; - static const char * const names[] =3D { "inflate", "deflate", "stats" }; - int err, nvqs; + struct virtqueue **vqs; + vq_callback_t **callbacks; + const char **names; + int err =3D -ENOMEM; + int nvqs; + + /* Inflateq and deflateq are used unconditionally */ + nvqs =3D 2; + + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CMD_VQ) || + virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) + nvqs++; + + /* Allocate space for find_vqs parameters */ + vqs =3D kcalloc(nvqs, sizeof(*vqs), GFP_KERNEL); + if (!vqs) + goto err_vq; + callbacks =3D kmalloc_array(nvqs, sizeof(*callbacks), GFP_KERNEL); + if (!callbacks) + goto err_callback; + names =3D kmalloc_array(nvqs, sizeof(*names), GFP_KERNEL); + if (!names) + goto err_names; + + callbacks[0] =3D balloon_ack; + names[0] =3D "inflate"; + callbacks[1] =3D balloon_ack; + names[1] =3D "deflate"; =20 /* - * We expect two virtqueues: inflate and deflate, and - * optionally stat. + * The stats_vq is used only when cmdq is not supported (or disabled) + * by the device. */ - nvqs =3D virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2; - err =3D virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL); - if (err) - return err; + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CMD_VQ)) { + callbacks[2] =3D cmdq_callback; + names[2] =3D "cmdq"; + } else if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { + callbacks[2] =3D stats_request; + names[2] =3D "stats"; + } =20 + err =3D vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, + names, NULL, NULL); + if (err) + goto err_find; vb->inflate_vq =3D vqs[0]; vb->deflate_vq =3D vqs[1]; - if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { + + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CMD_VQ)) { + vb->cmd_vq =3D vqs[2]; + /* Prime the cmdq with the header buffer. */ + cmdq_hdr_add(vb->cmd_vq, &vb->cmdq_stats_hdr, 1); + cmdq_hdr_add(vb->cmd_vq, &vb->cmdq_unused_page_hdr, 1); + } else if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { struct scatterlist sg; unsigned int num_stats; vb->stats_vq =3D vqs[2]; @@ -520,6 +716,16 @@ static int init_vqs(struct virtio_balloon *vb) BUG(); virtqueue_kick(vb->stats_vq); } + +err_find: + kfree(names); +err_names: + kfree(callbacks); +err_callback: + kfree(vqs); +err_vq: + return err; + return 0; } =20 @@ -640,7 +846,18 @@ static int virtballoon_probe(struct virtio_device *vde= v) goto out; } =20 - INIT_WORK(&vb->update_balloon_stats_work, update_balloon_stats_func); + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_CMD_VQ)) { + vb->cmdq_stats_hdr.cmd =3D + cpu_to_le32(VIRTIO_BALLOON_CMDQ_REPORT_STATS); + vb->cmdq_stats_hdr.flags =3D 0; + vb->cmdq_unused_page_hdr.cmd =3D + cpu_to_le32(VIRTIO_BALLOON_CMDQ_REPORT_UNUSED_PAGES); + vb->cmdq_unused_page_hdr.flags =3D 0; + INIT_WORK(&vb->cmdq_handle_work, cmdq_handle_work_func); + } else if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { + INIT_WORK(&vb->update_balloon_stats_work, + update_balloon_stats_func); + } INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func); spin_lock_init(&vb->stop_update_lock); vb->stop_update =3D false; @@ -722,6 +939,7 @@ static void virtballoon_remove(struct virtio_device *vd= ev) spin_unlock_irq(&vb->stop_update_lock); cancel_work_sync(&vb->update_balloon_size_work); cancel_work_sync(&vb->update_balloon_stats_work); + cancel_work_sync(&vb->cmdq_handle_work); =20 xb_empty(&vb->page_xb); remove_common(vb); @@ -776,6 +994,7 @@ static unsigned int features[] =3D { VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, VIRTIO_BALLOON_F_SG, + VIRTIO_BALLOON_F_CMD_VQ, }; =20 static struct virtio_driver virtio_balloon_driver =3D { diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index b9d7e10..793de12 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -52,8 +52,13 @@ "%s:"fmt, (_vq)->vq.name, ##args); \ (_vq)->broken =3D true; \ } while (0) -#define START_USE(vq) -#define END_USE(vq) +#define START_USE(_vq) \ + do { \ + while ((_vq)->in_use) \ + cpu_relax(); \ + (_vq)->in_use =3D __LINE__; \ + } while (0) +#define END_USE(_vq) ((_vq)->in_use =3D 0) #endif =20 struct vring_desc_state { @@ -101,9 +106,9 @@ struct vring_virtqueue { size_t queue_size_in_bytes; dma_addr_t queue_dma_addr; =20 -#ifdef DEBUG /* They're supposed to lock for us. */ unsigned int in_use; +#ifdef DEBUG =20 /* Figure out if their kicks are too delayed. */ bool last_add_time_valid; @@ -845,6 +850,18 @@ static void detach_buf(struct vring_virtqueue *vq, uns= igned int head, } } =20 +void virtqueue_detach_buf(struct virtqueue *_vq, unsigned int head, void *= *ctx) +{ + struct vring_virtqueue *vq =3D to_vvq(_vq); + + START_USE(vq); + + detach_buf(vq, head, ctx); + + END_USE(vq); +} +EXPORT_SYMBOL_GPL(virtqueue_detach_buf); + static inline bool more_used(const struct vring_virtqueue *vq) { return vq->last_used_idx !=3D virtio16_to_cpu(vq->vq.vdev, vq->vring.used= ->idx); @@ -1158,8 +1175,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int = index, vq->avail_idx_shadow =3D 0; vq->num_added =3D 0; list_add_tail(&vq->vq.list, &vdev->vqs); + vq->in_use =3D 0; #ifdef DEBUG - vq->in_use =3D false; vq->last_add_time_valid =3D false; #endif =20 diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 9f27101..9df480b 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -88,6 +88,8 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned in= t *len); void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len, void **ctx); =20 +void virtqueue_detach_buf(struct virtqueue *_vq, unsigned int head, void *= *ctx); + void virtqueue_disable_cb(struct virtqueue *vq); =20 bool virtqueue_enable_cb(struct virtqueue *vq); diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virti= o_balloon.h index 37780a7..b38c370 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -35,6 +35,7 @@ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_SG 3 /* Use sg instead of PFN lists */ +#define VIRTIO_BALLOON_F_CMD_VQ 4 /* Command virtqueue */ =20 /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -83,4 +84,13 @@ struct virtio_balloon_stat { __virtio64 val; } __attribute__((packed)); =20 +struct virtio_balloon_cmdq_hdr { +#define VIRTIO_BALLOON_CMDQ_REPORT_STATS 0 +#define VIRTIO_BALLOON_CMDQ_REPORT_UNUSED_PAGES 1 + __le32 cmd; +/* Flag to indicate the completion of handling a command */ +#define VIRTIO_BALLOON_CMDQ_F_COMPLETION 1 + __le32 flags; +}; + #endif /* _LINUX_VIRTIO_BALLOON_H */ --=20 2.7.4