From nobody Sun May 5 19:05:46 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1488520007634391.27906376439785; Thu, 2 Mar 2017 21:46:47 -0800 (PST) Received: from localhost ([::1]:56273 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjg3C-0002gM-Cr for importer@patchew.org; Fri, 03 Mar 2017 00:46:46 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39159) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjg0q-0001Rz-C0 for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:21 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cjg0n-0006R5-Cc for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:20 -0500 Received: from mga01.intel.com ([192.55.52.88]:50436) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cjg0m-0006Q0-W6 for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:17 -0500 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Mar 2017 21:44:16 -0800 Received: from devel-ww.sh.intel.com ([10.239.48.105]) by orsmga005.jf.intel.com with ESMTP; 02 Mar 2017 21:44:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.35,235,1484035200"; d="scan'208";a="71114459" From: Wei Wang To: virtio-dev@lists.oasis-open.org, kvm@vger.kernel.org, qemu-devel@nongnu.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org Date: Fri, 3 Mar 2017 13:40:26 +0800 Message-Id: <1488519630-89058-2-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488519630-89058-1-git-send-email-wei.w.wang@intel.com> References: <1488519630-89058-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH v7 kernel 1/5] virtio-balloon: rework deflate to add page to a list X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , "Michael S . Tsirkin" , Amit Shah , Liang Li , David Hildenbrand , Wei Wang , Dave Hansen , Liang Li , Cornelia Huck , Paolo Bonzini Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Liang Li When doing the inflating/deflating operation, the current virtio-balloon implementation uses an array to save 256 PFNS, then send these PFNS to host through virtio and process each PFN one by one. This way is not efficient when inflating/deflating a large mount of memory because too many times of the following operations: 1. Virtio data transmission 2. Page allocate/free 3. Address translation(GPA->HVA) 4. madvise The over head of these operations will consume a lot of CPU cycles and will take a long time to complete, it may impact the QoS of the guest as well as the host. The overhead will be reduced a lot if batch processing is used. E.g. If there are several pages whose address are physical contiguous in the guest, these bulk pages can be processed in one operation. The main idea for the optimization is to reduce the above operations as much as possible. And it can be achieved by using a {pfn|length} array instead of a PFN array. Comparing with PFN array, {pfn|length} array can present more pages and is fit for batch processing. This patch saves the deflated pages to a list instead of the PFN array, which will allow faster notifications using the {pfn|length} down the road. balloon_pfn_to_page() can be removed because it's useless. Signed-off-by: Liang Li Signed-off-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Liang Li Cc: Wei Wang --- drivers/virtio/virtio_balloon.c | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloo= n.c index 181793f..f59cb4f 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -103,12 +103,6 @@ static u32 page_to_balloon_pfn(struct page *page) return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE; } =20 -static struct page *balloon_pfn_to_page(u32 pfn) -{ - BUG_ON(pfn % VIRTIO_BALLOON_PAGES_PER_PAGE); - return pfn_to_page(pfn / VIRTIO_BALLOON_PAGES_PER_PAGE); -} - static void balloon_ack(struct virtqueue *vq) { struct virtio_balloon *vb =3D vq->vdev->priv; @@ -181,18 +175,16 @@ static unsigned fill_balloon(struct virtio_balloon *v= b, size_t num) return num_allocated_pages; } =20 -static void release_pages_balloon(struct virtio_balloon *vb) +static void release_pages_balloon(struct virtio_balloon *vb, + struct list_head *pages) { - unsigned int i; - struct page *page; + struct page *page, *next; =20 - /* Find pfns pointing at start of each page, get pages and free them. */ - for (i =3D 0; i < vb->num_pfns; i +=3D VIRTIO_BALLOON_PAGES_PER_PAGE) { - page =3D balloon_pfn_to_page(virtio32_to_cpu(vb->vdev, - vb->pfns[i])); + list_for_each_entry_safe(page, next, pages, lru) { if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) adjust_managed_page_count(page, 1); + list_del(&page->lru); put_page(page); /* balloon reference */ } } @@ -202,6 +194,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb,= size_t num) unsigned num_freed_pages; struct page *page; struct balloon_dev_info *vb_dev_info =3D &vb->vb_dev_info; + LIST_HEAD(pages); =20 /* We can only do one array worth at a time. */ num =3D min(num, ARRAY_SIZE(vb->pfns)); @@ -215,6 +208,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb,= size_t num) if (!page) break; set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + list_add(&page->lru, &pages); vb->num_pages -=3D VIRTIO_BALLOON_PAGES_PER_PAGE; } =20 @@ -226,7 +220,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb,= size_t num) */ if (vb->num_pfns !=3D 0) tell_host(vb, vb->deflate_vq); - release_pages_balloon(vb); + release_pages_balloon(vb, &pages); mutex_unlock(&vb->balloon_lock); return num_freed_pages; } --=20 2.7.4 From nobody Sun May 5 19:05:46 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1488519914046864.5441726947578; Thu, 2 Mar 2017 21:45:14 -0800 (PST) Received: from localhost ([::1]:56264 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjg1g-0001Si-Ah for importer@patchew.org; Fri, 03 Mar 2017 00:45:12 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39178) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjg0r-0001S1-D5 for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cjg0q-0006Rq-HS for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:21 -0500 Received: from mga01.intel.com ([192.55.52.88]:50436) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cjg0q-0006Q0-8R for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:20 -0500 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Mar 2017 21:44:19 -0800 Received: from devel-ww.sh.intel.com ([10.239.48.105]) by orsmga005.jf.intel.com with ESMTP; 02 Mar 2017 21:44:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.35,235,1484035200"; d="scan'208";a="71114483" From: Wei Wang To: virtio-dev@lists.oasis-open.org, kvm@vger.kernel.org, qemu-devel@nongnu.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org Date: Fri, 3 Mar 2017 13:40:27 +0800 Message-Id: <1488519630-89058-3-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488519630-89058-1-git-send-email-wei.w.wang@intel.com> References: <1488519630-89058-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH v7 kernel 2/5] virtio-balloon: VIRTIO_BALLOON_F_CHUNK_TRANSFER X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , "Michael S . Tsirkin" , Liang Li , Dave Hansen , Liang Li , Amit Shah , Wei Wang , Cornelia Huck , Paolo Bonzini , David Hildenbrand Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Liang Li Add a new feature bit, VIRTIO_BALLOON_F_CHUNK_TRANSFER. Please check the implementation patch commit for details about this feature. Signed-off-by: Liang Li Signed-off-by: Wei Wang Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Liang Li Cc: Wei Wang --- include/uapi/linux/virtio_balloon.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virti= o_balloon.h index 343d7dd..ed627b2 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -34,10 +34,14 @@ #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages = */ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ +#define VIRTIO_BALLOON_F_CHUNK_TRANSFER 3 /* Transfer pages in chunks */ =20 /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 =20 +/* Shift to get a chunk size */ +#define VIRTIO_BALLOON_CHUNK_SIZE_SHIFT 12 + struct virtio_balloon_config { /* Number of pages host wants Guest to give up. */ __u32 num_pages; @@ -82,4 +86,12 @@ struct virtio_balloon_stat { __virtio64 val; } __attribute__((packed)); =20 +/* Response header structure */ +struct virtio_balloon_resp_hdr { + u8 cmd; + u8 flag; + __le16 id; /* cmd id */ + __le32 data_len; /* Payload len in bytes */ +}; + #endif /* _LINUX_VIRTIO_BALLOON_H */ --=20 2.7.4 From nobody Sun May 5 19:05:46 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1488520009038222.08363831829558; Thu, 2 Mar 2017 21:46:49 -0800 (PST) Received: from localhost ([::1]:56274 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjg3D-0002gv-P8 for importer@patchew.org; Fri, 03 Mar 2017 00:46:47 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39195) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjg0y-0001Vf-3x for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cjg0v-0006Sl-Kp for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:28 -0500 Received: from mga01.intel.com ([192.55.52.88]:50436) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cjg0t-0006Q0-L1 for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:25 -0500 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Mar 2017 21:44:23 -0800 Received: from devel-ww.sh.intel.com ([10.239.48.105]) by orsmga005.jf.intel.com with ESMTP; 02 Mar 2017 21:44:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.35,235,1484035200"; d="scan'208";a="71114498" From: Wei Wang To: virtio-dev@lists.oasis-open.org, kvm@vger.kernel.org, qemu-devel@nongnu.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org Date: Fri, 3 Mar 2017 13:40:28 +0800 Message-Id: <1488519630-89058-4-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488519630-89058-1-git-send-email-wei.w.wang@intel.com> References: <1488519630-89058-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , "Michael S . Tsirkin" , Liang Li , Dave Hansen , Liang Li , Amit Shah , Wei Wang , Cornelia Huck , Paolo Bonzini , David Hildenbrand Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Liang Li The implementation of the current virtio-balloon is not very efficient, because the pages are transferred to the host one by one. Here is the breakdown of the time in percentage spent on each step of the balloon inflating process (inflating 7GB of an 8GB idle guest). 1) allocating pages (6.5%) 2) sending PFNs to host (68.3%) 3) address translation (6.1%) 4) madvise (19%) It takes about 4126ms for the inflating process to complete. The above profiling shows that the bottlenecks are stage 2) and stage 4). This patch optimizes step 2) by transfering pages to the host in chunks. A chunk consists of guest physically continuous pages, and it is offered to the host via a base PFN (i.e. the start PFN of those physically continuous pages) and the size (i.e. the total number of the pages). A normal chunk is formated as below: Suggested-by: Michael S. Tsirkin ----------------------------------------------- | Base (52 bit) | Size (12 bit)| ----------------------------------------------- For large size chunks, an extended chunk format is used: ----------------------------------------------- | Base (64 bit) | ----------------------------------------------- ----------------------------------------------- | Size (64 bit) | ----------------------------------------------- By doing so, step 4) can also be optimized by doing address translation and madvise() in chunks rather than page by page. This optimization requires the negotation of a new feature bit, VIRTIO_BALLOON_F_CHUNK_TRANSFER. With this new feature, the above ballooning process takes ~590ms resulting in an improvement of ~85%. TODO: optimize stage 1) by allocating/freeing a chunk of pages instead of a single page each time. Signed-off-by: Liang Li Signed-off-by: Wei Wang Suggested-by: Michael S. Tsirkin Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Liang Li Cc: Wei Wang --- drivers/virtio/virtio_balloon.c | 351 ++++++++++++++++++++++++++++++++++++= ---- 1 file changed, 323 insertions(+), 28 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloo= n.c index f59cb4f..4416370 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -42,6 +42,10 @@ #define OOM_VBALLOON_DEFAULT_PAGES 256 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 =20 +#define PAGE_BMAP_SIZE (8 * PAGE_SIZE) +#define PFNS_PER_PAGE_BMAP (PAGE_BMAP_SIZE * BITS_PER_BYTE) +#define PAGE_BMAP_COUNT_MAX 32 + static int oom_pages =3D OOM_VBALLOON_DEFAULT_PAGES; module_param(oom_pages, int, S_IRUSR | S_IWUSR); MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); @@ -50,6 +54,16 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); static struct vfsmount *balloon_mnt; #endif =20 +struct balloon_page_chunk { + __le64 base : 52; + __le64 size : 12; +}; + +struct balloon_page_chunk_ext { + __le64 base; + __le64 size; +}; + struct virtio_balloon { struct virtio_device *vdev; struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; @@ -67,6 +81,20 @@ struct virtio_balloon { =20 /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; + /* Pointer to the response header. */ + struct virtio_balloon_resp_hdr resp_hdr; + /* Pointer to the start address of response data. */ + __le64 *resp_data; + /* Size of response data buffer. */ + unsigned int resp_buf_size; + /* Pointer offset of the response data. */ + unsigned int resp_pos; + /* Bitmap used to save the pfns info */ + unsigned long *page_bitmap[PAGE_BMAP_COUNT_MAX]; + /* Number of split page bitmaps */ + unsigned int nr_page_bmap; + /* Used to record the processed pfn range */ + unsigned long min_pfn, max_pfn, start_pfn, end_pfn; /* * The pages we've told the Host we're not using are enqueued * at vb_dev_info->pages list. @@ -110,20 +138,180 @@ static void balloon_ack(struct virtqueue *vq) wake_up(&vb->acked); } =20 -static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq) +static inline void init_bmap_pfn_range(struct virtio_balloon *vb) { - struct scatterlist sg; + vb->min_pfn =3D ULONG_MAX; + vb->max_pfn =3D 0; +} + +static inline void update_bmap_pfn_range(struct virtio_balloon *vb, + struct page *page) +{ + unsigned long balloon_pfn =3D page_to_balloon_pfn(page); + + vb->min_pfn =3D min(balloon_pfn, vb->min_pfn); + vb->max_pfn =3D max(balloon_pfn, vb->max_pfn); +} + +static void extend_page_bitmap(struct virtio_balloon *vb, + unsigned long nr_pfn) +{ + int i, bmap_count; + unsigned long bmap_len; + + bmap_len =3D ALIGN(nr_pfn, BITS_PER_LONG) / BITS_PER_BYTE; + bmap_len =3D ALIGN(bmap_len, PAGE_BMAP_SIZE); + bmap_count =3D min((int)(bmap_len / PAGE_BMAP_SIZE), + PAGE_BMAP_COUNT_MAX); + + for (i =3D 1; i < bmap_count; i++) { + vb->page_bitmap[i] =3D kmalloc(PAGE_BMAP_SIZE, GFP_KERNEL); + if (vb->page_bitmap[i]) + vb->nr_page_bmap++; + else + break; + } +} + +static void free_extended_page_bitmap(struct virtio_balloon *vb) +{ + int i, bmap_count =3D vb->nr_page_bmap; + + for (i =3D 1; i < bmap_count; i++) { + kfree(vb->page_bitmap[i]); + vb->page_bitmap[i] =3D NULL; + vb->nr_page_bmap--; + } +} + +static void kfree_page_bitmap(struct virtio_balloon *vb) +{ + int i; + + for (i =3D 0; i < vb->nr_page_bmap; i++) + kfree(vb->page_bitmap[i]); +} + +static void clear_page_bitmap(struct virtio_balloon *vb) +{ + int i; + + for (i =3D 0; i < vb->nr_page_bmap; i++) + memset(vb->page_bitmap[i], 0, PAGE_BMAP_SIZE); +} + +static void send_resp_data(struct virtio_balloon *vb, struct virtqueue *vq, + bool busy_wait) +{ + struct scatterlist sg[2]; + struct virtio_balloon_resp_hdr *hdr =3D &vb->resp_hdr; unsigned int len; =20 - sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns); + len =3D hdr->data_len =3D vb->resp_pos * sizeof(__le64); + sg_init_table(sg, 2); + sg_set_buf(&sg[0], hdr, sizeof(struct virtio_balloon_resp_hdr)); + sg_set_buf(&sg[1], vb->resp_data, len); + + if (virtqueue_add_outbuf(vq, sg, 2, vb, GFP_KERNEL) =3D=3D 0) { + virtqueue_kick(vq); + if (busy_wait) + while (!virtqueue_get_buf(vq, &len) + && !virtqueue_is_broken(vq)) + cpu_relax(); + else + wait_event(vb->acked, virtqueue_get_buf(vq, &len)); + vb->resp_pos =3D 0; + free_extended_page_bitmap(vb); + } +} =20 - /* We should always be able to add one buffer to an empty queue. */ - virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL); - virtqueue_kick(vq); +static void do_set_resp_bitmap(struct virtio_balloon *vb, unsigned long ba= se, + int size) +{ + /* Use the extented chunk format if the size is too large */ + if (size > (1 << VIRTIO_BALLOON_CHUNK_SIZE_SHIFT)) { + struct balloon_page_chunk_ext *chunk_ext =3D + (struct balloon_page_chunk_ext *)(vb->resp_data + vb->resp_pos); + chunk_ext->base =3D cpu_to_le64(base); + chunk_ext->size =3D cpu_to_le64(size); + vb->resp_pos +=3D sizeof(vb->resp_pos) / sizeof(*chunk_ext); + } else { + struct balloon_page_chunk *chunk =3D + (struct balloon_page_chunk *)(vb->resp_data + vb->resp_pos); + chunk->base =3D cpu_to_le64(base); + chunk->size =3D cpu_to_le16(size); + vb->resp_pos +=3D sizeof(vb->resp_pos) / sizeof(*chunk); + } +} =20 - /* When host has read buffer, this completes via balloon_ack */ - wait_event(vb->acked, virtqueue_get_buf(vq, &len)); +static void chunking_pages_from_bmap(struct virtio_balloon *vb, + struct virtqueue *vq, + unsigned long start_pfn, unsigned long *bitmap, + unsigned long len, bool busy_wait) +{ + unsigned long pos =3D 0, end =3D len * BITS_PER_BYTE; + + while (pos < end) { + unsigned long one =3D find_next_bit(bitmap, end, pos); + + if (one < end) { + unsigned long chunk_size, zero; + + zero =3D find_next_zero_bit(bitmap, end, one + 1); + if (zero >=3D end) + chunk_size =3D end - one; + else + chunk_size =3D zero - one; + if (chunk_size) { + if ((vb->resp_pos + 2) * sizeof(__le64) > + vb->resp_buf_size) + send_resp_data(vb, vq, busy_wait); + do_set_resp_bitmap(vb, start_pfn + one, chunk_size); + } + pos =3D one + chunk_size; + } else + break; + } +} =20 +static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq) +{ + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CHUNK_TRANSFER)) { + int nr_pfn, nr_used_bmap, i; + unsigned long start_pfn, bmap_len; + + start_pfn =3D vb->start_pfn; + nr_pfn =3D vb->end_pfn - start_pfn + 1; + nr_pfn =3D roundup(nr_pfn, BITS_PER_LONG); + nr_used_bmap =3D nr_pfn / PFNS_PER_PAGE_BMAP; + if (nr_pfn % PFNS_PER_PAGE_BMAP) + nr_used_bmap++; + bmap_len =3D nr_pfn / BITS_PER_BYTE; + + for (i =3D 0; i < nr_used_bmap; i++) { + unsigned int bmap_size =3D PAGE_BMAP_SIZE; + + if (i + 1 =3D=3D nr_used_bmap) + bmap_size =3D bmap_len - PAGE_BMAP_SIZE * i; + chunking_pages_from_bmap(vb, vq, start_pfn + i * PFNS_PER_PAGE_BMAP, + vb->page_bitmap[i], bmap_size, false); + } + if (vb->resp_pos > 0) + send_resp_data(vb, vq, false); + } else { + struct scatterlist sg; + unsigned int len; + + sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns); + + /* We should always be able to add one buffer to an + * empty queue + */ + virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL); + virtqueue_kick(vq); + /* When host has read buffer, this completes via balloon_ack */ + wait_event(vb->acked, virtqueue_get_buf(vq, &len)); + } } =20 static void set_page_pfns(struct virtio_balloon *vb, @@ -138,13 +326,58 @@ static void set_page_pfns(struct virtio_balloon *vb, page_to_balloon_pfn(page) + i); } =20 -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) +static void set_page_bitmap(struct virtio_balloon *vb, + struct list_head *pages, struct virtqueue *vq) +{ + unsigned long pfn, pfn_limit; + struct page *page; + bool found; + + vb->min_pfn =3D rounddown(vb->min_pfn, BITS_PER_LONG); + vb->max_pfn =3D roundup(vb->max_pfn, BITS_PER_LONG); + pfn_limit =3D PFNS_PER_PAGE_BMAP * vb->nr_page_bmap; + + if (vb->nr_page_bmap =3D=3D 1) + extend_page_bitmap(vb, vb->max_pfn - vb->min_pfn + 1); + for (pfn =3D vb->min_pfn; pfn < vb->max_pfn; pfn +=3D pfn_limit) { + unsigned long end_pfn; + + clear_page_bitmap(vb); + vb->start_pfn =3D pfn; + end_pfn =3D pfn; + found =3D false; + list_for_each_entry(page, pages, lru) { + unsigned long bmap_idx, bmap_pos, balloon_pfn; + + balloon_pfn =3D page_to_balloon_pfn(page); + if (balloon_pfn < pfn || balloon_pfn >=3D pfn + pfn_limit) + continue; + bmap_idx =3D (balloon_pfn - pfn) / PFNS_PER_PAGE_BMAP; + bmap_pos =3D (balloon_pfn - pfn) % PFNS_PER_PAGE_BMAP; + set_bit(bmap_pos, vb->page_bitmap[bmap_idx]); + if (balloon_pfn > end_pfn) + end_pfn =3D balloon_pfn; + found =3D true; + } + if (found) { + vb->end_pfn =3D end_pfn; + tell_host(vb, vq); + } + } +} + +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num) { struct balloon_dev_info *vb_dev_info =3D &vb->vb_dev_info; - unsigned num_allocated_pages; + unsigned int num_allocated_pages; + bool chunking =3D virtio_has_feature(vb->vdev, + VIRTIO_BALLOON_F_CHUNK_TRANSFER); =20 - /* We can only do one array worth at a time. */ - num =3D min(num, ARRAY_SIZE(vb->pfns)); + if (chunking) + init_bmap_pfn_range(vb); + else + /* We can only do one array worth at a time. */ + num =3D min(num, ARRAY_SIZE(vb->pfns)); =20 mutex_lock(&vb->balloon_lock); for (vb->num_pfns =3D 0; vb->num_pfns < num; @@ -159,7 +392,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb= , size_t num) msleep(200); break; } - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + if (chunking) + update_bmap_pfn_range(vb, page); + else + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); vb->num_pages +=3D VIRTIO_BALLOON_PAGES_PER_PAGE; if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) @@ -168,8 +404,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb= , size_t num) =20 num_allocated_pages =3D vb->num_pfns; /* Did we get any? */ - if (vb->num_pfns !=3D 0) - tell_host(vb, vb->inflate_vq); + if (vb->num_pfns !=3D 0) { + if (chunking) + set_page_bitmap(vb, &vb_dev_info->pages, + vb->inflate_vq); + else + tell_host(vb, vb->inflate_vq); + } mutex_unlock(&vb->balloon_lock); =20 return num_allocated_pages; @@ -189,15 +430,20 @@ static void release_pages_balloon(struct virtio_ballo= on *vb, } } =20 -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) +static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num) { - unsigned num_freed_pages; + unsigned int num_freed_pages; struct page *page; struct balloon_dev_info *vb_dev_info =3D &vb->vb_dev_info; LIST_HEAD(pages); + bool chunking =3D virtio_has_feature(vb->vdev, + VIRTIO_BALLOON_F_CHUNK_TRANSFER); =20 - /* We can only do one array worth at a time. */ - num =3D min(num, ARRAY_SIZE(vb->pfns)); + if (chunking) + init_bmap_pfn_range(vb); + else + /* We can only do one array worth at a time. */ + num =3D min(num, ARRAY_SIZE(vb->pfns)); =20 mutex_lock(&vb->balloon_lock); /* We can't release more pages than taken */ @@ -207,7 +453,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb= , size_t num) page =3D balloon_page_dequeue(vb_dev_info); if (!page) break; - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + if (chunking) + update_bmap_pfn_range(vb, page); + else + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); list_add(&page->lru, &pages); vb->num_pages -=3D VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -218,8 +467,12 @@ static unsigned leak_balloon(struct virtio_balloon *vb= , size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - if (vb->num_pfns !=3D 0) - tell_host(vb, vb->deflate_vq); + if (vb->num_pfns !=3D 0) { + if (chunking) + set_page_bitmap(vb, &pages, vb->deflate_vq); + else + tell_host(vb, vb->deflate_vq); + } release_pages_balloon(vb, &pages); mutex_unlock(&vb->balloon_lock); return num_freed_pages; @@ -431,6 +684,18 @@ static int init_vqs(struct virtio_balloon *vb) } =20 #ifdef CONFIG_BALLOON_COMPACTION +static void tell_host_one_page(struct virtio_balloon *vb, + struct virtqueue *vq, struct page *page) +{ + __le64 *chunk; + + chunk =3D vb->resp_data + vb->resp_pos; + *chunk =3D cpu_to_le64((page_to_pfn(page) << + VIRTIO_BALLOON_CHUNK_SIZE_SHIFT) | 1); + vb->resp_pos++; + send_resp_data(vb, vq, false); +} + /* * virtballoon_migratepage - perform the balloon page migration on behalf = of * a compation thread. (called under page lock) @@ -455,6 +720,8 @@ static int virtballoon_migratepage(struct balloon_dev_i= nfo *vb_dev_info, struct virtio_balloon *vb =3D container_of(vb_dev_info, struct virtio_balloon, vb_dev_info); unsigned long flags; + bool chunking =3D virtio_has_feature(vb->vdev, + VIRTIO_BALLOON_F_CHUNK_TRANSFER); =20 /* * In order to avoid lock contention while migrating pages concurrently @@ -475,15 +742,23 @@ static int virtballoon_migratepage(struct balloon_dev= _info *vb_dev_info, vb_dev_info->isolated_pages--; __count_vm_event(BALLOON_MIGRATE); spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags); - vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; - set_page_pfns(vb, vb->pfns, newpage); - tell_host(vb, vb->inflate_vq); + if (chunking) + tell_host_one_page(vb, vb->inflate_vq, newpage); + else { + vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; + set_page_pfns(vb, vb->pfns, newpage); + tell_host(vb, vb->inflate_vq); + } =20 /* balloon's page migration 2nd step -- deflate "page" */ balloon_page_delete(page); - vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; - set_page_pfns(vb, vb->pfns, page); - tell_host(vb, vb->deflate_vq); + if (chunking) + tell_host_one_page(vb, vb->deflate_vq, page); + else { + vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; + set_page_pfns(vb, vb->pfns, page); + tell_host(vb, vb->deflate_vq); + } =20 mutex_unlock(&vb->balloon_lock); =20 @@ -533,6 +808,21 @@ static int virtballoon_probe(struct virtio_device *vde= v) spin_lock_init(&vb->stop_update_lock); vb->stop_update =3D false; vb->num_pages =3D 0; + + vb->page_bitmap[0] =3D kmalloc(PAGE_BMAP_SIZE, GFP_KERNEL); + if (!vb->page_bitmap[0]) { + __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_CHUNK_TRANSFER); + } else { + vb->nr_page_bmap =3D 1; + vb->resp_data =3D kmalloc(PAGE_BMAP_SIZE, GFP_KERNEL); + if (!vb->resp_data) { + __virtio_clear_bit(vdev, + VIRTIO_BALLOON_F_CHUNK_TRANSFER); + kfree(vb->page_bitmap[0]); + } + } + vb->resp_pos =3D 0; + vb->resp_buf_size =3D PAGE_BMAP_SIZE; mutex_init(&vb->balloon_lock); init_waitqueue_head(&vb->acked); vb->vdev =3D vdev; @@ -578,6 +868,8 @@ static int virtballoon_probe(struct virtio_device *vdev) out_del_vqs: vdev->config->del_vqs(vdev); out_free_vb: + kfree(vb->resp_data); + kfree_page_bitmap(vb); kfree(vb); out: return err; @@ -611,6 +903,8 @@ static void virtballoon_remove(struct virtio_device *vd= ev) remove_common(vb); if (vb->vb_dev_info.inode) iput(vb->vb_dev_info.inode); + kfree_page_bitmap(vb); + kfree(vb->resp_data); kfree(vb); } =20 @@ -649,6 +943,7 @@ static unsigned int features[] =3D { VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, + VIRTIO_BALLOON_F_CHUNK_TRANSFER, }; =20 static struct virtio_driver virtio_balloon_driver =3D { --=20 2.7.4 From nobody Sun May 5 19:05:46 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1488520085668703.308858824943; Thu, 2 Mar 2017 21:48:05 -0800 (PST) Received: from localhost ([::1]:56280 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjg4S-0003hC-EH for importer@patchew.org; Fri, 03 Mar 2017 00:48:04 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39206) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjg11-0001Xs-Jd for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:32 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cjg0x-0006Te-Iw for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:31 -0500 Received: from mga01.intel.com ([192.55.52.88]:50436) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cjg0x-0006Q0-AD for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:27 -0500 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Mar 2017 21:44:26 -0800 Received: from devel-ww.sh.intel.com ([10.239.48.105]) by orsmga005.jf.intel.com with ESMTP; 02 Mar 2017 21:44:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.35,235,1484035200"; d="scan'208";a="71114517" From: Wei Wang To: virtio-dev@lists.oasis-open.org, kvm@vger.kernel.org, qemu-devel@nongnu.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org Date: Fri, 3 Mar 2017 13:40:29 +0800 Message-Id: <1488519630-89058-5-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488519630-89058-1-git-send-email-wei.w.wang@intel.com> References: <1488519630-89058-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH v7 kernel 4/5] virtio-balloon: define flags and head for host request vq X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , "Michael S . Tsirkin" , Liang Li , Dave Hansen , Liang Li , Amit Shah , Wei Wang , Cornelia Huck , Paolo Bonzini , Andrew Morton , David Hildenbrand , Mel Gorman Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Liang Li Define the flags and head struct for a new host request virtual queue. Guest can get requests from host and then responds to them on this new virtual queue. Host can make use of this virtqueue to request the guest to do some operations, e.g. drop page cache, synchronize file system, etc. The hypervisor can get some of guest's runtime information through this virtual queue too, e.g. the guest's unused page information, which can be used for live migration optimization. Signed-off-by: Liang Li Signed-off-by: Wei Wang Cc: Andrew Morton Cc: Mel Gorman Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Liang Li Cc: Wei Wang --- include/uapi/linux/virtio_balloon.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virti= o_balloon.h index ed627b2..630b0ef 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -35,6 +35,7 @@ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_CHUNK_TRANSFER 3 /* Transfer pages in chunks */ +#define VIRTIO_BALLOON_F_HOST_REQ_VQ 4 /* Host request virtqueue */ =20 /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -94,4 +95,25 @@ struct virtio_balloon_resp_hdr { __le32 data_len; /* Payload len in bytes */ }; =20 +enum virtio_balloon_req_id { + /* Get unused page information */ + BALLOON_GET_UNUSED_PAGES, +}; + +enum virtio_balloon_flag { + /* Have more data for a request */ + BALLOON_FLAG_CONT, + /* No more data for a request */ + BALLOON_FLAG_DONE, +}; + +struct virtio_balloon_req_hdr { + /* Used to distinguish different requests */ + __le16 cmd; + /* Reserved */ + __le16 reserved[3]; + /* Request parameter */ + __le64 param; +}; + #endif /* _LINUX_VIRTIO_BALLOON_H */ --=20 2.7.4 From nobody Sun May 5 19:05:46 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 148852014606914.652441820843023; Thu, 2 Mar 2017 21:49:06 -0800 (PST) Received: from localhost ([::1]:56285 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjg5Q-0004Lm-TD for importer@patchew.org; Fri, 03 Mar 2017 00:49:04 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39218) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjg13-0001YA-7A for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cjg11-0006U6-C9 for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:33 -0500 Received: from mga01.intel.com ([192.55.52.88]:50436) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cjg10-0006Q0-U7 for qemu-devel@nongnu.org; Fri, 03 Mar 2017 00:44:31 -0500 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Mar 2017 21:44:30 -0800 Received: from devel-ww.sh.intel.com ([10.239.48.105]) by orsmga005.jf.intel.com with ESMTP; 02 Mar 2017 21:44:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.35,235,1484035200"; d="scan'208";a="71114530" From: Wei Wang To: virtio-dev@lists.oasis-open.org, kvm@vger.kernel.org, qemu-devel@nongnu.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org Date: Fri, 3 Mar 2017 13:40:30 +0800 Message-Id: <1488519630-89058-6-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488519630-89058-1-git-send-email-wei.w.wang@intel.com> References: <1488519630-89058-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH v7 kernel 5/5] This patch contains two parts: X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , "Michael S . Tsirkin" , Liang Li , Dave Hansen , Liang Li , Amit Shah , Wei Wang , Cornelia Huck , Paolo Bonzini , Andrew Morton , David Hildenbrand , Mel Gorman Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Liang Li One is to add a new API to mm go get the unused page information. The virtio balloon driver will use this new API added to get the unused page info and send it to hypervisor(QEMU) to speed up live migration. During sending the bitmap, some the pages may be modified and are used by the guest, this inaccuracy can be corrected by the dirty page logging mechanism. One is to add support the request for vm's unused page information, QEMU can make use of unused page information and the dirty page logging mechanism to skip the transportation of some of these unused pages, this is very helpful to reduce the network traffic and speed up the live migration process. Signed-off-by: Liang Li Signed-off-by: Wei Wang Cc: Andrew Morton Cc: Mel Gorman Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Liang Li Cc: Wei Wang --- drivers/virtio/virtio_balloon.c | 137 ++++++++++++++++++++++++++++++++++++= ++-- include/linux/mm.h | 3 + mm/page_alloc.c | 120 +++++++++++++++++++++++++++++++++++ 3 files changed, 255 insertions(+), 5 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloo= n.c index 4416370..9b6cf44f 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -66,7 +66,7 @@ struct balloon_page_chunk_ext { =20 struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *host_req_vq; =20 /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; @@ -95,6 +95,8 @@ struct virtio_balloon { unsigned int nr_page_bmap; /* Used to record the processed pfn range */ unsigned long min_pfn, max_pfn, start_pfn, end_pfn; + /* Request header */ + struct virtio_balloon_req_hdr req_hdr; /* * The pages we've told the Host we're not using are enqueued * at vb_dev_info->pages list. @@ -549,6 +551,80 @@ static void stats_handle_request(struct virtio_balloon= *vb) virtqueue_kick(vq); } =20 +static void __send_unused_pages(struct virtio_balloon *vb, + unsigned long req_id, unsigned int pos, bool done) +{ + struct virtio_balloon_resp_hdr *hdr =3D &vb->resp_hdr; + struct virtqueue *vq =3D vb->host_req_vq; + + vb->resp_pos =3D pos; + hdr->cmd =3D BALLOON_GET_UNUSED_PAGES; + hdr->id =3D req_id; + if (!done) + hdr->flag =3D BALLOON_FLAG_CONT; + else + hdr->flag =3D BALLOON_FLAG_DONE; + + if (pos > 0 || done) + send_resp_data(vb, vq, true); + +} + +static void send_unused_pages(struct virtio_balloon *vb, + unsigned long req_id) +{ + struct scatterlist sg_in; + unsigned int pos =3D 0; + struct virtqueue *vq =3D vb->host_req_vq; + int ret, order; + struct zone *zone =3D NULL; + bool part_fill =3D false; + + mutex_lock(&vb->balloon_lock); + + for (order =3D MAX_ORDER - 1; order >=3D 0; order--) { + ret =3D mark_unused_pages(&zone, order, vb->resp_data, + vb->resp_buf_size / sizeof(__le64), + &pos, VIRTIO_BALLOON_CHUNK_SIZE_SHIFT, part_fill); + if (ret =3D=3D -ENOSPC) { + if (pos =3D=3D 0) { + void *new_resp_data; + + new_resp_data =3D kmalloc(2 * vb->resp_buf_size, + GFP_KERNEL); + if (new_resp_data) { + kfree(vb->resp_data); + vb->resp_data =3D new_resp_data; + vb->resp_buf_size *=3D 2; + } else { + part_fill =3D true; + dev_warn(&vb->vdev->dev, + "%s: part fill order: %d\n", + __func__, order); + } + } else { + __send_unused_pages(vb, req_id, pos, false); + pos =3D 0; + } + + if (!part_fill) { + order++; + continue; + } + } else + zone =3D NULL; + + if (order =3D=3D 0) + __send_unused_pages(vb, req_id, pos, true); + + } + + mutex_unlock(&vb->balloon_lock); + sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr)); + virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL); + virtqueue_kick(vq); +} + static void virtballoon_changed(struct virtio_device *vdev) { struct virtio_balloon *vb =3D vdev->priv; @@ -648,18 +724,51 @@ static void update_balloon_size_func(struct work_stru= ct *work) queue_work(system_freezable_wq, work); } =20 +static void handle_host_request(struct virtqueue *vq) +{ + struct virtio_balloon *vb =3D vq->vdev->priv; + struct virtio_balloon_req_hdr *ptr_hdr; + unsigned int len; + + ptr_hdr =3D virtqueue_get_buf(vb->host_req_vq, &len); + if (!ptr_hdr || len !=3D sizeof(vb->req_hdr)) + return; + + switch (ptr_hdr->cmd) { + case BALLOON_GET_UNUSED_PAGES: + send_unused_pages(vb, ptr_hdr->param); + break; + default: + dev_warn(&vb->vdev->dev, "%s: host request %d not supported \n", + __func__, ptr_hdr->cmd); + } +} + static int init_vqs(struct virtio_balloon *vb) { - struct virtqueue *vqs[3]; - vq_callback_t *callbacks[] =3D { balloon_ack, balloon_ack, stats_request = }; - static const char * const names[] =3D { "inflate", "deflate", "stats" }; + struct virtqueue *vqs[4]; + vq_callback_t *callbacks[] =3D { balloon_ack, balloon_ack, + stats_request, handle_host_request }; + static const char * const names[] =3D { "inflate", "deflate", + "stats", "host_request" }; int err, nvqs; =20 /* * We expect two virtqueues: inflate and deflate, and * optionally stat. */ - nvqs =3D virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2; + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HOST_REQ_VQ)) + nvqs =3D 4; + else if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) + nvqs =3D 3; + else + nvqs =3D 2; + + if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { + __virtio_clear_bit(vb->vdev, VIRTIO_BALLOON_F_CHUNK_TRANSFER); + __virtio_clear_bit(vb->vdev, VIRTIO_BALLOON_F_HOST_REQ_VQ); + } + err =3D vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names); if (err) return err; @@ -680,6 +789,20 @@ static int init_vqs(struct virtio_balloon *vb) BUG(); virtqueue_kick(vb->stats_vq); } + + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HOST_REQ_VQ)) { + struct scatterlist sg_in; + + vb->host_req_vq =3D vqs[3]; + sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr)); + if (virtqueue_add_inbuf(vb->host_req_vq, &sg_in, 1, + &vb->req_hdr, GFP_KERNEL) < 0) + __virtio_clear_bit(vb->vdev, + VIRTIO_BALLOON_F_HOST_REQ_VQ); + else + virtqueue_kick(vb->host_req_vq); + } + return 0; } =20 @@ -812,12 +935,15 @@ static int virtballoon_probe(struct virtio_device *vd= ev) vb->page_bitmap[0] =3D kmalloc(PAGE_BMAP_SIZE, GFP_KERNEL); if (!vb->page_bitmap[0]) { __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_CHUNK_TRANSFER); + __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_HOST_REQ_VQ); } else { vb->nr_page_bmap =3D 1; vb->resp_data =3D kmalloc(PAGE_BMAP_SIZE, GFP_KERNEL); if (!vb->resp_data) { __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_CHUNK_TRANSFER); + __virtio_clear_bit(vdev, + VIRTIO_BALLOON_F_HOST_REQ_VQ); kfree(vb->page_bitmap[0]); } } @@ -944,6 +1070,7 @@ static unsigned int features[] =3D { VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, VIRTIO_BALLOON_F_CHUNK_TRANSFER, + VIRTIO_BALLOON_F_HOST_REQ_VQ, }; =20 static struct virtio_driver virtio_balloon_driver =3D { diff --git a/include/linux/mm.h b/include/linux/mm.h index b84615b..c9ad89c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1764,6 +1764,9 @@ extern void free_area_init(unsigned long * zones_size= ); extern void free_area_init_node(int nid, unsigned long * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); +extern int mark_unused_pages(struct zone **start_zone, int order, + __le64 *pages, unsigned int size, unsigned int *pos, + u8 len_bits, bool part_fill); =20 /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f3e0c69..f0573b1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4498,6 +4498,126 @@ void show_free_areas(unsigned int filter) show_swap_cache_info(); } =20 +static int __mark_unused_pages(struct zone *zone, int order, + __le64 *pages, unsigned int size, unsigned int *pos, + u8 len_bits, bool part_fill) +{ + unsigned long pfn, flags; + int t, ret =3D 0; + struct list_head *curr; + __le64 *range; + + if (zone_is_empty(zone)) + return 0; + + spin_lock_irqsave(&zone->lock, flags); + + if (*pos + zone->free_area[order].nr_free > size && !part_fill) { + ret =3D -ENOSPC; + goto out; + } + for (t =3D 0; t < MIGRATE_TYPES; t++) { + list_for_each(curr, &zone->free_area[order].free_list[t]) { + pfn =3D page_to_pfn(list_entry(curr, struct page, lru)); + range =3D pages + *pos; + if (order < len_bits) { + if (*pos + 1 > size) { + ret =3D -ENOSPC; + goto out; + } + *range =3D cpu_to_le64((pfn << len_bits) + | 1 << order); + *pos +=3D 1; + } else { + if (*pos + 2 > size) { + ret =3D -ENOSPC; + goto out; + } + *range =3D cpu_to_le64((pfn << len_bits) | 0); + *(range + 1) =3D cpu_to_le64(1 << order); + *pos +=3D 2; + } + } + } + +out: + spin_unlock_irqrestore(&zone->lock, flags); + + return ret; +} + +/* + * During live migration, page is discardable unless it's content + * is needed by the system. + * mark_unused_pages provides an API to mark the unused pages, these + * unused pages can be discarded if there is no modification since + * the request. Some other mechanism, like the dirty page logging + * can be used to track the modification. + * + * This function scans the free page list to mark the unused pages + * with the specified order, and set the corresponding range element + * in the array 'pages' if unused pages are found for the specified + * order. + * + * @start_zone: zone to start the mark operation. + * @order: page order to mark. + * @pages: array to save the unused page info. + * @size: size of array pages. + * @pos: offset in the array to save the page info. + * @len_bits: bits for the length field of the range. + * @part_fill: indicate if partial fill is used. + * + * return -EINVAL if parameter is invalid + * return -ENOSPC when bitmap can't contain the pages + * return 0 when sccess + */ +int mark_unused_pages(struct zone **start_zone, int order, + __le64 *pages, unsigned int size, unsigned int *pos, + u8 len_bits, bool part_fill) +{ + struct zone *zone; + int ret =3D 0; + bool skip_check =3D false; + + /* make sure all the parameters are valid */ + if (pages =3D=3D NULL || pos =3D=3D NULL || *pos < 0 + || order >=3D MAX_ORDER || len_bits > 64) + return -EINVAL; + if (*start_zone !=3D NULL) { + bool found =3D false; + + for_each_populated_zone(zone) { + if (zone !=3D *start_zone) + continue; + found =3D true; + break; + } + if (!found) + return -EINVAL; + } else + skip_check =3D true; + + for_each_populated_zone(zone) { + /* Start from *start_zone if it's not NULL */ + if (!skip_check) { + if (*start_zone !=3D zone) + continue; + else + skip_check =3D true; + } + ret =3D __mark_unused_pages(zone, order, pages, size, + pos, len_bits, part_fill); + if (ret < 0) { + /* record the failed zone */ + *start_zone =3D zone; + break; + } + } + + return ret; +} +EXPORT_SYMBOL(mark_unused_pages); + static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) { zoneref->zone =3D zone; --=20 2.7.4