From nobody Sat Apr 27 23:05:30 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1502941209013731.973389391334; Wed, 16 Aug 2017 20:40:09 -0700 (PDT) Received: from localhost ([::1]:52550 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diBfD-0000p2-PA for importer@patchew.org; Wed, 16 Aug 2017 23:40:07 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48333) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diBdV-0007yX-5g for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diBdT-0003l0-7Y for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:21 -0400 Received: from mga01.intel.com ([192.55.52.88]:1602) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1diBdS-0003iP-Q9 for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:19 -0400 Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Aug 2017 20:38:18 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga004.jf.intel.com with ESMTP; 16 Aug 2017 20:38:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,386,1498546800"; d="scan'208";a="119892689" From: Wei Wang To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, mhocko@kernel.org, akpm@linux-foundation.org, mawilcox@microsoft.com Date: Thu, 17 Aug 2017 11:26:52 +0800 Message-Id: <1502940416-42944-2-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1502940416-42944-1-git-send-email-wei.w.wang@intel.com> References: <1502940416-42944-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH v14 1/5] lib/xbitmap: Introduce xbitmap X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aarcange@redhat.com, yang.zhang.wz@gmail.com, david@redhat.com, liliang.opensource@gmail.com, willy@infradead.org, amit.shah@redhat.com, wei.w.wang@intel.com, quan.xu@aliyun.com, cornelia.huck@de.ibm.com, pbonzini@redhat.com, mgorman@techsingularity.net Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Matthew Wilcox The eXtensible Bitmap is a sparse bitmap representation which is efficient for set bits which tend to cluster. It supports up to 'unsigned long' worth of bits, and this commit adds the bare bones -- xb_set_bit(), xb_clear_bit() and xb_test_bit(). Signed-off-by: Matthew Wilcox Signed-off-by: Wei Wang Cc: Andrew Morton Cc: Michal Hocko Cc: Michael S. Tsirkin --- include/linux/radix-tree.h | 3 + include/linux/xbitmap.h | 61 ++++++++++++++++ lib/Makefile | 2 +- lib/radix-tree.c | 22 +++++- lib/xbitmap.c | 176 +++++++++++++++++++++++++++++++++++++++++= ++++ 5 files changed, 260 insertions(+), 4 deletions(-) create mode 100644 include/linux/xbitmap.h create mode 100644 lib/xbitmap.c diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 3e57350..e1203b1 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -309,6 +309,8 @@ void radix_tree_iter_replace(struct radix_tree_root *, const struct radix_tree_iter *, void __rcu **slot, void *entry); void radix_tree_replace_slot(struct radix_tree_root *, void __rcu **slot, void *entry); +bool __radix_tree_delete(struct radix_tree_root *root, + struct radix_tree_node *node, void __rcu **slot); void __radix_tree_delete_node(struct radix_tree_root *, struct radix_tree_node *, radix_tree_update_node_t update_node, @@ -325,6 +327,7 @@ unsigned int radix_tree_gang_lookup(const struct radix_= tree_root *, unsigned int radix_tree_gang_lookup_slot(const struct radix_tree_root *, void __rcu ***results, unsigned long *indices, unsigned long first_index, unsigned int max_items); +int __radix_tree_preload(gfp_t gfp_mask, unsigned int nr); int radix_tree_preload(gfp_t gfp_mask); int radix_tree_maybe_preload(gfp_t gfp_mask); int radix_tree_maybe_preload_order(gfp_t gfp_mask, int order); diff --git a/include/linux/xbitmap.h b/include/linux/xbitmap.h new file mode 100644 index 0000000..5edbf84 --- /dev/null +++ b/include/linux/xbitmap.h @@ -0,0 +1,61 @@ +/* + * eXtensible Bitmaps + * Copyright (c) 2017 Microsoft Corporation + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of the + * License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * eXtensible Bitmaps provide an unlimited-size sparse bitmap facility. + * All bits are initially zero. + */ + +#ifndef __XBITMAP_H__ +#define __XBITMAP_H__ + +#include + +struct xb { + struct radix_tree_root xbrt; +}; + +#define XB_INIT { \ + .xbrt =3D RADIX_TREE_INIT(IDR_RT_MARKER | GFP_NOWAIT), \ +} +#define DEFINE_XB(name) struct xb name =3D XB_INIT + +static inline void xb_init(struct xb *xb) +{ + INIT_RADIX_TREE(&xb->xbrt, IDR_RT_MARKER | GFP_NOWAIT); +} + +int xb_set_bit(struct xb *xb, unsigned long bit); +bool xb_test_bit(const struct xb *xb, unsigned long bit); +void xb_clear_bit(struct xb *xb, unsigned long bit); + +/* Check if the xb tree is empty */ +static inline bool xb_is_empty(const struct xb *xb) +{ + return radix_tree_empty(&xb->xbrt); +} + +void xb_preload(gfp_t gfp); + +/** + * xb_preload_end - end preload section started with xb_preload() + * + * Each xb_preload() should be matched with an invocation of this + * function. See xb_preload() for details. + */ +static inline void xb_preload_end(void) +{ + preempt_enable(); +} + +#endif diff --git a/lib/Makefile b/lib/Makefile index 40c1837..ea50496 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -18,7 +18,7 @@ KCOV_INSTRUMENT_dynamic_debug.o :=3D n =20 lib-y :=3D ctype.o string.o vsprintf.o cmdline.o \ rbtree.o radix-tree.o dump_stack.o timerqueue.o\ - idr.o int_sqrt.o extable.o \ + idr.o xbitmap.o int_sqrt.o extable.o \ sha1.o chacha20.o irq_regs.o argv_split.o \ flex_proportions.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o kobject_uevent.o \ diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 898e879..ee72e2c 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -463,7 +463,7 @@ radix_tree_node_free(struct radix_tree_node *node) * To make use of this facility, the radix tree must be initialised without * __GFP_DIRECT_RECLAIM being passed to INIT_RADIX_TREE(). */ -static int __radix_tree_preload(gfp_t gfp_mask, unsigned nr) +int __radix_tree_preload(gfp_t gfp_mask, unsigned int nr) { struct radix_tree_preload *rtp; struct radix_tree_node *node; @@ -496,6 +496,7 @@ static int __radix_tree_preload(gfp_t gfp_mask, unsigne= d nr) out: return ret; } +EXPORT_SYMBOL(__radix_tree_preload); =20 /* * Load up this CPU's radix_tree_node buffer with sufficient objects to @@ -840,6 +841,8 @@ int __radix_tree_create(struct radix_tree_root *root, u= nsigned long index, offset, 0, 0); if (!child) return -ENOMEM; + if (is_idr(root)) + all_tag_set(child, IDR_FREE); rcu_assign_pointer(*slot, node_to_entry(child)); if (node) node->count++; @@ -1986,8 +1989,20 @@ void __radix_tree_delete_node(struct radix_tree_root= *root, delete_node(root, node, update_node, private); } =20 -static bool __radix_tree_delete(struct radix_tree_root *root, - struct radix_tree_node *node, void __rcu **slot) +/** + * __radix_tree_delete - delete a slot from a radix tree + * @root: radix tree root + * @node: node containing the slot + * @slot: pointer to the slot to delete + * + * Clear @slot from @node of the radix tree. This may cause the current no= de to + * be freed. This function may be called without any locking if there are = no + * other threads which can access this tree. + * + * Return: the node or NULL if the node is freed. + */ +bool __radix_tree_delete(struct radix_tree_root *root, + struct radix_tree_node *node, void __rcu **slot) { void *old =3D rcu_dereference_raw(*slot); int exceptional =3D radix_tree_exceptional_entry(old) ? -1 : 0; @@ -2003,6 +2018,7 @@ static bool __radix_tree_delete(struct radix_tree_roo= t *root, replace_slot(slot, NULL, node, -1, exceptional); return node && delete_node(root, node, NULL, NULL); } +EXPORT_SYMBOL(__radix_tree_delete); =20 /** * radix_tree_iter_delete - delete the entry at this iterator position diff --git a/lib/xbitmap.c b/lib/xbitmap.c new file mode 100644 index 0000000..cc766d9 --- /dev/null +++ b/lib/xbitmap.c @@ -0,0 +1,176 @@ +#include +#include + +/* + * The xbitmap implementation supports up to ULONG_MAX bits, and it is + * implemented based on ida bitmaps. So, given an unsigned long index, + * the high order XB_INDEX_BITS bits of the index is used to find the + * corresponding iteam (i.e. ida bitmap) from the radix tree, and the low + * order (i.e. ilog2(IDA_BITMAP_BITS)) bits of the index are indexed into + * the ida bitmap to find the bit. + */ +#define XB_INDEX_BITS (BITS_PER_LONG - ilog2(IDA_BITMAP_BITS)) +#define XB_MAX_PATH (DIV_ROUND_UP(XB_INDEX_BITS, \ + RADIX_TREE_MAP_SHIFT)) +#define XB_PRELOAD_SIZE (XB_MAX_PATH * 2 - 1) + +enum xb_ops { + XB_SET, + XB_CLEAR, + XB_TEST +}; + +static int xb_bit_ops(struct xb *xb, unsigned long bit, enum xb_ops ops) +{ + int ret =3D 0; + unsigned long index =3D bit / IDA_BITMAP_BITS; + struct radix_tree_root *root =3D &xb->xbrt; + struct radix_tree_node *node; + void **slot; + struct ida_bitmap *bitmap; + unsigned long ebit, tmp; + + bit %=3D IDA_BITMAP_BITS; + ebit =3D bit + RADIX_TREE_EXCEPTIONAL_SHIFT; + + switch (ops) { + case XB_SET: + ret =3D __radix_tree_create(root, index, 0, &node, &slot); + if (ret) + return ret; + bitmap =3D rcu_dereference_raw(*slot); + if (radix_tree_exception(bitmap)) { + tmp =3D (unsigned long)bitmap; + if (ebit < BITS_PER_LONG) { + tmp |=3D 1UL << ebit; + rcu_assign_pointer(*slot, (void *)tmp); + return 0; + } + bitmap =3D this_cpu_xchg(ida_bitmap, NULL); + if (!bitmap) + return -EAGAIN; + memset(bitmap, 0, sizeof(*bitmap)); + bitmap->bitmap[0] =3D + tmp >> RADIX_TREE_EXCEPTIONAL_SHIFT; + rcu_assign_pointer(*slot, bitmap); + } + if (!bitmap) { + if (ebit < BITS_PER_LONG) { + bitmap =3D (void *)((1UL << ebit) | + RADIX_TREE_EXCEPTIONAL_ENTRY); + __radix_tree_replace(root, node, slot, bitmap, + NULL, NULL); + return 0; + } + bitmap =3D this_cpu_xchg(ida_bitmap, NULL); + if (!bitmap) + return -EAGAIN; + memset(bitmap, 0, sizeof(*bitmap)); + __radix_tree_replace(root, node, slot, bitmap, NULL, + NULL); + } + __set_bit(bit, bitmap->bitmap); + break; + case XB_CLEAR: + bitmap =3D __radix_tree_lookup(root, index, &node, &slot); + if (radix_tree_exception(bitmap)) { + tmp =3D (unsigned long)bitmap; + if (ebit >=3D BITS_PER_LONG) + return 0; + tmp &=3D ~(1UL << ebit); + if (tmp =3D=3D RADIX_TREE_EXCEPTIONAL_ENTRY) + __radix_tree_delete(root, node, slot); + else + rcu_assign_pointer(*slot, (void *)tmp); + return 0; + } + if (!bitmap) + return 0; + __clear_bit(bit, bitmap->bitmap); + if (bitmap_empty(bitmap->bitmap, IDA_BITMAP_BITS)) { + kfree(bitmap); + __radix_tree_delete(root, node, slot); + } + break; + case XB_TEST: + bitmap =3D radix_tree_lookup(root, index); + if (!bitmap) + return 0; + if (radix_tree_exception(bitmap)) { + if (ebit > BITS_PER_LONG) + return 0; + return (unsigned long)bitmap & (1UL << bit); + } + ret =3D test_bit(bit, bitmap->bitmap); + break; + default: + return -EINVAL; + } + return ret; +} + +/** + * xb_set_bit - set a bit in the xbitmap + * @xb: the xbitmap tree used to record the bit + * @bit: index of the bit to set + * + * This function is used to set a bit in the xbitmap. If the bitmap that @= bit + * resides in is not there, it will be allocated. + * + * Returns: 0 on success. %-EAGAIN indicates that @bit was not set. The ca= ller + * may want to call the function again. + */ +int xb_set_bit(struct xb *xb, unsigned long bit) +{ + return xb_bit_ops(xb, bit, XB_SET); +} +EXPORT_SYMBOL(xb_set_bit); + +/** + * xb_clear_bit - clear a bit in the xbitmap + * @xb: the xbitmap tree used to record the bit + * @bit: index of the bit to set + * + * This function is used to clear a bit in the xbitmap. If all the bits of= the + * bitmap are 0, the bitmap will be freed. + */ +void xb_clear_bit(struct xb *xb, unsigned long bit) +{ + xb_bit_ops(xb, bit, XB_CLEAR); +} +EXPORT_SYMBOL(xb_clear_bit); + +/** + * xb_test_bit - test a bit in the xbitmap + * @xb: the xbitmap tree used to record the bit + * @bit: index of the bit to set + * + * This function is used to test a bit in the xbitmap. + * Returns: 1 if the bit is set, or 0 otherwise. + */ +bool xb_test_bit(const struct xb *xb, unsigned long bit) +{ + return (bool)xb_bit_ops(xb, bit, XB_TEST); +} +EXPORT_SYMBOL(xb_test_bit); + +/** + * xb_preload - preload for xb_set_bit() + * @gfp_mask: allocation mask to use for preloading + * + * Preallocate memory to use for the next call to xb_set_bit(). This funct= ion + * returns with preemption disabled. It will be enabled by xb_preload_end(= ). + */ +void xb_preload(gfp_t gfp) +{ + __radix_tree_preload(gfp, XB_PRELOAD_SIZE); + if (!this_cpu_read(ida_bitmap)) { + struct ida_bitmap *bitmap =3D kmalloc(sizeof(*bitmap), gfp); + + if (!bitmap) + return; + bitmap =3D this_cpu_cmpxchg(ida_bitmap, NULL, bitmap); + kfree(bitmap); + } +} +EXPORT_SYMBOL(xb_preload); --=20 2.7.4 From nobody Sat Apr 27 23:05:30 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1502941212083225.43543489412468; Wed, 16 Aug 2017 20:40:12 -0700 (PDT) Received: from localhost ([::1]:52552 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diBfG-0000sE-28 for importer@patchew.org; Wed, 16 Aug 2017 23:40:10 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48371) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diBdY-00080o-8k for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diBdX-0003mr-Ai for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:24 -0400 Received: from mga01.intel.com ([192.55.52.88]:1602) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1diBdX-0003iP-20 for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:23 -0400 Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Aug 2017 20:38:22 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga004.jf.intel.com with ESMTP; 16 Aug 2017 20:38:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,386,1498546800"; d="scan'208";a="119892703" From: Wei Wang To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, mhocko@kernel.org, akpm@linux-foundation.org, mawilcox@microsoft.com Date: Thu, 17 Aug 2017 11:26:53 +0800 Message-Id: <1502940416-42944-3-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1502940416-42944-1-git-send-email-wei.w.wang@intel.com> References: <1502940416-42944-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH v14 2/5] lib/xbitmap: add xb_find_next_bit() and xb_zero() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aarcange@redhat.com, yang.zhang.wz@gmail.com, david@redhat.com, liliang.opensource@gmail.com, willy@infradead.org, amit.shah@redhat.com, wei.w.wang@intel.com, quan.xu@aliyun.com, cornelia.huck@de.ibm.com, pbonzini@redhat.com, mgorman@techsingularity.net Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" xb_find_next_bit() is used to find the next "1" or "0" bit in the given range. xb_zero() is used to zero the given range of bits. Signed-off-by: Wei Wang Cc: Andrew Morton Cc: Matthew Wilcox Cc: Michal Hocko Cc: Michael S. Tsirkin --- include/linux/xbitmap.h | 3 +++ lib/xbitmap.c | 39 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 42 insertions(+) diff --git a/include/linux/xbitmap.h b/include/linux/xbitmap.h index 5edbf84..739d08c 100644 --- a/include/linux/xbitmap.h +++ b/include/linux/xbitmap.h @@ -38,6 +38,9 @@ static inline void xb_init(struct xb *xb) int xb_set_bit(struct xb *xb, unsigned long bit); bool xb_test_bit(const struct xb *xb, unsigned long bit); void xb_clear_bit(struct xb *xb, unsigned long bit); +void xb_zero(struct xb *xb, unsigned long start, unsigned long end); +unsigned long xb_find_next_bit(struct xb *xb, unsigned long start, + unsigned long end, bool set); =20 /* Check if the xb tree is empty */ static inline bool xb_is_empty(const struct xb *xb) diff --git a/lib/xbitmap.c b/lib/xbitmap.c index cc766d9..2267ac2 100644 --- a/lib/xbitmap.c +++ b/lib/xbitmap.c @@ -174,3 +174,42 @@ void xb_preload(gfp_t gfp) } } EXPORT_SYMBOL(xb_preload); + +/** + * xb_zero - zero a range of bits in the xbitmap + * @xb: the xbitmap that the bits reside in + * @start: the start of the range, inclusive + * @end: the end of the range, inclusive + */ +void xb_zero(struct xb *xb, unsigned long start, unsigned long end) +{ + unsigned long i; + + for (i =3D start; i <=3D end; i++) + xb_clear_bit(xb, i); +} +EXPORT_SYMBOL(xb_zero); + +/** + * xb_find_next_bit - find next 1 or 0 in the give range of bits + * @xb: the xbitmap that the bits reside in + * @start: the start of the range, inclusive + * @end: the end of the range, inclusive + * @set: the polarity (1 or 0) of the next bit to find + * + * Return the index of the found bit in the xbitmap. If the returned index + * exceeds @end, it indicates that no such bit is found in the given range. + */ +unsigned long xb_find_next_bit(struct xb *xb, unsigned long start, + unsigned long end, bool set) +{ + unsigned long i; + + for (i =3D start; i <=3D end; i++) { + if (xb_test_bit(xb, i) =3D=3D set) + break; + } + + return i; +} +EXPORT_SYMBOL(xb_find_next_bit); --=20 2.7.4 From nobody Sat Apr 27 23:05:30 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1502941338888549.5546913125503; Wed, 16 Aug 2017 20:42:18 -0700 (PDT) Received: from localhost ([::1]:52675 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diBhJ-0003OO-Or for importer@patchew.org; Wed, 16 Aug 2017 23:42:17 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48436) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diBdf-00088Q-Jx for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diBdb-0003p7-IS for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:31 -0400 Received: from mga01.intel.com ([192.55.52.88]:1602) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1diBdb-0003iP-5i for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:27 -0400 Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Aug 2017 20:38:26 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga004.jf.intel.com with ESMTP; 16 Aug 2017 20:38:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,386,1498546800"; d="scan'208";a="119892739" From: Wei Wang To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, mhocko@kernel.org, akpm@linux-foundation.org, mawilcox@microsoft.com Date: Thu, 17 Aug 2017 11:26:54 +0800 Message-Id: <1502940416-42944-4-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1502940416-42944-1-git-send-email-wei.w.wang@intel.com> References: <1502940416-42944-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH v14 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aarcange@redhat.com, yang.zhang.wz@gmail.com, david@redhat.com, liliang.opensource@gmail.com, willy@infradead.org, amit.shah@redhat.com, wei.w.wang@intel.com, quan.xu@aliyun.com, cornelia.huck@de.ibm.com, pbonzini@redhat.com, mgorman@techsingularity.net Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add a new feature, VIRTIO_BALLOON_F_SG, which enables the transfer of balloon (i.e. inflated/deflated) pages using scatter-gather lists to the host. The implementation of the previous virtio-balloon is not very efficient, because the balloon pages are transferred to the host one by one. Here is the breakdown of the time in percentage spent on each step of the balloon inflating process (inflating 7GB of an 8GB idle guest). 1) allocating pages (6.5%) 2) sending PFNs to host (68.3%) 3) address translation (6.1%) 4) madvise (19%) It takes about 4126ms for the inflating process to complete. The above profiling shows that the bottlenecks are stage 2) and stage 4). This patch optimizes step 2) by transferring pages to the host in sgs. An sg describes a chunk of guest physically continuous pages. With this mechanism, step 4) can also be optimized by doing address translation and madvise() in chunks rather than page by page. With this new feature, the above ballooning process takes ~541ms resulting in an improvement of ~87%. TODO: optimize stage 1) by allocating/freeing a chunk of pages instead of a single page each time. Signed-off-by: Wei Wang Signed-off-by: Liang Li Suggested-by: Michael S. Tsirkin --- drivers/virtio/virtio_balloon.c | 157 ++++++++++++++++++++++++++++++++= ---- include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 141 insertions(+), 17 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloo= n.c index f0b3a0b..72041b4 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -32,6 +32,7 @@ #include #include #include +#include =20 /* * Balloon device works in 4K page units. So each page is pointed to by @@ -79,6 +80,9 @@ struct virtio_balloon { /* Synchronize access/update to this struct virtio_balloon elements */ struct mutex balloon_lock; =20 + /* The xbitmap used to record ballooned pages */ + struct xb page_xb; + /* The array of pfns we tell the Host about. */ unsigned int num_pfns; __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]; @@ -141,13 +145,98 @@ static void set_page_pfns(struct virtio_balloon *vb, page_to_balloon_pfn(page) + i); } =20 +static int add_one_sg(struct virtqueue *vq, void *addr, uint32_t size) +{ + struct scatterlist sg; + + sg_init_one(&sg, addr, size); + return virtqueue_add_inbuf(vq, &sg, 1, vq, GFP_KERNEL); +} + +static void send_balloon_page_sg(struct virtio_balloon *vb, + struct virtqueue *vq, + void *addr, + uint32_t size) +{ + unsigned int len; + int ret; + + do { + ret =3D add_one_sg(vq, addr, size); + virtqueue_kick(vq); + wait_event(vb->acked, virtqueue_get_buf(vq, &len)); + /* + * It is uncommon to see the vq is full, because the sg is sent + * one by one and the device is able to handle it in time. But + * if that happens, we go back to retry after an entry gets + * released. + */ + } while (unlikely(ret =3D=3D -ENOSPC)); +} + +/* + * Send balloon pages in sgs to host. The balloon pages are recorded in the + * page xbitmap. Each bit in the bitmap corresponds to a page of PAGE_SIZE. + * The page xbitmap is searched for continuous "1" bits, which correspond + * to continuous pages, to chunk into sgs. + * + * @page_xb_start and @page_xb_end form the range of bits in the xbitmap t= hat + * need to be searched. + */ +static void tell_host_sgs(struct virtio_balloon *vb, + struct virtqueue *vq, + unsigned long page_xb_start, + unsigned long page_xb_end) +{ + unsigned long sg_pfn_start, sg_pfn_end; + void *sg_addr; + uint32_t sg_len, sg_max_len =3D round_down(UINT_MAX, PAGE_SIZE); + + sg_pfn_start =3D page_xb_start; + while (sg_pfn_start < page_xb_end) { + sg_pfn_start =3D xb_find_next_bit(&vb->page_xb, sg_pfn_start, + page_xb_end, 1); + if (sg_pfn_start =3D=3D page_xb_end + 1) + break; + sg_pfn_end =3D xb_find_next_bit(&vb->page_xb, sg_pfn_start + 1, + page_xb_end, 0); + sg_addr =3D (void *)pfn_to_kaddr(sg_pfn_start); + sg_len =3D (sg_pfn_end - sg_pfn_start) << PAGE_SHIFT; + while (sg_len > sg_max_len) { + send_balloon_page_sg(vb, vq, sg_addr, sg_max_len); + sg_addr +=3D sg_max_len; + sg_len -=3D sg_max_len; + } + send_balloon_page_sg(vb, vq, sg_addr, sg_len); + xb_zero(&vb->page_xb, sg_pfn_start, sg_pfn_end); + sg_pfn_start =3D sg_pfn_end + 1; + } +} + +static inline void xb_set_page(struct virtio_balloon *vb, + struct page *page, + unsigned long *pfn_min, + unsigned long *pfn_max) +{ + unsigned long pfn =3D page_to_pfn(page); + + *pfn_min =3D min(pfn, *pfn_min); + *pfn_max =3D max(pfn, *pfn_max); + xb_preload(GFP_KERNEL); + xb_set_bit(&vb->page_xb, pfn); + xb_preload_end(); +} + static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) { struct balloon_dev_info *vb_dev_info =3D &vb->vb_dev_info; unsigned num_allocated_pages; + bool use_sg =3D virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG); + unsigned long pfn_max =3D 0, pfn_min =3D ULONG_MAX; =20 /* We can only do one array worth at a time. */ - num =3D min(num, ARRAY_SIZE(vb->pfns)); + if (!use_sg) + num =3D min(num, ARRAY_SIZE(vb->pfns)); =20 mutex_lock(&vb->balloon_lock); for (vb->num_pfns =3D 0; vb->num_pfns < num; @@ -162,7 +251,12 @@ static unsigned fill_balloon(struct virtio_balloon *vb= , size_t num) msleep(200); break; } - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + + if (use_sg) + xb_set_page(vb, page, &pfn_min, &pfn_max); + else + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + vb->num_pages +=3D VIRTIO_BALLOON_PAGES_PER_PAGE; if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) @@ -171,8 +265,12 @@ static unsigned fill_balloon(struct virtio_balloon *vb= , size_t num) =20 num_allocated_pages =3D vb->num_pfns; /* Did we get any? */ - if (vb->num_pfns !=3D 0) - tell_host(vb, vb->inflate_vq); + if (vb->num_pfns) { + if (use_sg) + tell_host_sgs(vb, vb->inflate_vq, pfn_min, pfn_max); + else + tell_host(vb, vb->inflate_vq); + } mutex_unlock(&vb->balloon_lock); =20 return num_allocated_pages; @@ -198,9 +296,12 @@ static unsigned leak_balloon(struct virtio_balloon *vb= , size_t num) struct page *page; struct balloon_dev_info *vb_dev_info =3D &vb->vb_dev_info; LIST_HEAD(pages); + bool use_sg =3D virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG); + unsigned long pfn_max =3D 0, pfn_min =3D ULONG_MAX; =20 - /* We can only do one array worth at a time. */ - num =3D min(num, ARRAY_SIZE(vb->pfns)); + /* Traditionally, we can only do one array worth at a time. */ + if (!use_sg) + num =3D min(num, ARRAY_SIZE(vb->pfns)); =20 mutex_lock(&vb->balloon_lock); /* We can't release more pages than taken */ @@ -210,7 +311,11 @@ static unsigned leak_balloon(struct virtio_balloon *vb= , size_t num) page =3D balloon_page_dequeue(vb_dev_info); if (!page) break; - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + if (use_sg) + xb_set_page(vb, page, &pfn_min, &pfn_max); + else + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + list_add(&page->lru, &pages); vb->num_pages -=3D VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -221,8 +326,12 @@ static unsigned leak_balloon(struct virtio_balloon *vb= , size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - if (vb->num_pfns !=3D 0) - tell_host(vb, vb->deflate_vq); + if (vb->num_pfns) { + if (use_sg) + tell_host_sgs(vb, vb->deflate_vq, pfn_min, pfn_max); + else + tell_host(vb, vb->deflate_vq); + } release_pages_balloon(vb, &pages); mutex_unlock(&vb->balloon_lock); return num_freed_pages; @@ -441,6 +550,7 @@ static int init_vqs(struct virtio_balloon *vb) } =20 #ifdef CONFIG_BALLOON_COMPACTION + /* * virtballoon_migratepage - perform the balloon page migration on behalf = of * a compation thread. (called under page lock) @@ -464,6 +574,7 @@ static int virtballoon_migratepage(struct balloon_dev_i= nfo *vb_dev_info, { struct virtio_balloon *vb =3D container_of(vb_dev_info, struct virtio_balloon, vb_dev_info); + bool use_sg =3D virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG); unsigned long flags; =20 /* @@ -485,16 +596,24 @@ static int virtballoon_migratepage(struct balloon_dev= _info *vb_dev_info, vb_dev_info->isolated_pages--; __count_vm_event(BALLOON_MIGRATE); spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags); - vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; - set_page_pfns(vb, vb->pfns, newpage); - tell_host(vb, vb->inflate_vq); - + if (use_sg) { + send_balloon_page_sg(vb, vb->inflate_vq, page_address(newpage), + PAGE_SIZE); + } else { + vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; + set_page_pfns(vb, vb->pfns, newpage); + tell_host(vb, vb->inflate_vq); + } /* balloon's page migration 2nd step -- deflate "page" */ balloon_page_delete(page); - vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; - set_page_pfns(vb, vb->pfns, page); - tell_host(vb, vb->deflate_vq); - + if (use_sg) { + send_balloon_page_sg(vb, vb->deflate_vq, page_address(page), + PAGE_SIZE); + } else { + vb->num_pfns =3D VIRTIO_BALLOON_PAGES_PER_PAGE; + set_page_pfns(vb, vb->pfns, page); + tell_host(vb, vb->deflate_vq); + } mutex_unlock(&vb->balloon_lock); =20 put_page(page); /* balloon reference */ @@ -553,6 +672,9 @@ static int virtballoon_probe(struct virtio_device *vdev) if (err) goto out_free_vb; =20 + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_SG)) + xb_init(&vb->page_xb); + vb->nb.notifier_call =3D virtballoon_oom_notify; vb->nb.priority =3D VIRTBALLOON_OOM_NOTIFY_PRIORITY; err =3D register_oom_notifier(&vb->nb); @@ -669,6 +791,7 @@ static unsigned int features[] =3D { VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, + VIRTIO_BALLOON_F_SG, }; =20 static struct virtio_driver virtio_balloon_driver =3D { diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virti= o_balloon.h index 343d7dd..37780a7 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -34,6 +34,7 @@ #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages = */ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ +#define VIRTIO_BALLOON_F_SG 3 /* Use sg instead of PFN lists */ =20 /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 --=20 2.7.4 From nobody Sat Apr 27 23:05:30 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 150294133667010.289273092867347; Wed, 16 Aug 2017 20:42:16 -0700 (PDT) Received: from localhost ([::1]:52664 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diBhH-0003KV-5q for importer@patchew.org; Wed, 16 Aug 2017 23:42:15 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48460) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diBdh-0008Bs-Dj for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diBdg-0003sa-27 for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:33 -0400 Received: from mga01.intel.com ([192.55.52.88]:1602) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1diBdf-0003iP-P6 for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:31 -0400 Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Aug 2017 20:38:31 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga004.jf.intel.com with ESMTP; 16 Aug 2017 20:38:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,386,1498546800"; d="scan'208";a="119892758" From: Wei Wang To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, mhocko@kernel.org, akpm@linux-foundation.org, mawilcox@microsoft.com Date: Thu, 17 Aug 2017 11:26:55 +0800 Message-Id: <1502940416-42944-5-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1502940416-42944-1-git-send-email-wei.w.wang@intel.com> References: <1502940416-42944-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH v14 4/5] mm: support reporting free page blocks X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aarcange@redhat.com, yang.zhang.wz@gmail.com, david@redhat.com, liliang.opensource@gmail.com, willy@infradead.org, amit.shah@redhat.com, wei.w.wang@intel.com, quan.xu@aliyun.com, cornelia.huck@de.ibm.com, pbonzini@redhat.com, mgorman@techsingularity.net Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This patch adds support to walk through the free page blocks in the system and report them via a callback function. Some page blocks may leave the free list after zone->lock is released, so it is the caller's responsibility to either detect or prevent the use of such pages. Signed-off-by: Wei Wang Signed-off-by: Liang Li Cc: Michal Hocko Cc: Michael S. Tsirkin --- include/linux/mm.h | 6 ++++++ mm/page_alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 46b9ac5..cd29b9f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1835,6 +1835,12 @@ extern void free_area_init_node(int nid, unsigned lo= ng * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); =20 +extern void walk_free_mem_block(void *opaque1, + unsigned int min_order, + void (*visit)(void *opaque2, + unsigned long pfn, + unsigned long nr_pages)); + /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) * into the buddy system. The freed pages will be poisoned with pattern diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6d00f74..a721a35 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4762,6 +4762,50 @@ void show_free_areas(unsigned int filter, nodemask_t= *nodemask) show_swap_cache_info(); } =20 +/** + * walk_free_mem_block - Walk through the free page blocks in the system + * @opaque1: the context passed from the caller + * @min_order: the minimum order of free lists to check + * @visit: the callback function given by the caller + * + * The function is used to walk through the free page blocks in the system, + * and each free page block is reported to the caller via the @visit callb= ack. + * Please note: + * 1) The function is used to report hints of free pages, so the caller sh= ould + * not use those reported pages after the callback returns. + * 2) The callback is invoked with the zone->lock being held, so it should= not + * block and should finish as soon as possible. + */ +void walk_free_mem_block(void *opaque1, + unsigned int min_order, + void (*visit)(void *opaque2, + unsigned long pfn, + unsigned long nr_pages)) +{ + struct zone *zone; + struct page *page; + struct list_head *list; + unsigned int order; + enum migratetype mt; + unsigned long pfn, flags; + + for_each_populated_zone(zone) { + for (order =3D MAX_ORDER - 1; + order < MAX_ORDER && order >=3D min_order; order--) { + for (mt =3D 0; mt < MIGRATE_TYPES; mt++) { + spin_lock_irqsave(&zone->lock, flags); + list =3D &zone->free_area[order].free_list[mt]; + list_for_each_entry(page, list, lru) { + pfn =3D page_to_pfn(page); + visit(opaque1, pfn, 1 << order); + } + spin_unlock_irqrestore(&zone->lock, flags); + } + } + } +} +EXPORT_SYMBOL_GPL(walk_free_mem_block); + static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) { zoneref->zone =3D zone; --=20 2.7.4 From nobody Sat Apr 27 23:05:30 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1502941430669262.5293594595523; Wed, 16 Aug 2017 20:43:50 -0700 (PDT) Received: from localhost ([::1]:52825 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diBin-0005HB-Ge for importer@patchew.org; Wed, 16 Aug 2017 23:43:49 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48513) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1diBdl-0008Gl-NY for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1diBdk-0003vk-B4 for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:37 -0400 Received: from mga01.intel.com ([192.55.52.88]:1602) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1diBdj-0003iP-VY for qemu-devel@nongnu.org; Wed, 16 Aug 2017 23:38:36 -0400 Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Aug 2017 20:38:35 -0700 Received: from devel-ww.sh.intel.com ([10.239.48.97]) by orsmga004.jf.intel.com with ESMTP; 16 Aug 2017 20:38:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,386,1498546800"; d="scan'208";a="119892765" From: Wei Wang To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, mhocko@kernel.org, akpm@linux-foundation.org, mawilcox@microsoft.com Date: Thu, 17 Aug 2017 11:26:56 +0800 Message-Id: <1502940416-42944-6-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1502940416-42944-1-git-send-email-wei.w.wang@intel.com> References: <1502940416-42944-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH v14 5/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aarcange@redhat.com, yang.zhang.wz@gmail.com, david@redhat.com, liliang.opensource@gmail.com, willy@infradead.org, amit.shah@redhat.com, wei.w.wang@intel.com, quan.xu@aliyun.com, cornelia.huck@de.ibm.com, pbonzini@redhat.com, mgorman@techsingularity.net Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add a new vq to report hints of guest free pages to the host. Signed-off-by: Wei Wang Signed-off-by: Liang Li --- drivers/virtio/virtio_balloon.c | 167 +++++++++++++++++++++++++++++++-= ---- include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 147 insertions(+), 21 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloo= n.c index 72041b4..e6755bc 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -54,11 +54,12 @@ static struct vfsmount *balloon_mnt; =20 struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq; =20 /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; struct work_struct update_balloon_size_work; + struct work_struct report_free_page_work; =20 /* Prevent updating balloon when it is being canceled. */ spinlock_t stop_update_lock; @@ -90,6 +91,13 @@ struct virtio_balloon { /* Memory statistics */ struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; =20 + /* + * Used by the device and driver to signal each other. + * device->driver: start the free page report. + * driver->device: end the free page report. + */ + __virtio32 report_free_page_signal; + /* To register callback in oom notifier call chain */ struct notifier_block nb; }; @@ -174,6 +182,17 @@ static void send_balloon_page_sg(struct virtio_balloon= *vb, } while (unlikely(ret =3D=3D -ENOSPC)); } =20 +static void send_free_page_sg(struct virtqueue *vq, void *addr, uint32_t s= ize) +{ + unsigned int len; + + add_one_sg(vq, addr, size); + virtqueue_kick(vq); + /* Release entries if there are */ + while (virtqueue_get_buf(vq, &len)) + ; +} + /* * Send balloon pages in sgs to host. The balloon pages are recorded in the * page xbitmap. Each bit in the bitmap corresponds to a page of PAGE_SIZE. @@ -511,42 +530,143 @@ static void update_balloon_size_func(struct work_str= uct *work) queue_work(system_freezable_wq, work); } =20 +static void virtio_balloon_send_free_pages(void *opaque, unsigned long pfn, + unsigned long nr_pages) +{ + struct virtio_balloon *vb =3D (struct virtio_balloon *)opaque; + void *addr =3D (void *)pfn_to_kaddr(pfn); + uint32_t len =3D nr_pages << PAGE_SHIFT; + + send_free_page_sg(vb->free_page_vq, addr, len); +} + +static void report_free_page_completion(struct virtio_balloon *vb) +{ + struct virtqueue *vq =3D vb->free_page_vq; + struct scatterlist sg; + unsigned int len; + int ret; + + sg_init_one(&sg, &vb->report_free_page_signal, sizeof(__virtio32)); +retry: + ret =3D virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL); + virtqueue_kick(vq); + if (unlikely(ret =3D=3D -ENOSPC)) { + wait_event(vb->acked, virtqueue_get_buf(vq, &len)); + goto retry; + } +} + +static void report_free_page(struct work_struct *work) +{ + struct virtio_balloon *vb; + + vb =3D container_of(work, struct virtio_balloon, report_free_page_work); + walk_free_mem_block(vb, 0, &virtio_balloon_send_free_pages); + report_free_page_completion(vb); +} + +static void free_page_request(struct virtqueue *vq) +{ + struct virtio_balloon *vb =3D vq->vdev->priv; + + queue_work(system_freezable_wq, &vb->report_free_page_work); +} + static int init_vqs(struct virtio_balloon *vb) { - struct virtqueue *vqs[3]; - vq_callback_t *callbacks[] =3D { balloon_ack, balloon_ack, stats_request = }; - static const char * const names[] =3D { "inflate", "deflate", "stats" }; - int err, nvqs; + struct virtqueue **vqs; + vq_callback_t **callbacks; + const char **names; + struct scatterlist sg; + int i, nvqs, err =3D -ENOMEM; + + /* Inflateq and deflateq are used unconditionally */ + nvqs =3D 2; + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) + nvqs++; + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_VQ)) + nvqs++; + + /* Allocate space for find_vqs parameters */ + vqs =3D kcalloc(nvqs, sizeof(*vqs), GFP_KERNEL); + if (!vqs) + goto err_vq; + callbacks =3D kmalloc_array(nvqs, sizeof(*callbacks), GFP_KERNEL); + if (!callbacks) + goto err_callback; + names =3D kmalloc_array(nvqs, sizeof(*names), GFP_KERNEL); + if (!names) + goto err_names; + + callbacks[0] =3D balloon_ack; + names[0] =3D "inflate"; + callbacks[1] =3D balloon_ack; + names[1] =3D "deflate"; + + i =3D 2; + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { + callbacks[i] =3D stats_request; + names[i] =3D "stats"; + i++; + } =20 - /* - * We expect two virtqueues: inflate and deflate, and - * optionally stat. - */ - nvqs =3D virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2; - err =3D virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL); + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_VQ)) { + callbacks[i] =3D free_page_request; + names[i] =3D "free_page_vq"; + } + + err =3D vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names, + NULL, NULL); if (err) - return err; + goto err_find; =20 vb->inflate_vq =3D vqs[0]; vb->deflate_vq =3D vqs[1]; + i =3D 2; if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { - struct scatterlist sg; - unsigned int num_stats; - vb->stats_vq =3D vqs[2]; - + vb->stats_vq =3D vqs[i++]; /* * Prime this virtqueue with one buffer so the hypervisor can * use it to signal us later (it can't be broken yet!). */ - num_stats =3D update_balloon_stats(vb); - - sg_init_one(&sg, vb->stats, sizeof(vb->stats[0]) * num_stats); + sg_init_one(&sg, vb->stats, sizeof(vb->stats)); if (virtqueue_add_outbuf(vb->stats_vq, &sg, 1, vb, GFP_KERNEL) - < 0) - BUG(); + < 0) { + dev_warn(&vb->vdev->dev, "%s: add stat_vq failed\n", + __func__); + goto err_find; + } virtqueue_kick(vb->stats_vq); } + + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_VQ)) { + vb->free_page_vq =3D vqs[i]; + vb->report_free_page_signal =3D 0; + sg_init_one(&sg, &vb->report_free_page_signal, + sizeof(__virtio32)); + if (virtqueue_add_outbuf(vb->free_page_vq, &sg, 1, vb, + GFP_KERNEL) < 0) { + dev_warn(&vb->vdev->dev, "%s: add signal buf failed\n", + __func__); + goto err_find; + } + virtqueue_kick(vb->free_page_vq); + } + + kfree(names); + kfree(callbacks); + kfree(vqs); return 0; + +err_find: + kfree(names); +err_names: + kfree(callbacks); +err_callback: + kfree(vqs); +err_vq: + return err; } =20 #ifdef CONFIG_BALLOON_COMPACTION @@ -675,6 +795,9 @@ static int virtballoon_probe(struct virtio_device *vdev) if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_SG)) xb_init(&vb->page_xb); =20 + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_VQ)) + INIT_WORK(&vb->report_free_page_work, report_free_page); + vb->nb.notifier_call =3D virtballoon_oom_notify; vb->nb.priority =3D VIRTBALLOON_OOM_NOTIFY_PRIORITY; err =3D register_oom_notifier(&vb->nb); @@ -739,6 +862,7 @@ static void virtballoon_remove(struct virtio_device *vd= ev) spin_unlock_irq(&vb->stop_update_lock); cancel_work_sync(&vb->update_balloon_size_work); cancel_work_sync(&vb->update_balloon_stats_work); + cancel_work_sync(&vb->report_free_page_work); =20 remove_common(vb); #ifdef CONFIG_BALLOON_COMPACTION @@ -792,6 +916,7 @@ static unsigned int features[] =3D { VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, VIRTIO_BALLOON_F_SG, + VIRTIO_BALLOON_F_FREE_PAGE_VQ, }; =20 static struct virtio_driver virtio_balloon_driver =3D { diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virti= o_balloon.h index 37780a7..8214f84 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -35,6 +35,7 @@ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_SG 3 /* Use sg instead of PFN lists */ +#define VIRTIO_BALLOON_F_FREE_PAGE_VQ 4 /* Virtqueue to report free pages = */ =20 /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 --=20 2.7.4