[Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement

Wei Wang posted 7 patches 6 years, 4 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
drivers/virtio/virtio_balloon.c          | 444 +++++++++++++++++++++++++++----
include/linux/mm.h                       |   6 +
include/linux/radix-tree.h               |   2 +
include/linux/xbitmap.h                  |  55 ++++
include/uapi/linux/virtio_balloon.h      |   7 +
lib/Makefile                             |   2 +-
lib/radix-tree.c                         |  40 ++-
lib/xbitmap.c                            | 330 +++++++++++++++++++++++
mm/page_alloc.c                          |  91 +++++++
tools/include/linux/bitmap.h             |  34 +++
tools/include/linux/kernel.h             |   2 +
tools/testing/radix-tree/Makefile        |  12 +-
tools/testing/radix-tree/linux/xbitmap.h |   1 +
tools/testing/radix-tree/main.c          |   4 +
tools/testing/radix-tree/test.h          |   1 +
15 files changed, 976 insertions(+), 55 deletions(-)
create mode 100644 include/linux/xbitmap.h
create mode 100644 lib/xbitmap.c
create mode 100644 tools/testing/radix-tree/linux/xbitmap.h
[Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Wei Wang 6 years, 4 months ago
This patch series enhances the existing virtio-balloon with the following
new features:
1) fast ballooning: transfer ballooned pages between the guest and host in
chunks using sgs, instead of one array each time; and
2) free page block reporting: a new virtqueue to report guest free pages
to the host.

The second feature can be used to accelerate live migration of VMs. Here
are some details:

Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to write-protect all the guest memory.

The second feature enables the optimization of the 1st round memory
transfer - the hypervisor can skip the transfer of guest free pages in the
1st round. It is not concerned that the memory pages are used after they
are given to the hypervisor as a hint of the free pages, because they will
be tracked by the hypervisor and transferred in the next round if they are
used and written.

ChangeLog:
v19->v20:
1) patch 1: xbitmap
	- add __rcu to "void **slot";
	- remove the exceptional path.
2) patch 3: xbitmap
	- DeveloperNotes: add an item to comment that the current bit range
	  related APIs operating on extremely large ranges (e.g.
          [0, ULONG_MAX)) will take too long time. This can be optimized in
	  the future.
	- remove the exceptional path;
	- remove xb_preload_and_set();
	- reimplement xb_clear_bit_range to make its usage close to
	  bitmap_clear;
	- rename xb_find_next_set_bit to xb_find_set, and re-implement it
	  in a style close to find_next_bit;
	- rename xb_find_next_zero_bit to xb_find_clear, and re-implement
	  it in a stytle close to find_next_zero_bit;
	- separate the implementation of xb_find_set and xb_find_clear for
	  the convenience of future updates.
3) patch 4: virtio-balloon
	- xb_set_page: change the way to call xb_ related APIs

v18->v19:
1) patch 3:
	- xb_clear_bit_range and xb_find_next_bit will deal with range [start,
	  end), where end is changed to be exclusive of the range.
	- add overflow checks at the end of xb_clear_bit_range and
	  xb_find_next_bit 
	- add overflow related test cases
2) patch 4:
	- change back to the previous add_one_sg methond, which is based on the
	  scatterlist struct
        - tell_host_sgs: use "uint64_t len" to avoid overflow
	- batch_balloon_page_sg: a simpler function to implement the batching of
	  sgs
3) patch 6: batch_free_page_sg: batch sgs using the previous scatterlist struct
4) patch 7: add a config field, poison_val, to tell the host about the poison
	    value
v17->v18:
1) patch 1-2: new to solve some tools related compilation issues
2) patch 3: revert to the original xbitmap implementation from Matthew
Wilcox with some minor changes (e.g. comments added to the exported
functions)
3) patch 4: summarize the changes we want to make to patch 3
4) patch 5: add the developer notes as a reminder for users to avoid
concurrent accesses to the ida bitmap
5) patch 6: a new vring API to allow users to directly pass in a physical
address to a vring desc
6) patch 7: ballooning time is reduced from ~490ms to ~440ms with the new
implementation
	- use the new API from patch 6 to send balloon pages
	- xb_preload with "GFP_NOWAIT | __GFP_NOWARN" flag;
	- handle the case when xb_set_page() fails to avoid memory leak;
	- put xb_set_page() under the balloon lock
7) patch 9: simper implementation
	- start free page reporting by sending a new cmd id from the host
	- guest acks the start or stop via adding a cmd id to the free page vq
	- use vb->report_free_page, instead of vb->report_free_page_stop
	- use WRITE_ONCE/READ_ONCE to access vb->report_free_page
	- use the new API from patch 6 to send free pages to avoid the
	  unnecessary use of kaddr.
8) patch 10: new patch to solve the page posioning issue reported by
Michael S. Tsirkin 

v16->v17:
1) patch 1: please check the commit log there;
2) patch 3: included Michael S. Tsirkin patch to fix the potential
deadlock issue;
3) patch 4: use BUG_ON if virtqueue_add_ returns error, which is
expected never to happen;
4) patch 4: add leak_balloon_sg_oom, which is used in the oom case when
VIRTIO_BALLOON_F_SG is in use;
5) patch 6: use config registers, instead of a vq, as the command channel
between the host and guest;
6) patch 6: add the command sequence id support.

v15->v16:
1) mm: stop reporting the free pfn range if the callback returns false;
2) mm: move some implementaion of walk_free_mem_block into a function to
make the code layout looks better;
3) xbitmap: added some optimizations suggested by Matthew, please refer to
the ChangLog in the xbitmap patch for details.
4) xbitmap: added a test suite
5) virtio-balloon: bail out with a warning when virtqueue_add_inbuf returns
an error
6) virtio-balloon: some small code re-arrangement, e.g. detachinf used buf
from the vq before adding a new buf

v14->v15:
1) mm: make the report callback return a bool value - returning 1 to stop
walking through the free page list.
2) virtio-balloon: batching sgs of balloon pages till the vq is full
3) virtio-balloon: create a new workqueue, rather than using the default
system_wq, to queue the free page reporting work item.
4) virtio-balloon: add a ctrl_vq to be a central control plane which will
handle all the future control related commands between the host and guest.
Add free page report as the first feature controlled under ctrl_vq, and
the free_page_vq is a data plane vq dedicated to the transmission of free
page blocks.

v13->v14:
1) xbitmap: move the code from lib/radix-tree.c to lib/xbitmap.c.
2) xbitmap: consolidate the implementation of xb_bit_set/clear/test into
one xb_bit_ops.
3) xbitmap: add documents for the exported APIs.
4) mm: rewrite the function to walk through free page blocks.
5) virtio-balloon: when reporting a free page blcok to the device, if the
vq is full (less likey to happen in practice), just skip reporting this
block, instead of busywaiting till an entry gets released.
6) virtio-balloon: fail the probe function if adding the signal buf in
init_vqs fails.

v12->v13:
1) mm: use a callback function to handle the the free page blocks from the
report function. This avoids exposing the zone internal to a kernel
module.
2) virtio-balloon: send balloon pages or a free page block using a single
sg each time. This has the benefits of simpler implementation with no new
APIs.
3) virtio-balloon: the free_page_vq is used to report free pages only (no
multiple usages interleaving)
4) virtio-balloon: Balloon pages and free page blocks are sent via input
sgs, and the completion signal to the host is sent via an output sg.

v11->v12:
1) xbitmap: use the xbitmap from Matthew Wilcox to record ballooned pages.
2) virtio-ring: enable the driver to build up a desc chain using vring
desc.
3) virtio-ring: Add locking to the existing START_USE() and END_USE()
macro to lock/unlock the vq when a vq operation starts/ends.
4) virtio-ring: add virtqueue_kick_sync() and virtqueue_kick_async()
5) virtio-balloon: describe chunks of ballooned pages and free pages
blocks directly using one or more chains of desc from the vq.

v10->v11:
1) virtio_balloon: use vring_desc to describe a chunk;
2) virtio_ring: support to add an indirect desc table to virtqueue;
3)  virtio_balloon: use cmdq to report guest memory statistics.

v9->v10:
1) mm: put report_unused_page_block() under CONFIG_VIRTIO_BALLOON;
2) virtio-balloon: add virtballoon_validate();
3) virtio-balloon: msg format change;
4) virtio-balloon: move miscq handling to a task on system_freezable_wq;
5) virtio-balloon: code cleanup.

v8->v9:
1) Split the two new features, VIRTIO_BALLOON_F_BALLOON_CHUNKS and
VIRTIO_BALLOON_F_MISC_VQ, which were mixed together in the previous
implementation;
2) Simpler function to get the free page block.

v7->v8:
1) Use only one chunk format, instead of two.
2) re-write the virtio-balloon implementation patch.
3) commit changes
4) patch re-org

Matthew Wilcox (1):
  xbitmap: Introduce xbitmap

Wei Wang (6):
  xbitmap: potential improvement
  xbitmap: add more operations
  virtio-balloon: VIRTIO_BALLOON_F_SG
  mm: support reporting free page blocks
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
  virtio-balloon: don't report free pages when page poisoning is enabled

 drivers/virtio/virtio_balloon.c          | 444 +++++++++++++++++++++++++++----
 include/linux/mm.h                       |   6 +
 include/linux/radix-tree.h               |   2 +
 include/linux/xbitmap.h                  |  55 ++++
 include/uapi/linux/virtio_balloon.h      |   7 +
 lib/Makefile                             |   2 +-
 lib/radix-tree.c                         |  40 ++-
 lib/xbitmap.c                            | 330 +++++++++++++++++++++++
 mm/page_alloc.c                          |  91 +++++++
 tools/include/linux/bitmap.h             |  34 +++
 tools/include/linux/kernel.h             |   2 +
 tools/testing/radix-tree/Makefile        |  12 +-
 tools/testing/radix-tree/linux/xbitmap.h |   1 +
 tools/testing/radix-tree/main.c          |   4 +
 tools/testing/radix-tree/test.h          |   1 +
 15 files changed, 976 insertions(+), 55 deletions(-)
 create mode 100644 include/linux/xbitmap.h
 create mode 100644 lib/xbitmap.c
 create mode 100644 tools/testing/radix-tree/linux/xbitmap.h

-- 
2.7.4


Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Tetsuo Handa 6 years, 4 months ago
Wei Wang wrote:
> ChangeLog:
> v19->v20:
> 1) patch 1: xbitmap
> 	- add __rcu to "void **slot";
> 	- remove the exceptional path.
> 2) patch 3: xbitmap
> 	- DeveloperNotes: add an item to comment that the current bit range
> 	  related APIs operating on extremely large ranges (e.g.
>           [0, ULONG_MAX)) will take too long time. This can be optimized in
> 	  the future.
> 	- remove the exceptional path;
> 	- remove xb_preload_and_set();
> 	- reimplement xb_clear_bit_range to make its usage close to
> 	  bitmap_clear;
> 	- rename xb_find_next_set_bit to xb_find_set, and re-implement it
> 	  in a style close to find_next_bit;
> 	- rename xb_find_next_zero_bit to xb_find_clear, and re-implement
> 	  it in a stytle close to find_next_zero_bit;
> 	- separate the implementation of xb_find_set and xb_find_clear for
> 	  the convenience of future updates.

Removing exceptional path made this patch easier to read.
But what I meant is

  Can you eliminate exception path and fold all xbitmap patches into one, and
  post only one xbitmap patch without virtio-balloon changes? 

.

I still think we don't need xb_preload()/xb_preload_end().
I think xb_find_set() has a bug in !node path.

Also, please avoid unconditionally adding to builtin modules.
There are users who want to save even few KB.

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Matthew Wilcox 6 years, 4 months ago
On Tue, Dec 19, 2017 at 11:05:11PM +0900, Tetsuo Handa wrote:
> Removing exceptional path made this patch easier to read.
> But what I meant is
> 
>   Can you eliminate exception path and fold all xbitmap patches into one, and
>   post only one xbitmap patch without virtio-balloon changes? 
> 
> .
> 
> I still think we don't need xb_preload()/xb_preload_end().
> I think xb_find_set() has a bug in !node path.

Don't think.  Write a test-case.  Please.  If it shows a bug, then great,
Wei has an example of what the bug is to fix.  If it doesn't show a bug,
then we can add it to the test suite anyway, to ensure that case continues
to work in the future.

> Also, please avoid unconditionally adding to builtin modules.
> There are users who want to save even few KB.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Tetsuo Handa 6 years, 4 months ago
Matthew Wilcox wrote:
> > I think xb_find_set() has a bug in !node path.
> 
> Don't think.  Write a test-case.  Please.  If it shows a bug, then great,

+unsigned long xb_find_set(struct xb *xb, unsigned long size,
+			  unsigned long offset)
+{
+	struct radix_tree_root *root = &xb->xbrt;
+	struct radix_tree_node *node;
+	void __rcu **slot;
+	struct ida_bitmap *bitmap;
+	unsigned long index = offset / IDA_BITMAP_BITS;
+	unsigned long index_end = size / IDA_BITMAP_BITS;
+	unsigned long bit = offset % IDA_BITMAP_BITS;
+
+	if (unlikely(offset >= size))
+		return size;
+
+	while (index <= index_end) {
+		unsigned long ret;
+		unsigned int nbits = size - index * IDA_BITMAP_BITS;
+
+		bitmap = __radix_tree_lookup(root, index, &node, &slot);
+		if (!node) {
+			index = (index | RADIX_TREE_MAP_MASK) + 1;

Why we don't need to reset "bit" to 0 here?
We will continue with wrong offset if "bit != 0", won't we?

+			continue;
+		}
+
+		if (bitmap) {
+			if (nbits > IDA_BITMAP_BITS)
+				nbits = IDA_BITMAP_BITS;
+
+			ret = find_next_bit(bitmap->bitmap, nbits, bit);
+			if (ret != nbits)
+				return ret + index * IDA_BITMAP_BITS;
+		}
+		bit = 0;
+		index++;
+	}
+
+	return size;
+}

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Michael S. Tsirkin 6 years, 4 months ago
On Tue, Dec 19, 2017 at 11:05:11PM +0900, Tetsuo Handa wrote:
> Wei Wang wrote:
> > ChangeLog:
> > v19->v20:
> > 1) patch 1: xbitmap
> > 	- add __rcu to "void **slot";
> > 	- remove the exceptional path.
> > 2) patch 3: xbitmap
> > 	- DeveloperNotes: add an item to comment that the current bit range
> > 	  related APIs operating on extremely large ranges (e.g.
> >           [0, ULONG_MAX)) will take too long time. This can be optimized in
> > 	  the future.
> > 	- remove the exceptional path;
> > 	- remove xb_preload_and_set();
> > 	- reimplement xb_clear_bit_range to make its usage close to
> > 	  bitmap_clear;
> > 	- rename xb_find_next_set_bit to xb_find_set, and re-implement it
> > 	  in a style close to find_next_bit;
> > 	- rename xb_find_next_zero_bit to xb_find_clear, and re-implement
> > 	  it in a stytle close to find_next_zero_bit;
> > 	- separate the implementation of xb_find_set and xb_find_clear for
> > 	  the convenience of future updates.
> 
> Removing exceptional path made this patch easier to read.
> But what I meant is
> 
>   Can you eliminate exception path and fold all xbitmap patches into one, and
>   post only one xbitmap patch without virtio-balloon changes? 

And then people will complain that patch is too big
and hard to understand without any users.

As long as patches don't change the same lines of code,
it's fine to split up imho. In this case it's done
to attribute code better, seems like a reasonable thing to do
to me.

> .
> 
> I still think we don't need xb_preload()/xb_preload_end().
> I think xb_find_set() has a bug in !node path.
> 
> Also, please avoid unconditionally adding to builtin modules.
> There are users who want to save even few KB.

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Wei Wang 6 years, 4 months ago
On 12/19/2017 10:05 PM, Tetsuo Handa wrote:
> Wei Wang wrote:
>> ChangeLog:
>> v19->v20:
>> 1) patch 1: xbitmap
>> 	- add __rcu to "void **slot";
>> 	- remove the exceptional path.
>> 2) patch 3: xbitmap
>> 	- DeveloperNotes: add an item to comment that the current bit range
>> 	  related APIs operating on extremely large ranges (e.g.
>>            [0, ULONG_MAX)) will take too long time. This can be optimized in
>> 	  the future.
>> 	- remove the exceptional path;
>> 	- remove xb_preload_and_set();
>> 	- reimplement xb_clear_bit_range to make its usage close to
>> 	  bitmap_clear;
>> 	- rename xb_find_next_set_bit to xb_find_set, and re-implement it
>> 	  in a style close to find_next_bit;
>> 	- rename xb_find_next_zero_bit to xb_find_clear, and re-implement
>> 	  it in a stytle close to find_next_zero_bit;
>> 	- separate the implementation of xb_find_set and xb_find_clear for
>> 	  the convenience of future updates.
> Removing exceptional path made this patch easier to read.
> But what I meant is
>
>    Can you eliminate exception path and fold all xbitmap patches into one, and
>    post only one xbitmap patch without virtio-balloon changes?
>
> .
>
> I still think we don't need xb_preload()/xb_preload_end().

Why would you think preload is not needed?

The bitmap is allocated via preload "bitmap = 
this_cpu_cmpxchg(ida_bitmap, NULL, bitmap);", this allocated bitmap 
would be used in xb_set_bit().


> I think xb_find_set() has a bug in !node path.

I think we can probably remove the "!node" path for now. It would be 
good to get the fundamental part in first, and leave optimization to 
come as separate patches with corresponding test cases in the future.

Best,
Wei

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Matthew Wilcox 6 years, 4 months ago
On Wed, Dec 20, 2017 at 06:34:36PM +0800, Wei Wang wrote:
> On 12/19/2017 10:05 PM, Tetsuo Handa wrote:
> > I think xb_find_set() has a bug in !node path.
> 
> I think we can probably remove the "!node" path for now. It would be good to
> get the fundamental part in first, and leave optimization to come as
> separate patches with corresponding test cases in the future.

You can't remove the !node path.  You'll see !node when the highest set
bit is less than 1024.  So do something like this:

	unsigned long bit;
	xb_preload(GFP_KERNEL);
	xb_set_bit(xb, 700);
	xb_preload_end();
	bit = xb_find_set(xb, ULONG_MAX, 0);
	assert(bit == 700);


Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Wang, Wei W 6 years, 4 months ago
On Wednesday, December 20, 2017 8:26 PM, Matthew Wilcox wrote:
> On Wed, Dec 20, 2017 at 06:34:36PM +0800, Wei Wang wrote:
> > On 12/19/2017 10:05 PM, Tetsuo Handa wrote:
> > > I think xb_find_set() has a bug in !node path.
> >
> > I think we can probably remove the "!node" path for now. It would be
> > good to get the fundamental part in first, and leave optimization to
> > come as separate patches with corresponding test cases in the future.
> 
> You can't remove the !node path.  You'll see !node when the highest set bit
> is less than 1024.  So do something like this:
> 
> 	unsigned long bit;
> 	xb_preload(GFP_KERNEL);
> 	xb_set_bit(xb, 700);
> 	xb_preload_end();
> 	bit = xb_find_set(xb, ULONG_MAX, 0);
> 	assert(bit == 700);

This above test will result in "!node with bitmap !=NULL", and it goes to the regular "if (bitmap)" path, which finds 700.

A better test would be
...
xb_set_bit(xb, 700);
assert(xb_find_set(xb, ULONG_MAX, 800) == ULONG_MAX);
...

The first try with the "if (bitmap)" path doesn't find a set bit, and the remaining tries will always result in "!node && !bitmap", which implies no set bit anymore and no need to try in this case.

So, I think we can change it to

If (!node && !bitmap)
	return size;


Best,
Wei



Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Matthew Wilcox 6 years, 4 months ago
On Wed, Dec 20, 2017 at 04:13:16PM +0000, Wang, Wei W wrote:
> On Wednesday, December 20, 2017 8:26 PM, Matthew Wilcox wrote:
> > 	unsigned long bit;
> > 	xb_preload(GFP_KERNEL);
> > 	xb_set_bit(xb, 700);
> > 	xb_preload_end();
> > 	bit = xb_find_set(xb, ULONG_MAX, 0);
> > 	assert(bit == 700);
> 
> This above test will result in "!node with bitmap !=NULL", and it goes to the regular "if (bitmap)" path, which finds 700.
> 
> A better test would be
> ...
> xb_set_bit(xb, 700);
> assert(xb_find_set(xb, ULONG_MAX, 800) == ULONG_MAX);
> ...

I decided to write a test case to show you what I meant, then I discovered
the test suite didn't build, then the test I wrote took forever to run, so
I rewrote xb_find_set() using the radix tree iterators.  So I have no idea
what bugs may be in your implementation, but at least this function passes
the current test suite.  Of course, there may be gaps in the test suite.
And since I changed the API to not have the ambiguous return value, I
also changed the test suite, and maybe I introduced a bug.

diff --git a/include/linux/xbitmap.h b/include/linux/xbitmap.h
index ede1029b8a27..96e7e3560a0e 100644
--- a/include/linux/xbitmap.h
+++ b/include/linux/xbitmap.h
@@ -37,8 +37,7 @@ bool xb_test_bit(const struct xb *xb, unsigned long bit);
 void xb_clear_bit(struct xb *xb, unsigned long bit);
 void xb_clear_bit_range(struct xb *xb, unsigned long start,
 			unsigned long nbits);
-unsigned long xb_find_set(struct xb *xb, unsigned long size,
-			  unsigned long offset);
+bool xb_find_set(struct xb *xb, unsigned long max, unsigned long *bit);
 unsigned long xb_find_zero(struct xb *xb, unsigned long size,
 			   unsigned long offset);
 
diff --git a/lib/xbitmap.c b/lib/xbitmap.c
index 0bd3027b082d..58c26c8dd595 100644
--- a/lib/xbitmap.c
+++ b/lib/xbitmap.c
@@ -150,48 +150,39 @@ EXPORT_SYMBOL(xb_test_bit);
 /**
  * xb_find_set - find the next set bit in a range of bits
  * @xb: the xbitmap to search from
- * @offset: the offset in the range to start searching
- * @size: the size of the range
+ * @max: the maximum position to search to
+ * @bit: the first bit to examine, and on exit, the found bit
  *
- * Returns: the found bit or, @size if no set bit is found.
+ * Returns: %true if a set bit was found.  @bit will be updated.
  */
-unsigned long xb_find_set(struct xb *xb, unsigned long size,
-			  unsigned long offset)
+bool xb_find_set(struct xb *xb, unsigned long max, unsigned long *bit)
 {
-	struct radix_tree_root *root = &xb->xbrt;
-	struct radix_tree_node *node;
+	struct radix_tree_iter iter;
 	void __rcu **slot;
 	struct ida_bitmap *bitmap;
-	unsigned long index = offset / IDA_BITMAP_BITS;
-	unsigned long index_end = size / IDA_BITMAP_BITS;
-	unsigned long bit = offset % IDA_BITMAP_BITS;
-
-	if (unlikely(offset >= size))
-		return size;
-
-	while (index <= index_end) {
-		unsigned long ret;
-		unsigned int nbits = size - index * IDA_BITMAP_BITS;
-
-		bitmap = __radix_tree_lookup(root, index, &node, &slot);
-		if (!node) {
-			index = (index | RADIX_TREE_MAP_MASK) + 1;
-			continue;
-		}
-
+	unsigned long index = *bit / IDA_BITMAP_BITS;
+	unsigned int first = *bit % IDA_BITMAP_BITS;
+	unsigned long index_end = max / IDA_BITMAP_BITS;
+
+	radix_tree_for_each_slot(slot, &xb->xbrt, &iter, index) {
+		if (iter.index > index_end)
+			break;
+		bitmap = radix_tree_deref_slot(slot);
 		if (bitmap) {
-			if (nbits > IDA_BITMAP_BITS)
-				nbits = IDA_BITMAP_BITS;
-
-			ret = find_next_bit(bitmap->bitmap, nbits, bit);
-			if (ret != nbits)
-				return ret + index * IDA_BITMAP_BITS;
+			unsigned int nbits = IDA_BITMAP_BITS;
+			if (iter.index == index_end)
+				nbits = max % IDA_BITMAP_BITS + 1;
+
+			first = find_next_bit(bitmap->bitmap, nbits, first);
+			if (first != nbits) {
+				*bit = first + iter.index * IDA_BITMAP_BITS;
+				return true;
+			}
 		}
-		bit = 0;
-		index++;
+		first = 0;
 	}
 
-	return size;
+	return false;
 }
 EXPORT_SYMBOL(xb_find_set);
 
@@ -246,19 +237,30 @@ static DEFINE_XB(xb1);
 
 void xbitmap_check_bit(unsigned long bit)
 {
+	unsigned long nbit = 0;
+
 	xb_preload(GFP_KERNEL);
 	assert(!xb_test_bit(&xb1, bit));
 	assert(xb_set_bit(&xb1, bit) == 0);
 	assert(xb_test_bit(&xb1, bit));
-	assert(xb_clear_bit(&xb1, bit) == 0);
+	assert(xb_find_set(&xb1, ULONG_MAX, &nbit) == true);
+	assert(nbit == bit);
+	assert(xb_find_set(&xb1, ULONG_MAX, &nbit) == true);
+	assert(nbit == bit);
+	nbit++;
+	assert(xb_find_set(&xb1, ULONG_MAX, &nbit) == false);
+	assert(nbit == bit + 1);
+	xb_clear_bit(&xb1, bit);
 	assert(xb_empty(&xb1));
-	assert(xb_clear_bit(&xb1, bit) == 0);
+	xb_clear_bit(&xb1, bit);
 	assert(xb_empty(&xb1));
 	xb_preload_end();
 }
 
 static void xbitmap_check_bit_range(void)
 {
+	unsigned long nbit;
+
 	/*
 	 * Regular tests
 	 * set bit 2000, 2001, 2040
@@ -273,14 +275,23 @@ static void xbitmap_check_bit_range(void)
 	assert(!xb_set_bit(&xb1, 2000));
 	assert(!xb_set_bit(&xb1, 2001));
 	assert(!xb_set_bit(&xb1, 2040));
-	assert(xb_find_set(&xb1, 2048, 0) == 2000);
-	assert(xb_find_set(&xb1, 2002, 2000) == 2000);
-	assert(xb_find_set(&xb1, 2041, 2002) == 2040);
-	assert(xb_find_set(&xb1, 2040, 2002) == 2040);
+	nbit = 0;
+	assert(xb_find_set(&xb1, 2048, &nbit) == true);
+	assert(nbit == 2000);
+	assert(xb_find_set(&xb1, 2002, &nbit) == true);
+	assert(nbit == 2000);
+	nbit = 2002;
+	assert(xb_find_set(&xb1, 2041, &nbit) == true);
+	assert(nbit == 2040);
+	nbit = 2002;
+	assert(xb_find_set(&xb1, 2040, &nbit) == true);
+	assert(nbit == 2040);
 	assert(xb_find_zero(&xb1, 2048, 2000) == 2002);
 	assert(xb_find_zero(&xb1, 2060, 2048) == 2048);
 	xb_clear_bit_range(&xb1, 0, 2048);
-	assert(xb_find_set(&xb1, 2048, 0) == 2048);
+	nbit = 0;
+	assert(xb_find_set(&xb1, 2048, &nbit) == false);
+	assert(nbit == 0);
 	xb_preload_end();
 
 	/*
@@ -295,15 +306,22 @@ static void xbitmap_check_bit_range(void)
 	xb_preload_end();
 	xb_preload(GFP_KERNEL);
 	assert(!xb_set_bit(&xb1, ULONG_MAX - 4));
-	assert(xb_find_set(&xb1, ULONG_MAX, ULONG_MAX - 4) == ULONG_MAX - 4);
-	assert(xb_find_set(&xb1, ULONG_MAX + 4, ULONG_MAX - 3) ==
-	       ULONG_MAX + 4);
+	nbit = ULONG_MAX - 4;
+	assert(xb_find_set(&xb1, ULONG_MAX, &nbit) == true);
+	assert(nbit == ULONG_MAX - 4);
+	nbit++;
+	assert(xb_find_set(&xb1, ULONG_MAX + 4, &nbit) == false);
+	assert(nbit == ULONG_MAX - 3);
 	assert(xb_find_zero(&xb1, ULONG_MAX + 4, ULONG_MAX - 4) ==
 	       ULONG_MAX + 4);
 	xb_clear_bit_range(&xb1, ULONG_MAX - 4, 4);
-	assert(xb_find_set(&xb1, ULONG_MAX, ULONG_MAX - 10) == ULONG_MAX);
+	nbit = ULONG_MAX - 10;
+	assert(xb_find_set(&xb1, ULONG_MAX, &nbit) == false);
+	assert(nbit == ULONG_MAX - 10);
 	xb_clear_bit_range(&xb1, 0, 2);
-	assert(xb_find_set(&xb1, 2, 0) == 2);
+	nbit = 0;
+	assert(xb_find_set(&xb1, 2, &nbit) == false);
+	assert(nbit == 0);
 	xb_preload_end();
 }
 
@@ -313,6 +331,10 @@ void xbitmap_checks(void)
 	xbitmap_check_bit(0);
 	xbitmap_check_bit(30);
 	xbitmap_check_bit(31);
+	xbitmap_check_bit(62);
+	xbitmap_check_bit(63);
+	xbitmap_check_bit(64);
+	xbitmap_check_bit(700);
 	xbitmap_check_bit(1023);
 	xbitmap_check_bit(1024);
 	xbitmap_check_bit(1025);
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index 34ece7883629..adf36e34dd77 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -1,9 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 
 CFLAGS += -I. -I../../include -g -O2 -Wall -D_LGPL_SOURCE -fsanitize=address
-LDFLAGS += -fsanitize=address
-LDLIBS+= -lpthread -lurcu
-TARGETS = main idr-test multiorder
+LDFLAGS += -fsanitize=address $(LDLIBS)
+LDLIBS := -lpthread -lurcu
+TARGETS = main idr-test multiorder xbitmap
 CORE_OFILES := radix-tree.o idr.o linux.o test.o find_bit.o
 OFILES = main.o $(CORE_OFILES) regression1.o regression2.o regression3.o \
 	 tag_check.o multiorder.o idr-test.o iteration_check.o benchmark.o \
diff --git a/tools/testing/radix-tree/linux/kernel.h b/tools/testing/radix-tree/linux/kernel.h
index c3bc3f364f68..426f32f28547 100644
--- a/tools/testing/radix-tree/linux/kernel.h
+++ b/tools/testing/radix-tree/linux/kernel.h
@@ -17,6 +17,4 @@
 #define pr_debug printk
 #define pr_cont printk
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
-
 #endif /* _KERNEL_H */

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Wei Wang 6 years, 4 months ago
On 12/21/2017 01:10 AM, Matthew Wilcox wrote:
> On Wed, Dec 20, 2017 at 04:13:16PM +0000, Wang, Wei W wrote:
>> On Wednesday, December 20, 2017 8:26 PM, Matthew Wilcox wrote:
>>> 	unsigned long bit;
>>> 	xb_preload(GFP_KERNEL);
>>> 	xb_set_bit(xb, 700);
>>> 	xb_preload_end();
>>> 	bit = xb_find_set(xb, ULONG_MAX, 0);
>>> 	assert(bit == 700);
>> This above test will result in "!node with bitmap !=NULL", and it goes to the regular "if (bitmap)" path, which finds 700.
>>
>> A better test would be
>> ...
>> xb_set_bit(xb, 700);
>> assert(xb_find_set(xb, ULONG_MAX, 800) == ULONG_MAX);
>> ...
> I decided to write a test case to show you what I meant, then I discovered
> the test suite didn't build, then the test I wrote took forever to run, so
> I rewrote xb_find_set() using the radix tree iterators.  So I have no idea
> what bugs may be in your implementation, but at least this function passes
> the current test suite.  Of course, there may be gaps in the test suite.
> And since I changed the API to not have the ambiguous return value, I
> also changed the test suite, and maybe I introduced a bug.

Thanks for the effort. That's actually caused by the previous "!node" 
path, which incorrectly changed "index = (index | RADIX_TREE_MAP_MASK) + 
1". With the change below, it will run pretty well with the test cases.

if (!node && !bitmap)
     return size;

Would you mind to have a try with the v20 RESEND patch that was just 
shared? It makes the above change and added the test case you suggested?

One more question is about the return value, why would it be ambiguous? 
I think it is the same as find_next_bit() which returns the found bit or 
size if not found.


Best,
Wei

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Matthew Wilcox 6 years, 4 months ago
On Thu, Dec 21, 2017 at 10:49:44AM +0800, Wei Wang wrote:
> On 12/21/2017 01:10 AM, Matthew Wilcox wrote:
> One more question is about the return value, why would it be ambiguous? I
> think it is the same as find_next_bit() which returns the found bit or size
> if not found.

Because find_next_bit doesn't reasonably support a bitmap which is
ULONG_MAX in size.  The point of XBitmap is to support a bitmap which
is ULONG_MAX in size, so every possible return value is a legitimate
"we found a bit here".  There's no value which can possibly be used for
"no bit was found".

Re: [Qemu-devel] [PATCH v20 0/7] Virtio-balloon Enhancement
Posted by Tetsuo Handa 6 years, 4 months ago
Wei Wang wrote:
> Thanks for the effort. That's actually caused by the previous "!node" 
> path, which incorrectly changed "index = (index | RADIX_TREE_MAP_MASK) + 
> 1". With the change below, it will run pretty well with the test cases.
> 
> if (!node && !bitmap)
>      return size;
> 
> Would you mind to have a try with the v20 RESEND patch that was just 
> shared?

No. Please explain what "!node" situation indicates. Why did you try
"index = (index | RADIX_TREE_MAP_MASK) + 1; continue;" in the previous patch?

+unsigned long xb_find_set(struct xb *xb, unsigned long size,
+			  unsigned long offset)
+{
+	struct radix_tree_root *root = &xb->xbrt;
+	struct radix_tree_node *node;
+	void __rcu **slot;
+	struct ida_bitmap *bitmap;
+	unsigned long index = offset / IDA_BITMAP_BITS;
+	unsigned long index_end = size / IDA_BITMAP_BITS;
+	unsigned long bit = offset % IDA_BITMAP_BITS;
+
+	if (unlikely(offset >= size))
+		return size;
+
+	while (index <= index_end) {
+		unsigned long ret;
+		unsigned int nbits = size - index * IDA_BITMAP_BITS;
+
+		bitmap = __radix_tree_lookup(root, index, &node, &slot);
+
+		if (!node && !bitmap)
+			return size;
+
+		if (bitmap) {
+			if (nbits > IDA_BITMAP_BITS)
+				nbits = IDA_BITMAP_BITS;
+
+			ret = find_next_bit(bitmap->bitmap, nbits, bit);
+			if (ret != nbits)
+				return ret + index * IDA_BITMAP_BITS;
+		}
+		bit = 0;
+		index++;
+	}
+
+	return size;
+}