[v1] x86: more or less log-dirty related improvements

[PATCH 04/12] libxenguest: avoid allocating unused deferred-pages bitmap

Posted by Jan Beulich 4 years, 7 months ago

Like for the dirty bitmap, it is unnecessary to allocate the deferred-
pages bitmap when all that's ever going to happen is a single all-dirty
run.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
The clearing of the bitmap at the end of suspend_and_send_dirty() also
looks unnecessary - am I overlooking anything?

--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -130,7 +130,7 @@ static int write_batch(struct xc_sr_cont
                                                       ctx->save.batch_pfns[i]);
 
         /* Likely a ballooned page. */
-        if ( mfns[i] == INVALID_MFN )
+        if ( mfns[i] == INVALID_MFN && ctx->save.deferred_pages )
         {
             set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
             ++ctx->save.nr_deferred_pages;
@@ -196,8 +196,12 @@ static int write_batch(struct xc_sr_cont
             {
                 if ( rc == -1 && errno == EAGAIN )
                 {
-                    set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
-                    ++ctx->save.nr_deferred_pages;
+                    if ( ctx->save.deferred_pages )
+                    {
+                        set_bit(ctx->save.batch_pfns[i],
+                                ctx->save.deferred_pages);
+                        ++ctx->save.nr_deferred_pages;
+                    }
                     types[i] = XEN_DOMCTL_PFINFO_XTAB;
                     --nr_pages;
                 }
@@ -665,7 +669,8 @@ static int suspend_and_send_dirty(struct
     else
         xc_set_progress_prefix(xch, "Checkpointed save");
 
-    bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_size);
+    if ( ctx->save.deferred_pages )
+        bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_size);
 
     if ( !ctx->save.live && ctx->stream_type == XC_STREAM_COLO )
     {
@@ -682,7 +687,8 @@ static int suspend_and_send_dirty(struct
     if ( rc )
         goto out;
 
-    bitmap_clear(ctx->save.deferred_pages, ctx->save.p2m_size);
+    if ( ctx->save.deferred_pages )
+        bitmap_clear(ctx->save.deferred_pages, ctx->save.p2m_size);
     ctx->save.nr_deferred_pages = 0;
 
  out:
@@ -791,24 +797,31 @@ static int setup(struct xc_sr_context *c
 {
     xc_interface *xch = ctx->xch;
     int rc;
-    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
-                                    &ctx->save.dirty_bitmap_hbuf);
 
     rc = ctx->save.ops.setup(ctx);
     if ( rc )
         goto err;
 
-    dirty_bitmap = ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN
-        ? xc_hypercall_buffer_alloc_pages(
-              xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)))
-        : (void *)-1L;
+    if ( ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN )
+    {
+        DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                        &ctx->save.dirty_bitmap_hbuf);
+
+        dirty_bitmap =
+            xc_hypercall_buffer_alloc_pages(
+                xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
+        ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
+
+        if ( !dirty_bitmap || !ctx->save.deferred_pages )
+            goto enomem;
+    }
 
     ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
                                   sizeof(*ctx->save.batch_pfns));
-    ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
 
-    if ( !ctx->save.batch_pfns || !dirty_bitmap || !ctx->save.deferred_pages )
+    if ( !ctx->save.batch_pfns )
     {
+    enomem:
         ERROR("Unable to allocate memory for dirty bitmaps, batch pfns and"
               " deferred pages");
         rc = -1;

Re: [PATCH 04/12] libxenguest: avoid allocating unused deferred-pages bitmap

Posted by Andrew Cooper 4 years, 7 months ago

On 25/06/2021 14:19, Jan Beulich wrote:
> Like for the dirty bitmap, it is unnecessary to allocate the deferred-
> pages bitmap when all that's ever going to happen is a single all-dirty
> run.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> The clearing of the bitmap at the end of suspend_and_send_dirty() also
> looks unnecessary - am I overlooking anything?

Yes. Remus and COLO.  You don't want accumulate successfully-sent
deferred pages over checkpoints, otherwise you'll eventually be sending
the entire VM every checkpoint.


Answering out of patch order...
> @@ -791,24 +797,31 @@ static int setup(struct xc_sr_context *c
>  {
>      xc_interface *xch = ctx->xch;
>      int rc;
> -    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> -                                    &ctx->save.dirty_bitmap_hbuf);
>  
>      rc = ctx->save.ops.setup(ctx);
>      if ( rc )
>          goto err;
>  
> -    dirty_bitmap = ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN
> -        ? xc_hypercall_buffer_alloc_pages(
> -              xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)))
> -        : (void *)-1L;
> +    if ( ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN )
> +    {
> +        DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> +                                        &ctx->save.dirty_bitmap_hbuf);
> +
> +        dirty_bitmap =
> +            xc_hypercall_buffer_alloc_pages(
> +                xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
> +        ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
> +
> +        if ( !dirty_bitmap || !ctx->save.deferred_pages )
> +            goto enomem;
> +    }

So this is better than the previous patch.  At least we've got a clean
NULL pointer now.

I could in principle get on board with the optimisation, except its not
safe (see below).

> --- a/tools/libs/guest/xg_sr_save.c
> +++ b/tools/libs/guest/xg_sr_save.c
> @@ -130,7 +130,7 @@ static int write_batch(struct xc_sr_cont
>                                                        ctx->save.batch_pfns[i]);
>  
>          /* Likely a ballooned page. */
> -        if ( mfns[i] == INVALID_MFN )
> +        if ( mfns[i] == INVALID_MFN && ctx->save.deferred_pages )
>          {
>              set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
>              ++ctx->save.nr_deferred_pages;
> @@ -196,8 +196,12 @@ static int write_batch(struct xc_sr_cont
>              {
>                  if ( rc == -1 && errno == EAGAIN )
>                  {
> -                    set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
> -                    ++ctx->save.nr_deferred_pages;
> +                    if ( ctx->save.deferred_pages )
> +                    {
> +                        set_bit(ctx->save.batch_pfns[i],
> +                                ctx->save.deferred_pages);
> +                        ++ctx->save.nr_deferred_pages;
> +                    }

These two blocks are the only two which modify deferred_pages.

It occurs to me that this means deferred_pages is PV-only, because of
the stub implementations of x86_hvm_pfn_to_gfn() and
x86_hvm_normalise_page().  Furthermore, this is likely to be true for
any HVM-like domains even on other architectures.

If these instead were hard errors when !deferred_pages, then that at
least get the logic into an acceptable state. 

However, the first hunk demonstrates that deferred_pages gets used even
in the non-live case.  In particular, it is sensitive to errors with the
guests' handling of its own P2M.  Also, I can't obviously spot anything
which will correctly fail migration if deferred pages survive the final
iteration.

~Andrew

Re: [PATCH 04/12] libxenguest: avoid allocating unused deferred-pages bitmap

Posted by Jan Beulich 4 years, 7 months ago

On 25.06.2021 20:08, Andrew Cooper wrote:
> On 25/06/2021 14:19, Jan Beulich wrote:
>> Like for the dirty bitmap, it is unnecessary to allocate the deferred-
>> pages bitmap when all that's ever going to happen is a single all-dirty
>> run.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> The clearing of the bitmap at the end of suspend_and_send_dirty() also
>> looks unnecessary - am I overlooking anything?
> 
> Yes. Remus and COLO.  You don't want accumulate successfully-sent
> deferred pages over checkpoints, otherwise you'll eventually be sending
> the entire VM every checkpoint.

Oh, so what I've really missed is save() being a loop over these
functions.

> Answering out of patch order...
>> @@ -791,24 +797,31 @@ static int setup(struct xc_sr_context *c
>>  {
>>      xc_interface *xch = ctx->xch;
>>      int rc;
>> -    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>> -                                    &ctx->save.dirty_bitmap_hbuf);
>>  
>>      rc = ctx->save.ops.setup(ctx);
>>      if ( rc )
>>          goto err;
>>  
>> -    dirty_bitmap = ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN
>> -        ? xc_hypercall_buffer_alloc_pages(
>> -              xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)))
>> -        : (void *)-1L;
>> +    if ( ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN )
>> +    {
>> +        DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>> +                                        &ctx->save.dirty_bitmap_hbuf);
>> +
>> +        dirty_bitmap =
>> +            xc_hypercall_buffer_alloc_pages(
>> +                xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
>> +        ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
>> +
>> +        if ( !dirty_bitmap || !ctx->save.deferred_pages )
>> +            goto enomem;
>> +    }
> 
> So this is better than the previous patch.  At least we've got a clean
> NULL pointer now.
> 
> I could in principle get on board with the optimisation, except its not
> safe (see below).
> 
>> --- a/tools/libs/guest/xg_sr_save.c
>> +++ b/tools/libs/guest/xg_sr_save.c
>> @@ -130,7 +130,7 @@ static int write_batch(struct xc_sr_cont
>>                                                        ctx->save.batch_pfns[i]);
>>  
>>          /* Likely a ballooned page. */
>> -        if ( mfns[i] == INVALID_MFN )
>> +        if ( mfns[i] == INVALID_MFN && ctx->save.deferred_pages )
>>          {
>>              set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
>>              ++ctx->save.nr_deferred_pages;
>> @@ -196,8 +196,12 @@ static int write_batch(struct xc_sr_cont
>>              {
>>                  if ( rc == -1 && errno == EAGAIN )
>>                  {
>> -                    set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
>> -                    ++ctx->save.nr_deferred_pages;
>> +                    if ( ctx->save.deferred_pages )
>> +                    {
>> +                        set_bit(ctx->save.batch_pfns[i],
>> +                                ctx->save.deferred_pages);
>> +                        ++ctx->save.nr_deferred_pages;
>> +                    }
> 
> These two blocks are the only two which modify deferred_pages.
> 
> It occurs to me that this means deferred_pages is PV-only, because of
> the stub implementations of x86_hvm_pfn_to_gfn() and
> x86_hvm_normalise_page().  Furthermore, this is likely to be true for
> any HVM-like domains even on other architectures.

IOW are you suggesting to also avoid allocation for HVM live
migration, thus effectively making assumptions on the two hooks
being just stubs in that case, which can't ever fail?

> If these instead were hard errors when !deferred_pages, then that at
> least get the logic into an acceptable state. 

But the goal here isn't to change the logic, just to avoid allocating
memory that's effectively never used. What you suggest could be a
separate patch, yes, but I'm afraid I'm not feeling confident enough
in understanding why you think this needs changing, so I'd prefer to
leave such a change to you. (If I was to apply some guessing to what
you may mean, I could deduce that you think ->nr_deferred_pages may
still need maintaining, with it being non-zero at the end of the last
step causing migration to fail. But there would then still not be any
need for the bitmap itself in the cases where it no longer gets
allocated.)

> However, the first hunk demonstrates that deferred_pages gets used even
> in the non-live case.  In particular, it is sensitive to errors with the
> guests' handling of its own P2M.  Also, I can't obviously spot anything
> which will correctly fail migration if deferred pages survive the final
> iteration.

How does the first hunk demonstrate this? The question isn't when
the bitmap gets updated, but under what conditions it gets consumed.
If the only sending function ever called is suspend_and_send_dirty(),
then nothing would ever have had a chance to set any bit. And any
bits set in the course of suspend_and_send_dirty() running will get
cleared again at the end of suspend_and_send_dirty().

Jan

Re: [PATCH 04/12] libxenguest: avoid allocating unused deferred-pages bitmap

Posted by Ian Jackson 4 years, 5 months ago

Jan Beulich writes ("Re: [PATCH 04/12] libxenguest: avoid allocating unused deferred-pages bitmap"):
> [stuff]

I have read this conversation several times and it is not clear to me
whether Andrew was saying Jan's patch is bad, or the existing code is
bad.

I'm hesitant to give an ack for an optimisation without understanding
what the implications might be.  Andrew, can you explain what will go
wrong if we take Jan's patch ?

(I amk not really familiar with this area of the code.  If necessary I
could go and read it to form my own opinion.)

Thanks,
Ian.