[v2] migration/ram: Optimize for virtio-mem via RamDiscardManager

[PATCH v2 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination

Posted by David Hildenbrand 4 years, 6 months ago

Currently, when someone (i.e., the VM) accesses discarded parts inside a
RAMBlock with a RamDiscardManager managing the corresponding mapped memory
region, postcopy will request migration of the corresponding page from the
source. The source, however, will never answer, because it refuses to
migrate such pages with undefined content ("logically unplugged"): the
pages are never dirty, and get_queued_page() will consequently skip
processing these postcopy requests.

Especially reading discarded ("logically unplugged") ranges is supposed to
work in some setups (for example with current virtio-mem), although it
barely ever happens: still, not placing a page would currently stall the
VM, as it cannot make forward progress.

Let's check the state via the RamDiscardManager (the state e.g.,
of virtio-mem is migrated during precopy) and avoid sending a request
that will never get answered. Place a fresh zero page instead to keep
the VM working. This is the same behavior that would happen
automatically without userfaultfd being active, when accessing virtual
memory regions without populated pages -- "populate on demand".

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/postcopy-ram.c | 25 +++++++++++++++++++++----
 migration/ram.c          | 23 +++++++++++++++++++++++
 migration/ram.h          |  1 +
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 2e9697bdd2..673f60ce2b 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -671,6 +671,23 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
     return ret;
 }
 
+static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb,
+                                 ram_addr_t start, uint64_t haddr)
+{
+    /*
+     * Discarded pages (via RamDiscardManager) are never migrated. On unlikely
+     * access, place a zeropage, which will also set the relevant bits in the
+     * recv_bitmap accordingly, so we won't try placing a zeropage twice.
+     */
+    if (ramblock_page_is_discarded(rb, start)) {
+        bool received = ramblock_recv_bitmap_test_byte_offset(rb, start);
+
+        return received ? 0 : postcopy_place_page_zero(mis, (void *)haddr, rb);
+    }
+
+    return migrate_send_rp_req_pages(mis, rb, start, haddr);
+}
+
 /*
  * Callback from shared fault handlers to ask for a page,
  * the page must be specified by a RAMBlock and an offset in that rb
@@ -690,7 +707,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
                                         qemu_ram_get_idstr(rb), rb_offset);
         return postcopy_wake_shared(pcfd, client_addr, rb);
     }
-    migrate_send_rp_req_pages(mis, rb, aligned_rbo, client_addr);
+    postcopy_request_page(mis, rb, aligned_rbo, client_addr);
     return 0;
 }
 
@@ -984,8 +1001,8 @@ retry:
              * Send the request to the source - we want to request one
              * of our host page sizes (which is >= TPS)
              */
-            ret = migrate_send_rp_req_pages(mis, rb, rb_offset,
-                                            msg.arg.pagefault.address);
+            ret = postcopy_request_page(mis, rb, rb_offset,
+                                        msg.arg.pagefault.address);
             if (ret) {
                 /* May be network failure, try to wait for recovery */
                 if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
@@ -993,7 +1010,7 @@ retry:
                     goto retry;
                 } else {
                     /* This is a unavoidable fault */
-                    error_report("%s: migrate_send_rp_req_pages() get %d",
+                    error_report("%s: postcopy_request_page() get %d",
                                  __func__, ret);
                     break;
                 }
diff --git a/migration/ram.c b/migration/ram.c
index 4bf74ae2e1..d7505f5368 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -876,6 +876,29 @@ static uint64_t ramblock_dirty_bitmap_exclude_discarded_pages(RAMBlock *rb)
     return cleared_bits;
 }
 
+/*
+ * Check if a page falls into a discarded range as managed by a
+ * RamDiscardManager responsible for the mapped memory region of the RAMBlock.
+ * Pages inside discarded ranges are never migrated and postcopy has to
+ * place zeropages instead.
+ *
+ * Note: The result is only stable while migration (precopy/postcopy).
+ */
+bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t offset)
+{
+    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        MemoryRegionSection section = {
+            .mr = rb->mr,
+            .offset_within_region = offset,
+            .size = int128_get64(TARGET_PAGE_SIZE),
+        };
+
+        return !ram_discard_manager_is_populated(rdm, &section);
+    }
+    return false;
+}
+
 /* Called with RCU critical section */
 static void ramblock_sync_dirty_bitmap(RAMState *rs, RAMBlock *rb)
 {
diff --git a/migration/ram.h b/migration/ram.h
index 4833e9fd5b..6e7f12e556 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -72,6 +72,7 @@ void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
 int64_t ramblock_recv_bitmap_send(QEMUFile *file,
                                   const char *block_name);
 int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
+bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t offset);
 
 /* ram cache */
 int colo_init_ram_cache(void);
-- 
2.31.1

Re: [PATCH v2 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination

Posted by Peter Xu 4 years, 6 months ago

On Wed, Jul 21, 2021 at 11:27:58AM +0200, David Hildenbrand wrote:
> Currently, when someone (i.e., the VM) accesses discarded parts inside a
> RAMBlock with a RamDiscardManager managing the corresponding mapped memory
> region, postcopy will request migration of the corresponding page from the
> source. The source, however, will never answer, because it refuses to
> migrate such pages with undefined content ("logically unplugged"): the
> pages are never dirty, and get_queued_page() will consequently skip
> processing these postcopy requests.
> 
> Especially reading discarded ("logically unplugged") ranges is supposed to
> work in some setups (for example with current virtio-mem), although it
> barely ever happens: still, not placing a page would currently stall the
> VM, as it cannot make forward progress.
> 
> Let's check the state via the RamDiscardManager (the state e.g.,
> of virtio-mem is migrated during precopy) and avoid sending a request
> that will never get answered. Place a fresh zero page instead to keep
> the VM working. This is the same behavior that would happen
> automatically without userfaultfd being active, when accessing virtual
> memory regions without populated pages -- "populate on demand".
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  migration/postcopy-ram.c | 25 +++++++++++++++++++++----
>  migration/ram.c          | 23 +++++++++++++++++++++++
>  migration/ram.h          |  1 +
>  3 files changed, 45 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 2e9697bdd2..673f60ce2b 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -671,6 +671,23 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
>      return ret;
>  }
>  
> +static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb,
> +                                 ram_addr_t start, uint64_t haddr)
> +{
> +    /*
> +     * Discarded pages (via RamDiscardManager) are never migrated. On unlikely
> +     * access, place a zeropage, which will also set the relevant bits in the
> +     * recv_bitmap accordingly, so we won't try placing a zeropage twice.
> +     */
> +    if (ramblock_page_is_discarded(rb, start)) {
> +        bool received = ramblock_recv_bitmap_test_byte_offset(rb, start);

Will received be set for any case with the current code base?  As I thought
virtio-mem forbids plug/unplug during the whole lifecycle of migration.

> +
> +        return received ? 0 : postcopy_place_page_zero(mis, (void *)haddr, rb);

(now we can fill up pages in two threads.. but looks thread-safe)

Meanwhile if this is highly not wanted, maybe worth an error_report_once() so
the admin could see something?

> +    }
> +
> +    return migrate_send_rp_req_pages(mis, rb, start, haddr);
> +}
> +
>  /*
>   * Callback from shared fault handlers to ask for a page,
>   * the page must be specified by a RAMBlock and an offset in that rb
> @@ -690,7 +707,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
>                                          qemu_ram_get_idstr(rb), rb_offset);
>          return postcopy_wake_shared(pcfd, client_addr, rb);
>      }
> -    migrate_send_rp_req_pages(mis, rb, aligned_rbo, client_addr);
> +    postcopy_request_page(mis, rb, aligned_rbo, client_addr);
>      return 0;
>  }
>  
> @@ -984,8 +1001,8 @@ retry:
>               * Send the request to the source - we want to request one
>               * of our host page sizes (which is >= TPS)
>               */
> -            ret = migrate_send_rp_req_pages(mis, rb, rb_offset,
> -                                            msg.arg.pagefault.address);
> +            ret = postcopy_request_page(mis, rb, rb_offset,
> +                                        msg.arg.pagefault.address);
>              if (ret) {
>                  /* May be network failure, try to wait for recovery */
>                  if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
> @@ -993,7 +1010,7 @@ retry:
>                      goto retry;
>                  } else {
>                      /* This is a unavoidable fault */
> -                    error_report("%s: migrate_send_rp_req_pages() get %d",
> +                    error_report("%s: postcopy_request_page() get %d",
>                                   __func__, ret);
>                      break;
>                  }
> diff --git a/migration/ram.c b/migration/ram.c
> index 4bf74ae2e1..d7505f5368 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -876,6 +876,29 @@ static uint64_t ramblock_dirty_bitmap_exclude_discarded_pages(RAMBlock *rb)
>      return cleared_bits;
>  }
>  
> +/*
> + * Check if a page falls into a discarded range as managed by a
> + * RamDiscardManager responsible for the mapped memory region of the RAMBlock.
> + * Pages inside discarded ranges are never migrated and postcopy has to
> + * place zeropages instead.
> + *
> + * Note: The result is only stable while migration (precopy/postcopy).
> + */
> +bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t offset)
> +{
> +    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
> +        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
> +        MemoryRegionSection section = {
> +            .mr = rb->mr,
> +            .offset_within_region = offset,
> +            .size = int128_get64(TARGET_PAGE_SIZE),

rb->page_size?

Although I think it should also work with TARGET_PAGE_SIZE in this specific
case, but maybe still better to use what we have.

> +        };
> +
> +        return !ram_discard_manager_is_populated(rdm, &section);
> +    }
> +    return false;
> +}
> +
>  /* Called with RCU critical section */
>  static void ramblock_sync_dirty_bitmap(RAMState *rs, RAMBlock *rb)
>  {
> diff --git a/migration/ram.h b/migration/ram.h
> index 4833e9fd5b..6e7f12e556 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -72,6 +72,7 @@ void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
>  int64_t ramblock_recv_bitmap_send(QEMUFile *file,
>                                    const char *block_name);
>  int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
> +bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t offset);
>  
>  /* ram cache */
>  int colo_init_ram_cache(void);
> -- 
> 2.31.1
> 

-- 
Peter Xu

Re: [PATCH v2 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination

Posted by David Hildenbrand 4 years, 6 months ago

>> +static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb,
>> +                                 ram_addr_t start, uint64_t haddr)
>> +{
>> +    /*
>> +     * Discarded pages (via RamDiscardManager) are never migrated. On unlikely
>> +     * access, place a zeropage, which will also set the relevant bits in the
>> +     * recv_bitmap accordingly, so we won't try placing a zeropage twice.
>> +     */
>> +    if (ramblock_page_is_discarded(rb, start)) {
>> +        bool received = ramblock_recv_bitmap_test_byte_offset(rb, start);
> 
> Will received be set for any case with the current code base?  As I thought
> virtio-mem forbids plug/unplug during the whole lifecycle of migration.

receive would only be set if you have two CPUs faulting on the same 
address at the same time and the first one already placed a zeropage on 
this code path (as the comment said, that will implicitly set it in the 
rceivedmask).

So, pretty unlikely to happen, but if the stars align ... :)

> 
>> +
>> +        return received ? 0 : postcopy_place_page_zero(mis, (void *)haddr, rb);
> 
> (now we can fill up pages in two threads.. but looks thread-safe)
> 
> Meanwhile if this is highly not wanted, maybe worth an error_report_once() so
> the admin could see something?


You mean, if postcopy_place_page_zero() fails?

[...]

>>   
>> +/*
>> + * Check if a page falls into a discarded range as managed by a
>> + * RamDiscardManager responsible for the mapped memory region of the RAMBlock.
>> + * Pages inside discarded ranges are never migrated and postcopy has to
>> + * place zeropages instead.
>> + *
>> + * Note: The result is only stable while migration (precopy/postcopy).
>> + */
>> +bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t offset)
>> +{
>> +    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
>> +        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
>> +        MemoryRegionSection section = {
>> +            .mr = rb->mr,
>> +            .offset_within_region = offset,
>> +            .size = int128_get64(TARGET_PAGE_SIZE),
> 
> rb->page_size?
> 
> Although I think it should also work with TARGET_PAGE_SIZE in this specific
> case, but maybe still better to use what we have.


If rb->page_size is discarded, TARGET_PAGE_SIZE is certainly discarded 
as well (as TARGET_PAGE_SIZE <= rb->page_size).

But yes, sounds cleaner.

Thanks Peter!

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination

Posted by Peter Xu 4 years, 6 months ago

On Fri, Jul 23, 2021 at 08:36:32PM +0200, David Hildenbrand wrote:
> > > +static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb,
> > > +                                 ram_addr_t start, uint64_t haddr)
> > > +{
> > > +    /*
> > > +     * Discarded pages (via RamDiscardManager) are never migrated. On unlikely
> > > +     * access, place a zeropage, which will also set the relevant bits in the
> > > +     * recv_bitmap accordingly, so we won't try placing a zeropage twice.
> > > +     */
> > > +    if (ramblock_page_is_discarded(rb, start)) {
> > > +        bool received = ramblock_recv_bitmap_test_byte_offset(rb, start);
> > 
> > Will received be set for any case with the current code base?  As I thought
> > virtio-mem forbids plug/unplug during the whole lifecycle of migration.
> 
> receive would only be set if you have two CPUs faulting on the same address
> at the same time and the first one already placed a zeropage on this code
> path (as the comment said, that will implicitly set it in the rceivedmask).

Ah I see; or just ignore the error of postcopy_place_page_zero() here because
per my understanding this whole path is not expected after all.

> 
> So, pretty unlikely to happen, but if the stars align ... :)
> 
> > 
> > > +
> > > +        return received ? 0 : postcopy_place_page_zero(mis, (void *)haddr, rb);
> > 
> > (now we can fill up pages in two threads.. but looks thread-safe)
> > 
> > Meanwhile if this is highly not wanted, maybe worth an error_report_once() so
> > the admin could see something?
> 
> 
> You mean, if postcopy_place_page_zero() fails?

I meant ramblock_page_is_discarded() shouldn't really trigger for a sane guest,
right?  Because it means the guest is accessing some unplugged memory.

I wanted to give a heads-up to the admin so he/she knows there's something odd.
Also that can be a hint for debugging if it's hit for any unknown reasons.

Thanks,

-- 
Peter Xu

Re: [PATCH v2 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination

Posted by David Hildenbrand 4 years, 6 months ago

On 23.07.21 20:52, Peter Xu wrote:
> On Fri, Jul 23, 2021 at 08:36:32PM +0200, David Hildenbrand wrote:
>>>> +static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb,
>>>> +                                 ram_addr_t start, uint64_t haddr)
>>>> +{
>>>> +    /*
>>>> +     * Discarded pages (via RamDiscardManager) are never migrated. On unlikely
>>>> +     * access, place a zeropage, which will also set the relevant bits in the
>>>> +     * recv_bitmap accordingly, so we won't try placing a zeropage twice.
>>>> +     */
>>>> +    if (ramblock_page_is_discarded(rb, start)) {
>>>> +        bool received = ramblock_recv_bitmap_test_byte_offset(rb, start);
>>>
>>> Will received be set for any case with the current code base?  As I thought
>>> virtio-mem forbids plug/unplug during the whole lifecycle of migration.
>>
>> receive would only be set if you have two CPUs faulting on the same address
>> at the same time and the first one already placed a zeropage on this code
>> path (as the comment said, that will implicitly set it in the rceivedmask).
> 
> Ah I see; or just ignore the error of postcopy_place_page_zero() here because
> per my understanding this whole path is not expected after all.

See below, if placing would go wrong in this corner case, I think we 
would still want to know it instead of letting a guest thread not make 
progress because nobody would wake it up.

Does that make sense?

> 
>>
>> So, pretty unlikely to happen, but if the stars align ... :)
>>
>>>
>>>> +
>>>> +        return received ? 0 : postcopy_place_page_zero(mis, (void *)haddr, rb);
>>>
>>> (now we can fill up pages in two threads.. but looks thread-safe)
>>>
>>> Meanwhile if this is highly not wanted, maybe worth an error_report_once() so
>>> the admin could see something?
>>
>>
>> You mean, if postcopy_place_page_zero() fails?
> 
> I meant ramblock_page_is_discarded() shouldn't really trigger for a sane guest,
> right?  Because it means the guest is accessing some unplugged memory.

It can happen in corner cases and is valid: with the current virtio-mem 
spec, guests are allowed to read unplugged memory. This will, for 
example, happen on older Linux guests when reading /proc/kcore or (with 
even older guests) when dumping guest memory via kdump. These corner 
cases were the main reason why the spec allows for it -- until we have 
guests properly adjusted such that it won't happen even in corner cases.

A future feature bit will disallow it for the guest: required for 
supporting shmem/hugetlb cleanly. With that in place, I agree that we 
would want to warn in this case!

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination

Posted by Peter Xu 4 years, 6 months ago

On Fri, Jul 23, 2021 at 09:01:42PM +0200, David Hildenbrand wrote:
> It can happen in corner cases and is valid: with the current virtio-mem
> spec, guests are allowed to read unplugged memory. This will, for example,
> happen on older Linux guests when reading /proc/kcore or (with even older
> guests) when dumping guest memory via kdump. These corner cases were the
> main reason why the spec allows for it -- until we have guests properly
> adjusted such that it won't happen even in corner cases.
> 
> A future feature bit will disallow it for the guest: required for supporting
> shmem/hugetlb cleanly. With that in place, I agree that we would want to
> warn in this case!

OK that makes sense; with the page_size change, feel free to add:

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

Re: [PATCH v2 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination

Posted by David Hildenbrand 4 years, 6 months ago

On 24.07.21 00:10, Peter Xu wrote:
> On Fri, Jul 23, 2021 at 09:01:42PM +0200, David Hildenbrand wrote:
>> It can happen in corner cases and is valid: with the current virtio-mem
>> spec, guests are allowed to read unplugged memory. This will, for example,
>> happen on older Linux guests when reading /proc/kcore or (with even older
>> guests) when dumping guest memory via kdump. These corner cases were the
>> main reason why the spec allows for it -- until we have guests properly
>> adjusted such that it won't happen even in corner cases.
>>
>> A future feature bit will disallow it for the guest: required for supporting
>> shmem/hugetlb cleanly. With that in place, I agree that we would want to
>> warn in this case!
> 
> OK that makes sense; with the page_size change, feel free to add:

I just realized that relying on the page_size would be wrong.

We migrate TARGET_PAGE_SIZE chunks and the offset might not be page_size 
aligned. So if we were to replace TARGET_PAGE_SIZE by rb->page_size, we 
might accidentally cover a "too big" range.

Makes sense?

> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 


-- 
Thanks,

David / dhildenb

Re: [PATCH v2 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination

Posted by Peter Xu 4 years, 6 months ago

On Thu, Jul 29, 2021 at 02:14:41PM +0200, David Hildenbrand wrote:
> On 24.07.21 00:10, Peter Xu wrote:
> > On Fri, Jul 23, 2021 at 09:01:42PM +0200, David Hildenbrand wrote:
> > > It can happen in corner cases and is valid: with the current virtio-mem
> > > spec, guests are allowed to read unplugged memory. This will, for example,
> > > happen on older Linux guests when reading /proc/kcore or (with even older
> > > guests) when dumping guest memory via kdump. These corner cases were the
> > > main reason why the spec allows for it -- until we have guests properly
> > > adjusted such that it won't happen even in corner cases.
> > > 
> > > A future feature bit will disallow it for the guest: required for supporting
> > > shmem/hugetlb cleanly. With that in place, I agree that we would want to
> > > warn in this case!
> > 
> > OK that makes sense; with the page_size change, feel free to add:
> 
> I just realized that relying on the page_size would be wrong.
> 
> We migrate TARGET_PAGE_SIZE chunks and the offset might not be page_size
> aligned. So if we were to replace TARGET_PAGE_SIZE by rb->page_size, we
> might accidentally cover a "too big" range.

I'm wondering whether we should make the offset page size aligned instead.  For
example, note that postcopy_place_page_zero() should only take page_size
aligned host addr or UFFDIO_COPY could fail (hugetlb doesn't support
UFFDIO_ZEROPAGE yet).

Btw, does virtio-mem supports hugetlbfs now?  When with it, the smallest unit
to plug/unplug would the huge page size (e.g., for 1g huge page, sounds not
helpful to unplug 2M memory), am I right?

-- 
Peter Xu

Re: [PATCH v2 5/6] migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination

Posted by David Hildenbrand 4 years, 6 months ago

On 29.07.21 17:52, Peter Xu wrote:
> On Thu, Jul 29, 2021 at 02:14:41PM +0200, David Hildenbrand wrote:
>> On 24.07.21 00:10, Peter Xu wrote:
>>> On Fri, Jul 23, 2021 at 09:01:42PM +0200, David Hildenbrand wrote:
>>>> It can happen in corner cases and is valid: with the current virtio-mem
>>>> spec, guests are allowed to read unplugged memory. This will, for example,
>>>> happen on older Linux guests when reading /proc/kcore or (with even older
>>>> guests) when dumping guest memory via kdump. These corner cases were the
>>>> main reason why the spec allows for it -- until we have guests properly
>>>> adjusted such that it won't happen even in corner cases.
>>>>
>>>> A future feature bit will disallow it for the guest: required for supporting
>>>> shmem/hugetlb cleanly. With that in place, I agree that we would want to
>>>> warn in this case!
>>>
>>> OK that makes sense; with the page_size change, feel free to add:
>>
>> I just realized that relying on the page_size would be wrong.
>>
>> We migrate TARGET_PAGE_SIZE chunks and the offset might not be page_size
>> aligned. So if we were to replace TARGET_PAGE_SIZE by rb->page_size, we
>> might accidentally cover a "too big" range.
> 
> I'm wondering whether we should make the offset page size aligned instead.  For
> example, note that postcopy_place_page_zero() should only take page_size
> aligned host addr or UFFDIO_COPY could fail (hugetlb doesn't support
> UFFDIO_ZEROPAGE yet).

That is true indeed. I'd assume in that case that we would get called 
with the proper offset already, right? Because uffd would only report 
properly aligned pages IIRC.

> 
> Btw, does virtio-mem supports hugetlbfs now?  When with it, the smallest unit
> to plug/unplug would the huge page size (e.g., for 1g huge page, sounds not
> helpful to unplug 2M memory), am I right?

It supports it to 99.9 % I'd say (especially with the dump/tpm/migration 
fixes upstream). The units are handled properly: the unit is at least as 
big as the page size.

So with 1 GiB pages, you have a unit of 1 GiB.


-- 
Thanks,

David / dhildenb