[PATCH v5 0/9] migration/ram: Optimize for virtio-mem via RamDiscardManager

David Hildenbrand posted 9 patches 2 years, 6 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210904160913.17785-1-david@redhat.com
Maintainers: Peter Xu <peterx@redhat.com>, Juan Quintela <quintela@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, David Hildenbrand <david@redhat.com>
hw/virtio/virtio-mem.c         |  92 ++++++++++-------
include/exec/memory.h          |  21 ++++
include/hw/virtio/virtio-mem.h |   3 -
migration/migration.c          |   6 +-
migration/postcopy-ram.c       |  40 ++++++--
migration/ram.c                | 180 +++++++++++++++++++++++++++++----
migration/ram.h                |   1 +
softmmu/memory.c               |  11 ++
8 files changed, 284 insertions(+), 70 deletions(-)
[PATCH v5 0/9] migration/ram: Optimize for virtio-mem via RamDiscardManager
Posted by David Hildenbrand 2 years, 6 months ago
virtio-mem exposes a dynamic amount of memory within RAMBlocks by
coordinating with the VM. Memory within a RAMBlock can either get
plugged and consequently used by the VM, or unplugged and consequently no
longer used by the VM. Logical unplug is realized by discarding the
physical memory backing for virtual memory ranges, similar to memory
ballooning.

However, important difference to virtio-balloon are:

a) A virtio-mem device only operates on its assigned memory region /
   RAMBlock ("device memory")
b) Initially, all device memory is logically unplugged
c) Virtual machines will never accidentally reuse memory that is currently
   logically unplugged. The spec defines most accesses to unplugged memory
   as "undefined behavior" -- except reading unplugged memory, which is
   currently expected to work, but that will change in the future.
d) The (un)plug granularity is in the range of megabytes -- "memory blocks"
e) The state (plugged/unplugged) of a memory block is always known and
   properly tracked.

Whenever memory blocks within the RAMBlock get (un)plugged, changes are
communicated via the RamDiscardManager to other QEMU subsystems, most
prominently vfio which updates the DMA mapping accordingly. "Unplugging"
corresponds to "discarding" and "plugging" corresponds to "populating".

While migrating (precopy/postcopy) that state of such memory blocks cannot
change, as virtio-mem will reject any guest requests that would change
the state of blocks with "busy". We don't want to migrate such logically
unplugged memory, because it can result in an unintended memory consumption
both, on the source (when reading memory from some memory backends) and on
the destination (when writing memory). Further, migration time can be
heavily reduced when skipping logically unplugged blocks and we avoid
populating unnecessary page tables in Linux.

Right now, virtio-mem reuses the free page hinting infrastructure during
precopy to exclude all logically unplugged ("discarded") parts from the
migration stream. However, there are some scenarios that are not handled
properly and need fixing. Further, there are some ugly corner cases in
postcopy code and background snapshotting code that similarly have to
handle such special RAMBlocks.

Let's reuse the RamDiscardManager infrastructure to essentially handle
precopy, postcopy and background snapshots cleanly, which means:

a) In precopy code, fixing up the initial dirty bitmaps (in the RAMBlock
   and e.g., KVM) to exclude discarded ranges.
b) In postcopy code, placing a zeropage when requested to handle a page
   falling into a discarded range -- because the source will never send it.
   Further, fix up the dirty bitmap when overwriting it in recovery mode.
c) In background snapshot code, never populating discarded ranges, not even
   with the shared zeropage, to avoid unintended memory consumption,
   especially in the future with hugetlb and shmem.

Detail: When realizing a virtio-mem devices, it will register the RAM
        for migration via vmstate_register_ram(). Further, it will
        set itself as the RamDiscardManager for the corresponding memory
        region of the RAMBlock via memory_region_set_ram_discard_manager().
        Last but not least, memory device code will actually map the
        memory region into guest physical address space. So migration
        code can always properly identify such RAMBlocks.

Tested with precopy/postcopy on shmem, where even reading unpopulated
memory ranges will populate actual memory and not the shared zeropage.
Tested with background snapshots on anonymous memory, because other
backends are not supported yet with upstream Linux.

Idealy, this should all go via the migration tree.

v4 -> v5:
- "migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the
   destination"
-- Use ROUND_DOWN and fix compile warning on 32 bit
-- Use int128_make64() instead of wrongly int128_get64()
- "migration: Simplify alignment and alignment checks"
-- Use ROUND_DOWN where possible instead of QEMU_ALIGN_DOWN and fix
   compilation warning on 32 bit
- "migration/ram: Factor out populating pages readable in
   ram_block_populate_pages()"
-- Rename functions, add a comment.
- "migration/ram: Handle RAMBlocks with a RamDiscardManager on background
   snapshots"
-- Adjust to changed function names

v3 -> v4:
- Added ACKs
- "migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the
   destination"
-- Use QEMU_ALIGN_DOWN() to align to ram pagesize
- "migration: Simplify alignment and alignment checks"
-- Added
- "migration/ram: Factor out populating pages readable in
   ram_block_populate_pages()"
-- Added
- "migration/ram: Handle RAMBlocks with a RamDiscardManager on background
   snapshots"
-- Simplified due to factored out code

v2 -> v3:
- "migration/ram: Don't passs RAMState to
   migration_clear_memory_region_dirty_bitmap_*()"
-- Added to make the next patch easier to implement
- "migration/ram: Handle RAMBlocks with a RamDiscardManager on the migration
   source"
-- Fixup the dirty bitmaps only initially and during postcopy recovery,
   not after every bitmap sync. Also properly clear the dirty bitmaps e.g.,
   in KVM. [Peter]
- "migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the
   destination"
-- Take care of proper host-page alignment [Peter]

v1 -> v2:
- "migration/ram: Handle RAMBlocks with a RamDiscardManager on the
   migration source"
-- Added a note how it interacts with the clear_bmap and what we might want
   to further optimize in the future when synchronizing bitmaps.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>

David Hildenbrand (9):
  memory: Introduce replay_discarded callback for RamDiscardManager
  virtio-mem: Implement replay_discarded RamDiscardManager callback
  migration/ram: Don't passs RAMState to
    migration_clear_memory_region_dirty_bitmap_*()
  migration/ram: Handle RAMBlocks with a RamDiscardManager on the
    migration source
  virtio-mem: Drop precopy notifier
  migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the
    destination
  migration: Simplify alignment and alignment checks
  migration/ram: Factor out populating pages readable in
    ram_block_populate_pages()
  migration/ram: Handle RAMBlocks with a RamDiscardManager on background
    snapshots

 hw/virtio/virtio-mem.c         |  92 ++++++++++-------
 include/exec/memory.h          |  21 ++++
 include/hw/virtio/virtio-mem.h |   3 -
 migration/migration.c          |   6 +-
 migration/postcopy-ram.c       |  40 ++++++--
 migration/ram.c                | 180 +++++++++++++++++++++++++++++----
 migration/ram.h                |   1 +
 softmmu/memory.c               |  11 ++
 8 files changed, 284 insertions(+), 70 deletions(-)

-- 
2.31.1


Re: [PATCH v5 0/9] migration/ram: Optimize for virtio-mem via RamDiscardManager
Posted by David Hildenbrand 2 years, 6 months ago
On 04.09.21 18:09, David Hildenbrand wrote:
> virtio-mem exposes a dynamic amount of memory within RAMBlocks by
> coordinating with the VM. Memory within a RAMBlock can either get
> plugged and consequently used by the VM, or unplugged and consequently no
> longer used by the VM. Logical unplug is realized by discarding the
> physical memory backing for virtual memory ranges, similar to memory
> ballooning.
> 
> However, important difference to virtio-balloon are:
> 
> a) A virtio-mem device only operates on its assigned memory region /
>     RAMBlock ("device memory")
> b) Initially, all device memory is logically unplugged
> c) Virtual machines will never accidentally reuse memory that is currently
>     logically unplugged. The spec defines most accesses to unplugged memory
>     as "undefined behavior" -- except reading unplugged memory, which is
>     currently expected to work, but that will change in the future.
> d) The (un)plug granularity is in the range of megabytes -- "memory blocks"
> e) The state (plugged/unplugged) of a memory block is always known and
>     properly tracked.
> 
> Whenever memory blocks within the RAMBlock get (un)plugged, changes are
> communicated via the RamDiscardManager to other QEMU subsystems, most
> prominently vfio which updates the DMA mapping accordingly. "Unplugging"
> corresponds to "discarding" and "plugging" corresponds to "populating".
> 
> While migrating (precopy/postcopy) that state of such memory blocks cannot
> change, as virtio-mem will reject any guest requests that would change
> the state of blocks with "busy". We don't want to migrate such logically
> unplugged memory, because it can result in an unintended memory consumption
> both, on the source (when reading memory from some memory backends) and on
> the destination (when writing memory). Further, migration time can be
> heavily reduced when skipping logically unplugged blocks and we avoid
> populating unnecessary page tables in Linux.
> 
> Right now, virtio-mem reuses the free page hinting infrastructure during
> precopy to exclude all logically unplugged ("discarded") parts from the
> migration stream. However, there are some scenarios that are not handled
> properly and need fixing. Further, there are some ugly corner cases in
> postcopy code and background snapshotting code that similarly have to
> handle such special RAMBlocks.
> 
> Let's reuse the RamDiscardManager infrastructure to essentially handle
> precopy, postcopy and background snapshots cleanly, which means:
> 
> a) In precopy code, fixing up the initial dirty bitmaps (in the RAMBlock
>     and e.g., KVM) to exclude discarded ranges.
> b) In postcopy code, placing a zeropage when requested to handle a page
>     falling into a discarded range -- because the source will never send it.
>     Further, fix up the dirty bitmap when overwriting it in recovery mode.
> c) In background snapshot code, never populating discarded ranges, not even
>     with the shared zeropage, to avoid unintended memory consumption,
>     especially in the future with hugetlb and shmem.
> 
> Detail: When realizing a virtio-mem devices, it will register the RAM
>          for migration via vmstate_register_ram(). Further, it will
>          set itself as the RamDiscardManager for the corresponding memory
>          region of the RAMBlock via memory_region_set_ram_discard_manager().
>          Last but not least, memory device code will actually map the
>          memory region into guest physical address space. So migration
>          code can always properly identify such RAMBlocks.
> 
> Tested with precopy/postcopy on shmem, where even reading unpopulated
> memory ranges will populate actual memory and not the shared zeropage.
> Tested with background snapshots on anonymous memory, because other
> backends are not supported yet with upstream Linux.
> 

Gentle ping.


-- 
Thanks,

David / dhildenb