From nobody Wed Nov 19 13:55:47 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=virtuozzo.com ARC-Seal: i=1; a=rsa-sha256; t=1616000493; cv=none; d=zohomail.com; s=zohoarc; b=QazlX0LF59vEU/KGuMIgOkyMPxfzk1tH/CNoHBdM1ltMLYgR4iHzpJKvS/PsYcEM6yEBdb8S/v0DlaDx5ceOKC/gJ46rh37i8ukp/iXl0FDETC1Xse9lpbDQbz3liZfhSCl3fU5oro6AL1td98cIOAvXjs/eluBEOLjPN4G72js= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1616000493; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=GFETwEQ2pOKFwsfVAY3vkGj6h6qxlwMYoHVTykZaxKY=; b=VPriBzAKoScICcfYzFtKPtb4pmqRpy4qsDbIf4xhgWNbfapTIe3ElxBKBr4aUmnO4AuTJYi0qAL/WN+qeb7xTEZIphitS6pvg1Fepjzj17b7y6tYg/FhDHu2hAubajsYIVH0HovJ//ir0jPcsg0em8eOdcUE74dZRUP18AQt3IE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 161600049376858.793018275283316; Wed, 17 Mar 2021 10:01:33 -0700 (PDT) Received: from localhost ([::1]:57306 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMZY3-0008CH-4I for importer@patchew.org; Wed, 17 Mar 2021 13:01:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36866) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ9p-0003Lm-LH for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:36:33 -0400 Received: from relay.sw.ru ([185.231.240.75]:50928) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ9k-0005xn-W4 for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:36:28 -0400 Received: from [192.168.15.248] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lMZ99-0034yI-Og; Wed, 17 Mar 2021 19:35:48 +0300 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=GFETwEQ2pOKFwsfVAY3vkGj6h6qxlwMYoHVTykZaxKY=; b=FPNFXKcNdbYq xQe2UzRg/K9HuqKKkwSfoguCEkZI/CPyOL+xfG5PwFmlxoYWuhy2dt9JXQ5otiFFn+3V7p20JYEbx YhIeCtvO6UU8HYcrEijMrqlIKoRXfl+wiKO3OjERdLz9jnmF/45mvhjSQkbAJJAtvIyss4alEjNZN rheQU=; From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [RFC PATCH 9/9] migration/snap-tool: Implementation of snapshot loading in postcopy Date: Wed, 17 Mar 2021 19:32:22 +0300 Message-Id: <20210317163222.182609-10-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> References: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @virtuozzo.com) Content-Type: text/plain; charset="utf-8" Implementation of asynchronous snapshot loading using standard postcopy migration mechanism on destination VM. The point of switchover to postcopy is trivially selected based on percentage of non-zero pages loaded in precopy. Signed-off-by: Andrey Gruzdev --- include/qemu-snap.h | 11 + qemu-snap-handlers.c | 482 ++++++++++++++++++++++++++++++++++++++++++- qemu-snap.c | 16 ++ 3 files changed, 504 insertions(+), 5 deletions(-) diff --git a/include/qemu-snap.h b/include/qemu-snap.h index f9f6db529f..4bf79e0964 100644 --- a/include/qemu-snap.h +++ b/include/qemu-snap.h @@ -30,6 +30,8 @@ #define AIO_BUFFER_SIZE (1024 * 1024) /* Max. concurrent AIO tasks */ #define AIO_TASKS_MAX 8 +/* Max. concurrent AIO tasks in postcopy */ +#define AIO_TASKS_POSTCOPY_MAX 4 =20 typedef struct AioBufferPool AioBufferPool; =20 @@ -103,6 +105,7 @@ typedef struct SnapLoadState { BlockBackend *blk; /* Block backend */ =20 QEMUFile *f_fd; /* Outgoing migration stream QEMUFile */ + QEMUFile *f_rp_fd; /* Return path stream QEMUFile */ QEMUFile *f_vmstate; /* Block backend vmstate area QEMUFile */ /* * Buffer to keep first few KBs of BDRV vmstate that we stashed at the @@ -114,6 +117,14 @@ typedef struct SnapLoadState { /* AIO buffer pool */ AioBufferPool *aio_pool; =20 + bool postcopy; /* From command-line --postcopy */ + int postcopy_percent; /* From command-line --postcopy */ + bool in_postcopy; /* Switched to postcopy mode */ + + /* Return path listening thread */ + QemuThread rp_listen_thread; + bool has_rp_listen_thread; + /* BDRV vmstate offset of RAM block list section */ int64_t state_ram_list_offset; /* BDRV vmstate offset of the first device section */ diff --git a/qemu-snap-handlers.c b/qemu-snap-handlers.c index 7dfe950829..ae581b3178 100644 --- a/qemu-snap-handlers.c +++ b/qemu-snap-handlers.c @@ -57,6 +57,16 @@ typedef struct RAMPageRef { int64_t page; /* Page index in RAM block */ } RAMPageRef; =20 +/* Page request from destination in postcopy */ +typedef struct RAMPageRequest { + RAMBlockDesc *block; /* RAM block*/ + int64_t offset; /* Offset within RAM block */ + unsigned size; /* Size of request */ + + /* Link into ram_state.page_req */ + QSIMPLEQ_ENTRY(RAMPageRequest) next; +} RAMPageRequest; + /* State reflecting RAM part of snapshot */ typedef struct RAMState { int64_t page_size; /* Page size */ @@ -64,6 +74,7 @@ typedef struct RAMState { int page_bits; /* Page size bits */ =20 int64_t normal_pages; /* Total number of normal (non-zero) pages= */ + int64_t precopy_pages; /* Normal pages to load in precopy */ int64_t loaded_pages; /* Current number of normal pages loaded */ =20 /* Last RAM block touched by load_send_ram_iterate() */ @@ -73,9 +84,15 @@ typedef struct RAMState { =20 /* Last RAM block sent by load_send_ram_iterate() */ RAMBlockDesc *last_sent_block; + /* RAM block from last enqueued load-postcopy page request */ + RAMBlockDesc *last_req_block; =20 /* List of RAM blocks */ QSIMPLEQ_HEAD(, RAMBlockDesc) ram_block_list; + + /* Page request queue for load-postcopy */ + QemuMutex page_req_mutex; + QSIMPLEQ_HEAD(, RAMPageRequest) page_req; } RAMState; =20 /* Section handler ops */ @@ -801,6 +818,422 @@ static void load_check_file_errors(SnapLoadState *sn,= int *res) } } =20 +static bool get_queued_page(RAMPageRef *p_ref) +{ + RAMState *rs =3D &ram_state; + RAMBlockDesc *block =3D NULL; + int64_t offset; + + if (QSIMPLEQ_EMPTY_ATOMIC(&rs->page_req)) { + return false; + } + + QEMU_LOCK_GUARD(&rs->page_req_mutex); + + while (!QSIMPLEQ_EMPTY(&rs->page_req)) { + RAMPageRequest *entry =3D QSIMPLEQ_FIRST(&rs->page_req); + + block =3D entry->block; + offset =3D entry->offset; + + if (entry->size > rs->page_size) { + entry->size -=3D rs->page_size; + entry->offset +=3D rs->page_size; + } else { + QSIMPLEQ_REMOVE_HEAD(&rs->page_req, next); + g_free(entry); + } + + if (test_bit((offset >> rs->page_bits), block->bitmap)) { + p_ref->block =3D block; + p_ref->page =3D offset >> rs->page_bits; + return true; + } + } + return false; +} + +static int queue_page_request(const char *id_str, int64_t offset, unsigned= size) +{ + RAMState *rs =3D &ram_state; + RAMBlockDesc *block; + RAMPageRequest *new_entry; + + if (!id_str) { + block =3D rs->last_req_block; + if (!block) { + error_report("RP-REQ_PAGES: no previous block"); + return -EINVAL; + } + } else { + block =3D ram_block_by_idstr(id_str); + if (!block) { + error_report("RP-REQ_PAGES: cannot find block '%s'", id_str); + return -EINVAL; + } + rs->last_req_block =3D block; + } + + if (offset + size > block->length) { + error_report("RP-REQ_PAGES: offset/size out RAM block end_offset= =3D0x%" PRIx64 + " limit=3D0x%" PRIx64, (offset + size), block->length= ); + return -EINVAL; + } + + new_entry =3D g_new0(RAMPageRequest, 1); + new_entry->block =3D block; + new_entry->offset =3D offset; + new_entry->size =3D size; + + qemu_mutex_lock(&rs->page_req_mutex); + QSIMPLEQ_INSERT_TAIL(&rs->page_req, new_entry, next); + qemu_mutex_unlock(&rs->page_req_mutex); + + return 0; +} + +/* QEMU_VM_COMMAND sub-commands */ +typedef enum VmSubCmd { + MIG_CMD_OPEN_RETURN_PATH =3D 1, + MIG_CMD_POSTCOPY_ADVISE =3D 3, + MIG_CMD_POSTCOPY_LISTEN =3D 4, + MIG_CMD_POSTCOPY_RUN =3D 5, + MIG_CMD_POSTCOPY_RAM_DISCARD =3D 6, + MIG_CMD_PACKAGED =3D 7, +} VmSubCmd; + +/* Return-path message types */ +typedef enum RpMsgType { + MIG_RP_MSG_INVALID =3D 0, + MIG_RP_MSG_SHUT =3D 1, + MIG_RP_MSG_REQ_PAGES_ID =3D 3, + MIG_RP_MSG_REQ_PAGES =3D 4, + MIG_RP_MSG_MAX =3D 7, +} RpMsgType; + +typedef struct RpMsgArgs { + int len; + const char *name; +} RpMsgArgs; + +/* + * Return-path message length/name indexed by message type. + * -1 value stands for variable message length. + */ +static RpMsgArgs rp_msg_args[] =3D { + [MIG_RP_MSG_INVALID] =3D { .len =3D -1, .name =3D "INVALID" }, + [MIG_RP_MSG_SHUT] =3D { .len =3D 4, .name =3D "SHUT" }, + [MIG_RP_MSG_REQ_PAGES_ID] =3D { .len =3D -1, .name =3D "REQ_PAGES_ID= " }, + [MIG_RP_MSG_REQ_PAGES] =3D { .len =3D 12, .name =3D "REQ_PAGES" }, + [MIG_RP_MSG_MAX] =3D { .len =3D -1, .name =3D "MAX" }, +}; + +/* Return-path message processing thread */ +static void *rp_listen_thread(void *opaque) +{ + SnapLoadState *sn =3D (SnapLoadState *) opaque; + QEMUFile *f =3D sn->f_rp_fd; + int res =3D 0; + + while (!res) { + uint8_t h_buf[512]; + const int h_max_len =3D sizeof(h_buf); + int h_type; + int h_len; + size_t count; + + h_type =3D qemu_get_be16(f); + h_len =3D qemu_get_be16(f); + /* Make early check for input errors */ + res =3D qemu_file_get_error(f); + if (res) { + break; + } + + /* Check message type */ + if (h_type >=3D MIG_RP_MSG_MAX || h_type =3D=3D MIG_RP_MSG_INVALID= ) { + error_report("RP: received invalid message type=3D%d length=3D= %d", + h_type, h_len); + res =3D -EINVAL; + break; + } + + /* Check message length */ + if (rp_msg_args[h_type].len !=3D -1 && h_len !=3D rp_msg_args[h_ty= pe].len) { + error_report("RP: received '%s' message len=3D%d expected=3D%d= ", + rp_msg_args[h_type].name, h_len, rp_msg_args[h_type].l= en); + res =3D -EINVAL; + break; + } else if (h_len > h_max_len) { + error_report("RP: received '%s' message len=3D%d max_len=3D%d", + rp_msg_args[h_type].name, h_len, h_max_len); + res =3D -EINVAL; + break; + } + + count =3D qemu_get_buffer(f, h_buf, h_len); + if (count !=3D h_len) { + break; + } + + switch (h_type) { + case MIG_RP_MSG_SHUT: + { + int shut_error; + + shut_error =3D be32_to_cpu(*(uint32_t *) h_buf); + if (shut_error) { + error_report("RP: sibling shutdown error=3D%d", shut_error= ); + } + /* Exit processing loop */ + res =3D 1; + break; + } + + case MIG_RP_MSG_REQ_PAGES: + case MIG_RP_MSG_REQ_PAGES_ID: + { + uint64_t offset; + uint32_t size; + char *id_str =3D NULL; + + offset =3D be64_to_cpu(*(uint64_t *) (h_buf + 0)); + size =3D be32_to_cpu(*(uint32_t *) (h_buf + 8)); + + if (h_type =3D=3D MIG_RP_MSG_REQ_PAGES_ID) { + int h_parsed_len =3D rp_msg_args[MIG_RP_MSG_REQ_PAGES].len; + + if (h_len > h_parsed_len) { + int id_len; + + /* RAM block ID string */ + id_len =3D h_buf[h_parsed_len]; + id_str =3D (char *) &h_buf[h_parsed_len + 1]; + id_str[id_len] =3D 0; + + h_parsed_len +=3D id_len + 1; + } + + if (h_parsed_len !=3D h_len) { + error_report("RP: received '%s' message len=3D%d expec= ted=3D%d", + rp_msg_args[MIG_RP_MSG_REQ_PAGES_ID].name, + h_len, h_parsed_len); + res =3D -EINVAL; + break; + } + } + + res =3D queue_page_request(id_str, offset, size); + break; + } + + default: + error_report("RP: received unexpected message type=3D%d len=3D= %d", + h_type, h_len); + res =3D -EINVAL; + } + } + + if (res >=3D 0) { + res =3D qemu_file_get_error(f); + } + if (res) { + error_report("RP: listen thread exit error=3D%d", res); + } + return NULL; +} + +static void send_command(QEMUFile *f, int cmd, uint16_t len, uint8_t *data) +{ + qemu_put_byte(f, QEMU_VM_COMMAND); + qemu_put_be16(f, (uint16_t) cmd); + qemu_put_be16(f, len); + + qemu_put_buffer_async(f, data, len, false); + qemu_fflush(f); +} + +static void send_ram_block_discard(QEMUFile *f, RAMBlockDesc *block) +{ + int id_len; + int msg_len; + uint8_t msg_buf[512]; + + id_len =3D strlen(block->idstr); + assert(id_len < 256); + + /* Version, always 0 */ + msg_buf[0] =3D 0; + /* RAM block ID string length, not including terminating 0 */ + msg_buf[1] =3D id_len; + /* RAM block ID string with terminating zero */ + memcpy(msg_buf + 2, block->idstr, (id_len + 1)); + msg_len =3D 2 + id_len + 1; + /* + * We take discard range start offset for the RAM block + * from precopy-mode block->last_offset. + */ + stq_be_p(msg_buf + msg_len, block->last_offset); + msg_len +=3D 8; + /* Discard range length */ + stq_be_p(msg_buf + msg_len, (block->length - block->last_offset)); + msg_len +=3D 8; + + send_command(f, MIG_CMD_POSTCOPY_RAM_DISCARD, msg_len, msg_buf); +} + +static int send_ram_each_block_discard(QEMUFile *f) +{ + RAMBlockDesc *block; + int res =3D 0; + + QSIMPLEQ_FOREACH(block, &ram_state.ram_block_list, next) { + send_ram_block_discard(f, block); + res =3D qemu_file_get_error(f); + if (res) { + break; + } + } + return res; +} + +static int load_prepare_postcopy(SnapLoadState *sn) +{ + QEMUFile *f =3D sn->f_fd; + uint64_t tmp[2]; + int res; + + /* Set number of pages to load in precopy before switching to postcopy= */ + ram_state.precopy_pages =3D ram_state.normal_pages * + sn->postcopy_percent / 100; + + /* Send POSTCOPY_ADVISE */ + tmp[0] =3D cpu_to_be64(ram_state.page_size); + tmp[1] =3D cpu_to_be64(ram_state.page_size); + send_command(f, MIG_CMD_POSTCOPY_ADVISE, 16, (uint8_t *) tmp); + /* Open return path on destination */ + send_command(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL); + + /* + * Check file errors after POSTCOPY_ADVISE command since destination m= ay already + * have closed input pipe in case postcopy had not been enabled in adv= ance. + */ + res =3D qemu_file_get_error(f); + if (!res) { + qemu_thread_create(&sn->rp_listen_thread, "return-path-thread", + rp_listen_thread, sn, QEMU_THREAD_JOINABLE); + sn->has_rp_listen_thread =3D true; + } + return res; +} + +static int load_start_postcopy(SnapLoadState *sn) +{ + QIOChannelBuffer *bioc; + QEMUFile *fb; + int eof_pos; + uint32_t length; + int res =3D 0; + + /* + * Send RAM discards for each block's unsent part. Without discards, + * userfault_fd code on destination will not trigger page requests + * as expected. Also, the UFFDIO_COPY/ZEROPAGE ioctl's that are used + * to place incoming page in postcopy would give an error if that page + * has not faulted with userfault_fd MISSING reason. + */ + res =3D send_ram_each_block_discard(sn->f_fd); + if (res) { + return res; + } + + /* + * To perform a switch to postcopy on destination, we need to send + * commands and the device state data in the following order: + * * MIG_CMD_POSTCOPY_LISTEN + * * device state sections + * * MIG_CMD_POSTCOPY_RUN + * All of this has to be packaged into a single blob using + * MIG_CMD_PACKAGED command. While loading the device state we may + * trigger page transfer requests and the fd must be free to process + * those, thus the destination must read the whole device state off + * the fd before it starts processing it. To wrap it up in a package, + * we use QEMU buffer channel object. + */ + bioc =3D qio_channel_buffer_new(512 * 1024); + qio_channel_set_name(QIO_CHANNEL(bioc), "snap-postcopy-buffer"); + fb =3D qemu_fopen_channel_output(QIO_CHANNEL(bioc)); + object_unref(OBJECT(bioc)); + + /* First goes MIG_CMD_POSTCOPY_LISTEN command */ + send_command(fb, MIG_CMD_POSTCOPY_LISTEN, 0, NULL); + + /* Then the rest of device state with optional VMDESC section.. */ + file_transfer_bytes(fb, sn->f_vmstate, + (sn->state_eof - sn->state_device_offset)); + qemu_fflush(fb); + + /* + * VMDESC json section may be present at the end of the stream + * so we'll try to locate it and truncate trailer for postcopy. + */ + eof_pos =3D bioc->usage - 1; + for (int offset =3D (bioc->usage - 11); offset >=3D 0; offset--) { + if (bioc->data[offset] =3D=3D QEMU_VM_SECTION_FOOTER && + bioc->data[offset + 5] =3D=3D QEMU_VM_EOF && + bioc->data[offset + 6] =3D=3D QEMU_VM_VMDESCRIPTION) { + uint32_t json_length; + uint32_t expected_length =3D bioc->usage - (offset + 11); + + json_length =3D be32_to_cpu(*(uint32_t *) &bioc->data[offset = + 7]); + if (json_length !=3D expected_length) { + error_report("Corrupted VMDESC trailer: length=3D%" PRIu32 + " expected=3D%" PRIu32, json_length, expected= _length); + res =3D -EINVAL; + goto fail; + } + + eof_pos =3D offset + 5; + break; + } + } + + /* + * In postcopy we need to remove QEMU_VM_EOF token which normally goes + * after the last non-iterable device state section before the (option= al) + * VMDESC json section. This is required to allow snapshot loading to + * continue in postcopy after we sent the rest of device state. + * VMDESC section also has to be removed from the stream if present. + */ + if (eof_pos >=3D 0 && bioc->data[eof_pos] =3D=3D QEMU_VM_EOF) { + bioc->usage =3D eof_pos; + bioc->offset =3D eof_pos; + } + + /* And the final MIG_CMD_POSTCOPY_RUN */ + send_command(fb, MIG_CMD_POSTCOPY_RUN, 0, NULL); + + /* Now send that blob */ + length =3D cpu_to_be32(bioc->usage); + send_command(sn->f_fd, MIG_CMD_PACKAGED, sizeof(length), + (uint8_t *) &length); + qemu_put_buffer_async(sn->f_fd, bioc->data, bioc->usage, false); + qemu_fflush(sn->f_fd); + + /* + * We set lower limit on the number of AIO in-flight requests + * to reduce return path PAGE_REQ processing latencies. + */ + aio_pool_set_max_in_flight(sn->aio_pool, AIO_TASKS_POSTCOPY_MAX); + /* Now in postcopy */ + sn->in_postcopy =3D true; + +fail: + qemu_fclose(fb); + load_check_file_errors(sn, &res); + return res; +} + static int ram_load(QEMUFile *f, void *opaque, int version_id) { int compat_flags =3D (RAM_SAVE_FLAG_MEM_SIZE | RAM_SAVE_FLAG_EOS); @@ -1007,11 +1440,23 @@ static void coroutine_fn load_buffers_fill_queue(Sn= apLoadState *sn) int64_t offset; int64_t limit; int64_t pages; + bool urgent; =20 - if (!find_next_unsent_page(&p_ref)) { + /* + * First need to check if aio_pool_try_acquire_next() will + * succeed at least once since we can't revert get_queued_page(). + */ + if (!aio_pool_can_acquire_next(sn->aio_pool)) { return; } =20 + urgent =3D get_queued_page(&p_ref); + if (!urgent) { + if (!find_next_unsent_page(&p_ref)) { + return; + } + } + get_unsent_page_range(&p_ref, &block, &offset, &limit); =20 do { @@ -1033,7 +1478,7 @@ static void coroutine_fn load_buffers_fill_queue(Snap= LoadState *sn) aio_buffer_start_task(buffer, load_buffers_task_co, bdrv_offset, s= ize); =20 offset +=3D size; - } while (offset < limit); + } while (!urgent && (offset < limit)); =20 rs->last_block =3D block; rs->last_page =3D offset >> rs->page_bits; @@ -1151,9 +1596,14 @@ static int load_send_leader(SnapLoadState *sn) =20 static int load_send_complete(SnapLoadState *sn) { - /* Transfer device state to the output pipe */ - file_transfer_bytes(sn->f_fd, sn->f_vmstate, - (sn->state_eof - sn->state_device_offset)); + if (!sn->in_postcopy) { + /* Transfer device state to the output pipe */ + file_transfer_bytes(sn->f_fd, sn->f_vmstate, + (sn->state_eof - sn->state_device_offset)); + } else { + /* In postcopy send final QEMU_VM_EOF token */ + qemu_put_byte(sn->f_fd, QEMU_VM_EOF); + } qemu_fflush(sn->f_fd); return 1; } @@ -1266,6 +1716,11 @@ static int load_state_header(SnapLoadState *sn) return 0; } =20 +static bool load_switch_to_postcopy(SnapLoadState *sn) +{ + return ram_state.loaded_pages > ram_state.precopy_pages; +} + /* Load snapshot data and send it with outgoing migration stream */ int coroutine_fn snap_load_state_main(SnapLoadState *sn) { @@ -1283,11 +1738,22 @@ int coroutine_fn snap_load_state_main(SnapLoadState= *sn) if (res) { goto fail; } + if (sn->postcopy) { + res =3D load_prepare_postcopy(sn); + if (res) { + goto fail; + } + } =20 do { res =3D load_send_ram_iterate(sn); /* Make additional check for file errors */ load_check_file_errors(sn, &res); + + if (!res && sn->postcopy && !sn->in_postcopy && + load_switch_to_postcopy(sn)) { + res =3D load_start_postcopy(sn); + } } while (!res); =20 if (res =3D=3D 1) { @@ -1313,6 +1779,10 @@ void snap_ram_init_state(int page_bits) =20 /* Initialize RAM block list head */ QSIMPLEQ_INIT(&rs->ram_block_list); + + /* Initialize load-postcopy page request queue */ + qemu_mutex_init(&rs->page_req_mutex); + QSIMPLEQ_INIT(&rs->page_req); } =20 /* Destroy snapshot RAM state */ @@ -1326,4 +1796,6 @@ void snap_ram_destroy_state(void) g_free(block->bitmap); g_free(block); } + /* Destroy page request mutex */ + qemu_mutex_destroy(&ram_state.page_req_mutex); } diff --git a/qemu-snap.c b/qemu-snap.c index c5efbd6803..89fb918cfc 100644 --- a/qemu-snap.c +++ b/qemu-snap.c @@ -141,6 +141,10 @@ static void snap_load_destroy_state(void) { SnapLoadState *sn =3D snap_load_get_state(); =20 + if (sn->has_rp_listen_thread) { + qemu_thread_join(&sn->rp_listen_thread); + } + if (sn->aio_pool) { aio_pool_free(sn->aio_pool); } @@ -330,6 +334,7 @@ static int snap_load(SnapLoadParams *params) { SnapLoadState *sn; QIOChannel *ioc_fd; + QIOChannel *ioc_rp_fd; uint8_t *buf; size_t count; int res =3D -1; @@ -338,11 +343,22 @@ static int snap_load(SnapLoadParams *params) snap_load_init_state(); sn =3D snap_load_get_state(); =20 + sn->postcopy =3D params->postcopy; + sn->postcopy_percent =3D params->postcopy_percent; + ioc_fd =3D qio_channel_new_fd(params->fd, NULL); qio_channel_set_name(QIO_CHANNEL(ioc_fd), "snap-channel-outgoing"); sn->f_fd =3D qemu_fopen_channel_output(ioc_fd); object_unref(OBJECT(ioc_fd)); =20 + /* Open return path QEMUFile in case we shall use postcopy */ + if (params->postcopy) { + ioc_rp_fd =3D qio_channel_new_fd(params->rp_fd, NULL); + qio_channel_set_name(QIO_CHANNEL(ioc_fd), "snap-channel-rp"); + sn->f_rp_fd =3D qemu_fopen_channel_input(ioc_rp_fd); + object_unref(OBJECT(ioc_rp_fd)); + } + sn->blk =3D snap_open(params->filename, params->bdrv_flags); if (!sn->blk) { goto fail; --=20 2.25.1