From nobody Wed Nov 19 13:55:47 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=virtuozzo.com ARC-Seal: i=1; a=rsa-sha256; t=1616001651; cv=none; d=zohomail.com; s=zohoarc; b=MAdSyC/90FrGQSr2iyyVh8iOfxeVcxfZT/QqZRvkHBiE1MChsB7aN89+/tCwx/d7uSr1doMMSa0YxOCUVtym2BDOgPr0hoN1vjvsUGkcYZubHNusLPGCo4GW+zVTVklkC/NV6O9/Q6O95qvzK2ftPtD7qLPWeCxsJv947sUAgAs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1616001651; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=KQSnYrifT5MGz3UBYuDBKnvuFBfN3TpZOQ1Wwyz4PIM=; b=V4fLcsP9jMKJY+QlHFNdNwp4DFltmlda/U57biZtO4oaTm8IRZg+yDvZ9x3upFPK3MAe33P31EvWSfhU8JudqQm4AJpxMpDxTZ+3CrqrLQsn7/QazfW5wKc0x9YLpZgIZPMcT3M3/zAhRLpoo3w11tFMRX8rEuecT1xIl/kAxWc= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1616001651416817.8437265003854; Wed, 17 Mar 2021 10:20:51 -0700 (PDT) Received: from localhost ([::1]:36648 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMZqk-00072n-6B for importer@patchew.org; Wed, 17 Mar 2021 13:20:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36706) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ9Y-0003CD-00 for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:36:15 -0400 Received: from relay.sw.ru ([185.231.240.75]:50744) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ9O-0005im-GT for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:36:11 -0400 Received: from [192.168.15.248] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lMZ8l-0034yI-PH; Wed, 17 Mar 2021 19:35:23 +0300 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=KQSnYrifT5MGz3UBYuDBKnvuFBfN3TpZOQ1Wwyz4PIM=; b=Z0YzBg2acvde YY1moErPWy9f6pXPswRxKfDqWZt07vSgIeXKv/HNG8PQ1t+veMBESQcJ2v5KEaTKppKb0iaMeLf3f AcCcSvb6n6OSZ8kgDeE5w7msJAa1+JbKx+WUAMamIPtXZeDIIJDU3Z3ZBoS4h+ajh2H4ezzMNMERE OiX3s=; From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [RFC PATCH 8/9] migration/snap-tool: Implementation of snapshot loading in precopy Date: Wed, 17 Mar 2021 19:32:21 +0300 Message-Id: <20210317163222.182609-9-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> References: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @virtuozzo.com) Content-Type: text/plain; charset="utf-8" This part implements snapshot loading in precopy mode. Signed-off-by: Andrey Gruzdev --- include/qemu-snap.h | 24 ++ qemu-snap-handlers.c | 586 ++++++++++++++++++++++++++++++++++++++++++- qemu-snap.c | 44 +++- 3 files changed, 649 insertions(+), 5 deletions(-) diff --git a/include/qemu-snap.h b/include/qemu-snap.h index 570f200c9d..f9f6db529f 100644 --- a/include/qemu-snap.h +++ b/include/qemu-snap.h @@ -26,6 +26,11 @@ */ #define PAGE_SIZE_MAX 16384 =20 +/* Buffer size for RAM chunk loads via AIO buffer_pool */ +#define AIO_BUFFER_SIZE (1024 * 1024) +/* Max. concurrent AIO tasks */ +#define AIO_TASKS_MAX 8 + typedef struct AioBufferPool AioBufferPool; =20 typedef struct AioBufferStatus { @@ -96,6 +101,25 @@ typedef struct SnapSaveState { =20 typedef struct SnapLoadState { BlockBackend *blk; /* Block backend */ + + QEMUFile *f_fd; /* Outgoing migration stream QEMUFile */ + QEMUFile *f_vmstate; /* Block backend vmstate area QEMUFile */ + /* + * Buffer to keep first few KBs of BDRV vmstate that we stashed at the + * start. Within this buffer are VM state header, configuration section + * and the first 'ram' section with RAM block list. + */ + QIOChannelBuffer *ioc_lbuf; + + /* AIO buffer pool */ + AioBufferPool *aio_pool; + + /* BDRV vmstate offset of RAM block list section */ + int64_t state_ram_list_offset; + /* BDRV vmstate offset of the first device section */ + int64_t state_device_offset; + /* BDRV vmstate End-Of-File */ + int64_t state_eof; } SnapLoadState; =20 SnapSaveState *snap_save_get_state(void); diff --git a/qemu-snap-handlers.c b/qemu-snap-handlers.c index 4b63d42a29..7dfe950829 100644 --- a/qemu-snap-handlers.c +++ b/qemu-snap-handlers.c @@ -41,12 +41,22 @@ typedef struct RAMBlockDesc { int64_t length; /* RAM block used_length */ int64_t nr_pages; /* RAM block page count (length >> page_bi= ts) */ =20 + int64_t last_offset; /* Last offset sent in precopy */ + char idstr[256]; /* RAM block id string */ =20 + unsigned long *bitmap; /* Loaded pages bitmap */ + /* Link into ram_list */ QSIMPLEQ_ENTRY(RAMBlockDesc) next; } RAMBlockDesc; =20 +/* Reference to the RAM page with block/page tuple */ +typedef struct RAMPageRef { + RAMBlockDesc *block; /* RAM block containing page */ + int64_t page; /* Page index in RAM block */ +} RAMPageRef; + /* State reflecting RAM part of snapshot */ typedef struct RAMState { int64_t page_size; /* Page size */ @@ -54,6 +64,15 @@ typedef struct RAMState { int page_bits; /* Page size bits */ =20 int64_t normal_pages; /* Total number of normal (non-zero) pages= */ + int64_t loaded_pages; /* Current number of normal pages loaded */ + + /* Last RAM block touched by load_send_ram_iterate() */ + RAMBlockDesc *last_block; + /* Last page touched by load_send_ram_iterate() */ + int64_t last_page; + + /* Last RAM block sent by load_send_ram_iterate() */ + RAMBlockDesc *last_sent_block; =20 /* List of RAM blocks */ QSIMPLEQ_HEAD(, RAMBlockDesc) ram_block_list; @@ -96,19 +115,22 @@ typedef struct SectionHandlers { =20 /* Forward declarations */ static int default_save(QEMUFile *f, void *opaque, int version_id); +static int default_load(QEMUFile *f, void *opaque, int version_id); static int ram_save(QEMUFile *f, void *opaque, int version_id); +static int ram_load(QEMUFile *f, void *opaque, int version_id); static int save_state_complete(SnapSaveState *sn); +static int coroutine_fn load_send_pages_flush(SnapLoadState *sn); =20 static RAMState ram_state; =20 static SectionHandlerOps default_handler_ops =3D { .save_section =3D default_save, - .load_section =3D NULL, + .load_section =3D default_load, }; =20 static SectionHandlerOps ram_handler_ops =3D { .save_section =3D ram_save, - .load_section =3D NULL, + .load_section =3D ram_load, }; =20 static SectionHandlers section_handlers =3D { @@ -212,6 +234,18 @@ static RAMBlockDesc *ram_block_by_idstr(const char *id= str) return NULL; } =20 +static RAMBlockDesc *ram_block_by_bdrv_offset(int64_t bdrv_offset) +{ + RAMBlockDesc *block; + + QSIMPLEQ_FOREACH(block, &ram_state.ram_block_list, next) { + if (ram_bdrv_offset_in_block(block, bdrv_offset)) { + return block; + } + } + return NULL; +} + static RAMBlockDesc *ram_block_from_stream(QEMUFile *f, int flags) { static RAMBlockDesc *block; @@ -289,6 +323,36 @@ static void ram_block_list_from_stream(QEMUFile *f, in= t64_t mem_size) } } =20 +static void ram_block_list_init_bitmaps(void) +{ + RAMBlockDesc *block; + + QSIMPLEQ_FOREACH(block, &ram_state.ram_block_list, next) { + block->nr_pages =3D block->length >> ram_state.page_bits; + + block->bitmap =3D bitmap_new(block->nr_pages); + bitmap_set(block->bitmap, 0, block->nr_pages); + } +} + +static inline +int64_t ram_block_bitmap_find_next(RAMBlockDesc *block, int64_t start) +{ + return find_next_bit(block->bitmap, block->nr_pages, start); +} + +static inline +int64_t ram_block_bitmap_find_next_clear(RAMBlockDesc *block, int64_t star= t) +{ + return find_next_zero_bit(block->bitmap, block->nr_pages, start); +} + +static inline +void ram_block_bitmap_clear(RAMBlockDesc *block, int64_t start, int64_t co= unt) +{ + bitmap_clear(block->bitmap, start, count); +} + static void save_check_file_errors(SnapSaveState *sn, int *res) { /* Check for -EIO that indicates plane EOF */ @@ -723,11 +787,517 @@ int coroutine_fn snap_save_state_main(SnapSaveState = *sn) return sn->status; } =20 +static void load_check_file_errors(SnapLoadState *sn, int *res) +{ + /* Check file errors even on success */ + if (*res >=3D 0 || *res =3D=3D -EINVAL) { + int f_res; + + f_res =3D qemu_file_get_error(sn->f_fd); + if (!f_res) { + f_res =3D qemu_file_get_error(sn->f_vmstate); + } + *res =3D f_res ? f_res : *res; + } +} + +static int ram_load(QEMUFile *f, void *opaque, int version_id) +{ + int compat_flags =3D (RAM_SAVE_FLAG_MEM_SIZE | RAM_SAVE_FLAG_EOS); + int64_t page_mask =3D ram_state.page_mask; + int flags =3D 0; + int res =3D 0; + + if (version_id !=3D 4) { + error_report("Unsupported version %d for 'ram' handler v4", versio= n_id); + return -EINVAL; + } + + while (!res && !(flags & RAM_SAVE_FLAG_EOS)) { + int64_t addr; + + addr =3D qemu_get_be64(f); + flags =3D addr & ~page_mask; + addr &=3D page_mask; + + if (flags & ~compat_flags) { + error_report("RAM page with incompatible flags: offset=3D0x%" = PRIx64 + " flags=3D0x%x", qemu_ftell2(f), flags); + res =3D -EINVAL; + break; + } + + switch (flags) { + case RAM_SAVE_FLAG_MEM_SIZE: + /* Fill RAM block list */ + ram_block_list_from_stream(f, addr); + break; + + case RAM_SAVE_FLAG_EOS: + /* Normal exit */ + break; + + default: + error_report("RAM page with unknown combination of flags:" + " offset=3D0x%" PRIx64 " page_addr=3D0x%" PRIx64 + " flags=3D0x%x", qemu_ftell2(f), addr, flags); + res =3D -EINVAL; + } + + /* Check for file errors even if all looks good */ + if (!res) { + res =3D qemu_file_get_error(f); + } + } + return res; +} + +static int default_load(QEMUFile *f, void *opaque, int version_id) +{ + error_report("Section with unknown ID: offset=3D0x%" PRIx64, + qemu_ftell2(f)); + return -EINVAL; +} + +static void send_page_header(QEMUFile *f, RAMBlockDesc *block, int64_t off= set) +{ + uint8_t hdr_buf[512]; + int hdr_len =3D 8; + + stq_be_p(hdr_buf, offset); + if (!(offset & RAM_SAVE_FLAG_CONTINUE)) { + int id_len; + + id_len =3D strlen(block->idstr); + assert(id_len < 256); + + hdr_buf[hdr_len] =3D id_len; + memcpy((hdr_buf + hdr_len + 1), block->idstr, id_len); + + hdr_len +=3D 1 + id_len; + } + + qemu_put_buffer(f, hdr_buf, hdr_len); +} + +static void send_zeropage(QEMUFile *f, RAMBlockDesc *block, int64_t offset) +{ + send_page_header(f, block, offset | RAM_SAVE_FLAG_ZERO); + qemu_put_byte(f, 0); +} + +static int send_pages_from_buffer(QEMUFile *f, AioBuffer *buffer) +{ + RAMState *rs =3D &ram_state; + int page_size =3D rs->page_size; + RAMBlockDesc *block =3D rs->last_sent_block; + int64_t bdrv_offset =3D buffer->status.offset; + int64_t flags =3D RAM_SAVE_FLAG_CONTINUE; + int pages =3D 0; + + /* Need to switch to the another RAM block? */ + if (!ram_bdrv_offset_in_block(block, bdrv_offset)) { + /* + * Lookup RAM block by BDRV offset cause in postcopy we + * can issue AIO buffer loads from non-contiguous blocks. + */ + block =3D ram_block_by_bdrv_offset(bdrv_offset); + rs->last_sent_block =3D block; + /* Reset RAM_SAVE_FLAG_CONTINUE */ + flags =3D 0; + } + + for (int offset =3D 0; offset < buffer->status.count; + offset +=3D page_size, bdrv_offset +=3D page_size) { + void *page_buf =3D buffer->data + offset; + int64_t addr; + + addr =3D ram_block_offset_from_bdrv(block, bdrv_offset); + + if (buffer_is_zero(page_buf, page_size)) { + send_zeropage(f, block, (addr | flags)); + } else { + send_page_header(f, block, + (addr | RAM_SAVE_FLAG_PAGE | flags)); + qemu_put_buffer_async(f, page_buf, page_size, false); + + /* Update non-zero page count */ + rs->loaded_pages++; + } + /* + * AioBuffer is always within a single RAM block so we need + * to set RAM_SAVE_FLAG_CONTINUE here unconditionally. + */ + flags =3D RAM_SAVE_FLAG_CONTINUE; + pages++; + } + + /* Need to flush cause we use qemu_put_buffer_async() */ + qemu_fflush(f); + return pages; +} + +static bool find_next_unsent_page(RAMPageRef *p_ref) +{ + RAMState *rs =3D &ram_state; + RAMBlockDesc *block =3D rs->last_block; + int64_t page =3D rs->last_page; + bool found =3D false; + bool full_round =3D false; + + if (!block) { +restart: + block =3D QSIMPLEQ_FIRST(&rs->ram_block_list); + page =3D 0; + full_round =3D true; + } + + while (!found && block) { + page =3D ram_block_bitmap_find_next(block, page); + if (page >=3D block->nr_pages) { + block =3D QSIMPLEQ_NEXT(block, next); + page =3D 0; + continue; + } + found =3D true; + } + + if (!found && !full_round) { + goto restart; + } + + if (found) { + p_ref->block =3D block; + p_ref->page =3D page; + } + return found; +} + +static inline +void get_unsent_page_range(RAMPageRef *p_ref, RAMBlockDesc **block, + int64_t *offset, int64_t *limit) +{ + int64_t page_limit; + + *block =3D p_ref->block; + *offset =3D p_ref->page << ram_state.page_bits; + page_limit =3D ram_block_bitmap_find_next_clear(p_ref->block, (p_ref->= page + 1)); + *limit =3D page_limit << ram_state.page_bits; +} + +static AioBufferStatus coroutine_fn load_buffers_task_co(AioBufferTask *ta= sk) +{ + SnapLoadState *sn =3D snap_load_get_state(); + AioBufferStatus ret; + int count; + + count =3D blk_pread(sn->blk, task->offset, task->buffer->data, task->s= ize); + + ret.offset =3D task->offset; + ret.count =3D count; + + return ret; +} + +static void coroutine_fn load_buffers_fill_queue(SnapLoadState *sn) +{ + RAMState *rs =3D &ram_state; + RAMPageRef p_ref; + RAMBlockDesc *block; + int64_t offset; + int64_t limit; + int64_t pages; + + if (!find_next_unsent_page(&p_ref)) { + return; + } + + get_unsent_page_range(&p_ref, &block, &offset, &limit); + + do { + AioBuffer *buffer; + int64_t bdrv_offset; + int size; + + /* Try to acquire next buffer from the pool */ + buffer =3D aio_pool_try_acquire_next(sn->aio_pool); + if (!buffer) { + break; + } + + bdrv_offset =3D ram_bdrv_from_block_offset(block, offset); + assert(bdrv_offset !=3D INVALID_OFFSET); + + /* Get maximum transfer size for current RAM block and offset */ + size =3D MIN((limit - offset), buffer->size); + aio_buffer_start_task(buffer, load_buffers_task_co, bdrv_offset, s= ize); + + offset +=3D size; + } while (offset < limit); + + rs->last_block =3D block; + rs->last_page =3D offset >> rs->page_bits; + + block->last_offset =3D offset; + + pages =3D rs->last_page - p_ref.page; + ram_block_bitmap_clear(block, p_ref.page, pages); +} + +static int coroutine_fn load_send_pages(SnapLoadState *sn) +{ + AioBuffer *compl_buffer; + int pages =3D 0; + + load_buffers_fill_queue(sn); + + compl_buffer =3D aio_pool_wait_compl_next(sn->aio_pool); + if (compl_buffer) { + /* Check AIO completion status */ + pages =3D aio_pool_status(sn->aio_pool); + if (pages < 0) { + return pages; + } + + pages =3D send_pages_from_buffer(sn->f_fd, compl_buffer); + aio_buffer_release(compl_buffer); + } + + return pages; +} + +static int coroutine_fn load_send_pages_flush(SnapLoadState *sn) +{ + AioBuffer *compl_buffer; + + while ((compl_buffer =3D aio_pool_wait_compl_next(sn->aio_pool))) { + int res =3D aio_pool_status(sn->aio_pool); + /* Check AIO completion status */ + if (res < 0) { + return res; + } + + send_pages_from_buffer(sn->f_fd, compl_buffer); + aio_buffer_release(compl_buffer); + } + + return 0; +} + +static void send_section_header_part_end(QEMUFile *f, SectionHandlersEntry= *se, + uint8_t section_type) +{ + assert(section_type =3D=3D QEMU_VM_SECTION_PART || + section_type =3D=3D QEMU_VM_SECTION_END); + + qemu_put_byte(f, section_type); + qemu_put_be32(f, se->state_section_id); +} + +static void send_section_footer(QEMUFile *f, SectionHandlersEntry *se) +{ + qemu_put_byte(f, QEMU_VM_SECTION_FOOTER); + qemu_put_be32(f, se->state_section_id); +} + +#define YIELD_AFTER_MS 500 /* ms */ + +static int coroutine_fn load_send_ram_iterate(SnapLoadState *sn) +{ + SectionHandlersEntry *se; + int64_t t_start; + int tmp_res; + int res =3D 1; + + /* Send 'ram' section header with QEMU_VM_SECTION_PART type */ + se =3D find_se("ram", 0); + send_section_header_part_end(sn->f_fd, se, QEMU_VM_SECTION_PART); + + t_start =3D qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + for (int iter =3D 0; res > 0; iter++) { + res =3D load_send_pages(sn); + + if (!(iter & 7)) { + int64_t t_cur =3D qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + if ((t_cur - t_start) > YIELD_AFTER_MS) { + break; + } + } + } + + /* Zero return code means that there're no more pages to send */ + if (res >=3D 0) { + res =3D res ? 0 : 1; + } + + /* Flush AIO buffers cause some may still remain unsent */ + tmp_res =3D load_send_pages_flush(sn); + res =3D tmp_res ? tmp_res : res; + + /* Send EOS flag before section footer */ + qemu_put_be64(sn->f_fd, RAM_SAVE_FLAG_EOS); + send_section_footer(sn->f_fd, se); + + qemu_fflush(sn->f_fd); + return res; +} + +static int load_send_leader(SnapLoadState *sn) +{ + qemu_put_buffer(sn->f_fd, (sn->ioc_lbuf->data + VMSTATE_HEADER_SIZE), + sn->state_device_offset); + return qemu_file_get_error(sn->f_fd); +} + +static int load_send_complete(SnapLoadState *sn) +{ + /* Transfer device state to the output pipe */ + file_transfer_bytes(sn->f_fd, sn->f_vmstate, + (sn->state_eof - sn->state_device_offset)); + qemu_fflush(sn->f_fd); + return 1; +} + +static int load_section_start_full(SnapLoadState *sn) +{ + QEMUFile *f =3D sn->f_vmstate; + int section_id; + int instance_id; + int version_id; + char idstr[256]; + SectionHandlersEntry *se; + int res; + + /* Read section start */ + section_id =3D qemu_get_be32(f); + if (!qemu_get_counted_string(f, idstr)) { + return qemu_file_get_error(f); + } + instance_id =3D qemu_get_be32(f); + version_id =3D qemu_get_be32(f); + + se =3D find_se(idstr, instance_id); + if (!se) { + se =3D §ion_handlers.default_entry; + } else if (version_id > se->version_id) { + /* Validate version */ + error_report("Unsupported version %d for '%s' v%d", + version_id, idstr, se->version_id); + return -EINVAL; + } + + se->state_section_id =3D section_id; + se->state_version_id =3D version_id; + + res =3D se->ops->load_section(f, sn, se->state_version_id); + if (res) { + return res; + } + + /* Finally check section footer */ + if (!check_section_footer(f, se)) { + return -EINVAL; + } + return 0; +} + +static int load_setup_ramlist(SnapLoadState *sn) +{ + QEMUFile *f =3D sn->f_vmstate; + uint8_t section_type; + int64_t section_pos; + int res; + + section_pos =3D qemu_ftell2(f); + + /* Read section type token */ + section_type =3D qemu_get_byte(f); + if (section_type =3D=3D QEMU_VM_EOF) { + error_report("Unexpected EOF token: offset=3D0x%" PRIx64, section_= pos); + return -EINVAL; + } else if (section_type !=3D QEMU_VM_SECTION_FULL && + section_type !=3D QEMU_VM_SECTION_START) { + error_report("Unexpected section type %d: offset=3D0x%" PRIx64, + section_type, section_pos); + return -EINVAL; + } + + res =3D load_section_start_full(sn); + if (!res) { + ram_block_list_init_bitmaps(); + } + return res; +} + +static int load_state_header(SnapLoadState *sn) +{ + QEMUFile *f =3D sn->f_vmstate; + uint32_t v; + + /* Validate specific MAGIC in vmstate area */ + v =3D qemu_get_be32(f); + if (v !=3D VMSTATE_MAGIC) { + error_report("Not a valid VMSTATE"); + return -EINVAL; + } + v =3D qemu_get_be32(f); + if (v !=3D ram_state.page_size) { + error_report("VMSTATE page size not matching target"); + return -EINVAL; + } + + /* Number of non-zero pages in all RAM blocks */ + ram_state.normal_pages =3D qemu_get_be64(f); + + /* VMSTATE area offsets, counted from QEMU_FILE_MAGIC */ + sn->state_ram_list_offset =3D qemu_get_be32(f); + sn->state_device_offset =3D qemu_get_be32(f); + sn->state_eof =3D qemu_get_be32(f); + + /* Check that offsets are within the limits */ + if ((VMSTATE_HEADER_SIZE + sn->state_device_offset) > INPLACE_READ_MAX= || + sn->state_device_offset <=3D sn->state_ram_list_offset) { + error_report("Corrupted VMSTATE header"); + return -EINVAL; + } + + /* Skip up to RAM block list section */ + qemu_file_skip(f, sn->state_ram_list_offset); + return 0; +} + /* Load snapshot data and send it with outgoing migration stream */ int coroutine_fn snap_load_state_main(SnapLoadState *sn) { - /* TODO: implement */ - return 0; + int res; + + res =3D load_state_header(sn); + if (res) { + goto fail; + } + res =3D load_setup_ramlist(sn); + if (res) { + goto fail; + } + res =3D load_send_leader(sn); + if (res) { + goto fail; + } + + do { + res =3D load_send_ram_iterate(sn); + /* Make additional check for file errors */ + load_check_file_errors(sn, &res); + } while (!res); + + if (res =3D=3D 1) { + res =3D load_send_complete(sn); + } + +fail: + load_check_file_errors(sn, &res); + /* Replace positive exit code with 0 */ + return res < 0 ? res : 0; } =20 /* Initialize snapshot RAM state */ @@ -748,4 +1318,12 @@ void snap_ram_init_state(int page_bits) /* Destroy snapshot RAM state */ void snap_ram_destroy_state(void) { + RAMBlockDesc *block; + RAMBlockDesc *next_block; + + /* Free RAM blocks */ + QSIMPLEQ_FOREACH_SAFE(block, &ram_state.ram_block_list, next, next_blo= ck) { + g_free(block->bitmap); + g_free(block); + } } diff --git a/qemu-snap.c b/qemu-snap.c index a337a7667b..c5efbd6803 100644 --- a/qemu-snap.c +++ b/qemu-snap.c @@ -139,7 +139,20 @@ static void snap_load_init_state(void) =20 static void snap_load_destroy_state(void) { - /* TODO: implement */ + SnapLoadState *sn =3D snap_load_get_state(); + + if (sn->aio_pool) { + aio_pool_free(sn->aio_pool); + } + if (sn->ioc_lbuf) { + object_unref(OBJECT(sn->ioc_lbuf)); + } + if (sn->f_vmstate) { + qemu_fclose(sn->f_vmstate); + } + if (sn->blk) { + blk_unref(sn->blk); + } } =20 static BlockBackend *snap_create(const char *filename, int64_t image_size, @@ -221,6 +234,12 @@ static void coroutine_fn do_snap_load_co(void *opaque) SnapTaskState *task_state =3D opaque; SnapLoadState *sn =3D snap_load_get_state(); =20 + /* Switch to non-blocking mode in coroutine context */ + qemu_file_set_blocking(sn->f_vmstate, false); + qemu_file_set_blocking(sn->f_fd, false); + /* Initialize AIO buffer pool in coroutine context */ + sn->aio_pool =3D aio_pool_new(DEFAULT_PAGE_SIZE, AIO_BUFFER_SIZE, + AIO_TASKS_MAX); /* Enter main routine */ task_state->ret =3D snap_load_state_main(sn); } @@ -310,15 +329,37 @@ fail: static int snap_load(SnapLoadParams *params) { SnapLoadState *sn; + QIOChannel *ioc_fd; + uint8_t *buf; + size_t count; int res =3D -1; =20 + snap_ram_init_state(ctz64(params->page_size)); snap_load_init_state(); sn =3D snap_load_get_state(); =20 + ioc_fd =3D qio_channel_new_fd(params->fd, NULL); + qio_channel_set_name(QIO_CHANNEL(ioc_fd), "snap-channel-outgoing"); + sn->f_fd =3D qemu_fopen_channel_output(ioc_fd); + object_unref(OBJECT(ioc_fd)); + sn->blk =3D snap_open(params->filename, params->bdrv_flags); if (!sn->blk) { goto fail; } + /* Open QEMUFile for BDRV vmstate area */ + sn->f_vmstate =3D qemu_fopen_bdrv_vmstate(blk_bs(sn->blk), 0); + + /* Create buffer channel to store leading part of VMSTATE stream */ + sn->ioc_lbuf =3D qio_channel_buffer_new(INPLACE_READ_MAX); + qio_channel_set_name(QIO_CHANNEL(sn->ioc_lbuf), "snap-leader-buffer"); + + count =3D qemu_peek_buffer(sn->f_vmstate, &buf, INPLACE_READ_MAX, 0); + res =3D qemu_file_get_error(sn->f_vmstate); + if (res < 0) { + goto fail; + } + qio_channel_write(QIO_CHANNEL(sn->ioc_lbuf), (char *) buf, count, NULL= ); =20 res =3D run_snap_task(do_snap_load_co); if (res) { @@ -327,6 +368,7 @@ static int snap_load(SnapLoadParams *params) =20 fail: snap_load_destroy_state(); + snap_ram_destroy_state(); =20 return res; } --=20 2.25.1