From nobody Mon Nov 17 23:55:04 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1606404213; cv=none; d=zohomail.com; s=zohoarc; b=dB40EO3Q2cYqjT6x7GTon3uY+H8TCdaiiLhac7FfE8t5s0FzdDWpux7vIcozABUDZA1Rmt8wDaogbF9xnKIHsmFlbQfwZY/UK8hk/p1F2z4Nkc5KvH28oO6wv8cGoIbV+GqulJckJUh7Xg2Kab2cpE/RX/kv2rH7XCw168l1St4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1606404213; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:References:Sender:Subject:To; bh=7BUvjFcYb+7N8VLS/ZL8Hs1kNVNffXukiV1LuO9FO3c=; b=bMQDCBjxo/x7RUgHCwMAK23+fRpDf+6Y+oWf7rOqBHWffiYsAXp45KPvSZ7Tchk0nG+DtrmSlpSGeb7kaqNLvTM9h4P5bTIuXHRmhmw2YrlnaqhTRYkuASHwPfqM9M7GSx3bR5NLStA1aSbjXdq6wB0kq+J+x3Axp49URwKO+u4= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1606404213698579.2579017667908; Thu, 26 Nov 2020 07:23:33 -0800 (PST) Received: from localhost ([::1]:40826 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kiJ7L-0000az-Pe for importer@patchew.org; Thu, 26 Nov 2020 10:23:32 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:47996) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kiJ3D-000591-Ag for qemu-devel@nongnu.org; Thu, 26 Nov 2020 10:19:15 -0500 Received: from relay.sw.ru ([185.231.240.75]:49916 helo=relay3.sw.ru) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kiJ3B-0008O5-3Y for qemu-devel@nongnu.org; Thu, 26 Nov 2020 10:19:15 -0500 Received: from [192.168.15.178] (helo=andrey-MS-7B54.sw.ru) by relay3.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1kiJ2j-00AT4g-NN; Thu, 26 Nov 2020 18:18:45 +0300 To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [PATCH v4 3/6] support UFFD write fault processing in ram_save_iterate() Date: Thu, 26 Nov 2020 18:17:31 +0300 Message-Id: <20201126151734.743849-4-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201126151734.743849-1-andrey.gruzdev@virtuozzo.com> References: <20201126151734.743849-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay3.sw.ru X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Reply-to: Andrey Gruzdev From: Andrey Gruzdev via Content-Type: text/plain; charset="utf-8" In this particular implementation the same single migration thread is responsible for both normal linear dirty page migration and procesing UFFD page fault events. Processing write faults includes reading UFFD file descriptor, finding respective RAM block and saving faulting page to the migration stream. After page has been saved, write protection can be removed. Since asynchronous version of qemu_put_buffer() is expected to be used to save pages, we also have to flush migraion stream prior to un-protecting saved memory range. Write protection is being removed for any previously protected memory chunk that has hit the migration stream. That's valid for pages from linear page scan along with write fault pages. Signed-off-by: Andrey Gruzdev --- migration/ram.c | 155 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 147 insertions(+), 8 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 3adfd1948d..bcdccdaef7 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1441,6 +1441,76 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr= _t *offset) return block; } =20 +#ifdef CONFIG_LINUX +/** + * ram_find_block_by_host_address: find RAM block containing host page + * + * Returns pointer to RAMBlock if found, NULL otherwise + * + * @rs: current RAM state + * @page_address: host page address + */ +static RAMBlock *ram_find_block_by_host_address(RAMState *rs, hwaddr page_= address) +{ + RAMBlock *bs =3D rs->last_seen_block; + + do { + if (page_address >=3D (hwaddr) bs->host && (page_address + TARGET_= PAGE_SIZE) <=3D + ((hwaddr) bs->host + bs->max_length)) { + return bs; + } + + bs =3D QLIST_NEXT_RCU(bs, next); + if (!bs) { + /* Hit the end of the list */ + bs =3D QLIST_FIRST_RCU(&ram_list.blocks); + } + } while (bs !=3D rs->last_seen_block); + + return NULL; +} + +/** + * poll_fault_page: try to get next UFFD write fault page and, if pending = fault + * is found, return RAM block pointer and page offset + * + * Returns pointer to the RAMBlock containing faulting page, + * NULL if no write faults are pending + * + * @rs: current RAM state + * @offset: page offset from the beginning of the block + */ +static RAMBlock *poll_fault_page(RAMState *rs, ram_addr_t *offset) +{ + struct uffd_msg uffd_msg; + hwaddr page_address; + RAMBlock *bs; + int res; + + if (!migrate_background_snapshot()) { + return NULL; + } + + res =3D uffd_read_events(rs->uffdio_fd, &uffd_msg, 1); + if (res <=3D 0) { + return NULL; + } + + page_address =3D uffd_msg.arg.pagefault.address; + bs =3D ram_find_block_by_host_address(rs, page_address); + if (!bs) { + /* In case we couldn't find respective block, just unprotect fault= ing page. */ + uffd_protect_memory(rs->uffdio_fd, page_address, TARGET_PAGE_SIZE,= false); + error_report("ram_find_block_by_host_address() failed: address=3D0= x%0lx", + page_address); + return NULL; + } + + *offset =3D (ram_addr_t) (page_address - (hwaddr) bs->host); + return bs; +} +#endif /* CONFIG_LINUX */ + /** * get_queued_page: unqueue a page from the postcopy requests * @@ -1480,6 +1550,16 @@ static bool get_queued_page(RAMState *rs, PageSearch= Status *pss) =20 } while (block && !dirty); =20 +#ifdef CONFIG_LINUX + if (!block) { + /* + * Poll write faults too if background snapshot is enabled; that's + * when we have vcpus got blocked by the write protected pages. + */ + block =3D poll_fault_page(rs, &offset); + } +#endif /* CONFIG_LINUX */ + if (block) { /* * As soon as we start servicing pages out of order, then we have @@ -1753,6 +1833,55 @@ static int ram_save_host_page(RAMState *rs, PageSear= chStatus *pss, return pages; } =20 +/** + * ram_save_host_page_pre: ram_save_host_page() pre-notifier + * + * @rs: current RAM state + * @pss: page-search-status structure + * @opaque: pointer to receive opaque context value + */ +static inline +void ram_save_host_page_pre(RAMState *rs, PageSearchStatus *pss, void **op= aque) +{ + *(ram_addr_t *) opaque =3D pss->page; +} + +/** + * ram_save_host_page_post: ram_save_host_page() post-notifier + * + * @rs: current RAM state + * @pss: page-search-status structure + * @opaque: opaque context value + * @res_override: pointer to the return value of ram_save_host_page(), + * overwritten in case of an error + */ +static void ram_save_host_page_post(RAMState *rs, PageSearchStatus *pss, + void *opaque, int *res_override) +{ + /* Check if page is from UFFD-managed region. */ + if (pss->block->flags & RAM_UF_WRITEPROTECT) { +#ifdef CONFIG_LINUX + ram_addr_t page_from =3D (ram_addr_t) opaque; + hwaddr page_address =3D (hwaddr) pss->block->host + + ((hwaddr) page_from << TARGET_PAGE_BITS); + hwaddr run_length =3D (hwaddr) (pss->page - page_from + 1) << TARG= ET_PAGE_BITS; + int res; + + /* Flush async buffers before un-protect. */ + qemu_fflush(rs->f); + /* Un-protect memory range. */ + res =3D uffd_protect_memory(rs->uffdio_fd, page_address, run_lengt= h, false); + /* We don't want to override existing error from ram_save_host_pag= e(). */ + if (res < 0 && *res_override >=3D 0) { + *res_override =3D res; + } +#else + /* Should never happen */ + qemu_file_set_error(rs->f, -ENOSYS); +#endif /* CONFIG_LINUX */ + } +} + /** * ram_find_and_save_block: finds a dirty page and sends it to f * @@ -1779,14 +1908,14 @@ static int ram_find_and_save_block(RAMState *rs, bo= ol last_stage) return pages; } =20 + if (!rs->last_seen_block) { + rs->last_seen_block =3D QLIST_FIRST_RCU(&ram_list.blocks); + } + pss.block =3D rs->last_seen_block; pss.page =3D rs->last_page; pss.complete_round =3D false; =20 - if (!pss.block) { - pss.block =3D QLIST_FIRST_RCU(&ram_list.blocks); - } - do { again =3D true; found =3D get_queued_page(rs, &pss); @@ -1797,7 +1926,11 @@ static int ram_find_and_save_block(RAMState *rs, boo= l last_stage) } =20 if (found) { + void *opaque; + + ram_save_host_page_pre(rs, &pss, &opaque); pages =3D ram_save_host_page(rs, &pss, last_stage); + ram_save_host_page_post(rs, &pss, opaque, &pages); } } while (!pages && again); =20 @@ -3864,9 +3997,12 @@ fail: rs->uffdio_fd =3D -1; return -1; #else + /* + * Should never happen since we prohibit 'background-snapshot' + * capability on non-Linux hosts. + */ rs->uffdio_fd =3D -1; - error_setg(&migrate_get_current()->error, - "Background-snapshot not supported on non-Linux hosts"); + error_setg(&migrate_get_current()->error, QERR_UNDEFINED_ERROR); return -1; #endif /* CONFIG_LINUX */ } @@ -3903,8 +4039,11 @@ void ram_write_tracking_stop(void) uffd_close_fd(rs->uffdio_fd); rs->uffdio_fd =3D -1; #else - error_setg(&migrate_get_current()->error, - "Background-snapshot not supported on non-Linux hosts"); + /* + * Should never happen since we prohibit 'background-snapshot' + * capability on non-Linux hosts. + */ + error_setg(&migrate_get_current()->error, QERR_UNDEFINED_ERROR); #endif /* CONFIG_LINUX */ } =20 --=20 2.25.1