From nobody Mon May 20 20:30:43 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1683748495; cv=none; d=zohomail.com; s=zohoarc; b=F5F8u+GHbUlEfkj3xNkjsjAigzoeH6h92xFMpIg9j8dpotpmlHYDzxwo3E2bc3eyclGS/VZIVEftjAHbv0iDazChJhSJV3p0ObEGalRF+c59dq+7CVUrcjdnmZzszWzhKRfLYKkV7bM4tBIHuWYgyrzAbWmQgzGzVxwnzg4YIbU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1683748495; h=Content-Transfer-Encoding:Cc:Date:From:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:To; bh=FPljtUUDyeSn/vzf5+ZMH3MIaaAA7mEnSC1wHlJ6FiY=; b=nqbcxskWR7KRtd/6noeAFeT6c6PPB4c3o1bNO93oQPAup/U648iRYEQcIWeTdzciyV3bW84xQVb1RyIpPOQVkGCoOyLKEQsaiRbhtXSALW0T4jL/wTewFv6V4jqkcvBFSrPMT8W5AYWdC/6ZPDY5cpV7cVsy1l2+KZJGJ7TUXXI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1683748495361408.07724837696367; Wed, 10 May 2023 12:54:55 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pwpso-0002Dv-Gd; Wed, 10 May 2023 15:53:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pwpsm-0002Dh-Me for qemu-devel@nongnu.org; Wed, 10 May 2023 15:53:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pwpsk-0003Bj-4i for qemu-devel@nongnu.org; Wed, 10 May 2023 15:53:52 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-643-Es3R-8PPNv6AW8JuM8TJtw-1; Wed, 10 May 2023 15:53:45 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3058C1011579; Wed, 10 May 2023 19:53:45 +0000 (UTC) Received: from secure.mitica (unknown [10.39.192.247]) by smtp.corp.redhat.com (Postfix) with ESMTP id E9A8D2026D16; Wed, 10 May 2023 19:53:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683748428; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=FPljtUUDyeSn/vzf5+ZMH3MIaaAA7mEnSC1wHlJ6FiY=; b=M3Q89XD9SnzNh2PqeVyREFew+I7QY5SnYgOO0GE6hQvKF7cY+0wSIasssREo14foqMWcgH lun4/hr8qGiOC1JHeogn94tqIx+d1R/2NEMmol/3anGsfkTzZUdPPYNaj+/4dxTbRTNMjz vMRHAHf0VYa0HYTYfFUymaSeVSgOHVs= X-MC-Unique: Es3R-8PPNv6AW8JuM8TJtw-1 From: Juan Quintela To: qemu-devel@nongnu.org Cc: Lukas Straub , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Vladimir Sementsov-Ogievskiy , Avihai Horon , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Leonardo Bras , Juan Quintela , Thomas Huth , Peter Xu , Markus Armbruster Subject: [PATCH] migration: Add documentation for backwards compatiblity Date: Wed, 10 May 2023 21:53:41 +0200 Message-Id: <20230510195341.7591-1-quintela@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=quintela@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1683748497361100003 Content-Type: text/plain; charset="utf-8" State what are the requeriments to get migration working between qemu versions. And once there explain how one is supposed to implement a new feature/default value and not break migration. Signed-off-by: Juan Quintela --- Hi I will really appreciate reviews: - I don't speak natively .rst format, so let me what I have done wrong. - English is not my native language either (no points if had guessed that). - This is stuff is obvious to me, so let me when I have assumed things, things that need to be claried, explained better, etc. Thanks, Juan. --- docs/devel/migration.rst | 212 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 212 insertions(+) diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst index 6f65c23b47..daa510da42 100644 --- a/docs/devel/migration.rst +++ b/docs/devel/migration.rst @@ -142,6 +142,218 @@ General advice for device developers may be different on the destination. This can result in the device state being loaded into the wrong device. =20 +How backwards compatibility work +-------------------------------- + +When we do migration, we have to qemus: source and target qemu. There +are two cases, they are the same version or they are a different +version. The easy case is when they are the same version. The +difficult one is when they are different versions. + +There are two things that are different, but they have very similar +names and sometimes get confused: +- qemu version +- machine version + +Let's start with a practical example, we start with: + +- qemu-system-x86_64 (v5.2), from now one qemu-5.2. +- qemu-system-x86_64 (v5.1), from now one qemu-5.1. + +Related to this are the "latest" machine types defined on each of +them: + +- pc-q35-5.2 (newer one in qemu-5.2) from now on pc-5.2 +- pc-q35-5.1 (newer one qemu-5.1) from now on pc-5.1 + +First of all, migration is only supposed to work if you use the same +machine type in both source and destination. The qemu configuration +needs to be the same also on source and destination. But that is not +relevant for this section. + +I am going to list the number of combinations that we can have. Let's +start with the trivial ones, qemu is the same on source and +destination: + +1 - qemu-5.2 -M pc-5.2 -> migrates to -> qemu-5.2 -M pc-5.2 + + This is the latest qemu with the latest machine type. + This have to work, and if it don't work it is a bug. + +2 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 + + Exactly the same case than the previous one, but for 5.1. + Nothing to see here either. + +This are the easiest ones, we will not talk more about them in this +section. + +Now we start with the more interesting cases. Let start with the +same qemu but not the same machine type. + +3 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 + + It needs to use the definition of pc-5.1 and the devices as they + were configured on 5.1, but this should be easy in the sense that + both sides are the same qemu and both sides have exactly the same + idea of what the pc-5.1 machine is. + +4 - qemu-5.1 -M pc-5.2 -> migrates to -> qemu-5.1 -M pc-5.2 + + This combination is not possible as the qemu-5.1 don't understand + pc-5.2 machine type. So nothing to worry here. + +Now it comes the interesting ones, when both qemus are different. +Notice also that the machine type needs to be pc-5.1, because we have +the limitation than qemu-5.1 don't know pc-5.2. So the possible cases +are: + +5 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 + + This migration is known as newer to older. We need to make sure + when we are developing 5.2 we need to take care about not to break + migration to qemu-5.1. Notice that we can't make updates to + qemu-5.1 to understand whatever qemu-5.2 decides to change, so it is + in qemu-5.2 side to make the relevant changes. + +6 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 + + This migration is known as older to newer. We need to make sure + than we are able to receive migrations from qemu-5.1. The problem is + similar to the previous one. + +If qemu-5.1 and qemu-5.2 were the same, there will not be any +compatibility problems. But the reason that we create qemu-5.2 is to +get new features, devices, defaults, etc. + +If we get a device that get a new feature, or change a default value, +we have a problem when we try to migrate between different qemu +versions. + +So we need a way to tell qemu-5.2 than when we are using machine type +pc-5.1, it needs to **not** use the feature, to be able to migrate to +read qemu-5.1. + +And the equivalent part when migrating from qemu-5.1 to qemu-5.2. +qemu-5.2 have to expect that it is not going to get data for the new +feature, because qemu-5.1 don't know about it. + +How do we tell qemu about this device feature changes? In +hw/core/machine.c:hw_compat_X_Y arrays. + +If we change a default value, we need to put back the old value on +that array. And the device, during initialization needs to look at +that array to see what value it needs to get for that feature. And +what are we going to put on that array, the value of a property. + +To create a property for a device, we need to use one of the +DEFINE_PROP_*() macros. See include/hw/qdev-properties.h to find the +macros that exist. With it, we set the default value for that +property, and that is what it is going to get in the latest released +version. But if we want a different value for a previous version, we +can change that in the hw_compat_X_Y arrays. + +hw_compat_X_Y is an array of registers that have the format: + +- name_device +- name_property +- value + +Let's see a practical example. + +In qemu-5.2 virtio-blk-device got multi queue support. This is a +change that is not backward compatible. In qemu-5.1 it has one +queue. In qemu-5.2 it has the same number of queues than the number of +cpus in the system. + +When we are doing migration, if we migrate from a device that has 4 +queues to a device that have only one queue, we don't know where to +put the extra information for the other 3 queues, and we fail +migration. + +Similar problem when we migrate from qemu-5.1 that has only one queue +to qemu-5.2, we only sent information for one queue, but destination +has 4, and we have 3 queues that are not properly initialized and +anything can happen. + +So, how can we address this problem. Easy, just convince qemu-5.2 +that when it is running pc-5.1, it needs to set the number of queues +for virtio-blk-devices to 1. + +That way we fix the cases 5 and 6. + +5 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 + + qemu-5.2 -M pc-5.1 sets number of queues to be 1. + qemu-5.1 -M pc-5.1 expects number of queues to be 1. + + correct. migration works. + +6 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 + + qemu-5.1 -M pc-5.1 sets number of queues to be 1. + qemu-5.2 -M pc-5.1 expects number of queues to be 1. + + correct. migration works. + +And now the other interesting case, case 3. In this case we have: + +3 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 + + Here we have the same qemu in both sides. So it don't matter a + lot if we have setup the number of queues to 1 or not, because + they are the same. + + WRONG! + + Think what happens if we do one of this double migrations: + + A -> migrates -> B -> migrates -> C + + where: + + A: qemu-5.1 -M pc-5.1 + B: qemu-5.2 -M pc-5.1 + C: qemu-5.2 -M pc-5.1 + + migration A -> B is case 6, so number of queues needs to be 1. + + migration B -> C is case 3, so we don't care. But actually we + care because we haven't started the guest in qemu-5.2, it came + migrated from qemu-5.1. So to be in the safe place, we need to + always use number of queues 1 when we are using pc-5.1. + +Now, how was this done in reality? The following commit shows how it +was done. + +commit 9445e1e15e66c19e42bea942ba810db28052cd05 +Author: Stefan Hajnoczi +Date: Tue Aug 18 15:33:47 2020 +0100 + + virtio-blk-pci: default num_queues to -smp N + +The relevant parts for migration are: + +@@ -1281,7 +1284,8 @@ static Property virtio_blk_properties[] =3D { + #endif + DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging,= 0, + true), +- DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1), ++ DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, ++ VIRTIO_BLK_AUTO_NUM_QUEUES), + DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256), + +It changes the default value of num_queues. But it fishes it for old +machine types to have the right value: + +@@ -31,6 +31,7 @@ + GlobalProperty hw_compat_5_1[] =3D { + ... ++ { "virtio-blk-device", "num-queues", "1"}, + ... + }; + + VMState ------- =20 --=20 2.40.1