From nobody Tue Feb 10 12:43:03 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1492006193608704.9245978255391; Wed, 12 Apr 2017 07:09:53 -0700 (PDT) Received: from localhost ([::1]:44555 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cyIy0-0005cu-82 for importer@patchew.org; Wed, 12 Apr 2017 10:09:52 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36631) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cyIuf-00031I-RV for qemu-devel@nongnu.org; Wed, 12 Apr 2017 10:06:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cyIuX-00018B-PD for qemu-devel@nongnu.org; Wed, 12 Apr 2017 10:06:25 -0400 Received: from szxga03-in.huawei.com ([45.249.212.189]:3366 helo=dggrg03-dlp.huawei.com) by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71) (envelope-from ) id 1cyIuK-0000tR-Mh; Wed, 12 Apr 2017 10:06:05 -0400 Received: from 172.30.72.55 (EHLO DGGEML403-HUB.china.huawei.com) ([172.30.72.55]) by dggrg03-dlp.huawei.com (MOS 4.4.6-GA FastPath queued) with ESMTP id ALP11468; Wed, 12 Apr 2017 22:05:50 +0800 (CST) Received: from localhost (10.177.24.212) by DGGEML403-HUB.china.huawei.com (10.3.17.33) with Microsoft SMTP Server id 14.3.301.0; Wed, 12 Apr 2017 22:05:39 +0800 From: zhanghailiang To: , Date: Wed, 12 Apr 2017 22:05:16 +0800 Message-ID: <1492005921-15664-2-git-send-email-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.7.2.windows.1 In-Reply-To: <1492005921-15664-1-git-send-email-zhang.zhanghailiang@huawei.com> References: <1492005921-15664-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.177.24.212] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.58EE3440.032D, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: e36426469222b5fe45615e6ca46f9aba X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] [fuzzy] X-Received-From: 45.249.212.189 Subject: [Qemu-devel] [PATCH v4 1/6] docs/block-replication: Add description for shared-disk case X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, xiecl.fnst@cn.fujitsu.com, zhangchen.fnst@cn.fujitsu.com, Wen Congyang , qemu-block@nongnu.org, zhanghailiang Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introuduce the scenario of shared-disk block replication and how to use it. Reviewed-by: Changlong Xie Reviewed-by: Stefan Hajnoczi Signed-off-by: zhanghailiang Signed-off-by: Wen Congyang Signed-off-by: Zhang Chen --- docs/block-replication.txt | 139 +++++++++++++++++++++++++++++++++++++++++= ++-- 1 file changed, 135 insertions(+), 4 deletions(-) diff --git a/docs/block-replication.txt b/docs/block-replication.txt index 6bde673..fbfe005 100644 --- a/docs/block-replication.txt +++ b/docs/block-replication.txt @@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the netwo= rk transportation effort during a vmstate checkpoint, the disk modification operations of the Primary disk are asynchronously forwarded to the Secondary node. =20 -=3D=3D Workflow =3D=3D +=3D=3D Non-shared disk workflow =3D=3D The following is the image of block replication workflow: =20 +----------------------+ +------------------------+ @@ -57,7 +57,7 @@ The following is the image of block replication workflow: 4) Secondary write requests will be buffered in the Disk buffer and it will overwrite the existing sector content in the buffer. =20 -=3D=3D Architecture =3D=3D +=3D=3D Non-shared disk architecture =3D=3D We are going to implement block replication from many basic blocks that are already in QEMU. =20 @@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculat= ive write-through of the NBD server into the secondary disk. So before block replication, the primary disk and secondary disk should contain the same data. =20 +=3D=3D Shared Disk Mode Workflow =3D=3D +The following is the image of block replication workflow: + + +----------------------+ +------------------------+ + |Primary Write Requests| |Secondary Write Requests| + +----------------------+ +------------------------+ + | | + | (4) + | V + | /-------------\ + | (2)Forward and write through | | + | +--------------------------> | Disk Buffer | + | | | | + | | \-------------/ + | |(1)read | + | | | + (3)write | | | backing file + V | | + +-----------------------------+ | + | Shared Disk | <-----+ + +-----------------------------+ + + 1) Primary writes will read original data and forward it to Secondary + QEMU. + 2) Before Primary write requests are written to Shared disk, the + original sector content will be read from Shared disk and + forwarded and buffered in the Disk buffer on the secondary site, + but it will not overwrite the existing sector content (it could be + from either "Secondary Write Requests" or previous COW of "Primary + Write Requests") in the Disk buffer. + 3) Primary write requests will be written to Shared disk. + 4) Secondary write requests will be buffered in the Disk buffer and it + will overwrite the existing sector content in the buffer. + +=3D=3D Shared Disk Mode Architecture =3D=3D +We are going to implement block replication from many basic +blocks that are already in QEMU. + virtio-blk || .= ---------- + / || |= Secondary + / || '= ---------- + / || = virtio-blk + / || = | + | || r= eplication(5) + | NBD --------> NBD (2) = | + | client || server ---> hidden disk <-- = active disk(4) + | ^ || | + | replication(1) || | + | | || | + | +-----------------' || | + (3) |drive-backup sync=3Dnone || | +--------. | +-----------------+ || | +Primary | | | || backing | +--------' | | || | + V | | + +-------------------------------------------+ | + | shared disk | <----------+ + +-------------------------------------------+ + + + 1) Primary writes will read original data and forward it to Secondary + QEMU. + 2) The hidden-disk buffers the original content that is modified by the + primary VM. It should also be an empty disk, and the driver supports + bdrv_make_empty() and backing file. + 3) Primary write requests will be written to Shared disk. + 4) Secondary write requests will be buffered in the active disk and it + will overwrite the existing sector content in the buffer. + =3D=3D Failure Handling =3D=3D There are 7 internal errors when block replication is running: 1. I/O error on primary disk @@ -145,7 +213,7 @@ d. replication_stop_all() things except failover. The caller must hold the I/O mutex lock if it is in migration/checkpoint thread. =20 -=3D=3D Usage =3D=3D +=3D=3D Non-shared disk usage =3D=3D Primary: -drive if=3Dxxx,driver=3Dquorum,read-pattern=3Dfifo,id=3Dcolo1,vote-thre= shold=3D1,\ children.0.file.filename=3D1.raw,\ @@ -234,6 +302,69 @@ Secondary: The primary host is down, so we should do the following thing: { 'execute': 'nbd-server-stop' } =20 +=3D=3D Shared disk usage =3D=3D +Primary: + -drive if=3Dvirtio,id=3Dprimary_disk0,file.filename=3D1.raw,driver=3Draw + +Issue qmp command: + { 'execute': 'blockdev-add', + 'arguments': { + 'driver': 'replication', + 'node-name': 'rep', + 'mode': 'primary', + 'shared-disk-id': 'primary_disk0', + 'shared-disk': true, + 'file': { + 'driver': 'nbd', + 'export': 'hidden_disk0', + 'server': { + 'type': 'inet', + 'data': { + 'host': 'xxx.xxx.xxx.xxx', + 'port': 'yyy' + } + } + } + } + } + +Secondary: + -drive if=3Dnone,driver=3Dqcow2,file.filename=3D/mnt/ramfs/hidden_disk.im= g,id=3Dhidden_disk0,\ + backing.driver=3Draw,backing.file.filename=3D1.raw \ + -drive if=3Dvirtio,id=3Dactive-disk0,driver=3Dreplication,mode=3Dsecondar= y,\ + file.driver=3Dqcow2,top-id=3Dactive-disk0,\ + file.file.filename=3D/mnt/ramfs/active_disk.img,\ + file.backing=3Dhidden_disk0,shared-disk=3Don + +Issue qmp command: +1. { 'execute': 'nbd-server-start', + 'arguments': { + 'addr': { + 'type': 'inet', + 'data': { + 'host': '0', + 'port': 'yyy' + } + } + } + } +2. { 'execute': 'nbd-server-add', + 'arguments': { + 'device': 'hidden_disk0', + 'writable': true + } + } + +After Failover: +Primary: + { 'execute': 'x-blockdev-del', + 'arguments': { + 'node-name': 'rep' + } + } + +Secondary: + {'execute': 'nbd-server-stop' } + TODO: 1. Continuous block replication -2. Shared disk --=20 1.8.3.1