From nobody Sat May 18 09:22:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1623339616324945.1422289763701; Thu, 10 Jun 2021 08:40:16 -0700 (PDT) Received: from localhost ([::1]:38284 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lrMn1-0003s6-9r for importer@patchew.org; Thu, 10 Jun 2021 11:40:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57362) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lrMmO-0003Br-ON for qemu-devel@nongnu.org; Thu, 10 Jun 2021 11:39:36 -0400 Received: from [114.255.44.146] (port=31709 helo=mail.kingsoft.com) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lrMmM-0001Vu-5Q for qemu-devel@nongnu.org; Thu, 10 Jun 2021 11:39:36 -0400 Received: from mail.kingsoft.com (localhost [10.88.1.79]) (using TLS with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mail.kingsoft.com (SMG-1-NODE-87) with SMTP id 0D.46.05588.23232C06; Thu, 10 Jun 2021 23:39:30 +0800 (HKT) Received: from KSbjmail3.kingsoft.cn (10.88.1.78) by KSBJMAIL4.kingsoft.cn (10.88.1.79) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.14; Thu, 10 Jun 2021 23:39:30 +0800 Received: from KSbjmail3.kingsoft.cn ([fe80::a143:8393:3ff1:cd3]) by KSBJMAIL3.kingsoft.cn ([fe80::a143:8393:3ff1:cd3%10]) with mapi id 15.01.2176.014; Thu, 10 Jun 2021 23:39:30 +0800 X-AuditID: 0a580157-8cdff700000015d4-47-60c23232250d From: =?gb2312?B?TElaSEFPWElOMSBbwO7V1fbOXQ==?= To: "qemu-devel@nongnu.org" , "quintela@redhat.com" , "dgilbert@redhat.com" Subject: [PATCH v2] migration/rdma: Use huge page register VM memory Thread-Topic: [PATCH v2] migration/rdma: Use huge page register VM memory Thread-Index: AddeDoRRKABeLUUiRGG9h//M45eppQ== Date: Thu, 10 Jun 2021 15:39:30 +0000 Message-ID: Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.88.1.106] Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrOLMWRmVeSWpSXmKPExsXCFcHor2tkdCjBYNI1XYvebffYLY737mCx uLOlj8mB2ePJtc1MHu/3XWULYIrisklJzcksSy3St0vgylhxX6pgmnbFv2V+DYwdWl2MnBwS AiYSFy/dY+li5OIQEpjOJPH1TBs7hPOCUWLOudtMEM5uRon2y5eZQFrYBDwlPq06wwaSEBHo Z5S4sf4BWBWzQA+TxO83raxdjBwcwgIuEs+f+IM0iAA1LN6xkR3C1pO4dvkeG4jNIqAq8f7B TFYQm1fAWuLLjwWMIDajgKzEtEf3wZYxC4hLzJ02ixXiVgGJJXvOM0PYohIvH/8DWyUhIC+x +qMwRLmWxLyG31CtihJTuh+yQ4wXlDg58wnLBEaRWUimzkLSMgtJyywkLQsYWVYxshTnphtu YoSEffgOxnlNH/UOMTJxMB5ilOBgVhLhzVE7lCDEm5JYWZValB9fVJqTWnyIUZqDRUmc93PO wQQhgfTEktTs1NSC1CKYLBMHp1QD0zb/OpG/3BHcV+aLJEtXTTgksuNtglp3bd9z+TcpnxVS r9qU1+duWa70P+WDFsP29O0LPlduZH29d2IJr7vupmtblE71izZyLWHgMNgnVH7pR0Kn+MoT c+c/cq8QKXcSyKjdYGqz/smPzMUmnA/WLuXZlPC6Mb+vl4tVq+rR0RXh5pf7N6Wkxl+/Kxlk K292fNs+Gc9j5jWiYSdNuSdtL/azD3v0vXHR+l97t8xzuJl43vvQuyM7uDy014Y0ai3qKNhz MWli/qcZik03nnJInvJhF3Y3WMkSX8fxtL3pl72i7szq0KQG9X28djbzy3/l3356s/7o6c+7 87Nmv436sCX2tuama+3+4ob58p/7lViKMxINtZiLihMBtU3nQOoCAAA= X-Host-Lookup-Failed: Reverse DNS lookup failed for 114.255.44.146 (failed) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=114.255.44.146; envelope-from=LIZHAOXIN1@kingsoft.com; helo=mail.kingsoft.com X-Spam_score_int: 21 X-Spam_score: 2.1 X-Spam_bar: ++ X-Spam_report: (2.1 / 5.0 requ) BAYES_00=-1.9, CHARSET_FARAWAY_HEADER=3.2, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?gb2312?B?TElaSEFPWElOMSBbwO7V1fbOXQ==?= , =?gb2312?B?c3VuaGFvMiBby+/qu10=?= , =?gb2312?B?REVOR0xJTldFTiBbtcvB1s7EXQ==?= , =?gb2312?B?WUFOR0ZFTkcxIFvR7rflXQ==?= Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" When using libvirt for RDMA live migration, if the VM memory is too large, it will take a lot of time to deregister the VM at the source side, resulting in a long downtime (VM 64G, deregister vm time is about 400ms). =20 Although the VM's memory uses 2M huge pages, the MLNX driver still uses 4K pages for pin memory, as well as for unpin. So we use huge pages to skip the process of pin memory and unpin memory to reduce downtime. =20 --- v2 - Add page_size in struct RDMALocalBlock - Use page_size to determine whether VM uses huge page --- =20 Signed-off-by: lizhaoxin diff --git a/migration/rdma.c b/migration/rdma.c index 1cdb4561f3..703816ebc7 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -215,6 +215,7 @@ typedef struct RDMALocalBlock { uint64_t remote_host_addr; /* remote virtual address */ uint64_t offset; uint64_t length; + uint64_t page_size; struct ibv_mr **pmr; /* MRs for chunk-level registration */ struct ibv_mr *mr; /* MR for non-chunk-level registration= */ uint32_t *remote_keys; /* rkeys for chunk-level registration = */ @@ -565,7 +566,8 @@ static inline uint8_t *ram_chunk_end(const RDMALocalBlo= ck *rdma_ram_block, =20 static int rdma_add_block(RDMAContext *rdma, const char *block_name, void *host_addr, - ram_addr_t block_offset, uint64_t length) + ram_addr_t block_offset, uint64_t length, + uint64_t page_size) { RDMALocalBlocks *local =3D &rdma->local_ram_blocks; RDMALocalBlock *block; @@ -595,6 +597,7 @@ static int rdma_add_block(RDMAContext *rdma, const char= *block_name, block->local_host_addr =3D host_addr; block->offset =3D block_offset; block->length =3D length; + block->page_size =3D page_size; block->index =3D local->nb_blocks; block->src_index =3D ~0U; /* Filled in by the receipt of the block lis= t */ block->nb_chunks =3D ram_chunk_index(host_addr, host_addr + length) + = 1UL; @@ -634,7 +637,8 @@ static int qemu_rdma_init_one_block(RAMBlock *rb, void = *opaque) void *host_addr =3D qemu_ram_get_host_addr(rb); ram_addr_t block_offset =3D qemu_ram_get_offset(rb); ram_addr_t length =3D qemu_ram_get_used_length(rb); - return rdma_add_block(opaque, block_name, host_addr, block_offset, len= gth); + ram_addr_t page_size =3D qemu_ram_pagesize(rb); + return rdma_add_block(opaque, block_name, host_addr, block_offset, len= gth, page_size); } =20 /* @@ -1123,13 +1127,25 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAConte= xt *rdma) RDMALocalBlocks *local =3D &rdma->local_ram_blocks; =20 for (i =3D 0; i < local->nb_blocks; i++) { - local->block[i].mr =3D - ibv_reg_mr(rdma->pd, - local->block[i].local_host_addr, - local->block[i].length, - IBV_ACCESS_LOCAL_WRITE | - IBV_ACCESS_REMOTE_WRITE - ); + if (local->block[i].page_size !=3D qemu_real_host_page_size) { + local->block[i].mr =3D + ibv_reg_mr(rdma->pd, + local->block[i].local_host_addr, + local->block[i].length, + IBV_ACCESS_LOCAL_WRITE | + IBV_ACCESS_REMOTE_WRITE | + IBV_ACCESS_ON_DEMAND | + IBV_ACCESS_HUGETLB + ); + } else { + local->block[i].mr =3D + ibv_reg_mr(rdma->pd, + local->block[i].local_host_addr, + local->block[i].length, + IBV_ACCESS_LOCAL_WRITE | + IBV_ACCESS_REMOTE_WRITE + ); + } if (!local->block[i].mr) { perror("Failed to register local dest ram block!\n"); break;