From nobody Thu Nov 13 14:57:00 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 158142822130167.88958775403785; Tue, 11 Feb 2020 05:37:01 -0800 (PST) Received: from localhost ([::1]:49742 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j1Vim-00026O-4K for importer@patchew.org; Tue, 11 Feb 2020 08:37:00 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:51666) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j1ULI-0001ct-Qf for qemu-devel@nongnu.org; Tue, 11 Feb 2020 07:08:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j1UL7-0004ys-Po for qemu-devel@nongnu.org; Tue, 11 Feb 2020 07:08:31 -0500 Received: from mail6.tencent.com ([220.249.245.26]:41909) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j1UL7-0004qQ-0o for qemu-devel@nongnu.org; Tue, 11 Feb 2020 07:08:29 -0500 Received: from EX-SZ018.tencent.com (unknown [10.28.6.39]) by mail6.tencent.com (Postfix) with ESMTP id 5E89BCCA33; Tue, 11 Feb 2020 20:08:42 +0800 (CST) Received: from EX-SZ003.tencent.com (10.28.6.15) by EX-SZ018.tencent.com (10.28.6.39) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Tue, 11 Feb 2020 20:08:18 +0800 Received: from EX-SZ005.tencent.com (10.28.6.29) by EX-SZ003.tencent.com (10.28.6.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Tue, 11 Feb 2020 20:08:18 +0800 Received: from EX-SZ005.tencent.com ([fe80::1c8:f876:daf6:e9c0]) by EX-SZ005.tencent.com ([fe80::1c8:f876:daf6:e9c0%4]) with mapi id 15.01.1713.004; Tue, 11 Feb 2020 20:08:18 +0800 From: =?utf-8?B?YmF1ZXJjaGVuKOmZiOiSmeiSmSk=?= To: qemu-devel Subject: Requesting review about optimizing large guest start up time Thread-Topic: Requesting review about optimizing large guest start up time Thread-Index: AQHV4NMV+rH5F4FwIUiLta/s/6EHWw== Date: Tue, 11 Feb 2020 12:08:18 +0000 Message-ID: Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [9.19.161.93] Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 220.249.245.26 X-Mailman-Approved-At: Tue, 11 Feb 2020 08:34:09 -0500 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "pbonzini >" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" From c882b155466313fcd85ac330a45a573e608b0d74 Mon Sep 17 00:00:00 2001 From: bauerchen Date: Tue, 11 Feb 2020 17:10:35 +0800 Subject: [PATCH] Optimize: large guest start-up in mem-prealloc MIME-Version: 1.0 Content-Type: text/plain; charset=3Dutf-8 Content-Transfer-Encoding: 8bit [desc]: =C2=A0 =C2=A0 Large memory VM starts slowly when using -mem-prealloc, and =C2=A0 =C2=A0 there are some areas to optimize in current method; =C2=A0 =C2=A0 1=E3=80=81mmap will be used to alloc threads stack during cre= ate page =C2=A0 =C2=A0 clearing threads, and it will attempt mm->mmap_sem for write =C2=A0 =C2=A0 lock, but clearing threads have hold read lock, this competit= ion =C2=A0 =C2=A0 will cause threads createion very slow; =C2=A0 =C2=A0 2=E3=80=81methods of calcuating pages for per threads is not = well;if we use =C2=A0 =C2=A0 64 threads to split 160 hugepage,63 threads clear 2page,1 thr= ead =C2=A0 =C2=A0 clear 34 page,so the entire speed is very slow; =C2=A0 =C2=A0 to solve the first problem,we add a mutex in thread function,= and =C2=A0 =C2=A0 start all threads when all threads finished createion; =C2=A0 =C2=A0 and the second problem, we spread remainder to other threads,= in =C2=A0 =C2=A0 situation that 160 hugepage and 64 threads, there are 32 thre= ads =C2=A0 =C2=A0 clear 3 pages,and 32 threads clear 2 pages; [test]: =C2=A0 =C2=A0 320G 84c VM start time can be reduced to 10s =C2=A0 =C2=A0 680G 84c VM start time can be reduced to 18s Signed-off-by: bauerchen Reviewed-by:Pan Rui Reviewed-by:Ivan Ren --- =C2=A0util/oslib-posix.c | 44 ++++++++++++++++++++++++++++++++++++-------- =C2=A01 file changed, 36 insertions(+), 8 deletions(-) diff --git a/util/oslib-posix.c b/util/oslib-posix.c index 5a291cc..e97369b 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -76,6 +76,10 @@ static MemsetThread *memset_thread; =C2=A0static int memset_num_threads; =C2=A0static bool memset_thread_failed; =C2=A0 +static QemuMutex page_mutex; +static QemuCond page_cond; +static volatile bool thread_create_flag; + =C2=A0int qemu_get_thread_id(void) =C2=A0{ =C2=A0#if defined(__linux__) @@ -403,6 +407,14 @@ static void *do_touch_pages(void *arg) =C2=A0 =C2=A0 =C2=A0MemsetThread *memset_args =3D (MemsetThread *)arg; =C2=A0 =C2=A0 =C2=A0sigset_t set, oldset; =C2=A0 + =C2=A0 =C2=A0/*wait for all threads create finished */ + =C2=A0 =C2=A0qemu_mutex_lock(&page_mutex); + =C2=A0 =C2=A0while(!thread_create_flag){ + =C2=A0 =C2=A0 =C2=A0 =C2=A0qemu_cond_wait(&page_cond, &page_mutex); + =C2=A0 =C2=A0} + =C2=A0 =C2=A0qemu_mutex_unlock(&page_mutex); + + =C2=A0 =C2=A0 =C2=A0/* unblock SIGBUS */ =C2=A0 =C2=A0 =C2=A0sigemptyset(&set); =C2=A0 =C2=A0 =C2=A0sigaddset(&set, SIGBUS); @@ -448,30 +460,46 @@ static inline int get_memset_num_threads(int smp_cpus) =C2=A0 =C2=A0 =C2=A0return ret; =C2=A0} =C2=A0 +static void calc_page_per_thread(size_t numpages, int memset_threads, size= _t *pages_per_thread){ + =C2=A0 =C2=A0int avg =3D numpages / memset_threads + 1; + =C2=A0 =C2=A0int i =3D 0; + =C2=A0 =C2=A0int last =3D avg * memset_threads - numpages; + =C2=A0 =C2=A0for (i =3D 0; i < memset_threads; i++) + =C2=A0 =C2=A0{ + =C2=A0 =C2=A0 =C2=A0 =C2=A0if(memset_threads - i <=3D last){ + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pages_per_thread[i] =3D avg - 1; + =C2=A0 =C2=A0 =C2=A0 =C2=A0}else + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pages_per_thread[i] =3D avg; + =C2=A0 =C2=A0} +} + =C2=A0static bool touch_all_pages(char *area, size_t hpagesize, size_t nump= ages, =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0int smp_cpus) =C2=A0{ - =C2=A0 =C2=A0size_t numpages_per_thread; - =C2=A0 =C2=A0size_t size_per_thread; + =C2=A0 =C2=A0size_t *numpages_per_thread; =C2=A0 =C2=A0 =C2=A0char *addr =3D area; =C2=A0 =C2=A0 =C2=A0int i =3D 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0memset_thread_failed =3D false; + =C2=A0 =C2=A0thread_create_flag =3D false; =C2=A0 =C2=A0 =C2=A0memset_num_threads =3D get_memset_num_threads(smp_cpus); + =C2=A0 =C2=A0numpages_per_thread =3D g_new0(size_t, memset_num_threads); =C2=A0 =C2=A0 =C2=A0memset_thread =3D g_new0(MemsetThread, memset_num_threa= ds); - =C2=A0 =C2=A0numpages_per_thread =3D (numpages / memset_num_threads); - =C2=A0 =C2=A0size_per_thread =3D (hpagesize * numpages_per_thread); + =C2=A0 =C2=A0calc_page_per_thread(numpages, memset_num_threads, numpages_= per_thread); + =C2=A0 =C2=A0 =C2=A0for (i =3D 0; i < memset_num_threads; i++) { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memset_thread[i].addr =3D addr; - =C2=A0 =C2=A0 =C2=A0 =C2=A0memset_thread[i].numpages =3D (i =3D=3D (memse= t_num_threads - 1)) ? - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0numpages : numpages_per= _thread; + =C2=A0 =C2=A0 =C2=A0 =C2=A0memset_thread[i].numpages =3D numpages_per_thr= ead[i]; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memset_thread[i].hpagesize =3D hpagesize; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0qemu_thread_create(&memset_thread[i].pgth= read, "touch_pages", =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 do_touch_pages, &memset_thread[i], =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 QEMU_THREAD_JOINABLE); - =C2=A0 =C2=A0 =C2=A0 =C2=A0addr +=3D size_per_thread; - =C2=A0 =C2=A0 =C2=A0 =C2=A0numpages -=3D numpages_per_thread; + =C2=A0 =C2=A0 =C2=A0 =C2=A0addr +=3D numpages_per_thread[i] * hpagesize; + =C2=A0 =C2=A0 =C2=A0 =C2=A0numpages -=3D numpages_per_thread[i]; =C2=A0 =C2=A0 =C2=A0} + =C2=A0 =C2=A0thread_create_flag =3D true; + =C2=A0 =C2=A0qemu_cond_broadcast(&page_cond); + =C2=A0 =C2=A0 =C2=A0for (i =3D 0; i < memset_num_threads; i++) { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0qemu_thread_join(&memset_thread[i].pgthre= ad); =C2=A0 =C2=A0 =C2=A0} --=C2=A0 1.8.3.1