From nobody Thu Nov 13 14:57:00 2025
Delivered-To: importer@patchew.org
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 158142822130167.88958775403785;
 Tue, 11 Feb 2020 05:37:01 -0800 (PST)
Received: from localhost ([::1]:49742 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1j1Vim-00026O-4K
	for importer@patchew.org; Tue, 11 Feb 2020 08:37:00 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:51666)
 by lists.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <bauerchen@tencent.com>) id 1j1ULI-0001ct-Qf
 for qemu-devel@nongnu.org; Tue, 11 Feb 2020 07:08:42 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <bauerchen@tencent.com>) id 1j1UL7-0004ys-Po
 for qemu-devel@nongnu.org; Tue, 11 Feb 2020 07:08:31 -0500
Received: from mail6.tencent.com ([220.249.245.26]:41909)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <bauerchen@tencent.com>)
 id 1j1UL7-0004qQ-0o
 for qemu-devel@nongnu.org; Tue, 11 Feb 2020 07:08:29 -0500
Received: from EX-SZ018.tencent.com (unknown [10.28.6.39])
 by mail6.tencent.com (Postfix) with ESMTP id 5E89BCCA33;
 Tue, 11 Feb 2020 20:08:42 +0800 (CST)
Received: from EX-SZ003.tencent.com (10.28.6.15) by EX-SZ018.tencent.com
 (10.28.6.39) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Tue, 11 Feb
 2020 20:08:18 +0800
Received: from EX-SZ005.tencent.com (10.28.6.29) by EX-SZ003.tencent.com
 (10.28.6.15) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Tue, 11 Feb
 2020 20:08:18 +0800
Received: from EX-SZ005.tencent.com ([fe80::1c8:f876:daf6:e9c0]) by
 EX-SZ005.tencent.com ([fe80::1c8:f876:daf6:e9c0%4]) with mapi id
 15.01.1713.004; Tue, 11 Feb 2020 20:08:18 +0800
From: =?utf-8?B?YmF1ZXJjaGVuKOmZiOiSmeiSmSk=?= <bauerchen@tencent.com>
To: qemu-devel <qemu-devel@nongnu.org>
Subject: Requesting review  about optimizing large guest start up time
Thread-Topic: Requesting review  about optimizing large guest start up time
Thread-Index: AQHV4NMV+rH5F4FwIUiLta/s/6EHWw==
Date: Tue, 11 Feb 2020 12:08:18 +0000
Message-ID: <e9dfa1311de74824983e769ea197c2e6@tencent.com>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ms-exchange-messagesentrepresentingtype: 1
x-originating-ip: [9.19.161.93]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-Received-From: 220.249.245.26
X-Mailman-Approved-At: Tue, 11 Feb 2020 08:34:09 -0500
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: "pbonzini >" <pbonzini@redhat.com>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>

From c882b155466313fcd85ac330a45a573e608b0d74 Mon Sep 17 00:00:00 2001
From: bauerchen <bauerchen@tencent.com>
Date: Tue, 11 Feb 2020 17:10:35 +0800
Subject: [PATCH] Optimize: large guest start-up in mem-prealloc
MIME-Version: 1.0
Content-Type: text/plain; charset=3Dutf-8
Content-Transfer-Encoding: 8bit

[desc]:
=C2=A0 =C2=A0 Large memory VM starts slowly when using -mem-prealloc, and
=C2=A0 =C2=A0 there are some areas to optimize in current method;

=C2=A0 =C2=A0 1=E3=80=81mmap will be used to alloc threads stack during cre=
ate page
=C2=A0 =C2=A0 clearing threads, and it will attempt mm->mmap_sem for write
=C2=A0 =C2=A0 lock, but clearing threads have hold read lock, this competit=
ion
=C2=A0 =C2=A0 will cause threads createion very slow;

=C2=A0 =C2=A0 2=E3=80=81methods of calcuating pages for per threads is not =
well;if we use
=C2=A0 =C2=A0 64 threads to split 160 hugepage,63 threads clear 2page,1 thr=
ead
=C2=A0 =C2=A0 clear 34 page,so the entire speed is very slow;

=C2=A0 =C2=A0 to solve the first problem,we add a mutex in thread function,=
and
=C2=A0 =C2=A0 start all threads when all threads finished createion;
=C2=A0 =C2=A0 and the second problem, we spread remainder to other threads,=
in
=C2=A0 =C2=A0 situation that 160 hugepage and 64 threads, there are 32 thre=
ads
=C2=A0 =C2=A0 clear 3 pages,and 32 threads clear 2 pages;
[test]:
=C2=A0 =C2=A0 320G 84c VM start time can be reduced to 10s
=C2=A0 =C2=A0 680G 84c VM start time can be reduced to 18s

Signed-off-by: bauerchen <bauerchen@tencent.com>
Reviewed-by:Pan Rui <ruippan@tencent.com>
Reviewed-by:Ivan Ren <ivanren@tencent.com>
---
=C2=A0util/oslib-posix.c | 44 ++++++++++++++++++++++++++++++++++++--------
=C2=A01 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 5a291cc..e97369b 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -76,6 +76,10 @@ static MemsetThread *memset_thread;
=C2=A0static int memset_num_threads;
=C2=A0static bool memset_thread_failed;
=C2=A0
+static QemuMutex page_mutex;
+static QemuCond page_cond;
+static volatile bool thread_create_flag;
+
=C2=A0int qemu_get_thread_id(void)
=C2=A0{
=C2=A0#if defined(__linux__)
@@ -403,6 +407,14 @@ static void *do_touch_pages(void *arg)
=C2=A0 =C2=A0 =C2=A0MemsetThread *memset_args =3D (MemsetThread *)arg;
=C2=A0 =C2=A0 =C2=A0sigset_t set, oldset;
=C2=A0
+ =C2=A0 =C2=A0/*wait for all threads create finished */
+ =C2=A0 =C2=A0qemu_mutex_lock(&page_mutex);
+ =C2=A0 =C2=A0while(!thread_create_flag){
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0qemu_cond_wait(&page_cond, &page_mutex);
+ =C2=A0 =C2=A0}
+ =C2=A0 =C2=A0qemu_mutex_unlock(&page_mutex);
+
+
=C2=A0 =C2=A0 =C2=A0/* unblock SIGBUS */
=C2=A0 =C2=A0 =C2=A0sigemptyset(&set);
=C2=A0 =C2=A0 =C2=A0sigaddset(&set, SIGBUS);
@@ -448,30 +460,46 @@ static inline int get_memset_num_threads(int smp_cpus)
=C2=A0 =C2=A0 =C2=A0return ret;
=C2=A0}
=C2=A0
+static void calc_page_per_thread(size_t numpages, int memset_threads, size=
_t *pages_per_thread){
+ =C2=A0 =C2=A0int avg =3D numpages / memset_threads + 1;
+ =C2=A0 =C2=A0int i =3D 0;
+ =C2=A0 =C2=A0int last =3D avg * memset_threads - numpages;
+ =C2=A0 =C2=A0for (i =3D 0; i < memset_threads; i++)
+ =C2=A0 =C2=A0{
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0if(memset_threads - i <=3D last){
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pages_per_thread[i] =3D avg - 1;
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0}else
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pages_per_thread[i] =3D avg;
+ =C2=A0 =C2=A0}
+}
+
=C2=A0static bool touch_all_pages(char *area, size_t hpagesize, size_t nump=
ages,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0int smp_cpus)
=C2=A0{
- =C2=A0 =C2=A0size_t numpages_per_thread;
- =C2=A0 =C2=A0size_t size_per_thread;
+ =C2=A0 =C2=A0size_t *numpages_per_thread;
=C2=A0 =C2=A0 =C2=A0char *addr =3D area;
=C2=A0 =C2=A0 =C2=A0int i =3D 0;
=C2=A0
=C2=A0 =C2=A0 =C2=A0memset_thread_failed =3D false;
+ =C2=A0 =C2=A0thread_create_flag =3D false;
=C2=A0 =C2=A0 =C2=A0memset_num_threads =3D get_memset_num_threads(smp_cpus);
+ =C2=A0 =C2=A0numpages_per_thread =3D g_new0(size_t, memset_num_threads);
=C2=A0 =C2=A0 =C2=A0memset_thread =3D g_new0(MemsetThread, memset_num_threa=
ds);
- =C2=A0 =C2=A0numpages_per_thread =3D (numpages / memset_num_threads);
- =C2=A0 =C2=A0size_per_thread =3D (hpagesize * numpages_per_thread);
+ =C2=A0 =C2=A0calc_page_per_thread(numpages, memset_num_threads, numpages_=
per_thread);
+
=C2=A0 =C2=A0 =C2=A0for (i =3D 0; i < memset_num_threads; i++) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memset_thread[i].addr =3D addr;
- =C2=A0 =C2=A0 =C2=A0 =C2=A0memset_thread[i].numpages =3D (i =3D=3D (memse=
t_num_threads - 1)) ?
- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0numpages : numpages_per=
_thread;
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0memset_thread[i].numpages =3D numpages_per_thr=
ead[i];
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memset_thread[i].hpagesize =3D hpagesize;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0qemu_thread_create(&memset_thread[i].pgth=
read, "touch_pages",
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 do_touch_pages, &memset_thread[i],
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 QEMU_THREAD_JOINABLE);
- =C2=A0 =C2=A0 =C2=A0 =C2=A0addr +=3D size_per_thread;
- =C2=A0 =C2=A0 =C2=A0 =C2=A0numpages -=3D numpages_per_thread;
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0addr +=3D numpages_per_thread[i] * hpagesize;
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0numpages -=3D numpages_per_thread[i];
=C2=A0 =C2=A0 =C2=A0}
+ =C2=A0 =C2=A0thread_create_flag =3D true;
+ =C2=A0 =C2=A0qemu_cond_broadcast(&page_cond);
+
=C2=A0 =C2=A0 =C2=A0for (i =3D 0; i < memset_num_threads; i++) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0qemu_thread_join(&memset_thread[i].pgthre=
ad);
=C2=A0 =C2=A0 =C2=A0}
--=C2=A0
1.8.3.1