From nobody Sun May 5 07:14:26 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=suse.com ARC-Seal: i=1; a=rsa-sha256; t=1635844825; cv=none; d=zohomail.com; s=zohoarc; b=Ay2+t0KE+58YJuJ9xD6Nb/eoNqWvYJR69wWlG5HBLRvl5JLlfIfE4T8WHvbi0+EO3K0rVZTIWEFDwitLjvCIxvb1uJqOTVviwuOzcWjlnWg9FPUblnjuZP8/Rqa634VrMnySWDS7bu4EduHTb6W+rRS0oZgp6x4cd0dytosZo24= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1635844825; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:To; bh=6Lc7Goygk5VwSWDn5qxlt2rJEPVKdXkQNXP9nghaPrU=; b=CK+9vw3HmrtV3GWP3J2vm+zhyHcmNl+p1hGTT/pkmAyZyeTasS+hdu9CYbGuECjK8JLi5dS8ljAcZQHVWRbyIwVupyu5PIFDXQr2HnNMXBlarMtG0YFY7kjgzrQ3S+pUFoUFB9djVkz/RsMNZBFtWitZ0DDleyTNR5a2styS8ls= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1635844825090249.15592811852923; Tue, 2 Nov 2021 02:20:25 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.219826.380835 (Exim 4.92) (envelope-from ) id 1mhpxa-0005N6-Fa; Tue, 02 Nov 2021 09:20:02 +0000 Received: by outflank-mailman (output) from mailman id 219826.380835; Tue, 02 Nov 2021 09:20:02 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mhpxa-0005Ma-Bx; Tue, 02 Nov 2021 09:20:02 +0000 Received: by outflank-mailman (input) for mailman id 219826; Tue, 02 Nov 2021 09:20:01 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mhpxZ-0005Ah-97 for xen-devel@lists.xenproject.org; Tue, 02 Nov 2021 09:20:01 +0000 Received: from smtp-out2.suse.de (unknown [195.135.220.29]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 5b64afd7-080d-4895-b7b8-91adfe565071; Tue, 02 Nov 2021 09:20:00 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id F41AD1FD75; Tue, 2 Nov 2021 09:19:58 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B316213BAA; Tue, 2 Nov 2021 09:19:58 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id IihJKr4CgWG2fgAAMHmgww (envelope-from ); Tue, 02 Nov 2021 09:19:58 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 5b64afd7-080d-4895-b7b8-91adfe565071 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1635844799; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=6Lc7Goygk5VwSWDn5qxlt2rJEPVKdXkQNXP9nghaPrU=; b=btySieH8OaU+6gORF4X44B5hnvB+XnszYnE8YsP3dCXgxJTajhML2MeWp3Kfg3CRs1iedR jKsl9tN0xl1e6UGW81USzJlUmElu99Ddsw4ZuOq0dV8oUwTOPdHGmRQ3B4Eg7225I9BbJQ iUum4rbjB/BlTmt/T9kCRwMjKiiKJTM= From: Juergen Gross To: xen-devel@lists.xenproject.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Juergen Gross , Jonathan Corbet , Boris Ostrovsky , Stefano Stabellini , stable@vger.kernel.org, =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?= Subject: [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done Date: Tue, 2 Nov 2021 10:19:44 +0100 Message-Id: <20211102091944.17487-1-jgross@suse.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @suse.com) X-ZM-MESSAGEID: 1635844826184100001 When running as PVH or HVM guest with actual memory < max memory the hypervisor is using "populate on demand" in order to allow the guest to balloon down from its maximum memory size. For this to work correctly the guest must not touch more memory pages than its target memory size as otherwise the PoD cache will be exhausted and the guest is crashed as a result of that. In extreme cases ballooning down might not be finished today before the init process is started, which can consume lots of memory. In order to avoid random boot crashes in such cases, add a late init call to wait for ballooning down having finished for PVH/HVM guests. Warn on console if initial ballooning fails, panic() after stalling for more than 3 minutes per default. Add a module parameter for changing this timeout. Cc: Reported-by: Marek Marczykowski-G=C3=B3recki Signed-off-by: Juergen Gross Reviewed-by: Boris Ostrovsky --- V2: - add warning and panic() when stalling (Marek Marczykowski-G=C3=B3recki) - don't wait if credit > 0 V3: - issue warning only after ballooning failed (Marek Marczykowski-G=C3=B3rec= ki) - make panic() timeout configurable via parameter V4: - fix boot parameter (Boris Ostrovsky) - set new state directly in update_schedule() (Boris Ostrovsky) --- .../admin-guide/kernel-parameters.txt | 7 ++ drivers/xen/balloon.c | 86 ++++++++++++++----- 2 files changed, 70 insertions(+), 23 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 43dc35fe5bc0..1396fd2d9031 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6349,6 +6349,13 @@ improve timer resolution at the expense of processing more timer interrupts. =20 + xen.balloon_boot_timeout=3D [XEN] + The time (in seconds) to wait before giving up to boot + in case initial ballooning fails to free enough memory. + Applies only when running as HVM or PVH guest and + started with less memory configured than allowed at + max. Default is 180. + xen.event_eoi_delay=3D [XEN] How long to delay EOI handling in case of event storms (jiffies). Default is 10. diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 3a50f097ed3e..3a661b7697d4 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -58,6 +58,7 @@ #include #include #include +#include =20 #include #include @@ -73,6 +74,12 @@ #include #include =20 +#undef MODULE_PARAM_PREFIX +#define MODULE_PARAM_PREFIX "xen." + +static uint __read_mostly balloon_boot_timeout =3D 180; +module_param(balloon_boot_timeout, uint, 0444); + static int xen_hotplug_unpopulated; =20 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG @@ -125,12 +132,12 @@ static struct ctl_table xen_root[] =3D { * BP_ECANCELED: error, balloon operation canceled. */ =20 -enum bp_state { +static enum bp_state { BP_DONE, BP_WAIT, BP_EAGAIN, BP_ECANCELED -}; +} balloon_state =3D BP_DONE; =20 /* Main waiting point for xen-balloon thread. */ static DECLARE_WAIT_QUEUE_HEAD(balloon_thread_wq); @@ -199,18 +206,15 @@ static struct page *balloon_next_page(struct page *pa= ge) return list_entry(next, struct page, lru); } =20 -static enum bp_state update_schedule(enum bp_state state) +static void update_schedule(void) { - if (state =3D=3D BP_WAIT) - return BP_WAIT; - - if (state =3D=3D BP_ECANCELED) - return BP_ECANCELED; + if (balloon_state =3D=3D BP_WAIT || balloon_state =3D=3D BP_ECANCELED) + return; =20 - if (state =3D=3D BP_DONE) { + if (balloon_state =3D=3D BP_DONE) { balloon_stats.schedule_delay =3D 1; balloon_stats.retry_count =3D 1; - return BP_DONE; + return; } =20 ++balloon_stats.retry_count; @@ -219,7 +223,8 @@ static enum bp_state update_schedule(enum bp_state stat= e) balloon_stats.retry_count > balloon_stats.max_retry_count) { balloon_stats.schedule_delay =3D 1; balloon_stats.retry_count =3D 1; - return BP_ECANCELED; + balloon_state =3D BP_ECANCELED; + return; } =20 balloon_stats.schedule_delay <<=3D 1; @@ -227,7 +232,7 @@ static enum bp_state update_schedule(enum bp_state stat= e) if (balloon_stats.schedule_delay > balloon_stats.max_schedule_delay) balloon_stats.schedule_delay =3D balloon_stats.max_schedule_delay; =20 - return BP_EAGAIN; + balloon_state =3D BP_EAGAIN; } =20 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG @@ -494,9 +499,9 @@ static enum bp_state decrease_reservation(unsigned long= nr_pages, gfp_t gfp) * Stop waiting if either state is BP_DONE and ballooning action is * needed, or if the credit has changed while state is not BP_DONE. */ -static bool balloon_thread_cond(enum bp_state state, long credit) +static bool balloon_thread_cond(long credit) { - if (state =3D=3D BP_DONE) + if (balloon_state =3D=3D BP_DONE) credit =3D 0; =20 return current_credit() !=3D credit || kthread_should_stop(); @@ -510,13 +515,12 @@ static bool balloon_thread_cond(enum bp_state state, = long credit) */ static int balloon_thread(void *unused) { - enum bp_state state =3D BP_DONE; long credit; unsigned long timeout; =20 set_freezable(); for (;;) { - switch (state) { + switch (balloon_state) { case BP_DONE: case BP_ECANCELED: timeout =3D 3600 * HZ; @@ -532,7 +536,7 @@ static int balloon_thread(void *unused) credit =3D current_credit(); =20 wait_event_freezable_timeout(balloon_thread_wq, - balloon_thread_cond(state, credit), timeout); + balloon_thread_cond(credit), timeout); =20 if (kthread_should_stop()) return 0; @@ -543,22 +547,23 @@ static int balloon_thread(void *unused) =20 if (credit > 0) { if (balloon_is_inflated()) - state =3D increase_reservation(credit); + balloon_state =3D increase_reservation(credit); else - state =3D reserve_additional_memory(); + balloon_state =3D reserve_additional_memory(); } =20 if (credit < 0) { long n_pages; =20 n_pages =3D min(-credit, si_mem_available()); - state =3D decrease_reservation(n_pages, GFP_BALLOON); - if (state =3D=3D BP_DONE && n_pages !=3D -credit && + balloon_state =3D decrease_reservation(n_pages, + GFP_BALLOON); + if (balloon_state =3D=3D BP_DONE && n_pages !=3D -credit && n_pages < totalreserve_pages) - state =3D BP_EAGAIN; + balloon_state =3D BP_EAGAIN; } =20 - state =3D update_schedule(state); + update_schedule(); =20 mutex_unlock(&balloon_mutex); =20 @@ -765,3 +770,38 @@ static int __init balloon_init(void) return 0; } subsys_initcall(balloon_init); + +static int __init balloon_wait_finish(void) +{ + long credit, last_credit =3D 0; + unsigned long last_changed =3D 0; + + if (!xen_domain()) + return -ENODEV; + + /* PV guests don't need to wait. */ + if (xen_pv_domain() || !current_credit()) + return 0; + + pr_info("Waiting for initial ballooning down having finished.\n"); + + while ((credit =3D current_credit()) < 0) { + if (credit !=3D last_credit) { + last_changed =3D jiffies; + last_credit =3D credit; + } + if (balloon_state =3D=3D BP_ECANCELED) { + pr_warn_once("Initial ballooning failed, %ld pages need to be freed.\n", + -credit); + if (jiffies - last_changed >=3D HZ * balloon_boot_timeout) + panic("Initial ballooning failed!\n"); + } + + schedule_timeout_interruptible(HZ / 10); + } + + pr_info("Initial ballooning down finished.\n"); + + return 0; +} +late_initcall_sync(balloon_wait_finish); --=20 2.26.2