From nobody Sat Feb 7 10:07:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86C4CEB64DC for ; Tue, 27 Jun 2023 11:23:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231222AbjF0LXT (ORCPT ); Tue, 27 Jun 2023 07:23:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229562AbjF0LXR (ORCPT ); Tue, 27 Jun 2023 07:23:17 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AD9BDD for ; Tue, 27 Jun 2023 04:22:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687864954; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2tWgJbMty0vF+sRGgcNfqljzE54wPeczugMOQkaZlck=; b=UQL7o5DxCmyRQ6Kwx8ymw7P6YeQg4INegkBQPQC3r+mbLYhy3yhApFGEDq60z9zmuyMam2 1SXaMgSn9/fbA0/z7C/zwzRBkzL66mV0mdQRTN9mEOVaW2Q8IYTcuasq/WgI57Rtv5Z7f9 IUMY78UMPD2UWARWz/+SsB5GQeXDuOc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-636-FJnFfwwzPY-Qr_PfD7F3LQ-1; Tue, 27 Jun 2023 07:22:30 -0400 X-MC-Unique: FJnFfwwzPY-Qr_PfD7F3LQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C720F381494D; Tue, 27 Jun 2023 11:22:29 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.192.116]) by smtp.corp.redhat.com (Postfix) with ESMTP id A1185200A3AD; Tue, 27 Jun 2023 11:22:27 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, virtualization@lists.linux-foundation.org, David Hildenbrand , Andrew Morton , "Michael S. Tsirkin" , John Hubbard , Oscar Salvador , Michal Hocko , Jason Wang , Xuan Zhuo Subject: [PATCH v1 1/5] mm/memory_hotplug: check for fatal signals only in offline_pages() Date: Tue, 27 Jun 2023 13:22:16 +0200 Message-Id: <20230627112220.229240-2-david@redhat.com> In-Reply-To: <20230627112220.229240-1-david@redhat.com> References: <20230627112220.229240-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Let's check for fatal signals only. That looks cleaner and still keeps the documented use case for manual user-space triggered memory offlining working. From Documentation/admin-guide/mm/memory-hotplug.rst: % timeout $TIMEOUT offline_block | failure_handling In fact, we even document there: "the offlining context can be terminated by sending a fatal signal". Signed-off-by: David Hildenbrand --- mm/memory_hotplug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 8e0fa209d533..0d2151df4ee1 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1879,7 +1879,7 @@ int __ref offline_pages(unsigned long start_pfn, unsi= gned long nr_pages, do { pfn =3D start_pfn; do { - if (signal_pending(current)) { + if (fatal_signal_pending(current)) { ret =3D -EINTR; reason =3D "signal backoff"; goto failed_removal_isolated; --=20 2.40.1 From nobody Sat Feb 7 10:07:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D3EDEB64D9 for ; Tue, 27 Jun 2023 11:23:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231402AbjF0LXY (ORCPT ); Tue, 27 Jun 2023 07:23:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229986AbjF0LXW (ORCPT ); Tue, 27 Jun 2023 07:23:22 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 680E219BA for ; Tue, 27 Jun 2023 04:22:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687864956; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4aSs18Pv+ZzmA9Rj2C8nVvP+WDMPcyKRh1O4HH29Hbc=; b=UxMLV7NXxKfMBIB8ihBPl41p2B78V9brNMBlt9O88KEvSPZiX4hruIpfF+YjAkTUeztWNP w2hAfmnOWT9EuktczMQEJFH8pNceobzXiuDlJh3pDKDbLY4H8c9rySH2hBAdVEQNcnbu7V VKBvkBZ7wAGq2Ps9DdQY9kuCNcYXenU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-266-y_sQWVhBPMapPH_5wxp9qg-1; Tue, 27 Jun 2023 07:22:32 -0400 X-MC-Unique: y_sQWVhBPMapPH_5wxp9qg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CA5553814947; Tue, 27 Jun 2023 11:22:31 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.192.116]) by smtp.corp.redhat.com (Postfix) with ESMTP id 30867200A3AD; Tue, 27 Jun 2023 11:22:30 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, virtualization@lists.linux-foundation.org, David Hildenbrand , Andrew Morton , "Michael S. Tsirkin" , John Hubbard , Oscar Salvador , Michal Hocko , Jason Wang , Xuan Zhuo Subject: [PATCH v1 2/5] virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY Date: Tue, 27 Jun 2023 13:22:17 +0200 Message-Id: <20230627112220.229240-3-david@redhat.com> In-Reply-To: <20230627112220.229240-1-david@redhat.com> References: <20230627112220.229240-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Let's prepare for offline_and_remove_memory() to return other error codes that effectively translate to -EBUSY, such as -ETIMEDOUT. Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 835f6cc2fb66..cb8bc6f6aa90 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -750,7 +750,15 @@ static int virtio_mem_offline_and_remove_memory(struct= virtio_mem *vm, dev_dbg(&vm->vdev->dev, "offlining and removing memory failed: %d\n", rc); } - return rc; + + switch (rc) { + case 0: + case -ENOMEM: + case -EINVAL: + return rc; + default: + return -EBUSY; + } } =20 /* --=20 2.40.1 From nobody Sat Feb 7 10:07:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4307CEB64DC for ; Tue, 27 Jun 2023 11:23:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231577AbjF0LXf (ORCPT ); Tue, 27 Jun 2023 07:23:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230115AbjF0LXY (ORCPT ); Tue, 27 Jun 2023 07:23:24 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2999826B1 for ; Tue, 27 Jun 2023 04:22:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687864959; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fr+vpotgatzONrB41R1RGYN75/TLbh/yXCu/qwGS3Ro=; b=B0XUmtXqbClOQr4B6glfkmKN0EiE+QLa8COqa0uTMpwlsiq3OVnzxqXLuPURGvprtMniwg un+DPurtDvn95ymNJa0M1rCrVEWvP0b3ElIG8bNHySdPlsC4qmME4k4MIyw1rkfPRNd+KS AQBaG9TTLOVx4jlpPuvr8k1dsuTFPYA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-344-5tImlbEUOjm-C3jcDkwNPA-1; Tue, 27 Jun 2023 07:22:34 -0400 X-MC-Unique: 5tImlbEUOjm-C3jcDkwNPA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A9866800962; Tue, 27 Jun 2023 11:22:33 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.192.116]) by smtp.corp.redhat.com (Postfix) with ESMTP id 10AC1200A3AD; Tue, 27 Jun 2023 11:22:31 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, virtualization@lists.linux-foundation.org, David Hildenbrand , Andrew Morton , "Michael S. Tsirkin" , John Hubbard , Oscar Salvador , Michal Hocko , Jason Wang , Xuan Zhuo Subject: [PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals Date: Tue, 27 Jun 2023 13:22:18 +0200 Message-Id: <20230627112220.229240-4-david@redhat.com> In-Reply-To: <20230627112220.229240-1-david@redhat.com> References: <20230627112220.229240-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" John Hubbard writes [1]: Some device drivers add memory to the system via memory hotplug. When the driver is unloaded, that memory is hot-unplugged. However, memory hot unplug can fail. And these days, it fails a little too easily, with respect to the above case. Specifically, if a signal is pending on the process, hot unplug fails. [...] So in this case, other things (unmovable pages, un-splittable huge pages) can also cause the above problem. However, those are demonstrably less common than simply having a pending signal. I've got bug reports from users who can trivially reproduce this by killing their process with a "kill -9", for example. Especially with ZONE_MOVABLE, offlining is supposed to work in most cases when offlining actually hotplugged (not boot) memory, and only fail in rare corner cases (e.g., some driver holds a reference to a page in ZONE_MOVABLE, turning it unmovable). In these corner cases we really don't want to be stuck forever in offline_and_remove_memory(). But in the general cases, we really want to do our best to make memory offlining succeed -- in a reasonable timeframe. Reliably failing in the described case when there is a fatal signal pending is sub-optimal. The pending signal check is mostly only relevant when user space explicitly triggers offlining of memory using sysfs device attributes ("state" or "online" attribute), but not when coming via offline_and_remove_memory(). So let's use a timer instead and ignore fatal signals, because they are not really expressive for offline_and_remove_memory() users. Let's default to 30 seconds if no timeout was specified, and limit the timeout to 120 seconds. This change is also valuable for virtio-mem in BBM (Big Block Mode) with "bbm_safe_unplug=3Doff", to avoid endless loops when stuck forever in offline_and_remove_memory(). While at it, drop the "extern" from offline_and_remove_memory() to make it fit into a single line. [1] https://lkml.kernel.org/r/20230620011719.155379-1-jhubbard@nvidia.com Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 2 +- include/linux/memory_hotplug.h | 2 +- mm/memory_hotplug.c | 50 ++++++++++++++++++++++++++++++++-- 3 files changed, 50 insertions(+), 4 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index cb8bc6f6aa90..f8792223f1db 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -738,7 +738,7 @@ static int virtio_mem_offline_and_remove_memory(struct = virtio_mem *vm, "offlining and removing memory: 0x%llx - 0x%llx\n", addr, addr + size - 1); =20 - rc =3D offline_and_remove_memory(addr, size); + rc =3D offline_and_remove_memory(addr, size, 0); if (!rc) { atomic64_sub(size, &vm->offline_size); /* diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 9fcbf5706595..d5f9e8b5a4a4 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -307,7 +307,7 @@ extern int offline_pages(unsigned long start_pfn, unsig= ned long nr_pages, struct zone *zone, struct memory_group *group); extern int remove_memory(u64 start, u64 size); extern void __remove_memory(u64 start, u64 size); -extern int offline_and_remove_memory(u64 start, u64 size); +int offline_and_remove_memory(u64 start, u64 size, unsigned int timeout_ms= ); =20 #else static inline void try_offline_node(int nid) {} diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0d2151df4ee1..ca635121644a 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -152,6 +152,22 @@ void put_online_mems(void) =20 bool movable_node_enabled =3D false; =20 +/* + * Protected by the device hotplug lock: offline_and_remove_memory() + * will activate a timer such that offlining cannot be stuck forever. + * + * With an active timer, fatal signals will be ignored, because they can be + * counter-productive when dying user space triggers device unplug/driver + * unloading that ends up offlining+removing device memory. + */ +static bool mhp_offlining_timer_active; +static atomic_t mhp_offlining_timer_expired; + +static void mhp_offline_timer_fn(struct timer_list *unused) +{ + atomic_set(&mhp_offlining_timer_expired, 1); +} + #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE int mhp_default_online_type =3D MMOP_OFFLINE; #else @@ -1879,7 +1895,18 @@ int __ref offline_pages(unsigned long start_pfn, uns= igned long nr_pages, do { pfn =3D start_pfn; do { - if (fatal_signal_pending(current)) { + /* + * If a timer is active, we're coming via + * offline_and_remove_memory() and want to ignore even + * fatal signals. + */ + if (mhp_offlining_timer_active) { + if (atomic_read(&mhp_offlining_timer_expired)) { + ret =3D -ETIMEDOUT; + reason =3D "timeout"; + goto failed_removal_isolated; + } + } else if (fatal_signal_pending(current)) { ret =3D -EINTR; reason =3D "signal backoff"; goto failed_removal_isolated; @@ -2232,11 +2259,17 @@ static int try_reonline_memory_block(struct memory_= block *mem, void *arg) * memory is still in use. Primarily useful for memory devices that logica= lly * unplugged all memory (so it's no longer in use) and want to offline + r= emove * that memory. + * + * offline_and_remove_memory() will not fail on fatal signals. Instead, it= will + * fail once the timeout has been reached and offlining was not completed.= If + * no timeout was specified, it will timeout after 30 seconds. The timeout= is + * limited to 120 seconds. */ -int offline_and_remove_memory(u64 start, u64 size) +int offline_and_remove_memory(u64 start, u64 size, unsigned int timeout_ms) { const unsigned long mb_count =3D size / memory_block_size_bytes(); uint8_t *online_types, *tmp; + struct timer_list timer; int rc; =20 if (!IS_ALIGNED(start, memory_block_size_bytes()) || @@ -2261,9 +2294,22 @@ int offline_and_remove_memory(u64 start, u64 size) =20 lock_device_hotplug(); =20 + if (!timeout_ms) + timeout_ms =3D 30 * MSEC_PER_SEC; + timeout_ms =3D min_t(unsigned int, timeout_ms, 120 * MSEC_PER_SEC); + + timer_setup_on_stack(&timer, mhp_offline_timer_fn, 0); + mod_timer(&timer, jiffies + msecs_to_jiffies(timeout_ms)); + mhp_offlining_timer_active =3D true; + tmp =3D online_types; rc =3D walk_memory_blocks(start, size, &tmp, try_offline_memory_block); =20 + timer_delete_sync(&timer); + atomic_set(&mhp_offlining_timer_expired, 0); + mhp_offlining_timer_active =3D false; + destroy_timer_on_stack(&timer); + /* * In case we succeeded to offline all memory, remove it. * This cannot fail as it cannot get onlined in the meantime. --=20 2.40.1 From nobody Sat Feb 7 10:07:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58FA6EB64D9 for ; Tue, 27 Jun 2023 11:23:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229986AbjF0LXa (ORCPT ); Tue, 27 Jun 2023 07:23:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229971AbjF0LXY (ORCPT ); Tue, 27 Jun 2023 07:23:24 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69A2D10C for ; Tue, 27 Jun 2023 04:22:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687864961; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T3lZgYf4GUx7yLsElB4csYTB1LQXvieCj0NGb4r43Kk=; b=GiXBeepRTYsfWFbuS5trU21wnCCJXsIQ9BnCXxzxYCNZ9tDJ+a+rA62gelwjD7l9qKC8hq ypmK6ZFU8LJho+9VJ/FS8B/7WdIqM29Gu3n8gfQdjVlgygW1TsYbeKuubOb6mdBf27JktH s8lJI9XWWi9YGN2c+OiW03+7npj7uwE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-354-SGCc500YMA6d6vSpb4530Q-1; Tue, 27 Jun 2023 07:22:36 -0400 X-MC-Unique: SGCc500YMA6d6vSpb4530Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A4EBE1C07588; Tue, 27 Jun 2023 11:22:35 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.192.116]) by smtp.corp.redhat.com (Postfix) with ESMTP id E5588200A3AD; Tue, 27 Jun 2023 11:22:33 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, virtualization@lists.linux-foundation.org, David Hildenbrand , Andrew Morton , "Michael S. Tsirkin" , John Hubbard , Oscar Salvador , Michal Hocko , Jason Wang , Xuan Zhuo Subject: [PATCH v1 4/5] virtio-mem: set the timeout for offline_and_remove_memory() to 10 seconds Date: Tue, 27 Jun 2023 13:22:19 +0200 Message-Id: <20230627112220.229240-5-david@redhat.com> In-Reply-To: <20230627112220.229240-1-david@redhat.com> References: <20230627112220.229240-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Currently we use the default (30 seconds), let's reduce it to 10 seconds. In BBM, we barely deal with blocks larger than 1/2 GiB, and after 10 seconds it's most probably best to give up on that memory block and try another one (or retry this one later). In the common fake-offline case where we effectively fake-offline memory using alloc_contig_range() first (SBM or BBM with bbm_safe_unplug=3Don), we expect offline_and_remove_memory() to be blazingly fast and never take anywhere close to 10seconds -- so this should only affect BBM with bbm_safe_unplug=3Doff. While at it, update the parameter description and the relationship to unmovable pages. Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index f8792223f1db..7468b4a907e3 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -41,7 +41,7 @@ MODULE_PARM_DESC(bbm_block_size, static bool bbm_safe_unplug =3D true; module_param(bbm_safe_unplug, bool, 0444); MODULE_PARM_DESC(bbm_safe_unplug, - "Use a safe unplug mechanism in BBM, avoiding long/endless loops"); + "Use a safe/fast unplug mechanism in BBM, failing faster on unmovabl= e pages"); =20 /* * virtio-mem currently supports the following modes of operation: @@ -738,7 +738,7 @@ static int virtio_mem_offline_and_remove_memory(struct = virtio_mem *vm, "offlining and removing memory: 0x%llx - 0x%llx\n", addr, addr + size - 1); =20 - rc =3D offline_and_remove_memory(addr, size, 0); + rc =3D offline_and_remove_memory(addr, size, 10 * MSEC_PER_SEC); if (!rc) { atomic64_sub(size, &vm->offline_size); /* --=20 2.40.1 From nobody Sat Feb 7 10:07:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9EB9EB64D9 for ; Tue, 27 Jun 2023 11:23:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231307AbjF0LXj (ORCPT ); Tue, 27 Jun 2023 07:23:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231305AbjF0LXZ (ORCPT ); Tue, 27 Jun 2023 07:23:25 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FA2F26AE for ; Tue, 27 Jun 2023 04:22:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687864962; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8+JMX7PmWMV8slSmifeWswXiVAWwIFJBeTLnY9EmnBs=; b=bGSWd6LFKSE7dSoY8M4wGHs+aNMASw0bNTE9sYGGFOcBXiYzrwQVJHrLSBLlbatMlM+mxc kA1JOpoFBd2flbG6+l2oeQ/UanIh5RgtmWObqoeoLPbAF/+2Hir2GLxWlvpFjtVsqkcsBo ic6AgMp1JJQ34wWae6BvJg+ebjGIGcQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-228-_3hHQO3VOiqe86__7AxxGg-1; Tue, 27 Jun 2023 07:22:38 -0400 X-MC-Unique: _3hHQO3VOiqe86__7AxxGg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CE10B858290; Tue, 27 Jun 2023 11:22:37 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.192.116]) by smtp.corp.redhat.com (Postfix) with ESMTP id E37C0200A3AD; Tue, 27 Jun 2023 11:22:35 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, virtualization@lists.linux-foundation.org, David Hildenbrand , Andrew Morton , "Michael S. Tsirkin" , John Hubbard , Oscar Salvador , Michal Hocko , Jason Wang , Xuan Zhuo Subject: [PATCH v1 5/5] virtio-mem: check if the config changed before (fake) offlining memory Date: Tue, 27 Jun 2023 13:22:20 +0200 Message-Id: <20230627112220.229240-6-david@redhat.com> In-Reply-To: <20230627112220.229240-1-david@redhat.com> References: <20230627112220.229240-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If we repeatedly fail to (fake) offline memory, we won't be sending any unplug requests to the device. However, we only check if the config changed when sending such (un)plug requests. So we could end up trying for a long time to offline memory, even though the config changed already and we're not supposed to unplug memory anymore. Let's optimize for that case, identified while testing the offline_and_remove() memory timeout and simulating it repeatedly running into the timeout. Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 7468b4a907e3..247fb3e0ce61 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -1922,6 +1922,10 @@ static int virtio_mem_sbm_unplug_sb_online(struct vi= rtio_mem *vm, unsigned long start_pfn; int rc; =20 + /* Stop fake offlining attempts if the config changed. */ + if (atomic_read(&vm->config_changed)) + return -EAGAIN; + start_pfn =3D PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) + sb_id * vm->sbm.sb_size); =20 @@ -2233,6 +2237,10 @@ static int virtio_mem_bbm_unplug_request(struct virt= io_mem *vm, uint64_t diff) virtio_mem_bbm_for_each_bb_rev(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED) { cond_resched(); =20 + /* Stop (fake) offlining attempts if the config changed. */ + if (atomic_read(&vm->config_changed)) + return -EAGAIN; + /* * As we're holding no locks, these checks are racy, * but we don't care. --=20 2.40.1