[PATCH v2] hv_balloon: Fallback to generic_online_page() for non-HV hot added mem

Jacob Pan posted 1 patch 11 months, 2 weeks ago
drivers/hv/hv_balloon.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
[PATCH v2] hv_balloon: Fallback to generic_online_page() for non-HV hot added mem
Posted by Jacob Pan 11 months, 2 weeks ago
The Hyper-V balloon driver installs a custom callback for handling page
onlining operations performed by the memory hotplug subsystem. This
custom callback is global, and overrides the default callback
(generic_online_page) that Linux otherwise uses. The custom callback
properly handles memory that is hot-added by the balloon driver as part
of a Hyper-V hot-add region.

But memory can also be hot-added directly by a device driver for a vPCI
device, particularly GPUs. In such a case, the custom callback installed by
the balloon driver runs, but won't find the page in its hot-add region list
and doesn't online it, which could cause driver initialization failures.

Fix this by having the balloon custom callback run generic_online_page()
when the page isn't part of a Hyper-V hot-add region, thereby doing the
default Linux behavior. This allows device driver hot-adds to work
properly. Similar cases are handled the same way in the virtio-mem driver.

Suggested-by: Vikram Sethi <vsethi@nvidia.com>
Tested-by: Michael Frohlich <mfrohlich@microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v2: Updated commit message suggested by Michael Kelley.
---
 drivers/hv/hv_balloon.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index a99112e6f0b8..c999daf34108 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -766,16 +766,18 @@ static void hv_online_page(struct page *pg, unsigned int order)
 	struct hv_hotadd_state *has;
 	unsigned long pfn = page_to_pfn(pg);
 
-	guard(spinlock_irqsave)(&dm_device.ha_lock);
-	list_for_each_entry(has, &dm_device.ha_region_list, list) {
-		/* The page belongs to a different HAS. */
-		if (pfn < has->start_pfn ||
-		    (pfn + (1UL << order) > has->end_pfn))
-			continue;
+	scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
+		list_for_each_entry(has, &dm_device.ha_region_list, list) {
+			/* The page belongs to a different HAS. */
+			if (pfn < has->start_pfn ||
+				(pfn + (1UL << order) > has->end_pfn))
+				continue;
 
-		hv_bring_pgs_online(has, pfn, 1UL << order);
-		break;
+			hv_bring_pgs_online(has, pfn, 1UL << order);
+			return;
+		}
 	}
+	generic_online_page(pg, order);
 }
 
 static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
-- 
2.34.1
Re: [PATCH v2] hv_balloon: Fallback to generic_online_page() for non-HV hot added mem
Posted by Wei Liu 11 months, 2 weeks ago
On Tue, Jan 07, 2025 at 10:09:18AM -0800, Jacob Pan wrote:
> The Hyper-V balloon driver installs a custom callback for handling page
> onlining operations performed by the memory hotplug subsystem. This
> custom callback is global, and overrides the default callback
> (generic_online_page) that Linux otherwise uses. The custom callback
> properly handles memory that is hot-added by the balloon driver as part
> of a Hyper-V hot-add region.
> 
> But memory can also be hot-added directly by a device driver for a vPCI
> device, particularly GPUs. In such a case, the custom callback installed by
> the balloon driver runs, but won't find the page in its hot-add region list
> and doesn't online it, which could cause driver initialization failures.
> 
> Fix this by having the balloon custom callback run generic_online_page()
> when the page isn't part of a Hyper-V hot-add region, thereby doing the
> default Linux behavior. This allows device driver hot-adds to work
> properly. Similar cases are handled the same way in the virtio-mem driver.
> 
> Suggested-by: Vikram Sethi <vsethi@nvidia.com>
> Tested-by: Michael Frohlich <mfrohlich@microsoft.com>
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>

Applied to hyperv-next. Thanks!