From nobody Sun Feb 8 19:55:44 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D607332904 for ; Mon, 26 Jan 2026 11:20:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769426414; cv=none; b=IP/IEzkP/v/cCOo/bzsBcNdQ096ESiI7V3LCLKf1v8GebxsZ7xpYr3eaU7cdNrJBHgJkR/YLCVqJZUs0dgRqjc8BNhwl8M54cnfTe5TfmaocVdoWqwLRNrYZ0BfQcnV+qPN3L/ScDhh65lWlxgB/2Tw2ppPBIMEnz5rCTIEJSqs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769426414; c=relaxed/simple; bh=NMONEHHI44GshRRex6IyUSXkpR8+GSu43tBqdKsMN5s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Fg/s3r1EgDRR0aGIeDrx1Exdq/wYQ2t5r/jwkfXs5FLqZMskWbNmIluoHSPy+W8G+XBRk0jUcEI8YSJadkfsV5scDfGPxXJVa2O8QYxNPpvD4k9U7Eq7yrm17QLu/g2VcraNnzfVXQHrdJxqgfPQZipOQ2kwApoJd+VX4nztzgk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=B2v28cmy; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=qdV0g8Md; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="B2v28cmy"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="qdV0g8Md" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1769426408; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5SnEHCRs5XWO2ns//bS/06itk+2DVbBvg0YC2GfTFEU=; b=B2v28cmyd1S6AKwhPzrQOFxbjfUUx9p0w/4BxfpGrJ3sWybVtEwhRQDC2CAkapCav0Co8a p8GzJ4VlZff2JPHriRe3qNS9/sBU3KXM95S4UqR7POFtoocwLTSCQkuXrCiNfP5j9IR6/I JHfmctDEe54P8miQoCGx9XvO/DLGiYY= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-600-B5jX31f0M0ikV87ZRcgPoQ-1; Mon, 26 Jan 2026 06:20:02 -0500 X-MC-Unique: B5jX31f0M0ikV87ZRcgPoQ-1 X-Mimecast-MFC-AGG-ID: B5jX31f0M0ikV87ZRcgPoQ_1769426401 Received: by mail-lf1-f70.google.com with SMTP id 2adb3069b0e04-59b78ae5ab2so2810884e87.1 for ; Mon, 26 Jan 2026 03:20:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1769426400; x=1770031200; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5SnEHCRs5XWO2ns//bS/06itk+2DVbBvg0YC2GfTFEU=; b=qdV0g8MdZYqgT3Z9ukTxgr8/gVFfRpxaNHErUKfAmDNr3ZI8CH4VHNJSwYCOaUB/YV WYU8tLGD725KqiNo958GB/fepi2cRfNj96NSQhnGgROdYNHQaZ+I8kXojqCmxl/TJhNi xQCaVQh7EkxscHhj/HQeVUMLzm5USqak8nShNnt0cR4kdFNNbCyW0Q3QFbcNyDJcR3ua bLsjl++Qh+2PIo6P0iAe3RvVPt0nq1AGuDSueSaoduz6Bdao8sRczO0+qR3y0xsFg7j8 Gogi9a8QeRskwN2W2s2DSVpEz+le4vMTTUAQf6frIUkSFGDv7oXBC1HpzKFiFdOpD/7Y 9blg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769426400; x=1770031200; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=5SnEHCRs5XWO2ns//bS/06itk+2DVbBvg0YC2GfTFEU=; b=CmBEIkRG9jPErWZUmdUb45XHgTrirHBJtUKkffBl1aceQkniPvOO3mnyhKdggZ3b43 rz95OsTJx22QpCbyvIfsrzH9v/+insOlHUNYMIERYrt7BNSTIjnff8uMges0JjU0dY7M sQP3PIooUgUze8tx7hEzFcKDRCMhHHk5L0GxNbOTeCb+KwEaM52atCcoyQ0iqp+gk74n ClbB7nGQRW4POGEVYojDcRHiIpp6Wf+GsED6wp+MmTZAnIWPGra0lOiVyrv3YdqBP4vc oyG/L+c3yNqfwB3k+tN0yY6pmHJCy0bYMSJwTZ6k/2l4+W5Fee3eVH+5E8ctrBoxUJ79 AnHw== X-Gm-Message-State: AOJu0YxMNH9uqgoODcRQDYbEe5b5XVOwvYEd1ZOtWrceFO4GvBLEQP8m e+WVKZh9vlRXE5oyzRkXZ9MAD+yQX+qhql+ahbT23IJRbiwr4/8wmiyh4tK0WlTQ+L4HNCd+4R6 W2Gn6p6lVodgZi9+Cwnr3tnpxM76BE1HuJ7p0mPbhWZ1QRHtNGbJZ8UpoDCbwEZAP X-Gm-Gg: AZuq6aJF3NzMMQ544EqyQ7vkX8YjHKBfW7H9Klf5vCUDfOTYlvBG58KLQjZ40ogpxKJ szebV/bcDxzccYCLEfRf8iUaC8ptk8vIyvUMvuB0rku0rWoiRq/Fb79jUGthrMYC4Nt/pS8hNNp Vakd8hNt8AEV31yxcmTOJ/hROuM6mZvLSvVxLRG67xnD1dSO7ItanWotc2DXg1mtdVbLVmANeTd U235re+7eA9MaH1nsp7SMP073EzVkjbrU9fkWwhd7CKnAMedJGVYltOCqh/W96QttQzUK+Ft8n5 Aaoxk3tfQtLW0t7NxooixSHMMuefneLeFwyYg5/1120a+iwmlO/QMLJUIGtGkMgHdi3LEwjio8r NZHKSig3co1W2WonOxa/U7T19 X-Received: by 2002:a05:6512:104c:b0:59d:e93e:a8ac with SMTP id 2adb3069b0e04-59df361cb0fmr1508232e87.19.1769426400256; Mon, 26 Jan 2026 03:20:00 -0800 (PST) X-Received: by 2002:a05:6512:104c:b0:59d:e93e:a8ac with SMTP id 2adb3069b0e04-59df361cb0fmr1508219e87.19.1769426399669; Mon, 26 Jan 2026 03:19:59 -0800 (PST) Received: from fedora (85-23-51-1.bb.dnainternet.fi. [85.23.51.1]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-59de48e5f76sm2572221e87.23.2026.01.26.03.19.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Jan 2026 03:19:59 -0800 (PST) From: mpenttil@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?UTF-8?q?Mika=20Penttil=C3=A4?= , David Hildenbrand , Jason Gunthorpe , Leon Romanovsky , Alistair Popple , Balbir Singh , Zi Yan , Matthew Brost Subject: [PATCH v3 1/3] mm: unified hmm fault and migrate device pagewalk paths Date: Mon, 26 Jan 2026 13:19:37 +0200 Message-ID: <20260126111939.1332983-2-mpenttil@redhat.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20260126111939.1332983-1-mpenttil@redhat.com> References: <20260126111939.1332983-1-mpenttil@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Mika Penttil=C3=A4 Currently, the way device page faulting and migration works is not optimal, if you want to do both fault handling and migration at once. Being able to migrate not present pages (or pages mapped with incorrect permissions, eg. COW) to the GPU requires doing either of the following sequences: 1. hmm_range_fault() - fault in non-present pages with correct permissions,= etc. 2. migrate_vma_*() - migrate the pages Or: 1. migrate_vma_*() - migrate present pages 2. If non-present pages detected by migrate_vma_*(): a) call hmm_range_fault() to fault pages in b) call migrate_vma_*() again to migrate now present pages The problem with the first sequence is that you always have to do two page walks even when most of the time the pages are present or zero page mappings so the common case takes a performance hit. The second sequence is better for the common case, but far worse if pages aren't present because now you have to walk the page tables three times (once to find the page is not present, once so hmm_range_fault() can find a non-present page to fault in and once again to setup the migration). It is also tricky to code correctly. We should be able to walk the page table once, faulting pages in as required and replacing them with migration entries if requested. Add a new flag to HMM APIs, HMM_PFN_REQ_MIGRATE, which tells to prepare for migration also during fault handling. Also, for the migrate_vma_setup() call paths, a flags, MIGRATE_VMA_FAULT, is added to tell to add fault handling to migrate. Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Alistair Popple Cc: Balbir Singh Cc: Zi Yan Cc: Matthew Brost Suggested-by: Alistair Popple Signed-off-by: Mika Penttil=C3=A4 --- include/linux/hmm.h | 19 +- include/linux/migrate.h | 27 +- mm/Kconfig | 1 + mm/hmm.c | 801 +++++++++++++++++++++++++++++++++++++--- mm/migrate_device.c | 86 ++++- 5 files changed, 869 insertions(+), 65 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index db75ffc949a7..e2f53e155af2 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -12,7 +12,7 @@ #include =20 struct mmu_interval_notifier; - +struct migrate_vma; /* * On output: * 0 - The page is faultable and a future call with=20 @@ -27,6 +27,7 @@ struct mmu_interval_notifier; * HMM_PFN_P2PDMA_BUS - Bus mapped P2P transfer * HMM_PFN_DMA_MAPPED - Flag preserved on input-to-output transformation * to mark that page is already DMA mapped + * HMM_PFN_MIGRATE - Migrate PTE installed * * On input: * 0 - Return the current state of the page, do not fault = it. @@ -34,6 +35,7 @@ struct mmu_interval_notifier; * will fail * HMM_PFN_REQ_WRITE - The output must have HMM_PFN_WRITE or hmm_range_fau= lt() * will fail. Must be combined with HMM_PFN_REQ_FAULT. + * HMM_PFN_REQ_MIGRATE - For default_flags, request to migrate to device */ enum hmm_pfn_flags { /* Output fields and flags */ @@ -48,15 +50,25 @@ enum hmm_pfn_flags { HMM_PFN_P2PDMA =3D 1UL << (BITS_PER_LONG - 5), HMM_PFN_P2PDMA_BUS =3D 1UL << (BITS_PER_LONG - 6), =20 - HMM_PFN_ORDER_SHIFT =3D (BITS_PER_LONG - 11), + /* Migrate request */ + HMM_PFN_MIGRATE =3D 1UL << (BITS_PER_LONG - 7), + HMM_PFN_COMPOUND =3D 1UL << (BITS_PER_LONG - 8), + HMM_PFN_ORDER_SHIFT =3D (BITS_PER_LONG - 13), =20 /* Input flags */ HMM_PFN_REQ_FAULT =3D HMM_PFN_VALID, HMM_PFN_REQ_WRITE =3D HMM_PFN_WRITE, + HMM_PFN_REQ_MIGRATE =3D HMM_PFN_MIGRATE, =20 HMM_PFN_FLAGS =3D ~((1UL << HMM_PFN_ORDER_SHIFT) - 1), }; =20 +enum { + /* These flags are carried from input-to-output */ + HMM_PFN_INOUT_FLAGS =3D HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | + HMM_PFN_P2PDMA_BUS, +}; + /* * hmm_pfn_to_page() - return struct page pointed to by a device entry * @@ -107,6 +119,7 @@ static inline unsigned int hmm_pfn_to_map_order(unsigne= d long hmm_pfn) * @default_flags: default flags for the range (write, read, ... see hmm d= oc) * @pfn_flags_mask: allows to mask pfn flags so that only default_flags ma= tter * @dev_private_owner: owner of device private pages + * @migrate: structure for migrating the associated vma */ struct hmm_range { struct mmu_interval_notifier *notifier; @@ -117,12 +130,14 @@ struct hmm_range { unsigned long default_flags; unsigned long pfn_flags_mask; void *dev_private_owner; + struct migrate_vma *migrate; }; =20 /* * Please see Documentation/mm/hmm.rst for how to use the range API. */ int hmm_range_fault(struct hmm_range *range); +int hmm_range_migrate_prepare(struct hmm_range *range, struct migrate_vma = **pargs); =20 /* * HMM_RANGE_DEFAULT_TIMEOUT - default timeout (ms) when waiting for a ran= ge diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 26ca00c325d9..104eda2dd881 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -3,6 +3,7 @@ #define _LINUX_MIGRATE_H =20 #include +#include #include #include #include @@ -97,6 +98,16 @@ static inline int set_movable_ops(const struct movable_o= perations *ops, enum pag return -ENOSYS; } =20 +enum migrate_vma_info { + MIGRATE_VMA_SELECT_NONE =3D 0, + MIGRATE_VMA_SELECT_COMPOUND =3D MIGRATE_VMA_SELECT_NONE, +}; + +static inline enum migrate_vma_info hmm_select_migrate(struct hmm_range *r= ange) +{ + return MIGRATE_VMA_SELECT_NONE; +} + #endif /* CONFIG_MIGRATION */ =20 #ifdef CONFIG_NUMA_BALANCING @@ -140,11 +151,12 @@ static inline unsigned long migrate_pfn(unsigned long= pfn) return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID; } =20 -enum migrate_vma_direction { +enum migrate_vma_info { MIGRATE_VMA_SELECT_SYSTEM =3D 1 << 0, MIGRATE_VMA_SELECT_DEVICE_PRIVATE =3D 1 << 1, MIGRATE_VMA_SELECT_DEVICE_COHERENT =3D 1 << 2, MIGRATE_VMA_SELECT_COMPOUND =3D 1 << 3, + MIGRATE_VMA_FAULT =3D 1 << 4, }; =20 struct migrate_vma { @@ -182,6 +194,17 @@ struct migrate_vma { struct page *fault_page; }; =20 +static inline enum migrate_vma_info hmm_select_migrate(struct hmm_range *r= ange) +{ + enum migrate_vma_info minfo; + + minfo =3D range->migrate ? range->migrate->flags : 0; + minfo |=3D (range->default_flags & HMM_PFN_REQ_MIGRATE) ? + MIGRATE_VMA_SELECT_SYSTEM : 0; + + return minfo; +} + int migrate_vma_setup(struct migrate_vma *args); void migrate_vma_pages(struct migrate_vma *migrate); void migrate_vma_finalize(struct migrate_vma *migrate); @@ -192,7 +215,7 @@ void migrate_device_pages(unsigned long *src_pfns, unsi= gned long *dst_pfns, unsigned long npages); void migrate_device_finalize(unsigned long *src_pfns, unsigned long *dst_pfns, unsigned long npages); - +void migrate_hmm_range_setup(struct hmm_range *range); #endif /* CONFIG_MIGRATION */ =20 #endif /* _LINUX_MIGRATE_H */ diff --git a/mm/Kconfig b/mm/Kconfig index a992f2203eb9..457539290241 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -661,6 +661,7 @@ config MIGRATION =20 config DEVICE_MIGRATION def_bool MIGRATION && ZONE_DEVICE + select HMM_MIRROR =20 config ARCH_ENABLE_HUGEPAGE_MIGRATION bool diff --git a/mm/hmm.c b/mm/hmm.c index 4ec74c18bef6..ad23cfd8309c 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -27,35 +28,70 @@ #include #include #include +#include =20 #include "internal.h" =20 struct hmm_vma_walk { - struct hmm_range *range; - unsigned long last; + struct mmu_notifier_range mmu_range; + struct vm_area_struct *vma; + struct hmm_range *range; + unsigned long start; + unsigned long end; + unsigned long last; + bool ptelocked; + bool pmdlocked; + spinlock_t *ptl; }; =20 +#define HMM_ASSERT_PTE_LOCKED(hmm_vma_walk, locked) \ + WARN_ON_ONCE(hmm_vma_walk->ptelocked !=3D locked) + +#define HMM_ASSERT_PMD_LOCKED(hmm_vma_walk, locked) \ + WARN_ON_ONCE(hmm_vma_walk->pmdlocked !=3D locked) + +#define HMM_ASSERT_UNLOCKED(hmm_vma_walk) \ + WARN_ON_ONCE(hmm_vma_walk->ptelocked || \ + hmm_vma_walk->pmdlocked) + enum { HMM_NEED_FAULT =3D 1 << 0, HMM_NEED_WRITE_FAULT =3D 1 << 1, HMM_NEED_ALL_BITS =3D HMM_NEED_FAULT | HMM_NEED_WRITE_FAULT, }; =20 -enum { - /* These flags are carried from input-to-output */ - HMM_PFN_INOUT_FLAGS =3D HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | - HMM_PFN_P2PDMA_BUS, -}; - static int hmm_pfns_fill(unsigned long addr, unsigned long end, - struct hmm_range *range, unsigned long cpu_flags) + struct hmm_vma_walk *hmm_vma_walk, unsigned long cpu_flags) { + struct hmm_range *range =3D hmm_vma_walk->range; unsigned long i =3D (addr - range->start) >> PAGE_SHIFT; + enum migrate_vma_info minfo; + bool migrate =3D false; + + minfo =3D hmm_select_migrate(range); + if (cpu_flags !=3D HMM_PFN_ERROR) { + if (minfo && (vma_is_anonymous(hmm_vma_walk->vma))) { + cpu_flags |=3D (HMM_PFN_VALID | HMM_PFN_MIGRATE); + migrate =3D true; + } + } + + if (migrate && thp_migration_supported() && + (minfo & MIGRATE_VMA_SELECT_COMPOUND) && + IS_ALIGNED(addr, HPAGE_PMD_SIZE) && + IS_ALIGNED(end, HPAGE_PMD_SIZE)) { + range->hmm_pfns[i] &=3D HMM_PFN_INOUT_FLAGS; + range->hmm_pfns[i] |=3D cpu_flags | HMM_PFN_COMPOUND; + addr +=3D PAGE_SIZE; + i++; + cpu_flags =3D 0; + } =20 for (; addr < end; addr +=3D PAGE_SIZE, i++) { range->hmm_pfns[i] &=3D HMM_PFN_INOUT_FLAGS; range->hmm_pfns[i] |=3D cpu_flags; } + return 0; } =20 @@ -78,6 +114,7 @@ static int hmm_vma_fault(unsigned long addr, unsigned lo= ng end, unsigned int fault_flags =3D FAULT_FLAG_REMOTE; =20 WARN_ON_ONCE(!required_fault); + HMM_ASSERT_UNLOCKED(hmm_vma_walk); hmm_vma_walk->last =3D addr; =20 if (required_fault & HMM_NEED_WRITE_FAULT) { @@ -171,11 +208,11 @@ static int hmm_vma_walk_hole(unsigned long addr, unsi= gned long end, if (!walk->vma) { if (required_fault) return -EFAULT; - return hmm_pfns_fill(addr, end, range, HMM_PFN_ERROR); + return hmm_pfns_fill(addr, end, hmm_vma_walk, HMM_PFN_ERROR); } if (required_fault) return hmm_vma_fault(addr, end, required_fault, walk); - return hmm_pfns_fill(addr, end, range, 0); + return hmm_pfns_fill(addr, end, hmm_vma_walk, 0); } =20 static inline unsigned long hmm_pfn_flags_order(unsigned long order) @@ -208,8 +245,13 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, un= signed long addr, cpu_flags =3D pmd_to_hmm_pfn_flags(range, pmd); required_fault =3D hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, cpu_flags); - if (required_fault) + if (required_fault) { + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } return hmm_vma_fault(addr, end, required_fault, walk); + } =20 pfn =3D pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); for (i =3D 0; addr < end; addr +=3D PAGE_SIZE, i++, pfn++) { @@ -289,14 +331,23 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, u= nsigned long addr, goto fault; =20 if (softleaf_is_migration(entry)) { - pte_unmap(ptep); - hmm_vma_walk->last =3D addr; - migration_entry_wait(walk->mm, pmdp, addr); - return -EBUSY; + if (!hmm_select_migrate(range)) { + HMM_ASSERT_UNLOCKED(hmm_vma_walk); + hmm_vma_walk->last =3D addr; + migration_entry_wait(walk->mm, pmdp, addr); + return -EBUSY; + } else + goto out; } =20 /* Report error for everything else */ - pte_unmap(ptep); + + if (hmm_vma_walk->ptelocked) { + pte_unmap_unlock(ptep, hmm_vma_walk->ptl); + hmm_vma_walk->ptelocked =3D false; + } else + pte_unmap(ptep); + return -EFAULT; } =20 @@ -313,7 +364,12 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, un= signed long addr, if (!vm_normal_page(walk->vma, addr, pte) && !is_zero_pfn(pte_pfn(pte))) { if (hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0)) { - pte_unmap(ptep); + if (hmm_vma_walk->ptelocked) { + pte_unmap_unlock(ptep, hmm_vma_walk->ptl); + hmm_vma_walk->ptelocked =3D false; + } else + pte_unmap(ptep); + return -EFAULT; } new_pfn_flags =3D HMM_PFN_ERROR; @@ -326,7 +382,11 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, un= signed long addr, return 0; =20 fault: - pte_unmap(ptep); + if (hmm_vma_walk->ptelocked) { + pte_unmap_unlock(ptep, hmm_vma_walk->ptl); + hmm_vma_walk->ptelocked =3D false; + } else + pte_unmap(ptep); /* Fault any virtual address we were asked to fault */ return hmm_vma_fault(addr, end, required_fault, walk); } @@ -370,13 +430,18 @@ static int hmm_vma_handle_absent_pmd(struct mm_walk *= walk, unsigned long start, required_fault =3D hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0); if (required_fault) { - if (softleaf_is_device_private(entry)) + if (softleaf_is_device_private(entry)) { + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } return hmm_vma_fault(addr, end, required_fault, walk); + } else return -EFAULT; } =20 - return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); + return hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); } #else static int hmm_vma_handle_absent_pmd(struct mm_walk *walk, unsigned long s= tart, @@ -384,15 +449,491 @@ static int hmm_vma_handle_absent_pmd(struct mm_walk = *walk, unsigned long start, pmd_t pmd) { struct hmm_vma_walk *hmm_vma_walk =3D walk->private; - struct hmm_range *range =3D hmm_vma_walk->range; unsigned long npages =3D (end - start) >> PAGE_SHIFT; =20 if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) return -EFAULT; - return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); + return hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); } #endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */ =20 +#ifdef CONFIG_DEVICE_MIGRATION +/** + * migrate_vma_split_folio() - Helper function to split a THP folio + * @folio: the folio to split + * @fault_page: struct page associated with the fault if any + * + * Returns 0 on success + */ +static int migrate_vma_split_folio(struct folio *folio, + struct page *fault_page) +{ + int ret; + struct folio *fault_folio =3D fault_page ? page_folio(fault_page) : NULL; + struct folio *new_fault_folio =3D NULL; + + if (folio !=3D fault_folio) { + folio_get(folio); + folio_lock(folio); + } + + ret =3D split_folio(folio); + if (ret) { + if (folio !=3D fault_folio) { + folio_unlock(folio); + folio_put(folio); + } + return ret; + } + + new_fault_folio =3D fault_page ? page_folio(fault_page) : NULL; + + /* + * Ensure the lock is held on the correct + * folio after the split + */ + if (!new_fault_folio) { + folio_unlock(folio); + folio_put(folio); + } else if (folio !=3D new_fault_folio) { + if (new_fault_folio !=3D fault_folio) { + folio_get(new_fault_folio); + folio_lock(new_fault_folio); + } + folio_unlock(folio); + folio_put(folio); + } + + return 0; +} + +static int hmm_vma_handle_migrate_prepare_pmd(const struct mm_walk *walk, + pmd_t *pmdp, + unsigned long start, + unsigned long end, + unsigned long *hmm_pfn) +{ + struct hmm_vma_walk *hmm_vma_walk =3D walk->private; + struct hmm_range *range =3D hmm_vma_walk->range; + struct migrate_vma *migrate =3D range->migrate; + struct folio *fault_folio =3D NULL; + struct folio *folio; + enum migrate_vma_info minfo; + unsigned long i; + int r =3D 0; + + minfo =3D hmm_select_migrate(range); + if (!minfo) + return r; + + WARN_ON_ONCE(!migrate); + HMM_ASSERT_PMD_LOCKED(hmm_vma_walk, true); + + fault_folio =3D migrate->fault_page ? + page_folio(migrate->fault_page) : NULL; + + if (pmd_none(*pmdp)) + return hmm_pfns_fill(start, end, hmm_vma_walk, 0); + + if (!(hmm_pfn[0] & HMM_PFN_VALID)) + goto out; + + if (pmd_trans_huge(*pmdp)) { + if (!(minfo & MIGRATE_VMA_SELECT_SYSTEM)) + goto out; + + folio =3D pmd_folio(*pmdp); + if (is_huge_zero_folio(folio)) + return hmm_pfns_fill(start, end, hmm_vma_walk, 0); + + } else if (!pmd_present(*pmdp)) { + const softleaf_t entry =3D softleaf_from_pmd(*pmdp); + + folio =3D softleaf_to_folio(entry); + + if (!softleaf_is_device_private(entry)) + goto out; + + if (!(minfo & MIGRATE_VMA_SELECT_DEVICE_PRIVATE)) + goto out; + + if (folio->pgmap->owner !=3D migrate->pgmap_owner) + goto out; + + } else { + hmm_vma_walk->last =3D start; + return -EBUSY; + } + + folio_get(folio); + + if (folio !=3D fault_folio && unlikely(!folio_trylock(folio))) { + folio_put(folio); + hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); + return 0; + } + + if (thp_migration_supported() && + (migrate->flags & MIGRATE_VMA_SELECT_COMPOUND) && + (IS_ALIGNED(start, HPAGE_PMD_SIZE) && + IS_ALIGNED(end, HPAGE_PMD_SIZE))) { + + struct page_vma_mapped_walk pvmw =3D { + .ptl =3D hmm_vma_walk->ptl, + .address =3D start, + .pmd =3D pmdp, + .vma =3D walk->vma, + }; + + hmm_pfn[0] |=3D HMM_PFN_MIGRATE | HMM_PFN_COMPOUND; + + r =3D set_pmd_migration_entry(&pvmw, folio_page(folio, 0)); + if (r) { + hmm_pfn[0] &=3D ~(HMM_PFN_MIGRATE | HMM_PFN_COMPOUND); + r =3D -ENOENT; // fallback + goto unlock_out; + } + for (i =3D 1, start +=3D PAGE_SIZE; start < end; start +=3D PAGE_SIZE, i= ++) + hmm_pfn[i] &=3D HMM_PFN_INOUT_FLAGS; + + } else { + r =3D -ENOENT; // fallback + goto unlock_out; + } + + +out: + return r; + +unlock_out: + if (folio !=3D fault_folio) + folio_unlock(folio); + folio_put(folio); + goto out; + +} + +/* + * Install migration entries if migration requested, either from fault + * or migrate paths. + * + */ +static int hmm_vma_handle_migrate_prepare(const struct mm_walk *walk, + pmd_t *pmdp, + pte_t *ptep, + unsigned long addr, + unsigned long *hmm_pfn) +{ + struct hmm_vma_walk *hmm_vma_walk =3D walk->private; + struct hmm_range *range =3D hmm_vma_walk->range; + struct migrate_vma *migrate =3D range->migrate; + struct mm_struct *mm =3D walk->vma->vm_mm; + struct folio *fault_folio =3D NULL; + enum migrate_vma_info minfo; + struct dev_pagemap *pgmap; + bool anon_exclusive; + struct folio *folio; + unsigned long pfn; + struct page *page; + softleaf_t entry; + pte_t pte, swp_pte; + bool writable =3D false; + + // Do we want to migrate at all? + minfo =3D hmm_select_migrate(range); + if (!minfo) + return 0; + + WARN_ON_ONCE(!migrate); + HMM_ASSERT_PTE_LOCKED(hmm_vma_walk, true); + + fault_folio =3D migrate->fault_page ? + page_folio(migrate->fault_page) : NULL; + + pte =3D ptep_get(ptep); + + if (pte_none(pte)) { + // migrate without faulting case + if (vma_is_anonymous(walk->vma)) { + *hmm_pfn &=3D HMM_PFN_INOUT_FLAGS; + *hmm_pfn |=3D HMM_PFN_MIGRATE | HMM_PFN_VALID; + goto out; + } + } + + if (!(hmm_pfn[0] & HMM_PFN_VALID)) + goto out; + + if (!pte_present(pte)) { + /* + * Only care about unaddressable device page special + * page table entry. Other special swap entries are not + * migratable, and we ignore regular swapped page. + */ + entry =3D softleaf_from_pte(pte); + if (!softleaf_is_device_private(entry)) + goto out; + + if (!(minfo & MIGRATE_VMA_SELECT_DEVICE_PRIVATE)) + goto out; + + page =3D softleaf_to_page(entry); + folio =3D page_folio(page); + if (folio->pgmap->owner !=3D migrate->pgmap_owner) + goto out; + + if (folio_test_large(folio)) { + int ret; + + pte_unmap_unlock(ptep, hmm_vma_walk->ptl); + hmm_vma_walk->ptelocked =3D false; + ret =3D migrate_vma_split_folio(folio, + migrate->fault_page); + if (ret) + goto out_error; + return -EAGAIN; + } + + pfn =3D page_to_pfn(page); + if (softleaf_is_device_private_write(entry)) + writable =3D true; + } else { + pfn =3D pte_pfn(pte); + if (is_zero_pfn(pfn) && + (minfo & MIGRATE_VMA_SELECT_SYSTEM)) { + *hmm_pfn =3D HMM_PFN_MIGRATE|HMM_PFN_VALID; + goto out; + } + page =3D vm_normal_page(walk->vma, addr, pte); + if (page && !is_zone_device_page(page) && + !(minfo & MIGRATE_VMA_SELECT_SYSTEM)) { + goto out; + } else if (page && is_device_coherent_page(page)) { + pgmap =3D page_pgmap(page); + + if (!(minfo & + MIGRATE_VMA_SELECT_DEVICE_COHERENT) || + pgmap->owner !=3D migrate->pgmap_owner) + goto out; + } + + folio =3D page ? page_folio(page) : NULL; + if (folio && folio_test_large(folio)) { + int ret; + + pte_unmap_unlock(ptep, hmm_vma_walk->ptl); + hmm_vma_walk->ptelocked =3D false; + + ret =3D migrate_vma_split_folio(folio, + migrate->fault_page); + if (ret) + goto out_error; + return -EAGAIN; + } + + writable =3D pte_write(pte); + } + + if (!page || !page->mapping) + goto out; + + /* + * By getting a reference on the folio we pin it and that blocks + * any kind of migration. Side effect is that it "freezes" the + * pte. + * + * We drop this reference after isolating the folio from the lru + * for non device folio (device folio are not on the lru and thus + * can't be dropped from it). + */ + folio =3D page_folio(page); + folio_get(folio); + + /* + * We rely on folio_trylock() to avoid deadlock between + * concurrent migrations where each is waiting on the others + * folio lock. If we can't immediately lock the folio we fail this + * migration as it is only best effort anyway. + * + * If we can lock the folio it's safe to set up a migration entry + * now. In the common case where the folio is mapped once in a + * single process setting up the migration entry now is an + * optimisation to avoid walking the rmap later with + * try_to_migrate(). + */ + + if (fault_folio =3D=3D folio || folio_trylock(folio)) { + anon_exclusive =3D folio_test_anon(folio) && + PageAnonExclusive(page); + + flush_cache_page(walk->vma, addr, pfn); + + if (anon_exclusive) { + pte =3D ptep_clear_flush(walk->vma, addr, ptep); + + if (folio_try_share_anon_rmap_pte(folio, page)) { + set_pte_at(mm, addr, ptep, pte); + folio_unlock(folio); + folio_put(folio); + goto out; + } + } else { + pte =3D ptep_get_and_clear(mm, addr, ptep); + } + + if (pte_dirty(pte)) + folio_mark_dirty(folio); + + /* Setup special migration page table entry */ + if (writable) + entry =3D make_writable_migration_entry(pfn); + else if (anon_exclusive) + entry =3D make_readable_exclusive_migration_entry(pfn); + else + entry =3D make_readable_migration_entry(pfn); + + if (pte_present(pte)) { + if (pte_young(pte)) + entry =3D make_migration_entry_young(entry); + if (pte_dirty(pte)) + entry =3D make_migration_entry_dirty(entry); + } + + swp_pte =3D swp_entry_to_pte(entry); + if (pte_present(pte)) { + if (pte_soft_dirty(pte)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pte)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } else { + if (pte_swp_soft_dirty(pte)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pte)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } + + set_pte_at(mm, addr, ptep, swp_pte); + folio_remove_rmap_pte(folio, page, walk->vma); + folio_put(folio); + *hmm_pfn |=3D HMM_PFN_MIGRATE; + + if (pte_present(pte)) + flush_tlb_range(walk->vma, addr, addr + PAGE_SIZE); + } else + folio_put(folio); +out: + return 0; +out_error: + return -EFAULT; + +} + +static int hmm_vma_walk_split(pmd_t *pmdp, + unsigned long addr, + struct mm_walk *walk) +{ + struct hmm_vma_walk *hmm_vma_walk =3D walk->private; + struct hmm_range *range =3D hmm_vma_walk->range; + struct migrate_vma *migrate =3D range->migrate; + struct folio *folio, *fault_folio; + spinlock_t *ptl; + int ret =3D 0; + + HMM_ASSERT_UNLOCKED(hmm_vma_walk); + + fault_folio =3D (migrate && migrate->fault_page) ? + page_folio(migrate->fault_page) : NULL; + + ptl =3D pmd_lock(walk->mm, pmdp); + if (unlikely(!pmd_trans_huge(*pmdp))) { + spin_unlock(ptl); + goto out; + } + + folio =3D pmd_folio(*pmdp); + if (is_huge_zero_folio(folio)) { + spin_unlock(ptl); + split_huge_pmd(walk->vma, pmdp, addr); + } else { + folio_get(folio); + spin_unlock(ptl); + + if (folio !=3D fault_folio) { + if (unlikely(!folio_trylock(folio))) { + folio_put(folio); + ret =3D -EBUSY; + goto out; + } + } else + folio_put(folio); + + ret =3D split_folio(folio); + if (fault_folio !=3D folio) { + folio_unlock(folio); + folio_put(folio); + } + + } +out: + return ret; +} +#else +static int hmm_vma_handle_migrate_prepare_pmd(const struct mm_walk *walk, + pmd_t *pmdp, + unsigned long start, + unsigned long end, + unsigned long *hmm_pfn) +{ + return 0; +} + +static int hmm_vma_handle_migrate_prepare(const struct mm_walk *walk, + pmd_t *pmdp, + pte_t *pte, + unsigned long addr, + unsigned long *hmm_pfn) +{ + return 0; +} + +static int hmm_vma_walk_split(pmd_t *pmdp, + unsigned long addr, + struct mm_walk *walk) +{ + return 0; +} +#endif + +static int hmm_vma_capture_migrate_range(unsigned long start, + unsigned long end, + struct mm_walk *walk) +{ + struct hmm_vma_walk *hmm_vma_walk =3D walk->private; + struct hmm_range *range =3D hmm_vma_walk->range; + + if (!hmm_select_migrate(range)) + return 0; + + if (hmm_vma_walk->vma && (hmm_vma_walk->vma !=3D walk->vma)) + return -ERANGE; + + hmm_vma_walk->vma =3D walk->vma; + hmm_vma_walk->start =3D start; + hmm_vma_walk->end =3D end; + + if (end - start > range->end - range->start) + return -ERANGE; + + if (!hmm_vma_walk->mmu_range.owner) { + mmu_notifier_range_init_owner(&hmm_vma_walk->mmu_range, MMU_NOTIFY_MIGRA= TE, 0, + walk->vma->vm_mm, start, end, + range->dev_private_owner); + mmu_notifier_invalidate_range_start(&hmm_vma_walk->mmu_range); + } + + return 0; +} + static int hmm_vma_walk_pmd(pmd_t *pmdp, unsigned long start, unsigned long end, @@ -403,43 +944,125 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, unsigned long *hmm_pfns =3D &range->hmm_pfns[(start - range->start) >> PAGE_SHIFT]; unsigned long npages =3D (end - start) >> PAGE_SHIFT; + struct mm_struct *mm =3D walk->vma->vm_mm; unsigned long addr =3D start; + enum migrate_vma_info minfo; + unsigned long i; pte_t *ptep; pmd_t pmd; + int r =3D 0; + + minfo =3D hmm_select_migrate(range); =20 again: - pmd =3D pmdp_get_lockless(pmdp); - if (pmd_none(pmd)) - return hmm_vma_walk_hole(start, end, -1, walk); + hmm_vma_walk->ptelocked =3D false; + hmm_vma_walk->pmdlocked =3D false; + + if (minfo) { + hmm_vma_walk->ptl =3D pmd_lock(mm, pmdp); + hmm_vma_walk->pmdlocked =3D true; + pmd =3D pmdp_get(pmdp); + } else + pmd =3D pmdp_get_lockless(pmdp); + + if (pmd_none(pmd)) { + r =3D hmm_vma_walk_hole(start, end, -1, walk); + + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } + return r; + } =20 if (thp_migration_supported() && pmd_is_migration_entry(pmd)) { - if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) { + if (!minfo) { + if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) { + hmm_vma_walk->last =3D addr; + pmd_migration_entry_wait(walk->mm, pmdp); + return -EBUSY; + } + } + for (i =3D 0; addr < end; addr +=3D PAGE_SIZE, i++) + hmm_pfns[i] &=3D HMM_PFN_INOUT_FLAGS; + + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } + + return 0; + } + + if (pmd_trans_huge(pmd) || !pmd_present(pmd)) { + + if (!pmd_present(pmd)) { + r =3D hmm_vma_handle_absent_pmd(walk, start, end, hmm_pfns, + pmd); + // If not migrating we are done + if (r || !minfo) { + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } + return r; + } + } else { + + /* + * No need to take pmd_lock here if not migrating, + * even if some other thread is splitting the huge + * pmd we will get that event through mmu_notifier callback. + * + * So just read pmd value and check again it's a transparent + * huge or device mapping one and compute corresponding pfn + * values. + */ + + if (!minfo) { + pmd =3D pmdp_get_lockless(pmdp); + if (!pmd_trans_huge(pmd)) + goto again; + } + + r =3D hmm_vma_handle_pmd(walk, addr, end, hmm_pfns, pmd); + + // If not migrating we are done + if (r || !minfo) { + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } + return r; + } + } + + r =3D hmm_vma_handle_migrate_prepare_pmd(walk, pmdp, start, end, hmm_pfn= s); + + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } + + if (r =3D=3D -ENOENT) { + r =3D hmm_vma_walk_split(pmdp, addr, walk); + if (r) { + /* Split not successful, skip */ + return hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); + } + + /* Split successful or "again", reloop */ hmm_vma_walk->last =3D addr; - pmd_migration_entry_wait(walk->mm, pmdp); return -EBUSY; } - return hmm_pfns_fill(start, end, range, 0); - } =20 - if (!pmd_present(pmd)) - return hmm_vma_handle_absent_pmd(walk, start, end, hmm_pfns, - pmd); + return r; =20 - if (pmd_trans_huge(pmd)) { - /* - * No need to take pmd_lock here, even if some other thread - * is splitting the huge pmd we will get that event through - * mmu_notifier callback. - * - * So just read pmd value and check again it's a transparent - * huge or device mapping one and compute corresponding pfn - * values. - */ - pmd =3D pmdp_get_lockless(pmdp); - if (!pmd_trans_huge(pmd)) - goto again; + } =20 - return hmm_vma_handle_pmd(walk, addr, end, hmm_pfns, pmd); + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; } =20 /* @@ -451,22 +1074,43 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, if (pmd_bad(pmd)) { if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) return -EFAULT; - return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); + return hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); } =20 - ptep =3D pte_offset_map(pmdp, addr); + if (minfo) { + ptep =3D pte_offset_map_lock(mm, pmdp, addr, &hmm_vma_walk->ptl); + if (ptep) + hmm_vma_walk->ptelocked =3D true; + } else + ptep =3D pte_offset_map(pmdp, addr); if (!ptep) goto again; + for (; addr < end; addr +=3D PAGE_SIZE, ptep++, hmm_pfns++) { - int r; =20 r =3D hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, hmm_pfns); if (r) { - /* hmm_vma_handle_pte() did pte_unmap() */ + /* hmm_vma_handle_pte() did pte_unmap() / pte_unmap_unlock */ return r; } + + r =3D hmm_vma_handle_migrate_prepare(walk, pmdp, ptep, addr, hmm_pfns); + if (r =3D=3D -EAGAIN) { + HMM_ASSERT_UNLOCKED(hmm_vma_walk); + goto again; + } + if (r) { + hmm_pfns_fill(addr, end, hmm_vma_walk, HMM_PFN_ERROR); + break; + } } - pte_unmap(ptep - 1); + + if (hmm_vma_walk->ptelocked) { + pte_unmap_unlock(ptep - 1, hmm_vma_walk->ptl); + hmm_vma_walk->ptelocked =3D false; + } else + pte_unmap(ptep - 1); + return 0; } =20 @@ -600,6 +1244,11 @@ static int hmm_vma_walk_test(unsigned long start, uns= igned long end, struct hmm_vma_walk *hmm_vma_walk =3D walk->private; struct hmm_range *range =3D hmm_vma_walk->range; struct vm_area_struct *vma =3D walk->vma; + int r; + + r =3D hmm_vma_capture_migrate_range(start, end, walk); + if (r) + return r; =20 if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)) && vma->vm_flags & VM_READ) @@ -622,7 +1271,7 @@ static int hmm_vma_walk_test(unsigned long start, unsi= gned long end, (end - start) >> PAGE_SHIFT, 0)) return -EFAULT; =20 - hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); + hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); =20 /* Skip this vma and continue processing the next vma. */ return 1; @@ -652,9 +1301,17 @@ static const struct mm_walk_ops hmm_walk_ops =3D { * the invalidation to finish. * -EFAULT: A page was requested to be valid and could not be made val= id * ie it has no backing VMA or it is illegal to access + * -ERANGE: The range crosses multiple VMAs, or space for hmm_pfns arr= ay + * is too low. * * This is similar to get_user_pages(), except that it can read the page t= ables * without mutating them (ie causing faults). + * + * If want to do migrate after faulting, call hmm_range_fault() with + * HMM_PFN_REQ_MIGRATE and initialize range.migrate field. + * After hmm_range_fault() call migrate_hmm_range_setup() instead of + * migrate_vma_setup() and after that follow normal migrate calls path. + * */ int hmm_range_fault(struct hmm_range *range) { @@ -662,16 +1319,33 @@ int hmm_range_fault(struct hmm_range *range) .range =3D range, .last =3D range->start, }; - struct mm_struct *mm =3D range->notifier->mm; + bool is_fault_path =3D !!range->notifier; + struct mm_struct *mm; int ret; =20 + /* + * + * Could be serving a device fault or come from migrate + * entry point. For the former we have not resolved the vma + * yet, and the latter we don't have a notifier (but have a vma). + * + */ +#ifdef CONFIG_DEVICE_MIGRATION + mm =3D is_fault_path ? range->notifier->mm : range->migrate->vma->vm_mm; +#else + WARN_ON_ONCE(!faulting_path); + mm =3D range->notifier->mm; +#endif mmap_assert_locked(mm); =20 do { /* If range is no longer valid force retry. */ - if (mmu_interval_check_retry(range->notifier, - range->notifier_seq)) - return -EBUSY; + if (is_fault_path && mmu_interval_check_retry(range->notifier, + range->notifier_seq)) { + ret =3D -EBUSY; + break; + } + ret =3D walk_page_range(mm, hmm_vma_walk.last, range->end, &hmm_walk_ops, &hmm_vma_walk); /* @@ -681,6 +1355,19 @@ int hmm_range_fault(struct hmm_range *range) * output, and all >=3D are still at their input values. */ } while (ret =3D=3D -EBUSY); + +#ifdef CONFIG_DEVICE_MIGRATION + if (hmm_select_migrate(range) && range->migrate && + hmm_vma_walk.mmu_range.owner) { + // The migrate_vma path has the following initialized + if (is_fault_path) { + range->migrate->vma =3D hmm_vma_walk.vma; + range->migrate->start =3D range->start; + range->migrate->end =3D hmm_vma_walk.end; + } + mmu_notifier_invalidate_range_end(&hmm_vma_walk.mmu_range); + } +#endif return ret; } EXPORT_SYMBOL(hmm_range_fault); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 23379663b1e1..bda6320f6242 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -734,7 +734,16 @@ static void migrate_vma_unmap(struct migrate_vma *migr= ate) */ int migrate_vma_setup(struct migrate_vma *args) { + int ret; long nr_pages =3D (args->end - args->start) >> PAGE_SHIFT; + struct hmm_range range =3D { + .notifier =3D NULL, + .start =3D args->start, + .end =3D args->end, + .hmm_pfns =3D args->src, + .dev_private_owner =3D args->pgmap_owner, + .migrate =3D args + }; =20 args->start &=3D PAGE_MASK; args->end &=3D PAGE_MASK; @@ -759,17 +768,25 @@ int migrate_vma_setup(struct migrate_vma *args) args->cpages =3D 0; args->npages =3D 0; =20 - migrate_vma_collect(args); + if (args->flags & MIGRATE_VMA_FAULT) + range.default_flags |=3D HMM_PFN_REQ_FAULT; + + ret =3D hmm_range_fault(&range); + + migrate_hmm_range_setup(&range); =20 - if (args->cpages) - migrate_vma_unmap(args); + /* Remove migration PTEs */ + if (ret) { + migrate_vma_pages(args); + migrate_vma_finalize(args); + } =20 /* * At this point pages are locked and unmapped, and thus they have * stable content and can safely be copied to destination memory that * is allocated by the drivers. */ - return 0; + return ret; =20 } EXPORT_SYMBOL(migrate_vma_setup); @@ -1489,3 +1506,64 @@ int migrate_device_coherent_folio(struct folio *foli= o) return 0; return -EBUSY; } + +void migrate_hmm_range_setup(struct hmm_range *range) +{ + + struct migrate_vma *migrate =3D range->migrate; + + if (!migrate) + return; + + migrate->npages =3D (migrate->end - migrate->start) >> PAGE_SHIFT; + migrate->cpages =3D 0; + + for (unsigned long i =3D 0; i < migrate->npages; i++) { + + unsigned long pfn =3D range->hmm_pfns[i]; + + pfn &=3D ~HMM_PFN_INOUT_FLAGS; + + /* + * + * Don't do migration if valid and migrate flags are not both set. + * + */ + if ((pfn & (HMM_PFN_VALID | HMM_PFN_MIGRATE)) !=3D + (HMM_PFN_VALID | HMM_PFN_MIGRATE)) { + migrate->src[i] =3D 0; + migrate->dst[i] =3D 0; + continue; + } + + migrate->cpages++; + + /* + * + * The zero page is encoded in a special way, valid and migrate is + * set, and pfn part is zero. Encode specially for migrate also. + * + */ + if (pfn =3D=3D (HMM_PFN_VALID|HMM_PFN_MIGRATE)) { + migrate->src[i] =3D MIGRATE_PFN_MIGRATE; + migrate->dst[i] =3D 0; + continue; + } + if (pfn =3D=3D (HMM_PFN_VALID|HMM_PFN_MIGRATE|HMM_PFN_COMPOUND)) { + migrate->src[i] =3D MIGRATE_PFN_MIGRATE|MIGRATE_PFN_COMPOUND; + migrate->dst[i] =3D 0; + continue; + } + + migrate->src[i] =3D migrate_pfn(page_to_pfn(hmm_pfn_to_page(pfn))) + | MIGRATE_PFN_MIGRATE; + migrate->src[i] |=3D (pfn & HMM_PFN_WRITE) ? MIGRATE_PFN_WRITE : 0; + migrate->src[i] |=3D (pfn & HMM_PFN_COMPOUND) ? MIGRATE_PFN_COMPOUND : 0; + migrate->dst[i] =3D 0; + } + + if (migrate->cpages) + migrate_vma_unmap(migrate); + +} +EXPORT_SYMBOL(migrate_hmm_range_setup); --=20 2.50.0 From nobody Sun Feb 8 19:55:44 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5197E33122F for ; Mon, 26 Jan 2026 11:20:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769426418; cv=none; b=qVI0IU8VvF3z01bmOasr9LZKuh1Ydrb7TK3GeTI7u05Ub1qewlVnsMusJE8/zOjzFl7G/yC4tAXhMRAN/RJm50x2NPiAXctVirRMRRxDlL/IzU0/SEFa4M+TS54oAKWOQqwUnl1UB2+zfgVMFdifEKdNjbkW71a75k6hVkJG3dI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769426418; c=relaxed/simple; bh=1WBs/5NwS++R32SQ2pZFyMjPdfm8SKJsIsq/66bONYM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mI1hvvH1zs08dEy0TTg3hhjGsMBSRMnDQNeG/W7URMSGfIfr2rmz59/FyBNKQG+sGaqMsEOP9dyMATSfXMhvEEh5kxOVVFKyygjtYXWE1KiTgP9nIH45d1RGV5AiuVmKpHwuvFASzk7G7wJ3f5FLxc7VluoM2K05dWByhI0UqiY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=LBDNWcmJ; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=YEiLYhmY; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LBDNWcmJ"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="YEiLYhmY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1769426415; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pe4kihdf5rzPSapoEMxkpwm+CiIoqdx7EKDwdy3eKBM=; b=LBDNWcmJOR1pYqynKjJuob3IQ14eY+SFHwt+oHGhHsGKggSu78czcn+iJ1/2Kq6zm6Owbb No+XnYKX/8asFSp5QZ/uJ7mcz+pECknJWNWqRpEBov+zc7jpk6fs4XRqEFqtFGriEhuN9o KE5x5mGcIiGvYj4ojen1vM27RWyGA8g= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-141-gi5X5NR9NbyjNsuj4hK3aA-1; Mon, 26 Jan 2026 06:20:08 -0500 X-MC-Unique: gi5X5NR9NbyjNsuj4hK3aA-1 X-Mimecast-MFC-AGG-ID: gi5X5NR9NbyjNsuj4hK3aA_1769426407 Received: by mail-lf1-f70.google.com with SMTP id 2adb3069b0e04-59dfc5b6b5bso349272e87.3 for ; Mon, 26 Jan 2026 03:20:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1769426407; x=1770031207; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pe4kihdf5rzPSapoEMxkpwm+CiIoqdx7EKDwdy3eKBM=; b=YEiLYhmYZOVcnowTdPToETNTGtbUdTDiqXEh02fcvGgGeZ2NNO5EsWQRFGFs3WKvYG Q9XwTqAasp1fP0MoJ9Ffxyvn0CQ2ACFe+/kuntkvBGFVw0eAWvATn6KjFdRqVxqFf/XF 8oOZHJUr+ZA6u7F+LkMDlnrWCKA67u5h5jbnmFQx4iTVw/Fh6NBwijAdHTdtWN0SPu3B B9QowSd99+sxAj/J8BElzMYBSapT4d3c+4X8Sb7k8F5kQu6ak/ABAIxonuzIxZCvGXAa NrzaQUJlMLtFDWmZH6tU5i4da0huTxdBwvTVw76Mcq8dSLdKVp0ewuunv83GWSxIrBbM QwOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769426407; x=1770031207; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=pe4kihdf5rzPSapoEMxkpwm+CiIoqdx7EKDwdy3eKBM=; b=eJ6zErF6+S26GPZe8QELjLvBaMLpcS+r/JH3C+vU3ZDcUCmcoy0DbL4YGnkJOi7H2a Ra2nG5iohod3jOah14MguHmVN2mdQyYjFUg2IMJGU4f63WG5TfBZfahwzZEokRmEM0zi OHylhw4OBPawtdsCzL8Z6iLA9MGE6S1XDXfC4s20pJmFE80Vzhstna4+HD69fGm1S0T6 UYrcTHv3muJ9fxwZ7eco86uhCt8FMBiy3DzHODdZOMUI4FSI2zUBeqFLT6w2Aa9TFXWp 5ZXfQJPHKvD4nK6gxf+tOPiW5OE/6Lx54HR+CcI8hUlWkwjAhiyNGa0Xf5Mv0VSnaoAE JKHg== X-Gm-Message-State: AOJu0YwxBctX3Gmzf2cRIUuW/gyG4giixxCH+Q17d8ErrXUiRswqwsEM QLeEp/DQ4IoGee3JIrXzK3vDtwCJW9UlXW645dgnyq3CLZaVnX7c3VfULQ9FLJT9CCVcBJgRztI SIwKT2eXkWqaFWfAgD5bLI1ue5yuH+w4Twlelfl6hTearBA7iVU53V4bngUVb7SMv X-Gm-Gg: AZuq6aI+vbA3msT4SA4bwx5+FPrtC4R+4sONtw2EQZTdB3i66N7pb2zUgsVSu/g3LIH i8s6Sb7AHQ4PnCyOgegewjf5vxT9eVDhHdxq0fM+mH+JGMp/aVGV5JQ000u1wkmmmEFw7F/vDrm eZnlw38uQlmF/3GLn4RNwC1vrHxGlYqFt/9/rkGKQVd/hgqznAg5Kj3C4Y8IDdciG4ZxgwXGm4K WiX3sJVaH/wLxuRmwafTvm36o6iCnxiEZLmWfbnxwyqM3cUqaofpy1avccDKKOQ8C+OR/dPWRmY LWC4cEEDxvK1o6jkWcSNi6t6n5iLamllZA8Z/gaak0fIG2z+1KKdWlEnlrmcMT5MDTQnhNMwcXu gNJ0ZbvLv/Ne2SE9wu7p0xYa2 X-Received: by 2002:a05:6512:1195:b0:59d:f72c:733 with SMTP id 2adb3069b0e04-59df72c073amr961401e87.9.1769426406927; Mon, 26 Jan 2026 03:20:06 -0800 (PST) X-Received: by 2002:a05:6512:1195:b0:59d:f72c:733 with SMTP id 2adb3069b0e04-59df72c073amr961395e87.9.1769426406464; Mon, 26 Jan 2026 03:20:06 -0800 (PST) Received: from fedora (85-23-51-1.bb.dnainternet.fi. [85.23.51.1]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-59de48e5f76sm2572221e87.23.2026.01.26.03.20.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Jan 2026 03:20:05 -0800 (PST) From: mpenttil@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?UTF-8?q?Mika=20Penttil=C3=A4?= , David Hildenbrand , Jason Gunthorpe , Leon Romanovsky , Alistair Popple , Balbir Singh , Zi Yan , Matthew Brost , Marco Pagani Subject: [PATCH v3 2/3] mm: add new testcase for the migrate on fault case Date: Mon, 26 Jan 2026 13:19:38 +0200 Message-ID: <20260126111939.1332983-3-mpenttil@redhat.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20260126111939.1332983-1-mpenttil@redhat.com> References: <20260126111939.1332983-1-mpenttil@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Mika Penttil=C3=A4 Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Alistair Popple Cc: Balbir Singh Cc: Zi Yan Cc: Matthew Brost Signed-off-by: Marco Pagani Signed-off-by: Mika Penttil=C3=A4 --- lib/test_hmm.c | 100 ++++++++++++++++++++++++- lib/test_hmm_uapi.h | 19 ++--- tools/testing/selftests/mm/hmm-tests.c | 54 +++++++++++++ 3 files changed, 163 insertions(+), 10 deletions(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 8af169d3873a..b82517cfd616 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -36,6 +36,7 @@ #define DMIRROR_RANGE_FAULT_TIMEOUT 1000 #define DEVMEM_CHUNK_SIZE (256 * 1024 * 1024U) #define DEVMEM_CHUNKS_RESERVE 16 +#define PFNS_ARRAY_SIZE 64 =20 /* * For device_private pages, dpage is just a dummy struct page @@ -145,7 +146,7 @@ static bool dmirror_is_private_zone(struct dmirror_devi= ce *mdevice) HMM_DMIRROR_MEMORY_DEVICE_PRIVATE); } =20 -static enum migrate_vma_direction +static enum migrate_vma_info dmirror_select_device(struct dmirror *dmirror) { return (dmirror->mdevice->zone_device_type =3D=3D @@ -1194,6 +1195,99 @@ static int dmirror_migrate_to_device(struct dmirror = *dmirror, return ret; } =20 +static int do_fault_and_migrate(struct dmirror *dmirror, struct hmm_range = *range) +{ + struct migrate_vma *migrate =3D range->migrate; + int ret; + + mmap_read_lock(dmirror->notifier.mm); + + /* Fault-in pages for migration and update device page table */ + ret =3D dmirror_range_fault(dmirror, range); + + pr_debug("Migrating from sys mem to device mem\n"); + migrate_hmm_range_setup(range); + + dmirror_migrate_alloc_and_copy(migrate, dmirror); + migrate_vma_pages(migrate); + dmirror_migrate_finalize_and_map(migrate, dmirror); + migrate_vma_finalize(migrate); + + mmap_read_unlock(dmirror->notifier.mm); + return ret; +} + +static int dmirror_fault_and_migrate_to_device(struct dmirror *dmirror, + struct hmm_dmirror_cmd *cmd) +{ + unsigned long start, size, end, next; + unsigned long src_pfns[PFNS_ARRAY_SIZE] =3D { 0 }; + unsigned long dst_pfns[PFNS_ARRAY_SIZE] =3D { 0 }; + struct migrate_vma migrate =3D { 0 }; + struct hmm_range range =3D { 0 }; + struct dmirror_bounce bounce; + int ret =3D 0; + + /* Whole range */ + start =3D cmd->addr; + size =3D cmd->npages << PAGE_SHIFT; + end =3D start + size; + + if (!mmget_not_zero(dmirror->notifier.mm)) { + ret =3D -EFAULT; + goto out; + } + + migrate.pgmap_owner =3D dmirror->mdevice; + migrate.src =3D src_pfns; + migrate.dst =3D dst_pfns; + + range.migrate =3D &migrate; + range.hmm_pfns =3D src_pfns; + range.pfn_flags_mask =3D 0; + range.default_flags =3D HMM_PFN_REQ_FAULT | HMM_PFN_REQ_MIGRATE; + range.dev_private_owner =3D dmirror->mdevice; + range.notifier =3D &dmirror->notifier; + + for (next =3D start; next < end; next =3D range.end) { + range.start =3D next; + range.end =3D min(end, next + (PFNS_ARRAY_SIZE << PAGE_SHIFT)); + + pr_debug("Fault and migrate range start:%#lx end:%#lx\n", + range.start, range.end); + + ret =3D do_fault_and_migrate(dmirror, &range); + if (ret) + goto out_mmput; + } + + /* + * Return the migrated data for verification. + * Only for pages in device zone + */ + ret =3D dmirror_bounce_init(&bounce, start, size); + if (ret) + goto out_mmput; + + mutex_lock(&dmirror->mutex); + ret =3D dmirror_do_read(dmirror, start, end, &bounce); + mutex_unlock(&dmirror->mutex); + if (ret =3D=3D 0) { + ret =3D copy_to_user(u64_to_user_ptr(cmd->ptr), bounce.ptr, bounce.size); + if (ret) + ret =3D -EFAULT; + } + + cmd->cpages =3D bounce.cpages; + dmirror_bounce_fini(&bounce); + + +out_mmput: + mmput(dmirror->notifier.mm); +out: + return ret; +} + static void dmirror_mkentry(struct dmirror *dmirror, struct hmm_range *ran= ge, unsigned char *perm, unsigned long entry) { @@ -1510,6 +1604,10 @@ static long dmirror_fops_unlocked_ioctl(struct file = *filp, ret =3D dmirror_migrate_to_device(dmirror, &cmd); break; =20 + case HMM_DMIRROR_MIGRATE_ON_FAULT_TO_DEV: + ret =3D dmirror_fault_and_migrate_to_device(dmirror, &cmd); + break; + case HMM_DMIRROR_MIGRATE_TO_SYS: ret =3D dmirror_migrate_to_system(dmirror, &cmd); break; diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h index f94c6d457338..0b6e7a419e36 100644 --- a/lib/test_hmm_uapi.h +++ b/lib/test_hmm_uapi.h @@ -29,15 +29,16 @@ struct hmm_dmirror_cmd { }; =20 /* Expose the address space of the calling process through hmm device file= */ -#define HMM_DMIRROR_READ _IOWR('H', 0x00, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_WRITE _IOWR('H', 0x01, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_MIGRATE_TO_DEV _IOWR('H', 0x02, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_MIGRATE_TO_SYS _IOWR('H', 0x03, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_SNAPSHOT _IOWR('H', 0x04, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_EXCLUSIVE _IOWR('H', 0x05, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_CHECK_EXCLUSIVE _IOWR('H', 0x06, struct hmm_dmirror_cm= d) -#define HMM_DMIRROR_RELEASE _IOWR('H', 0x07, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_FLAGS _IOWR('H', 0x08, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_READ _IOWR('H', 0x00, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_WRITE _IOWR('H', 0x01, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_MIGRATE_TO_DEV _IOWR('H', 0x02, struct hmm_dmirror_cm= d) +#define HMM_DMIRROR_MIGRATE_ON_FAULT_TO_DEV _IOWR('H', 0x03, struct hmm_dm= irror_cmd) +#define HMM_DMIRROR_MIGRATE_TO_SYS _IOWR('H', 0x04, struct hmm_dmirror_cm= d) +#define HMM_DMIRROR_SNAPSHOT _IOWR('H', 0x05, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_EXCLUSIVE _IOWR('H', 0x06, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_CHECK_EXCLUSIVE _IOWR('H', 0x07, struct hmm_dmirror_c= md) +#define HMM_DMIRROR_RELEASE _IOWR('H', 0x08, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_FLAGS _IOWR('H', 0x09, struct hmm_dmirror_cmd) =20 #define HMM_DMIRROR_FLAG_FAIL_ALLOC (1ULL << 0) =20 diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftes= ts/mm/hmm-tests.c index e8328c89d855..c75616875c9e 100644 --- a/tools/testing/selftests/mm/hmm-tests.c +++ b/tools/testing/selftests/mm/hmm-tests.c @@ -277,6 +277,13 @@ static int hmm_migrate_sys_to_dev(int fd, return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_TO_DEV, buffer, npages); } =20 +static int hmm_migrate_on_fault_sys_to_dev(int fd, + struct hmm_buffer *buffer, + unsigned long npages) +{ + return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_ON_FAULT_TO_DEV, buffer, n= pages); +} + static int hmm_migrate_dev_to_sys(int fd, struct hmm_buffer *buffer, unsigned long npages) @@ -1034,6 +1041,53 @@ TEST_F(hmm, migrate) hmm_buffer_free(buffer); } =20 + +/* + * Fault and migrate anonymous memory to device private memory. + */ +TEST_F(hmm, migrate_on_fault) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + int *ptr; + int ret; + + npages =3D ALIGN(HMM_BUFFER_SIZE, self->page_size) >> self->page_shift; + ASSERT_NE(npages, 0); + size =3D npages << self->page_shift; + + buffer =3D malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd =3D -1; + buffer->size =3D size; + buffer->mirror =3D malloc(size); + ASSERT_NE(buffer->mirror, NULL); + + buffer->ptr =3D mmap(NULL, size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, + buffer->fd, 0); + ASSERT_NE(buffer->ptr, MAP_FAILED); + + /* Initialize buffer in system memory. */ + for (i =3D 0, ptr =3D buffer->ptr; i < size / sizeof(*ptr); ++i) + ptr[i] =3D i; + + /* Fault and migrate memory to device. */ + ret =3D hmm_migrate_on_fault_sys_to_dev(self->fd, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i =3D 0, ptr =3D buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], i); + + hmm_buffer_free(buffer); +} + /* * Migrate anonymous memory to device private memory and fault some of it = back * to system memory, then try migrating the resulting mix of system and de= vice --=20 2.50.0 From nobody Sun Feb 8 19:55:44 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 84FBC331A79 for ; Mon, 26 Jan 2026 11:20:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769426418; cv=none; b=dgfa8+EWYGZhrp2UuG+AQhF8/oGjtZctDqnzC7DTMv3nAtfTfV7ZaJYixhuFvKqe2xMyF+D2U6z2uTNfnu5WAdYk3P54typ/lfBl4wiouVsiQznBjAA4Jc1BcHST0pA4mOSQ1LaD+EeZUH3PNk/Eo99VLkrXhGQiG/nFw3b5uQU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769426418; c=relaxed/simple; bh=/5r0FHkYXZenhm1MF/UgCIl0NnZzXK91E1hVCYi86Gw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mPHdt7f05dPCWtg33zBzqz9bVqKp6UHaifFBmCzGC52NEY+ome7xNfaChkimusrtUlBp/BF6Usq7WKeoOZ+WDBGhLWjraERELSNRNhPx6lQOE9o2ytg7v17I6t2NKNF10DVtflLt+LRqb4eBhs4G1gAcOrkkuHLwn1veX0kgM0s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=L67EksFF; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=QymRi64J; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="L67EksFF"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="QymRi64J" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1769426415; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MIWs13uPtbBYR5vhSSUBlncr2KRdF3LUR9HD+nqyOvQ=; b=L67EksFFBRF0d2JffhDmOBZpGg+o3o+lPbm2JRhhT+OgpLlZZR+wY0x3bSXMms5Cm48rb5 m7jTxPdRF7j218zyRtPnpAm40/Hz+wvSnKibXnamTadXllyVgTDNgi4F+KQx+PAS8r6aTD t8TI0pp8pixMoAQaeVEDQncjMnAcQAM= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-649-N74Irn0_MmCNXdyuVVjy6g-1; Mon, 26 Jan 2026 06:20:12 -0500 X-MC-Unique: N74Irn0_MmCNXdyuVVjy6g-1 X-Mimecast-MFC-AGG-ID: N74Irn0_MmCNXdyuVVjy6g_1769426411 Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-59deeb3e155so769475e87.1 for ; Mon, 26 Jan 2026 03:20:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1769426411; x=1770031211; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MIWs13uPtbBYR5vhSSUBlncr2KRdF3LUR9HD+nqyOvQ=; b=QymRi64JlQktTNCmOOqY3+cUH3PYPqwujp2t0NBFE8q0+QFF4ZmeCPfyxhD3SDMTqZ aqd0Qw7+yq7Szl4AUJ8XCPdCQ1+WV9gMqRi2basXg+FGLJ8xpjZiYS0XRY2B75USG9eL LoB7SPRdIXoUT3hlyDZjcuE1SjaqtyxpjJR7R2LfFpim3Xe6oEsQxm/juzXIugajAOAw g8dkKVV8bm+qcE/6LL6dSj/ydPLt2F5KNU2rlwNWISyHd8cvL5kjeW0dE4LHKX6xiHRo gAtPrjZExSSlzQj/F+5hzpE6o9+FrKNubbYyTKj4R6tf1I3gZa06d+kPXbOSbh6Skw0V s1Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769426411; x=1770031211; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=MIWs13uPtbBYR5vhSSUBlncr2KRdF3LUR9HD+nqyOvQ=; b=NPFYzDkuTNihsr8WdxwASSHaMTDLl+amGTj/mpKQmaR4PVlKg5j7NzsiIazgAz+cNR v7Rw/12qfy17FbTli5eQgpoAKIWv4unan7oU3Kt29ccpEeR7GiSuv4ZHlwyV+sReDQGW zAaxT5NZkMN729sHdBSvJtQ2NLaDTRGNBInRu8oOI90D310uT2XQbkfuZ55yg3HpGsbv S5PE1OmV70NgwnRdiDT72+feQN8ePvkAPKrz8yj15i8seZL7ZXtqYoWmYgZJnvcRRMIO /BaIREl5z/3xRP1LXYSGNgpG3Paq/yyh5doMsMxQDcO5llgWOvVf3z2ekU74twqbTZdV 6M6A== X-Gm-Message-State: AOJu0YyugWJ/q1q+/J+PkKJhbY/8ShaTpiKigdF7TqBMffMxwHkPoXCr TObFlABDZpLJ8P2Nos9BGoS0ZPwqBfYwAsxIduFt3KVEqdIL0URpTxQRCN05LzRfoVgm1ZTaGmz sMv6Z0BIXtYl4jGIWC20y2KrzmwOfE+//N1da6V+V1L0jKayU7EwDZa590pvgNK92 X-Gm-Gg: AZuq6aKIg8ikFu/68IkF8ddjeCYyxMc/P7GAcEFYOh+cWvlLMWu6Wk8b7uy2T0Swbgj Y5/sAk7uD0fnBJXKwnvJKqhlJL3rEcT8NvRfqdGPP+EN2c7DDtb/4ZSedqw+IVwhToI9Aky0y/I pOwvmrVM8Rd8NCKbZYYUEYhdcGtuozuhPMebFeoUg+4n0Yg1Srg/UXELL1kDtuZJhwLBfamkDRW zHtYH54aDwkvZ3xizg5XQyTEPAp5HUcFc7agn0PSnLcHt6nsgN+eHB0GkEAv4myfQqY8MtGYkcg 8Cr0In3s6FwbHwy04A+zS23FIzw6XzTdHngaVCntywdOKt3fpmTzYFSobx8E0TlIsGC7yfkwwjf u/xO6LShzvqnjuoeq2sMJEdg6 X-Received: by 2002:a05:6512:1385:b0:59d:d5d1:371e with SMTP id 2adb3069b0e04-59df361762fmr1354452e87.20.1769426410813; Mon, 26 Jan 2026 03:20:10 -0800 (PST) X-Received: by 2002:a05:6512:1385:b0:59d:d5d1:371e with SMTP id 2adb3069b0e04-59df361762fmr1354442e87.20.1769426410302; Mon, 26 Jan 2026 03:20:10 -0800 (PST) Received: from fedora (85-23-51-1.bb.dnainternet.fi. [85.23.51.1]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-59de48e5f76sm2572221e87.23.2026.01.26.03.20.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Jan 2026 03:20:09 -0800 (PST) From: mpenttil@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?UTF-8?q?Mika=20Penttil=C3=A4?= , David Hildenbrand , Jason Gunthorpe , Leon Romanovsky , Alistair Popple , Balbir Singh , Zi Yan , Matthew Brost Subject: [PATCH v3 3/3] mm:/migrate_device.c: remove migrate_vma_collect_*() functions Date: Mon, 26 Jan 2026 13:19:39 +0200 Message-ID: <20260126111939.1332983-4-mpenttil@redhat.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20260126111939.1332983-1-mpenttil@redhat.com> References: <20260126111939.1332983-1-mpenttil@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Mika Penttil=C3=A4 With the unified fault handling and migrate path, the migrate_vma_collect_*() functions are unused, let's remove them. Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Alistair Popple Cc: Balbir Singh Cc: Zi Yan Cc: Matthew Brost Signed-off-by: Mika Penttil=C3=A4 --- mm/migrate_device.c | 508 -------------------------------------------- 1 file changed, 508 deletions(-) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index bda6320f6242..c896a4d8bca2 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -18,514 +18,6 @@ #include #include "internal.h" =20 -static int migrate_vma_collect_skip(unsigned long start, - unsigned long end, - struct mm_walk *walk) -{ - struct migrate_vma *migrate =3D walk->private; - unsigned long addr; - - for (addr =3D start; addr < end; addr +=3D PAGE_SIZE) { - migrate->dst[migrate->npages] =3D 0; - migrate->src[migrate->npages++] =3D 0; - } - - return 0; -} - -static int migrate_vma_collect_hole(unsigned long start, - unsigned long end, - __always_unused int depth, - struct mm_walk *walk) -{ - struct migrate_vma *migrate =3D walk->private; - unsigned long addr; - - /* Only allow populating anonymous memory. */ - if (!vma_is_anonymous(walk->vma)) - return migrate_vma_collect_skip(start, end, walk); - - if (thp_migration_supported() && - (migrate->flags & MIGRATE_VMA_SELECT_COMPOUND) && - (IS_ALIGNED(start, HPAGE_PMD_SIZE) && - IS_ALIGNED(end, HPAGE_PMD_SIZE))) { - migrate->src[migrate->npages] =3D MIGRATE_PFN_MIGRATE | - MIGRATE_PFN_COMPOUND; - migrate->dst[migrate->npages] =3D 0; - migrate->npages++; - migrate->cpages++; - - /* - * Collect the remaining entries as holes, in case we - * need to split later - */ - return migrate_vma_collect_skip(start + PAGE_SIZE, end, walk); - } - - for (addr =3D start; addr < end; addr +=3D PAGE_SIZE) { - migrate->src[migrate->npages] =3D MIGRATE_PFN_MIGRATE; - migrate->dst[migrate->npages] =3D 0; - migrate->npages++; - migrate->cpages++; - } - - return 0; -} - -/** - * migrate_vma_split_folio() - Helper function to split a THP folio - * @folio: the folio to split - * @fault_page: struct page associated with the fault if any - * - * Returns 0 on success - */ -static int migrate_vma_split_folio(struct folio *folio, - struct page *fault_page) -{ - int ret; - struct folio *fault_folio =3D fault_page ? page_folio(fault_page) : NULL; - struct folio *new_fault_folio =3D NULL; - - if (folio !=3D fault_folio) { - folio_get(folio); - folio_lock(folio); - } - - ret =3D split_folio(folio); - if (ret) { - if (folio !=3D fault_folio) { - folio_unlock(folio); - folio_put(folio); - } - return ret; - } - - new_fault_folio =3D fault_page ? page_folio(fault_page) : NULL; - - /* - * Ensure the lock is held on the correct - * folio after the split - */ - if (!new_fault_folio) { - folio_unlock(folio); - folio_put(folio); - } else if (folio !=3D new_fault_folio) { - if (new_fault_folio !=3D fault_folio) { - folio_get(new_fault_folio); - folio_lock(new_fault_folio); - } - folio_unlock(folio); - folio_put(folio); - } - - return 0; -} - -/** migrate_vma_collect_huge_pmd - collect THP pages without splitting the - * folio for device private pages. - * @pmdp: pointer to pmd entry - * @start: start address of the range for migration - * @end: end address of the range for migration - * @walk: mm_walk callback structure - * @fault_folio: folio associated with the fault if any - * - * Collect the huge pmd entry at @pmdp for migration and set the - * MIGRATE_PFN_COMPOUND flag in the migrate src entry to indicate that - * migration will occur at HPAGE_PMD granularity - */ -static int migrate_vma_collect_huge_pmd(pmd_t *pmdp, unsigned long start, - unsigned long end, struct mm_walk *walk, - struct folio *fault_folio) -{ - struct mm_struct *mm =3D walk->mm; - struct folio *folio; - struct migrate_vma *migrate =3D walk->private; - spinlock_t *ptl; - int ret; - unsigned long write =3D 0; - - ptl =3D pmd_lock(mm, pmdp); - if (pmd_none(*pmdp)) { - spin_unlock(ptl); - return migrate_vma_collect_hole(start, end, -1, walk); - } - - if (pmd_trans_huge(*pmdp)) { - if (!(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)) { - spin_unlock(ptl); - return migrate_vma_collect_skip(start, end, walk); - } - - folio =3D pmd_folio(*pmdp); - if (is_huge_zero_folio(folio)) { - spin_unlock(ptl); - return migrate_vma_collect_hole(start, end, -1, walk); - } - if (pmd_write(*pmdp)) - write =3D MIGRATE_PFN_WRITE; - } else if (!pmd_present(*pmdp)) { - const softleaf_t entry =3D softleaf_from_pmd(*pmdp); - - folio =3D softleaf_to_folio(entry); - - if (!softleaf_is_device_private(entry) || - !(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_PRIVATE) || - (folio->pgmap->owner !=3D migrate->pgmap_owner)) { - spin_unlock(ptl); - return migrate_vma_collect_skip(start, end, walk); - } - - if (softleaf_is_migration(entry)) { - migration_entry_wait_on_locked(entry, ptl); - spin_unlock(ptl); - return -EAGAIN; - } - - if (softleaf_is_device_private_write(entry)) - write =3D MIGRATE_PFN_WRITE; - } else { - spin_unlock(ptl); - return -EAGAIN; - } - - folio_get(folio); - if (folio !=3D fault_folio && unlikely(!folio_trylock(folio))) { - spin_unlock(ptl); - folio_put(folio); - return migrate_vma_collect_skip(start, end, walk); - } - - if (thp_migration_supported() && - (migrate->flags & MIGRATE_VMA_SELECT_COMPOUND) && - (IS_ALIGNED(start, HPAGE_PMD_SIZE) && - IS_ALIGNED(end, HPAGE_PMD_SIZE))) { - - struct page_vma_mapped_walk pvmw =3D { - .ptl =3D ptl, - .address =3D start, - .pmd =3D pmdp, - .vma =3D walk->vma, - }; - - unsigned long pfn =3D page_to_pfn(folio_page(folio, 0)); - - migrate->src[migrate->npages] =3D migrate_pfn(pfn) | write - | MIGRATE_PFN_MIGRATE - | MIGRATE_PFN_COMPOUND; - migrate->dst[migrate->npages++] =3D 0; - migrate->cpages++; - ret =3D set_pmd_migration_entry(&pvmw, folio_page(folio, 0)); - if (ret) { - migrate->npages--; - migrate->cpages--; - migrate->src[migrate->npages] =3D 0; - migrate->dst[migrate->npages] =3D 0; - goto fallback; - } - migrate_vma_collect_skip(start + PAGE_SIZE, end, walk); - spin_unlock(ptl); - return 0; - } - -fallback: - spin_unlock(ptl); - if (!folio_test_large(folio)) - goto done; - ret =3D split_folio(folio); - if (fault_folio !=3D folio) - folio_unlock(folio); - folio_put(folio); - if (ret) - return migrate_vma_collect_skip(start, end, walk); - if (pmd_none(pmdp_get_lockless(pmdp))) - return migrate_vma_collect_hole(start, end, -1, walk); - -done: - return -ENOENT; -} - -static int migrate_vma_collect_pmd(pmd_t *pmdp, - unsigned long start, - unsigned long end, - struct mm_walk *walk) -{ - struct migrate_vma *migrate =3D walk->private; - struct vm_area_struct *vma =3D walk->vma; - struct mm_struct *mm =3D vma->vm_mm; - unsigned long addr =3D start, unmapped =3D 0; - spinlock_t *ptl; - struct folio *fault_folio =3D migrate->fault_page ? - page_folio(migrate->fault_page) : NULL; - pte_t *ptep; - -again: - if (pmd_trans_huge(*pmdp) || !pmd_present(*pmdp)) { - int ret =3D migrate_vma_collect_huge_pmd(pmdp, start, end, walk, fault_f= olio); - - if (ret =3D=3D -EAGAIN) - goto again; - if (ret =3D=3D 0) - return 0; - } - - ptep =3D pte_offset_map_lock(mm, pmdp, start, &ptl); - if (!ptep) - goto again; - arch_enter_lazy_mmu_mode(); - ptep +=3D (addr - start) / PAGE_SIZE; - - for (; addr < end; addr +=3D PAGE_SIZE, ptep++) { - struct dev_pagemap *pgmap; - unsigned long mpfn =3D 0, pfn; - struct folio *folio; - struct page *page; - softleaf_t entry; - pte_t pte; - - pte =3D ptep_get(ptep); - - if (pte_none(pte)) { - if (vma_is_anonymous(vma)) { - mpfn =3D MIGRATE_PFN_MIGRATE; - migrate->cpages++; - } - goto next; - } - - if (!pte_present(pte)) { - /* - * Only care about unaddressable device page special - * page table entry. Other special swap entries are not - * migratable, and we ignore regular swapped page. - */ - entry =3D softleaf_from_pte(pte); - if (!softleaf_is_device_private(entry)) - goto next; - - page =3D softleaf_to_page(entry); - pgmap =3D page_pgmap(page); - if (!(migrate->flags & - MIGRATE_VMA_SELECT_DEVICE_PRIVATE) || - pgmap->owner !=3D migrate->pgmap_owner) - goto next; - - folio =3D page_folio(page); - if (folio_test_large(folio)) { - int ret; - - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(ptep, ptl); - ret =3D migrate_vma_split_folio(folio, - migrate->fault_page); - - if (ret) { - if (unmapped) - flush_tlb_range(walk->vma, start, end); - - return migrate_vma_collect_skip(addr, end, walk); - } - - goto again; - } - - mpfn =3D migrate_pfn(page_to_pfn(page)) | - MIGRATE_PFN_MIGRATE; - if (softleaf_is_device_private_write(entry)) - mpfn |=3D MIGRATE_PFN_WRITE; - } else { - pfn =3D pte_pfn(pte); - if (is_zero_pfn(pfn) && - (migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)) { - mpfn =3D MIGRATE_PFN_MIGRATE; - migrate->cpages++; - goto next; - } - page =3D vm_normal_page(migrate->vma, addr, pte); - if (page && !is_zone_device_page(page) && - !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)) { - goto next; - } else if (page && is_device_coherent_page(page)) { - pgmap =3D page_pgmap(page); - - if (!(migrate->flags & - MIGRATE_VMA_SELECT_DEVICE_COHERENT) || - pgmap->owner !=3D migrate->pgmap_owner) - goto next; - } - folio =3D page ? page_folio(page) : NULL; - if (folio && folio_test_large(folio)) { - int ret; - - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(ptep, ptl); - ret =3D migrate_vma_split_folio(folio, - migrate->fault_page); - - if (ret) { - if (unmapped) - flush_tlb_range(walk->vma, start, end); - - return migrate_vma_collect_skip(addr, end, walk); - } - - goto again; - } - mpfn =3D migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE; - mpfn |=3D pte_write(pte) ? MIGRATE_PFN_WRITE : 0; - } - - if (!page || !page->mapping) { - mpfn =3D 0; - goto next; - } - - /* - * By getting a reference on the folio we pin it and that blocks - * any kind of migration. Side effect is that it "freezes" the - * pte. - * - * We drop this reference after isolating the folio from the lru - * for non device folio (device folio are not on the lru and thus - * can't be dropped from it). - */ - folio =3D page_folio(page); - folio_get(folio); - - /* - * We rely on folio_trylock() to avoid deadlock between - * concurrent migrations where each is waiting on the others - * folio lock. If we can't immediately lock the folio we fail this - * migration as it is only best effort anyway. - * - * If we can lock the folio it's safe to set up a migration entry - * now. In the common case where the folio is mapped once in a - * single process setting up the migration entry now is an - * optimisation to avoid walking the rmap later with - * try_to_migrate(). - */ - if (fault_folio =3D=3D folio || folio_trylock(folio)) { - bool anon_exclusive; - pte_t swp_pte; - - flush_cache_page(vma, addr, pte_pfn(pte)); - anon_exclusive =3D folio_test_anon(folio) && - PageAnonExclusive(page); - if (anon_exclusive) { - pte =3D ptep_clear_flush(vma, addr, ptep); - - if (folio_try_share_anon_rmap_pte(folio, page)) { - set_pte_at(mm, addr, ptep, pte); - if (fault_folio !=3D folio) - folio_unlock(folio); - folio_put(folio); - mpfn =3D 0; - goto next; - } - } else { - pte =3D ptep_get_and_clear(mm, addr, ptep); - } - - migrate->cpages++; - - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pte)) - folio_mark_dirty(folio); - - /* Setup special migration page table entry */ - if (mpfn & MIGRATE_PFN_WRITE) - entry =3D make_writable_migration_entry( - page_to_pfn(page)); - else if (anon_exclusive) - entry =3D make_readable_exclusive_migration_entry( - page_to_pfn(page)); - else - entry =3D make_readable_migration_entry( - page_to_pfn(page)); - if (pte_present(pte)) { - if (pte_young(pte)) - entry =3D make_migration_entry_young(entry); - if (pte_dirty(pte)) - entry =3D make_migration_entry_dirty(entry); - } - swp_pte =3D swp_entry_to_pte(entry); - if (pte_present(pte)) { - if (pte_soft_dirty(pte)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pte)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - } else { - if (pte_swp_soft_dirty(pte)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pte)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - } - set_pte_at(mm, addr, ptep, swp_pte); - - /* - * This is like regular unmap: we remove the rmap and - * drop the folio refcount. The folio won't be freed, as - * we took a reference just above. - */ - folio_remove_rmap_pte(folio, page, vma); - folio_put(folio); - - if (pte_present(pte)) - unmapped++; - } else { - folio_put(folio); - mpfn =3D 0; - } - -next: - migrate->dst[migrate->npages] =3D 0; - migrate->src[migrate->npages++] =3D mpfn; - } - - /* Only flush the TLB if we actually modified any entries */ - if (unmapped) - flush_tlb_range(walk->vma, start, end); - - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(ptep - 1, ptl); - - return 0; -} - -static const struct mm_walk_ops migrate_vma_walk_ops =3D { - .pmd_entry =3D migrate_vma_collect_pmd, - .pte_hole =3D migrate_vma_collect_hole, - .walk_lock =3D PGWALK_RDLOCK, -}; - -/* - * migrate_vma_collect() - collect pages over a range of virtual addresses - * @migrate: migrate struct containing all migration information - * - * This will walk the CPU page table. For each virtual address backed by a - * valid page, it updates the src array and takes a reference on the page,= in - * order to pin the page until we lock it and unmap it. - */ -static void migrate_vma_collect(struct migrate_vma *migrate) -{ - struct mmu_notifier_range range; - - /* - * Note that the pgmap_owner is passed to the mmu notifier callback so - * that the registered device driver can skip invalidating device - * private page mappings that won't be migrated. - */ - mmu_notifier_range_init_owner(&range, MMU_NOTIFY_MIGRATE, 0, - migrate->vma->vm_mm, migrate->start, migrate->end, - migrate->pgmap_owner); - mmu_notifier_invalidate_range_start(&range); - - walk_page_range(migrate->vma->vm_mm, migrate->start, migrate->end, - &migrate_vma_walk_ops, migrate); - - mmu_notifier_invalidate_range_end(&range); - migrate->end =3D migrate->start + (migrate->npages << PAGE_SHIFT); -} - /* * migrate_vma_check_page() - check if page is pinned or not * @page: struct page to check --=20 2.50.0