From nobody Wed Feb 11 20:55:54 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B6F52264C7 for ; Wed, 11 Feb 2026 08:13:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770797621; cv=none; b=aIf7fcK07m6x5+8Wmt5KRPGp8Gsu13TVhRQrseeCFmqBAK2YZ4Jrctc1A5HAHFTA+ILlyct5LxcjphYPlDf26pFB3n4Q/RRTg8RFBXrzv0tm8SpLRq2LF8z+Z2Tq3/gDVuAUKqPiCqgvxWPeUVu2kqzC/Na/FDmFPXDgmdh7h/U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770797621; c=relaxed/simple; bh=GSx3VZf2ndVpW4K7HfgQ0cz6FXz97vPIJOiUoH+r6LE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VkzNgLccjLUnxo1JpKtAkKEK6Snd/TmCwYpIiRN0y4P2t5WNA/65HkeJBZUNVo2cRX3DGdpisLDu+J9OIi+pELv+pS7rQfiXVImeIn/kv5J7/GgrQkeMY0iDqFIuCZYgNdgHy+po9k4Lrb8y/KBjyclQ6h9/509MgEiCgaeKWfI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iXU+3h5+; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=jo58rjJF; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iXU+3h5+"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="jo58rjJF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1770797618; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dKoH29cV1ZIkOnI06Bks6a0Ek8/pBZTR7+EAH7YjfdI=; b=iXU+3h5+ok3Tz+oc/02Q1NO93xy2bnE6liIoLJpqyaJ0vOIwgGhm56yZR48P2rBQjmqy8x Ly+mvmddrLlaXpq3yaDEL6HC6AQJ+Ta945U+mrWk4VdUj0FMR+8Whz+QjgkKt/gUZF9mv6 Pa+h1Rvac0x7SdeZooKvow+by+HO8Ww= Received: from mail-lj1-f197.google.com (mail-lj1-f197.google.com [209.85.208.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-512-mmo3VswMPvK3ksYnwbEYQg-1; Wed, 11 Feb 2026 03:13:37 -0500 X-MC-Unique: mmo3VswMPvK3ksYnwbEYQg-1 X-Mimecast-MFC-AGG-ID: mmo3VswMPvK3ksYnwbEYQg_1770797616 Received: by mail-lj1-f197.google.com with SMTP id 38308e7fff4ca-3870a6d4ca9so338341fa.1 for ; Wed, 11 Feb 2026 00:13:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1770797615; x=1771402415; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dKoH29cV1ZIkOnI06Bks6a0Ek8/pBZTR7+EAH7YjfdI=; b=jo58rjJFpEmOoBb37wZZ1+LpMiasrO+fRNqS/nGrBDk/H8Q+IYWuz08uW9C506muJM UejxWxnRoPgIxAXuupfggzQ5JPBXrr1HTfQ45V3hfym7JXIEsx6WfcsfjgRxQD1SziUu /u48/q9WKORCnLe8gulE03imZdMWMiMqHE4hAkcyWMFP0DGcPMKjDIsNYpyWKtL5F1Ff /88FBEtbITwcG7fARJ5wtdZeQxG/PpK490HQ4iA7pr+MPXuwbKfArMz1i/c+6ZxnneNG cGifXdAZmVyAZhxirYo/l9NuGD/asZ+8HnsPtwt9z2MEh4/37xxLWoSxGiphFJ4okdjE QprA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770797615; x=1771402415; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=dKoH29cV1ZIkOnI06Bks6a0Ek8/pBZTR7+EAH7YjfdI=; b=ESDpS0Q9p5YoVNNTOIaKFjEfXfin710wWJJYKvW7qsxYgx8SBtIiiEedlO/8mLJssi gjApCF0tkSnuUzsecFO7tdIB2wfbWgz/eadetl2Do9w5cGs5lppDmEr5tebp75RxWERt w9q4INiS4Ius6LmJWrkROiKU+3L123ofOs1q3Q3dtRjVI3gW5zrkQnqs2SAw8e0KHM7M OLXQuibN/HMXu53evTRnbYkKx/KRf3ujbdpc2+M5ngF5CRdXPxQ0+v+8cErvFqiul9zY GRTJBcOpef8+tLwOY7YrvqS7VE0ggoDz8n+AQvS9la96/yE6v+g4zx6qYU2oUoeINOlV cp2w== X-Gm-Message-State: AOJu0YxpdgC1DoAtBBCybKatn4NDwfu5Yqs8BuhzQh+G9bezjRSQ+bW0 XBQSFJT4+V5i9lotjimvmDJ1q9KGSVSXAwG9qRbz/wTN3DzYQ2azgEYhkUtRldSThNsN12Eq90k 914AZvcZHA1lksYsL8HnsQxhg5T4FS9mqCdHGUds2TDqmNMuPzo/J8xNxx5YLhkux X-Gm-Gg: AZuq6aLLiwHm+pvaXFqPcmAQ3XAxrikz4QMMrG5z3BTLEdkcGqlUL+pjZpN6WOpoEnV SbvRFg6b1q0p4rIT3efMDxcR+9TPP52G5r7uAgTNyM19opLZbKhXDCnCTS3ccYhZ1ab2QVmioR6 eRdksBxsIPEZzBDgC28slMyuBqIDvyCefmLMefS2U7v22mJj1apWO+F0mLQGkLFagpTHLtSeeyB EuasGoa63TPq5KKkyzJMnofEttHC4IPTApCV5jubfY3TdVpW7apucdhZrEG9EbZPvIk9OYOT38D S14duSZwXcBj4OYkZkFg72nRUuGl3FniZtlUI1GgxFIVJRxMg98sb4kbip+GjndTal2BJ/0TcW4 I1RtpeQSzUJpOUAVgA7ESMBIsSHsU/TnQqBvg X-Received: by 2002:a05:6512:1150:b0:59e:56fb:7e53 with SMTP id 2adb3069b0e04-59e5e1fa5bamr400697e87.4.1770797615314; Wed, 11 Feb 2026 00:13:35 -0800 (PST) X-Received: by 2002:a05:6512:1150:b0:59e:56fb:7e53 with SMTP id 2adb3069b0e04-59e5e1fa5bamr400686e87.4.1770797614797; Wed, 11 Feb 2026 00:13:34 -0800 (PST) Received: from fedora (85-23-51-1.bb.dnainternet.fi. [85.23.51.1]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-59e5f568737sm188515e87.21.2026.02.11.00.13.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Feb 2026 00:13:34 -0800 (PST) From: mpenttil@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?UTF-8?q?Mika=20Penttil=C3=A4?= , David Hildenbrand , Jason Gunthorpe , Leon Romanovsky , Alistair Popple , Balbir Singh , Zi Yan , Matthew Brost Subject: [PATCH v5 3/6] mm/hmm: do the plumbing for HMM to participate in migration Date: Wed, 11 Feb 2026 10:12:58 +0200 Message-ID: <20260211081301.2940672-4-mpenttil@redhat.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20260211081301.2940672-1-mpenttil@redhat.com> References: <20260211081301.2940672-1-mpenttil@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Mika Penttil=C3=A4 Do the preparations in hmm_range_fault() and pagewalk callbacks to do the "collecting" part of migration, needed for migration on fault. These steps include locking for pmd/pte if migrating, capturing the vma for further migrate actions, and calling the still dummy hmm_vma_handle_migrate_prepare_pmd() and hmm_vma_handle_migrate_prepare() functions in the pagewalk. Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Alistair Popple Cc: Balbir Singh Cc: Zi Yan Cc: Matthew Brost Suggested-by: Alistair Popple Signed-off-by: Mika Penttil=C3=A4 --- include/linux/migrate.h | 18 +- lib/test_hmm.c | 2 +- mm/hmm.c | 419 +++++++++++++++++++++++++++++++++++----- 3 files changed, 386 insertions(+), 53 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 8e6c28efd4f8..818272b2a7b5 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -98,6 +98,16 @@ static inline int set_movable_ops(const struct movable_o= perations *ops, enum pag return -ENOSYS; } =20 +enum migrate_vma_info { + MIGRATE_VMA_SELECT_NONE =3D 0, + MIGRATE_VMA_SELECT_COMPOUND =3D MIGRATE_VMA_SELECT_NONE, +}; + +static inline enum migrate_vma_info hmm_select_migrate(struct hmm_range *r= ange) +{ + return MIGRATE_VMA_SELECT_NONE; +} + #endif /* CONFIG_MIGRATION */ =20 #ifdef CONFIG_NUMA_BALANCING @@ -141,7 +151,7 @@ static inline unsigned long migrate_pfn(unsigned long p= fn) return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID; } =20 -enum migrate_vma_direction { +enum migrate_vma_info { MIGRATE_VMA_SELECT_SYSTEM =3D 1 << 0, MIGRATE_VMA_SELECT_DEVICE_PRIVATE =3D 1 << 1, MIGRATE_VMA_SELECT_DEVICE_COHERENT =3D 1 << 2, @@ -183,6 +193,12 @@ struct migrate_vma { struct page *fault_page; }; =20 +// TODO: enable migration +static inline enum migrate_vma_info hmm_select_migrate(struct hmm_range *r= ange) +{ + return 0; +} + int migrate_vma_setup(struct migrate_vma *args); void migrate_vma_pages(struct migrate_vma *migrate); void migrate_vma_finalize(struct migrate_vma *migrate); diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 455a6862ae50..94f1f4cff8b1 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -145,7 +145,7 @@ static bool dmirror_is_private_zone(struct dmirror_devi= ce *mdevice) HMM_DMIRROR_MEMORY_DEVICE_PRIVATE); } =20 -static enum migrate_vma_direction +static enum migrate_vma_info dmirror_select_device(struct dmirror *dmirror) { return (dmirror->mdevice->zone_device_type =3D=3D diff --git a/mm/hmm.c b/mm/hmm.c index 21ff99379836..22ca89b0a89e 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -27,14 +28,44 @@ #include #include #include +#include =20 #include "internal.h" =20 struct hmm_vma_walk { - struct hmm_range *range; - unsigned long last; + struct mmu_notifier_range mmu_range; + struct vm_area_struct *vma; + struct hmm_range *range; + unsigned long start; + unsigned long end; + unsigned long last; + /* + * For migration we need pte/pmd + * locked for the handle_* and + * prepare_* regions. While faulting + * we have to drop the locks and + * start again. + * ptelocked and pmdlocked + * hold the state and tells if need + * to drop locks before faulting. + * ptl is the lock held for pte or pmd. + * + */ + bool ptelocked; + bool pmdlocked; + spinlock_t *ptl; }; =20 +#define HMM_ASSERT_PTE_LOCKED(hmm_vma_walk, locked) \ + WARN_ON_ONCE(hmm_vma_walk->ptelocked !=3D locked) + +#define HMM_ASSERT_PMD_LOCKED(hmm_vma_walk, locked) \ + WARN_ON_ONCE(hmm_vma_walk->pmdlocked !=3D locked) + +#define HMM_ASSERT_UNLOCKED(hmm_vma_walk) \ + WARN_ON_ONCE(hmm_vma_walk->ptelocked || \ + hmm_vma_walk->pmdlocked) + enum { HMM_NEED_FAULT =3D 1 << 0, HMM_NEED_WRITE_FAULT =3D 1 << 1, @@ -42,14 +73,37 @@ enum { }; =20 static int hmm_pfns_fill(unsigned long addr, unsigned long end, - struct hmm_range *range, unsigned long cpu_flags) + struct hmm_vma_walk *hmm_vma_walk, unsigned long cpu_flags) { + struct hmm_range *range =3D hmm_vma_walk->range; unsigned long i =3D (addr - range->start) >> PAGE_SHIFT; + enum migrate_vma_info minfo; + bool migrate =3D false; + + minfo =3D hmm_select_migrate(range); + if (cpu_flags !=3D HMM_PFN_ERROR) { + if (minfo && (vma_is_anonymous(hmm_vma_walk->vma))) { + cpu_flags |=3D (HMM_PFN_VALID | HMM_PFN_MIGRATE); + migrate =3D true; + } + } + + if (migrate && thp_migration_supported() && + (minfo & MIGRATE_VMA_SELECT_COMPOUND) && + IS_ALIGNED(addr, HPAGE_PMD_SIZE) && + IS_ALIGNED(end, HPAGE_PMD_SIZE)) { + range->hmm_pfns[i] &=3D HMM_PFN_INOUT_FLAGS; + range->hmm_pfns[i] |=3D cpu_flags | HMM_PFN_COMPOUND; + addr +=3D PAGE_SIZE; + i++; + cpu_flags =3D 0; + } =20 for (; addr < end; addr +=3D PAGE_SIZE, i++) { range->hmm_pfns[i] &=3D HMM_PFN_INOUT_FLAGS; range->hmm_pfns[i] |=3D cpu_flags; } + return 0; } =20 @@ -72,6 +126,7 @@ static int hmm_vma_fault(unsigned long addr, unsigned lo= ng end, unsigned int fault_flags =3D FAULT_FLAG_REMOTE; =20 WARN_ON_ONCE(!required_fault); + HMM_ASSERT_UNLOCKED(hmm_vma_walk); hmm_vma_walk->last =3D addr; =20 if (required_fault & HMM_NEED_WRITE_FAULT) { @@ -165,11 +220,11 @@ static int hmm_vma_walk_hole(unsigned long addr, unsi= gned long end, if (!walk->vma) { if (required_fault) return -EFAULT; - return hmm_pfns_fill(addr, end, range, HMM_PFN_ERROR); + return hmm_pfns_fill(addr, end, hmm_vma_walk, HMM_PFN_ERROR); } if (required_fault) return hmm_vma_fault(addr, end, required_fault, walk); - return hmm_pfns_fill(addr, end, range, 0); + return hmm_pfns_fill(addr, end, hmm_vma_walk, 0); } =20 static inline unsigned long hmm_pfn_flags_order(unsigned long order) @@ -202,8 +257,13 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, un= signed long addr, cpu_flags =3D pmd_to_hmm_pfn_flags(range, pmd); required_fault =3D hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, cpu_flags); - if (required_fault) + if (required_fault) { + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } return hmm_vma_fault(addr, end, required_fault, walk); + } =20 pfn =3D pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); for (i =3D 0; addr < end; addr +=3D PAGE_SIZE, i++, pfn++) { @@ -283,14 +343,23 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, u= nsigned long addr, goto fault; =20 if (softleaf_is_migration(entry)) { - pte_unmap(ptep); - hmm_vma_walk->last =3D addr; - migration_entry_wait(walk->mm, pmdp, addr); - return -EBUSY; + if (!hmm_select_migrate(range)) { + HMM_ASSERT_UNLOCKED(hmm_vma_walk); + hmm_vma_walk->last =3D addr; + migration_entry_wait(walk->mm, pmdp, addr); + return -EBUSY; + } else + goto out; } =20 /* Report error for everything else */ - pte_unmap(ptep); + + if (hmm_vma_walk->ptelocked) { + pte_unmap_unlock(ptep, hmm_vma_walk->ptl); + hmm_vma_walk->ptelocked =3D false; + } else + pte_unmap(ptep); + return -EFAULT; } =20 @@ -307,7 +376,12 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, un= signed long addr, if (!vm_normal_page(walk->vma, addr, pte) && !is_zero_pfn(pte_pfn(pte))) { if (hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0)) { - pte_unmap(ptep); + if (hmm_vma_walk->ptelocked) { + pte_unmap_unlock(ptep, hmm_vma_walk->ptl); + hmm_vma_walk->ptelocked =3D false; + } else + pte_unmap(ptep); + return -EFAULT; } new_pfn_flags =3D HMM_PFN_ERROR; @@ -320,7 +394,11 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, un= signed long addr, return 0; =20 fault: - pte_unmap(ptep); + if (hmm_vma_walk->ptelocked) { + pte_unmap_unlock(ptep, hmm_vma_walk->ptl); + hmm_vma_walk->ptelocked =3D false; + } else + pte_unmap(ptep); /* Fault any virtual address we were asked to fault */ return hmm_vma_fault(addr, end, required_fault, walk); } @@ -364,13 +442,18 @@ static int hmm_vma_handle_absent_pmd(struct mm_walk *= walk, unsigned long start, required_fault =3D hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0); if (required_fault) { - if (softleaf_is_device_private(entry)) + if (softleaf_is_device_private(entry)) { + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } return hmm_vma_fault(addr, end, required_fault, walk); + } else return -EFAULT; } =20 - return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); + return hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); } #else static int hmm_vma_handle_absent_pmd(struct mm_walk *walk, unsigned long s= tart, @@ -378,15 +461,100 @@ static int hmm_vma_handle_absent_pmd(struct mm_walk = *walk, unsigned long start, pmd_t pmd) { struct hmm_vma_walk *hmm_vma_walk =3D walk->private; - struct hmm_range *range =3D hmm_vma_walk->range; unsigned long npages =3D (end - start) >> PAGE_SHIFT; =20 if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) return -EFAULT; - return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); + return hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); } #endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */ =20 +#ifdef CONFIG_DEVICE_MIGRATION +static int hmm_vma_handle_migrate_prepare_pmd(const struct mm_walk *walk, + pmd_t *pmdp, + unsigned long start, + unsigned long end, + unsigned long *hmm_pfn) +{ + // TODO: implement migration entry insertion + return 0; +} + +static int hmm_vma_handle_migrate_prepare(const struct mm_walk *walk, + pmd_t *pmdp, + pte_t *pte, + unsigned long addr, + unsigned long *hmm_pfn) +{ + // TODO: implement migration entry insertion + return 0; +} + +static int hmm_vma_walk_split(pmd_t *pmdp, + unsigned long addr, + struct mm_walk *walk) +{ + // TODO : implement split + return 0; +} + +#else +static int hmm_vma_handle_migrate_prepare_pmd(const struct mm_walk *walk, + pmd_t *pmdp, + unsigned long start, + unsigned long end, + unsigned long *hmm_pfn) +{ + return 0; +} + +static int hmm_vma_handle_migrate_prepare(const struct mm_walk *walk, + pmd_t *pmdp, + pte_t *pte, + unsigned long addr, + unsigned long *hmm_pfn) +{ + return 0; +} + +static int hmm_vma_walk_split(pmd_t *pmdp, + unsigned long addr, + struct mm_walk *walk) +{ + return 0; +} +#endif + +static int hmm_vma_capture_migrate_range(unsigned long start, + unsigned long end, + struct mm_walk *walk) +{ + struct hmm_vma_walk *hmm_vma_walk =3D walk->private; + struct hmm_range *range =3D hmm_vma_walk->range; + + if (!hmm_select_migrate(range)) + return 0; + + if (hmm_vma_walk->vma && (hmm_vma_walk->vma !=3D walk->vma)) + return -ERANGE; + + hmm_vma_walk->vma =3D walk->vma; + hmm_vma_walk->start =3D start; + hmm_vma_walk->end =3D end; + + if (end - start > range->end - range->start) + return -ERANGE; + + if (!hmm_vma_walk->mmu_range.owner) { + mmu_notifier_range_init_owner(&hmm_vma_walk->mmu_range, MMU_NOTIFY_MIGRA= TE, 0, + walk->vma->vm_mm, start, end, + range->dev_private_owner); + mmu_notifier_invalidate_range_start(&hmm_vma_walk->mmu_range); + } + + return 0; +} + static int hmm_vma_walk_pmd(pmd_t *pmdp, unsigned long start, unsigned long end, @@ -397,43 +565,127 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, unsigned long *hmm_pfns =3D &range->hmm_pfns[(start - range->start) >> PAGE_SHIFT]; unsigned long npages =3D (end - start) >> PAGE_SHIFT; + struct mm_struct *mm =3D walk->vma->vm_mm; unsigned long addr =3D start; + enum migrate_vma_info minfo; + unsigned long i; pte_t *ptep; pmd_t pmd; + int r =3D 0; + + minfo =3D hmm_select_migrate(range); =20 again: - pmd =3D pmdp_get_lockless(pmdp); - if (pmd_none(pmd)) - return hmm_vma_walk_hole(start, end, -1, walk); + hmm_vma_walk->ptelocked =3D false; + hmm_vma_walk->pmdlocked =3D false; + + if (minfo) { + hmm_vma_walk->ptl =3D pmd_lock(mm, pmdp); + hmm_vma_walk->pmdlocked =3D true; + pmd =3D pmdp_get(pmdp); + } else + pmd =3D pmdp_get_lockless(pmdp); + + if (pmd_none(pmd)) { + r =3D hmm_vma_walk_hole(start, end, -1, walk); + + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } + return r; + } =20 if (thp_migration_supported() && pmd_is_migration_entry(pmd)) { - if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) { + if (!minfo) { + if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) { + hmm_vma_walk->last =3D addr; + pmd_migration_entry_wait(walk->mm, pmdp); + return -EBUSY; + } + } + for (i =3D 0; addr < end; addr +=3D PAGE_SIZE, i++) + hmm_pfns[i] &=3D HMM_PFN_INOUT_FLAGS; + + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } + + return 0; + } + + if (pmd_trans_huge(pmd) || !pmd_present(pmd)) { + + if (!pmd_present(pmd)) { + r =3D hmm_vma_handle_absent_pmd(walk, start, end, hmm_pfns, + pmd); + // If not migrating we are done + if (r || !minfo) { + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } + return r; + } + } + + if (pmd_trans_huge(pmd)) { + + /* + * No need to take pmd_lock here if not migrating, + * even if some other thread is splitting the huge + * pmd we will get that event through mmu_notifier callback. + * + * So just read pmd value and check again it's a transparent + * huge or device mapping one and compute corresponding pfn + * values. + */ + + if (!minfo) { + pmd =3D pmdp_get_lockless(pmdp); + if (!pmd_trans_huge(pmd)) + goto again; + } + + r =3D hmm_vma_handle_pmd(walk, addr, end, hmm_pfns, pmd); + + // If not migrating we are done + if (r || !minfo) { + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } + return r; + } + } + + r =3D hmm_vma_handle_migrate_prepare_pmd(walk, pmdp, start, end, hmm_pfn= s); + + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; + } + + if (r =3D=3D -ENOENT) { + r =3D hmm_vma_walk_split(pmdp, addr, walk); + if (r) { + /* Split not successful, skip */ + return hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); + } + + /* Split successful or "again", reloop */ hmm_vma_walk->last =3D addr; - pmd_migration_entry_wait(walk->mm, pmdp); return -EBUSY; } - return hmm_pfns_fill(start, end, range, 0); - } =20 - if (!pmd_present(pmd)) - return hmm_vma_handle_absent_pmd(walk, start, end, hmm_pfns, - pmd); + return r; =20 - if (pmd_trans_huge(pmd)) { - /* - * No need to take pmd_lock here, even if some other thread - * is splitting the huge pmd we will get that event through - * mmu_notifier callback. - * - * So just read pmd value and check again it's a transparent - * huge or device mapping one and compute corresponding pfn - * values. - */ - pmd =3D pmdp_get_lockless(pmdp); - if (!pmd_trans_huge(pmd)) - goto again; + } =20 - return hmm_vma_handle_pmd(walk, addr, end, hmm_pfns, pmd); + if (hmm_vma_walk->pmdlocked) { + spin_unlock(hmm_vma_walk->ptl); + hmm_vma_walk->pmdlocked =3D false; } =20 /* @@ -445,22 +697,43 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, if (pmd_bad(pmd)) { if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) return -EFAULT; - return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); + return hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); } =20 - ptep =3D pte_offset_map(pmdp, addr); + if (minfo) { + ptep =3D pte_offset_map_lock(mm, pmdp, addr, &hmm_vma_walk->ptl); + if (ptep) + hmm_vma_walk->ptelocked =3D true; + } else + ptep =3D pte_offset_map(pmdp, addr); if (!ptep) goto again; + for (; addr < end; addr +=3D PAGE_SIZE, ptep++, hmm_pfns++) { - int r; =20 r =3D hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, hmm_pfns); if (r) { - /* hmm_vma_handle_pte() did pte_unmap() */ + /* hmm_vma_handle_pte() did pte_unmap() / pte_unmap_unlock */ return r; } + + r =3D hmm_vma_handle_migrate_prepare(walk, pmdp, ptep, addr, hmm_pfns); + if (r =3D=3D -EAGAIN) { + HMM_ASSERT_UNLOCKED(hmm_vma_walk); + goto again; + } + if (r) { + hmm_pfns_fill(addr, end, hmm_vma_walk, HMM_PFN_ERROR); + break; + } } - pte_unmap(ptep - 1); + + if (hmm_vma_walk->ptelocked) { + pte_unmap_unlock(ptep - 1, hmm_vma_walk->ptl); + hmm_vma_walk->ptelocked =3D false; + } else + pte_unmap(ptep - 1); + return 0; } =20 @@ -594,6 +867,11 @@ static int hmm_vma_walk_test(unsigned long start, unsi= gned long end, struct hmm_vma_walk *hmm_vma_walk =3D walk->private; struct hmm_range *range =3D hmm_vma_walk->range; struct vm_area_struct *vma =3D walk->vma; + int r; + + r =3D hmm_vma_capture_migrate_range(start, end, walk); + if (r) + return r; =20 if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)) && vma->vm_flags & VM_READ) @@ -616,7 +894,7 @@ static int hmm_vma_walk_test(unsigned long start, unsig= ned long end, (end - start) >> PAGE_SHIFT, 0)) return -EFAULT; =20 - hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); + hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); =20 /* Skip this vma and continue processing the next vma. */ return 1; @@ -646,9 +924,17 @@ static const struct mm_walk_ops hmm_walk_ops =3D { * the invalidation to finish. * -EFAULT: A page was requested to be valid and could not be made val= id * ie it has no backing VMA or it is illegal to access + * -ERANGE: The range crosses multiple VMAs, or space for hmm_pfns arr= ay + * is too low. * * This is similar to get_user_pages(), except that it can read the page t= ables * without mutating them (ie causing faults). + * + * If want to do migrate after faulting, call hmm_range_fault() with + * HMM_PFN_REQ_MIGRATE and initialize range.migrate field. + * After hmm_range_fault() call migrate_hmm_range_setup() instead of + * migrate_vma_setup() and after that follow normal migrate calls path. + * */ int hmm_range_fault(struct hmm_range *range) { @@ -656,16 +942,34 @@ int hmm_range_fault(struct hmm_range *range) .range =3D range, .last =3D range->start, }; - struct mm_struct *mm =3D range->notifier->mm; + struct mm_struct *mm; + bool is_fault_path; int ret; =20 + /* + * + * Could be serving a device fault or come from migrate + * entry point. For the former we have not resolved the vma + * yet, and the latter we don't have a notifier (but have a vma). + * + */ +#ifdef CONFIG_DEVICE_MIGRATION + is_fault_path =3D !!range->notifier; + mm =3D is_fault_path ? range->notifier->mm : range->migrate->vma->vm_mm; +#else + is_fault_path =3D true; + mm =3D range->notifier->mm; +#endif mmap_assert_locked(mm); =20 do { /* If range is no longer valid force retry. */ - if (mmu_interval_check_retry(range->notifier, - range->notifier_seq)) - return -EBUSY; + if (is_fault_path && mmu_interval_check_retry(range->notifier, + range->notifier_seq)) { + ret =3D -EBUSY; + break; + } + ret =3D walk_page_range(mm, hmm_vma_walk.last, range->end, &hmm_walk_ops, &hmm_vma_walk); /* @@ -675,6 +979,19 @@ int hmm_range_fault(struct hmm_range *range) * output, and all >=3D are still at their input values. */ } while (ret =3D=3D -EBUSY); + +#ifdef CONFIG_DEVICE_MIGRATION + if (hmm_select_migrate(range) && range->migrate && + hmm_vma_walk.mmu_range.owner) { + // The migrate_vma path has the following initialized + if (is_fault_path) { + range->migrate->vma =3D hmm_vma_walk.vma; + range->migrate->start =3D range->start; + range->migrate->end =3D hmm_vma_walk.end; + } + mmu_notifier_invalidate_range_end(&hmm_vma_walk.mmu_range); + } +#endif return ret; } EXPORT_SYMBOL(hmm_range_fault); --=20 2.50.0