From nobody Sat Oct 4 17:29:46 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AD5723A9AD for ; Thu, 14 Aug 2025 07:24:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755156271; cv=none; b=Y1U8b47PQz0AXDTbagVRdgPo3nLnAFsJ6ZKOusOxQpac6u7uafI17NcPIPC9OpOoKh2wOa/tXK466Isz88z292hPmn/5UxIuEcq458LH+GAyhNsdRgJJcMDckQhZz3MjW01oDlB/uRAEdegtTJmsYeqmXINKd1rNvfAM5I1g6x4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755156271; c=relaxed/simple; bh=g+8cuxhTOlGQMxQ0dElv+Rq0Xre19vXaepdGVaB2vv0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=G7J38b9wyX+SGAqPW20wMC/lszTWlh7WbxPfefrh9asRADv2oMLi70pQmx38cCvASY+bqGulEUqgmWtspKZDwe9ANOWJkKb6X1MZkgfrLR9LYoPvJLNwVOux0mFEClY6+jNjvKXl2feG7HM60g1/61dYG1mT8XnUU6Jeg7WZYbw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Cfmk3THY; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Cfmk3THY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755156267; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aIvfbzyT9Aegr3dWXVJ85T4HzEO+HdxXBmAclblXRsw=; b=Cfmk3THYZiOCOCmZcHx/xCxYfywnHzsGNwZgqjw7aIHZr7shhYHOKToii8CubHxSHABVAS 9Fk7wI2hzCpzA73CYMY++FKx5n3DEs/pFV8VCWHg6T1I3qL/nRubRfvdxZQ9gwLzoADtTE 1mu/DbIFEQ2jn9cojLPgi649p0M6H4E= Received: from mail-lj1-f197.google.com (mail-lj1-f197.google.com [209.85.208.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-262-jjQ1yPEiOCOeDPK9zD3grQ-1; Thu, 14 Aug 2025 03:24:25 -0400 X-MC-Unique: jjQ1yPEiOCOeDPK9zD3grQ-1 X-Mimecast-MFC-AGG-ID: jjQ1yPEiOCOeDPK9zD3grQ_1755156264 Received: by mail-lj1-f197.google.com with SMTP id 38308e7fff4ca-333f9150cedso2510341fa.2 for ; Thu, 14 Aug 2025 00:24:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755156264; x=1755761064; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aIvfbzyT9Aegr3dWXVJ85T4HzEO+HdxXBmAclblXRsw=; b=oUjGhhsnsd+Z/YkNA+hiTfSmVy/NtH7PqKQWOYEpNKuT/70dtCjbIFIRT9unsX4Evr onXDUOQwT2gorRMHxwyx5vqOiwA5m1LuEt8VqMmCGlKe+3ehcAzchipDtxYO07kROjUK MIF/J1N5IDSJlsMaPzVSOtjeX3FeFfet8sIeLtfurgo8lWX3uCQGEqfjN7gMcAt2qsm+ Nj4xuPoqH496b2FLCb3KrGDDCqGAbnkiW2zpMe441XgZjPqvHYmEJRiFE6Mz81NRZYMb 7Gc2k4M8CtQTfxmN57zrn0cDAoYGNNXCNa3KoO5kyQklTxfN6apXfiqUekC/fqv9Twdw ZPtw== X-Gm-Message-State: AOJu0YzrzQZ+ML8SKDc56yx3hwF26BgT5mgwwGN05qvSxCTAlK2x2Kk0 /0Qsa9l2Q67DKmnZR1RTdz9bdqS6edm6PG2/MVLlig/LYXhJ9aeMhCpOq1ZQCX7RRird5a4jwxV Vy/tfuTl5qmFgW3MYcN59X52mRWti4lwCUtKFPCo08tLt8/nb7QPTotckh2dQNn9E X-Gm-Gg: ASbGncvmMzCRg2bY4EcByO3X4xwBULy1QctgI15NEa0o/yfy4MPFtoV9LaKL5W9ac4u p3fIW2Woy3/uukoCVsvOAN9okGCIfcjB5E/q6IK+3pHk+VNRc7vD+9Ga2KAVEX8Fmf2nn5PghM2 DXtbLzCLkchQ0cJ60HXlbkoIIu6TLLUzEfkwMvWNaQcNSWdaaTtFIHGi1SQhi4XHFDXoC++CiHF y+982dnVakritQBFM8p0VQdS1QzqaDfKHhEAzH9BIIwShUAO5RZczvISH9kEdj03migO0Hz/do/ j1nIOVMB30gK1zTlKyjGF7Zn16fznNeF8ALsl6YBmnObiS9tMnCTTog= X-Received: by 2002:a05:6512:15a5:b0:55b:7c34:2b95 with SMTP id 2adb3069b0e04-55ce5045da9mr796976e87.20.1755156264156; Thu, 14 Aug 2025 00:24:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHtIPuBiMFY0ZIY9glLmcyHL2YD88jMmKDvw3Qwgo8mPRFe+d1e9BQ4g3jqn4rV/aPKZsRSdQ== X-Received: by 2002:a05:6512:15a5:b0:55b:7c34:2b95 with SMTP id 2adb3069b0e04-55ce5045da9mr796971e87.20.1755156263663; Thu, 14 Aug 2025 00:24:23 -0700 (PDT) Received: from fedora (85-23-48-6.bb.dnainternet.fi. [85.23.48.6]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-55b887f72f4sm5620240e87.0.2025.08.14.00.24.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Aug 2025 00:24:23 -0700 (PDT) From: =?UTF-8?q?Mika=20Penttil=C3=A4?= To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?UTF-8?q?Mika=20Penttil=C3=A4?= , David Hildenbrand , Jason Gunthorpe , Leon Romanovsky , Alistair Popple , Balbir Singh Subject: [RFC PATCH 1/4] mm: use current as mmu notifier's owner Date: Thu, 14 Aug 2025 10:19:26 +0300 Message-ID: <20250814072045.3637192-3-mpenttil@redhat.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250814072045.3637192-1-mpenttil@redhat.com> References: <20250814072045.3637192-1-mpenttil@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable When doing migration in combination with device fault handling, detect the case in the interval notifier. Without that, we would livelock with our own invalidations while migrating and splitting pages during fault handling. Note, pgmap_owner, used in some other code paths as owner for filtering, is not readily available for split path, so use current for this use case. Also, current and pgmap_owner, both being pointers to memory, can not be mis-interpreted to each other. Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Alistair Popple Cc: Balbir Singh Signed-off-by: Mika Penttil=C3=A4 --- lib/test_hmm.c | 5 +++++ mm/huge_memory.c | 6 +++--- mm/rmap.c | 4 ++-- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 761725bc713c..cd5c139213be 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -269,6 +269,11 @@ static bool dmirror_interval_invalidate(struct mmu_int= erval_notifier *mni, range->owner =3D=3D dmirror->mdevice) return true; =20 + if (range->event =3D=3D MMU_NOTIFY_CLEAR && + range->owner =3D=3D current) { + return true; + } + if (mmu_notifier_range_blockable(range)) mutex_lock(&dmirror->mutex); else if (!mutex_trylock(&dmirror->mutex)) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9c38a95e9f09..276e38dd8f68 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3069,9 +3069,9 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd= _t *pmd, spinlock_t *ptl; struct mmu_notifier_range range; =20 - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, - address & HPAGE_PMD_MASK, - (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE); + mmu_notifier_range_init_owner(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, + address & HPAGE_PMD_MASK, + (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE, current); mmu_notifier_invalidate_range_start(&range); ptl =3D pmd_lock(vma->vm_mm, pmd); split_huge_pmd_locked(vma, range.start, pmd, freeze); diff --git a/mm/rmap.c b/mm/rmap.c index f93ce27132ab..e7829015a40b 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2308,8 +2308,8 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, * try_to_unmap() must hold a reference on the page. */ range.end =3D vma_address_end(&pvmw); - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, - address, range.end); + mmu_notifier_range_init_owner(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, + address, range.end, current); if (folio_test_hugetlb(folio)) { /* * If sharing is possible, start and end will be adjusted --=20 2.50.0 From nobody Sat Oct 4 17:29:46 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A43301A9F8F for ; Thu, 14 Aug 2025 07:24:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755156280; cv=none; b=lOaOq5Hph4EWi9wknZiZ5csWqR6n8sOJZUFWOFIjuowiEd6LfryHl4CYht2PpgKLjvJaqEsBPUdmqZ5XeXrkqKiWQuKafIRE1PT2EhAv24OvzdwgZhjSoivL1EmUOzKElg59ZOM6bDMkHfCoNjRavoX8VtqxNhssPCAxEGi5gps= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755156280; c=relaxed/simple; bh=B1AAvihTsI3EsJqVyrTjlE5xv1aMZVLeBgX2ZRw6SsE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jnZQ2gLxu9g0noCNWL94ACNPK5xoKMxnPLXwJ0Ztd7cXAEWWyt5gluIW4VRjaSRZ28sGJ2JIzDQVyavR2NbraRhFFqCdOsBnG+dD1zT9aFFERfe4RlAhlU+pVGV41VUv+67W61sS0ucE8fHnjH2YkyY19qbmiG9wRW3RmzdJ1Z4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dZlR+7Ys; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dZlR+7Ys" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755156275; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S7qbCEJdG1qU3nPZdac9XIOo/tgdpNyh0NlyCGdFYfo=; b=dZlR+7YsBQ1NoMA1UPTK5k91MhOS/eI0B4wTbrJ5oFpkvl6hHff340OFGGnhFYVJFMs5NG /JbaiJel9772wZHGVRIMGWedNKjBaGRJ+tFDTXcbIC3maimeFTBQDOLt0yjxTK4a9D1ArW 0ICm/p53GjAnOsoOoXXl+oPxQRfUvp4= Received: from mail-lj1-f198.google.com (mail-lj1-f198.google.com [209.85.208.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-156-SO1_rjAuM2unCiBrgzBexA-1; Thu, 14 Aug 2025 03:24:33 -0400 X-MC-Unique: SO1_rjAuM2unCiBrgzBexA-1 X-Mimecast-MFC-AGG-ID: SO1_rjAuM2unCiBrgzBexA_1755156272 Received: by mail-lj1-f198.google.com with SMTP id 38308e7fff4ca-333f92b401bso1995661fa.2 for ; Thu, 14 Aug 2025 00:24:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755156272; x=1755761072; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S7qbCEJdG1qU3nPZdac9XIOo/tgdpNyh0NlyCGdFYfo=; b=ZHeUuCROPQ8njCZXjbIkeAORXqPE5WLziBBl+cle7iHf+ZPpZfFEPCwOvfH+w3xI9G NZxmd0d3jFzcP4EGzPqCpNwGe9gpyL6Cverhlls1YTEKh8KU8LzSvmcSHaBuCHf9PIjE Cawl8ygBXlCpozLDRNXonaefUFH/zFNeUruOyOLvVxjLvdsoUXMQlf3lBxnK6jYtQyVQ TVLbPfeEecNf7fhS3KgSI8a796yqjA8E3n5mwRe+7sbRmcfhNoPSFziysKCbUElXHuEI Yba1O1KhaWbFKFiOu4+qmYzE5Uk7Krfh6qiya6IzMrzARGJwz+REsws2GAWyK8rh35cZ j0JA== X-Gm-Message-State: AOJu0YxGDy5CSD+dEOeA/bjLTxz85kkNWJFlOlVLIkJMvBrvkzSvt/ry OkQNqVM078IfOCwkEMszFDJdNwpjO1fVVBePW9VSmYsz5fJLRawGCa+x+VuGsifkPqydkBzQaVg hh5ROIyRub7IoIzYC/X5F0D3adzyHIHeaUZW4DJjPXgGsdj9b5lNCESZSAvNfqAk1 X-Gm-Gg: ASbGncvCZItQx84zvaIMEifXFc+LfxDcfjf9ix8dixkJmd1A8HhJJ1yY5K2mo+5jAEa u6tef30g61cMGm9qPdAVZE9FtBErKzB1VKfw8bC/CYbkTA4NrPJVmFcQ8Tp+CPlICdi+//wgQS8 DXhWba4Ryi2/RcvrTbNfOIjRsVaRFNPPtBMSZbYEvmyeAQiTcw9bv9R7WHKEFS5XROvuB6NF7dh AU4JvPw9Jztyd/y8r2kM1BcMU0o80u4ububfY16bH9IvSNs0Xak6JNF0DyRLU9E1Y2FcOyIDXSE u8du8OPd7yrQHcW7MDtwRqdKpZqLS7D9UQui2Dcq5xYiEfEMNovHuew= X-Received: by 2002:a05:6512:400d:b0:55c:c9d5:d347 with SMTP id 2adb3069b0e04-55ce501c558mr640437e87.35.1755156271630; Thu, 14 Aug 2025 00:24:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGTVhKdmp6e4M2S2jqprwboYLdHjne3GHrTQPewn0l8OaZqpbP/o3ZucOCSpjNeg2931+N2BQ== X-Received: by 2002:a05:6512:400d:b0:55c:c9d5:d347 with SMTP id 2adb3069b0e04-55ce501c558mr640425e87.35.1755156271046; Thu, 14 Aug 2025 00:24:31 -0700 (PDT) Received: from fedora (85-23-48-6.bb.dnainternet.fi. [85.23.48.6]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-55b887f72f4sm5620240e87.0.2025.08.14.00.24.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Aug 2025 00:24:30 -0700 (PDT) From: =?UTF-8?q?Mika=20Penttil=C3=A4?= To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?UTF-8?q?Mika=20Penttil=C3=A4?= , David Hildenbrand , Jason Gunthorpe , Leon Romanovsky , Alistair Popple , Balbir Singh Subject: [RFC PATCH 2/4] mm: unified fault and migrate device page paths Date: Thu, 14 Aug 2025 10:19:27 +0300 Message-ID: <20250814072045.3637192-4-mpenttil@redhat.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250814072045.3637192-1-mpenttil@redhat.com> References: <20250814072045.3637192-1-mpenttil@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable As of this writing, the way device page faulting and migration works is not optimal, if you want to do both fault handling and migration at once. Being able to migrate not present pages (or pages mapped with incorrect permissions, eg. COW) to the GPU requires doing either of the following sequences: 1. hmm_range_fault() - fault in non-present pages with correct permissions,= etc. 2. migrate_vma_*() - migrate the pages Or: 1. migrate_vma_*() - migrate present pages 2. If non-present pages detected by migrate_vma_*(): a) call hmm_range_fault() to fault pages in b) call migrate_vma_*() again to migrate now present pages The problem with the first sequence is that you always have to do two page walks even when most of the time the pages are present or zero page mappings so the common case takes a performance hit. The second sequence is better for the common case, but far worse if pages aren't present because now you have to walk the page tables three times (once to find the page is not present, once so hmm_range_fault() can find a non-present page to fault in and once again to setup the migration). It also tricky to code correctly. We should be able to walk the page table once, faulting pages in as required and replacing them with migration entries if requested. Add a new flag to HMM APIs, HMM_PFN_REQ_MIGRATE, which tells to prepare for migration also during fault handling. Also, for the migrate_vma_setup() call paths, a flags, MIGRATE_VMA_FAULT, is added to tell to add fault handling to migrate. Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Alistair Popple Cc: Balbir Singh Suggested-by: Alistair Popple Signed-off-by: Mika Penttil=C3=A4 --- include/linux/hmm.h | 10 +- include/linux/migrate.h | 6 +- mm/hmm.c | 351 ++++++++++++++++++++++++++++++++++++++-- mm/migrate_device.c | 72 ++++++++- 4 files changed, 420 insertions(+), 19 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index db75ffc949a7..7485e549c675 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -12,7 +12,7 @@ #include =20 struct mmu_interval_notifier; - +struct migrate_vma; /* * On output: * 0 - The page is faultable and a future call with=20 @@ -48,11 +48,14 @@ enum hmm_pfn_flags { HMM_PFN_P2PDMA =3D 1UL << (BITS_PER_LONG - 5), HMM_PFN_P2PDMA_BUS =3D 1UL << (BITS_PER_LONG - 6), =20 - HMM_PFN_ORDER_SHIFT =3D (BITS_PER_LONG - 11), + /* Migrate request */ + HMM_PFN_MIGRATE =3D 1UL << (BITS_PER_LONG - 7), + HMM_PFN_ORDER_SHIFT =3D (BITS_PER_LONG - 12), =20 /* Input flags */ HMM_PFN_REQ_FAULT =3D HMM_PFN_VALID, HMM_PFN_REQ_WRITE =3D HMM_PFN_WRITE, + HMM_PFN_REQ_MIGRATE =3D HMM_PFN_MIGRATE, =20 HMM_PFN_FLAGS =3D ~((1UL << HMM_PFN_ORDER_SHIFT) - 1), }; @@ -107,6 +110,7 @@ static inline unsigned int hmm_pfn_to_map_order(unsigne= d long hmm_pfn) * @default_flags: default flags for the range (write, read, ... see hmm d= oc) * @pfn_flags_mask: allows to mask pfn flags so that only default_flags ma= tter * @dev_private_owner: owner of device private pages + * @migrate: structure for migrating the associated vma */ struct hmm_range { struct mmu_interval_notifier *notifier; @@ -117,12 +121,14 @@ struct hmm_range { unsigned long default_flags; unsigned long pfn_flags_mask; void *dev_private_owner; + struct migrate_vma *migrate; }; =20 /* * Please see Documentation/mm/hmm.rst for how to use the range API. */ int hmm_range_fault(struct hmm_range *range); +int hmm_range_migrate_prepare(struct hmm_range *range, struct migrate_vma = **pargs); =20 /* * HMM_RANGE_DEFAULT_TIMEOUT - default timeout (ms) when waiting for a ran= ge diff --git a/include/linux/migrate.h b/include/linux/migrate.h index acadd41e0b5c..ab35d0f1f65d 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -3,6 +3,7 @@ #define _LINUX_MIGRATE_H =20 #include +#include #include #include #include @@ -143,10 +144,11 @@ static inline unsigned long migrate_pfn(unsigned long= pfn) return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID; } =20 -enum migrate_vma_direction { +enum migrate_vma_info { MIGRATE_VMA_SELECT_SYSTEM =3D 1 << 0, MIGRATE_VMA_SELECT_DEVICE_PRIVATE =3D 1 << 1, MIGRATE_VMA_SELECT_DEVICE_COHERENT =3D 1 << 2, + MIGRATE_VMA_FAULT =3D 1 << 3, }; =20 struct migrate_vma { @@ -194,7 +196,7 @@ void migrate_device_pages(unsigned long *src_pfns, unsi= gned long *dst_pfns, unsigned long npages); void migrate_device_finalize(unsigned long *src_pfns, unsigned long *dst_pfns, unsigned long npages); - +void migrate_hmm_range_setup(struct hmm_range *range); #endif /* CONFIG_MIGRATION */ =20 #endif /* _LINUX_MIGRATE_H */ diff --git a/mm/hmm.c b/mm/hmm.c index d545e2494994..8cb2b325fa9f 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -33,6 +34,10 @@ struct hmm_vma_walk { struct hmm_range *range; unsigned long last; + struct mmu_notifier_range mmu_range; + struct vm_area_struct *vma; + unsigned long start; + unsigned long end; }; =20 enum { @@ -47,15 +52,33 @@ enum { HMM_PFN_P2PDMA_BUS, }; =20 +static enum migrate_vma_info hmm_want_migrate(struct hmm_range *range) +{ + enum migrate_vma_info minfo; + + minfo =3D range->migrate ? range->migrate->flags : 0; + minfo |=3D (range->default_flags & HMM_PFN_REQ_MIGRATE) ? + MIGRATE_VMA_SELECT_SYSTEM : 0; + + return minfo; +} + static int hmm_pfns_fill(unsigned long addr, unsigned long end, - struct hmm_range *range, unsigned long cpu_flags) + struct hmm_vma_walk *hmm_vma_walk, unsigned long cpu_flags) { + struct hmm_range *range =3D hmm_vma_walk->range; unsigned long i =3D (addr - range->start) >> PAGE_SHIFT; =20 + if (cpu_flags !=3D HMM_PFN_ERROR) + if (hmm_want_migrate(range) && + (vma_is_anonymous(hmm_vma_walk->vma))) + cpu_flags |=3D (HMM_PFN_VALID | HMM_PFN_MIGRATE); + for (; addr < end; addr +=3D PAGE_SIZE, i++) { range->hmm_pfns[i] &=3D HMM_PFN_INOUT_FLAGS; range->hmm_pfns[i] |=3D cpu_flags; } + return 0; } =20 @@ -171,11 +194,11 @@ static int hmm_vma_walk_hole(unsigned long addr, unsi= gned long end, if (!walk->vma) { if (required_fault) return -EFAULT; - return hmm_pfns_fill(addr, end, range, HMM_PFN_ERROR); + return hmm_pfns_fill(addr, end, hmm_vma_walk, HMM_PFN_ERROR); } if (required_fault) return hmm_vma_fault(addr, end, required_fault, walk); - return hmm_pfns_fill(addr, end, range, 0); + return hmm_pfns_fill(addr, end, hmm_vma_walk, 0); } =20 static inline unsigned long hmm_pfn_flags_order(unsigned long order) @@ -326,6 +349,257 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, u= nsigned long addr, return hmm_vma_fault(addr, end, required_fault, walk); } =20 +/* + * Install migration entries if migration requested, either from fault + * or migrate paths. + * + */ +static void hmm_vma_handle_migrate_prepare(const struct mm_walk *walk, + pmd_t *pmdp, + unsigned long addr, + unsigned long *hmm_pfn) +{ + struct hmm_vma_walk *hmm_vma_walk =3D walk->private; + struct hmm_range *range =3D hmm_vma_walk->range; + struct migrate_vma *migrate =3D range->migrate; + struct mm_struct *mm =3D walk->vma->vm_mm; + struct folio *fault_folio =3D NULL; + enum migrate_vma_info minfo; + struct dev_pagemap *pgmap; + bool anon_exclusive; + struct folio *folio; + unsigned long pfn; + struct page *page; + swp_entry_t entry; + pte_t pte, swp_pte; + spinlock_t *ptl; + bool writable =3D false; + pte_t *ptep; + + + // Do we want to migrate at all? + minfo =3D hmm_want_migrate(range); + if (!minfo) + return; + + fault_folio =3D (migrate && migrate->fault_page) ? + page_folio(migrate->fault_page) : NULL; + + ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); + if (!ptep) + return; + + pte =3D ptep_get(ptep); + + if (pte_none(pte)) { + // migrate without faulting case + if (vma_is_anonymous(walk->vma)) + *hmm_pfn =3D HMM_PFN_MIGRATE|HMM_PFN_VALID; + goto out; + } + + if (!(*hmm_pfn & HMM_PFN_VALID)) + goto out; + + if (!pte_present(pte)) { + /* + * Only care about unaddressable device page special + * page table entry. Other special swap entries are not + * migratable, and we ignore regular swapped page. + */ + entry =3D pte_to_swp_entry(pte); + if (!is_device_private_entry(entry)) + goto out; + + // We have already checked that are the pgmap owners + if (!(minfo & MIGRATE_VMA_SELECT_DEVICE_PRIVATE)) + goto out; + + page =3D pfn_swap_entry_to_page(entry); + pfn =3D page_to_pfn(page); + if (is_writable_device_private_entry(entry)) + writable =3D true; + } else { + pfn =3D pte_pfn(pte); + if (is_zero_pfn(pfn) && + (minfo & MIGRATE_VMA_SELECT_SYSTEM)) { + *hmm_pfn =3D HMM_PFN_MIGRATE|HMM_PFN_VALID; + goto out; + } + page =3D vm_normal_page(walk->vma, addr, pte); + if (page && !is_zone_device_page(page) && + !(minfo & MIGRATE_VMA_SELECT_SYSTEM)) { + goto out; + } else if (page && is_device_coherent_page(page)) { + pgmap =3D page_pgmap(page); + + if (!(minfo & + MIGRATE_VMA_SELECT_DEVICE_COHERENT) || + pgmap->owner !=3D migrate->pgmap_owner) + goto out; + } + writable =3D pte_write(pte); + } + + /* FIXME support THP */ + if (!page || !page->mapping || PageTransCompound(page)) + goto out; + + /* + * By getting a reference on the folio we pin it and that blocks + * any kind of migration. Side effect is that it "freezes" the + * pte. + * + * We drop this reference after isolating the folio from the lru + * for non device folio (device folio are not on the lru and thus + * can't be dropped from it). + */ + folio =3D page_folio(page); + folio_get(folio); + + /* + * We rely on folio_trylock() to avoid deadlock between + * concurrent migrations where each is waiting on the others + * folio lock. If we can't immediately lock the folio we fail this + * migration as it is only best effort anyway. + * + * If we can lock the folio it's safe to set up a migration entry + * now. In the common case where the folio is mapped once in a + * single process setting up the migration entry now is an + * optimisation to avoid walking the rmap later with + * try_to_migrate(). + */ + + if (fault_folio =3D=3D folio || folio_trylock(folio)) { + anon_exclusive =3D folio_test_anon(folio) && + PageAnonExclusive(page); + + flush_cache_page(walk->vma, addr, pfn); + + if (anon_exclusive) { + pte =3D ptep_clear_flush(walk->vma, addr, ptep); + + if (folio_try_share_anon_rmap_pte(folio, page)) { + set_pte_at(mm, addr, ptep, pte); + folio_unlock(folio); + folio_put(folio); + goto out; + } + } else { + pte =3D ptep_get_and_clear(mm, addr, ptep); + } + + /* Setup special migration page table entry */ + if (writable) + entry =3D make_writable_migration_entry(pfn); + else if (anon_exclusive) + entry =3D make_readable_exclusive_migration_entry(pfn); + else + entry =3D make_readable_migration_entry(pfn); + + swp_pte =3D swp_entry_to_pte(entry); + if (pte_present(pte)) { + if (pte_soft_dirty(pte)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pte)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } else { + if (pte_swp_soft_dirty(pte)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pte)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } + + set_pte_at(mm, addr, ptep, swp_pte); + folio_remove_rmap_pte(folio, page, walk->vma); + folio_put(folio); + *hmm_pfn |=3D HMM_PFN_MIGRATE; + + if (pte_present(pte)) + flush_tlb_range(walk->vma, addr, addr + PAGE_SIZE); + } else + folio_put(folio); +out: + pte_unmap_unlock(ptep, ptl); + +} + +static int hmm_vma_walk_split(pmd_t *pmdp, + unsigned long addr, + struct mm_walk *walk) +{ + struct hmm_vma_walk *hmm_vma_walk =3D walk->private; + struct hmm_range *range =3D hmm_vma_walk->range; + struct migrate_vma *migrate =3D range->migrate; + struct folio *folio, *fault_folio; + spinlock_t *ptl; + int ret =3D 0; + + fault_folio =3D (migrate && migrate->fault_page) ? + page_folio(migrate->fault_page) : NULL; + + ptl =3D pmd_lock(walk->mm, pmdp); + if (unlikely(!pmd_trans_huge(*pmdp))) { + spin_unlock(ptl); + goto out; + } + + folio =3D pmd_folio(*pmdp); + if (is_huge_zero_folio(folio)) { + spin_unlock(ptl); + split_huge_pmd(walk->vma, pmdp, addr); + } else { + folio_get(folio); + spin_unlock(ptl); + /* FIXME: we don't expect THP for fault_folio */ + if (WARN_ON_ONCE(fault_folio =3D=3D folio)) { + folio_put(folio); + ret =3D -EBUSY; + goto out; + } + if (unlikely(!folio_trylock(folio))) { + folio_put(folio); + ret =3D -EBUSY; + goto out; + } + ret =3D split_folio(folio); + folio_unlock(folio); + folio_put(folio); + } +out: + return ret; +} + +static int hmm_vma_capture_migrate_range(unsigned long start, + unsigned long end, + struct mm_walk *walk) +{ + struct hmm_vma_walk *hmm_vma_walk =3D walk->private; + struct hmm_range *range =3D hmm_vma_walk->range; + + if (!hmm_want_migrate(range)) + return 0; + + if (hmm_vma_walk->vma && (hmm_vma_walk->vma !=3D walk->vma)) + return -ERANGE; + + hmm_vma_walk->vma =3D walk->vma; + hmm_vma_walk->start =3D start; + hmm_vma_walk->end =3D end; + + if (end - start > range->end - range->start) + return -ERANGE; + + if (!hmm_vma_walk->mmu_range.owner) { + mmu_notifier_range_init_owner(&hmm_vma_walk->mmu_range, MMU_NOTIFY_MIGRA= TE, 0, + walk->vma->vm_mm, start, end, + range->dev_private_owner); + mmu_notifier_invalidate_range_start(&hmm_vma_walk->mmu_range); + } + + return 0; +} + static int hmm_vma_walk_pmd(pmd_t *pmdp, unsigned long start, unsigned long end, @@ -351,13 +625,28 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, pmd_migration_entry_wait(walk->mm, pmdp); return -EBUSY; } - return hmm_pfns_fill(start, end, range, 0); + return hmm_pfns_fill(start, end, hmm_vma_walk, 0); } =20 if (!pmd_present(pmd)) { if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) return -EFAULT; - return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); + return hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); + } + + if (hmm_want_migrate(range) && + pmd_trans_huge(pmd)) { + int r; + + r =3D hmm_vma_walk_split(pmdp, addr, walk); + if (r) { + /* Split not successful, skip */ + return hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); + } + + /* Split successful or "again", reloop */ + hmm_vma_walk->last =3D addr; + return -EBUSY; } =20 if (pmd_trans_huge(pmd)) { @@ -386,7 +675,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, if (pmd_bad(pmd)) { if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) return -EFAULT; - return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); + return hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); } =20 ptep =3D pte_offset_map(pmdp, addr); @@ -400,8 +689,11 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, /* hmm_vma_handle_pte() did pte_unmap() */ return r; } + + hmm_vma_handle_migrate_prepare(walk, pmdp, addr, hmm_pfns); } pte_unmap(ptep - 1); + return 0; } =20 @@ -535,6 +827,11 @@ static int hmm_vma_walk_test(unsigned long start, unsi= gned long end, struct hmm_vma_walk *hmm_vma_walk =3D walk->private; struct hmm_range *range =3D hmm_vma_walk->range; struct vm_area_struct *vma =3D walk->vma; + int r; + + r =3D hmm_vma_capture_migrate_range(start, end, walk); + if (r) + return r; =20 if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)) && vma->vm_flags & VM_READ) @@ -557,7 +854,7 @@ static int hmm_vma_walk_test(unsigned long start, unsig= ned long end, (end - start) >> PAGE_SHIFT, 0)) return -EFAULT; =20 - hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); + hmm_pfns_fill(start, end, hmm_vma_walk, HMM_PFN_ERROR); =20 /* Skip this vma and continue processing the next vma. */ return 1; @@ -587,9 +884,17 @@ static const struct mm_walk_ops hmm_walk_ops =3D { * the invalidation to finish. * -EFAULT: A page was requested to be valid and could not be made val= id * ie it has no backing VMA or it is illegal to access + * -ERANGE: The range crosses multiple VMAs, or space for hmm_pfns arr= ay + * is too low. * * This is similar to get_user_pages(), except that it can read the page t= ables * without mutating them (ie causing faults). + * + * If want to do migrate after faultin, call hmm_range_fault() with + * HMM_PFN_REQ_MIGRATE and initialize range.migrate field. + * After hmm_range_fault() call migrate_hmm_range_setup() instead of + * migrate_vma_setup() and after that follow normal migrate calls path. + * */ int hmm_range_fault(struct hmm_range *range) { @@ -597,16 +902,28 @@ int hmm_range_fault(struct hmm_range *range) .range =3D range, .last =3D range->start, }; - struct mm_struct *mm =3D range->notifier->mm; + bool is_fault_path =3D !!range->notifier; + struct mm_struct *mm; int ret; =20 + /* + * + * Could be serving a device fault or come from migrate + * entry point. For the former we have not resolved the vma + * yet, and the latter we don't have a notifier (but have a vma). + * + */ + mm =3D is_fault_path ? range->notifier->mm : range->migrate->vma->vm_mm; mmap_assert_locked(mm); =20 do { /* If range is no longer valid force retry. */ - if (mmu_interval_check_retry(range->notifier, - range->notifier_seq)) - return -EBUSY; + if (is_fault_path && mmu_interval_check_retry(range->notifier, + range->notifier_seq)) { + ret =3D -EBUSY; + break; + } + ret =3D walk_page_range(mm, hmm_vma_walk.last, range->end, &hmm_walk_ops, &hmm_vma_walk); /* @@ -616,6 +933,18 @@ int hmm_range_fault(struct hmm_range *range) * output, and all >=3D are still at their input values. */ } while (ret =3D=3D -EBUSY); + + if (hmm_want_migrate(range) && range->migrate && + hmm_vma_walk.mmu_range.owner) { + // The migrate_vma path has the following initialized + if (is_fault_path) { + range->migrate->vma =3D hmm_vma_walk.vma; + range->migrate->start =3D range->start; + range->migrate->end =3D hmm_vma_walk.end; + } + mmu_notifier_invalidate_range_end(&hmm_vma_walk.mmu_range); + } + return ret; } EXPORT_SYMBOL(hmm_range_fault); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index e05e14d6eacd..87ddc0353165 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -535,7 +535,18 @@ static void migrate_vma_unmap(struct migrate_vma *migr= ate) */ int migrate_vma_setup(struct migrate_vma *args) { + int ret; long nr_pages =3D (args->end - args->start) >> PAGE_SHIFT; + struct hmm_range range =3D { + .notifier =3D NULL, + .start =3D args->start, + .end =3D args->end, + .migrate =3D args, + .hmm_pfns =3D args->src, + .default_flags =3D HMM_PFN_REQ_MIGRATE, + .dev_private_owner =3D args->pgmap_owner, + .migrate =3D args + }; =20 args->start &=3D PAGE_MASK; args->end &=3D PAGE_MASK; @@ -560,17 +571,19 @@ int migrate_vma_setup(struct migrate_vma *args) args->cpages =3D 0; args->npages =3D 0; =20 - migrate_vma_collect(args); + if (args->flags & MIGRATE_VMA_FAULT) + range.default_flags |=3D HMM_PFN_REQ_FAULT; =20 - if (args->cpages) - migrate_vma_unmap(args); + ret =3D hmm_range_fault(&range); + + migrate_hmm_range_setup(&range); =20 /* * At this point pages are locked and unmapped, and thus they have * stable content and can safely be copied to destination memory that * is allocated by the drivers. */ - return 0; + return ret; =20 } EXPORT_SYMBOL(migrate_vma_setup); @@ -1014,3 +1027,54 @@ int migrate_device_coherent_folio(struct folio *foli= o) return 0; return -EBUSY; } + +void migrate_hmm_range_setup(struct hmm_range *range) +{ + + struct migrate_vma *migrate =3D range->migrate; + + if (!migrate) + return; + + migrate->npages =3D (migrate->end - migrate->start) >> PAGE_SHIFT; + migrate->cpages =3D 0; + + for (unsigned long i =3D 0; i < migrate->npages; i++) { + + unsigned long pfn =3D range->hmm_pfns[i]; + + /* + * + * Don't do migration if valid and migrate flags are not both set. + * + */ + if ((pfn & (HMM_PFN_VALID | HMM_PFN_MIGRATE)) !=3D + (HMM_PFN_VALID | HMM_PFN_MIGRATE)) { + migrate->src[i] =3D 0; + migrate->dst[i] =3D 0; + continue; + } + + migrate->cpages++; + + /* + * + * The zero page is encoded in a special way, valid and migrate is + * set, and pfn part is zero. Encode specially for migrate also. + * + */ + if (pfn =3D=3D (HMM_PFN_VALID|HMM_PFN_MIGRATE)) { + migrate->src[i] =3D MIGRATE_PFN_MIGRATE; + continue; + } + + migrate->src[i] =3D migrate_pfn(page_to_pfn(hmm_pfn_to_page(pfn))) + | MIGRATE_PFN_MIGRATE; + migrate->src[i] |=3D (pfn & HMM_PFN_WRITE) ? MIGRATE_PFN_WRITE : 0; + } + + if (migrate->cpages) + migrate_vma_unmap(migrate); + +} +EXPORT_SYMBOL(migrate_hmm_range_setup); --=20 2.50.0 From nobody Sat Oct 4 17:29:46 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFF5A26A08A for ; Thu, 14 Aug 2025 07:24:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755156285; cv=none; b=HJCV00mBQWhGFOR+8ctUBpd/XUSTsQ8tkGFA0rTH9aNGmQCUiOSEkh5bFjLzsv+lPYuu4RGMbx/FyZ+xHHA+B4gVpMA9pEQ4OFVUDwniQkyK8PFgLyFTk1m4EccyhLQFKcVdH8FNGmshULzhPrr2qE50jA/zsSz6n3zHALBFmxk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755156285; c=relaxed/simple; bh=FNI0X4ZZ4zoUP3EajxX+b9mtcAM0+pfPa4+gnVkLzqQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=YBHO5ogAaLnpgJ+kM21A5qGrwdOeuSeQGUgM7D7uqj1dEmCa8NJFIyX2z1eKOEbdFgIVnszk6FOmL+/FyQJlH/W96nnQbnNxpwDJ9vSJmOzd1J6wiZ/mf/H9GiUqp9fNP5CE3+tDcqMEDDrZSsrpIAQ9ZhFndatX0EGkodp/xYM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RgMXsQSi; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RgMXsQSi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755156282; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xsfpUsPt1D9nW7UmCu9D4p6novuR1BE9ar8gV/MHRtA=; b=RgMXsQSiVzycbXPfzyFV7Afl15wGLdbuX1muemewWqZUS/03b4haMtxbxarQZB1Ss8IWDh fHzAU7+v3Iv0AHY42NsRSEjV5AewoJQUNzI3voxL+Fybz/G18DdFw8AKFLEh85kWjw/08P pvjWRpNSlpEBmJroZmn6UqJ5y+tvHCY= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-249-aDKM-WEPM02JsJ0ThbtkFg-1; Thu, 14 Aug 2025 03:24:41 -0400 X-MC-Unique: aDKM-WEPM02JsJ0ThbtkFg-1 X-Mimecast-MFC-AGG-ID: aDKM-WEPM02JsJ0ThbtkFg_1755156280 Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-55ce520d495so254974e87.1 for ; Thu, 14 Aug 2025 00:24:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755156280; x=1755761080; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xsfpUsPt1D9nW7UmCu9D4p6novuR1BE9ar8gV/MHRtA=; b=SSFCL9ILkP0z6Nxna99gZxMzOG83+xRYHnMrfr1r9RZi1df8IUjiN8K5LFY0tBIAei ppepBHhp01cEEfpnaL7CaGnuKS/ZHqj81hP4bOX1Bw+1DmOJrUviJmQnQDBYtadWJ7CK Ci06sdIbGWil8neCy+4uoyc25HsE4foaAD1P3G6Z51X0M1HeMpCZseMZEp61H0LWR+8t cDqz17CHhMZ3FmNnZ/28T7M5E/6SPRtojB0phRTW4R9ncTdxplkcq7+6L/SBZOOHdoCI eAjhcBddK1lwZlTdGCUizifVrWOkfhBbnORYiMaFTeUVig+iPllltneda3jHdPBr0nlI eXMQ== X-Gm-Message-State: AOJu0Yyh9RafJJ4CB6Xqu4ycRrMCImSDnoy1qcNp8pErSONtIewNNwl0 iJkoHSmiq9urEn0NrdHc+inqbDGrH1pV2egUcX+6KK8onqXljlymHRVePuCNfTBW1DFP2MFGC5/ VUV6Zaf9nzUK8r3NssK9S53atRRUg1nzudS0snCthpNa02jELjGJIdJv4JTOlvYib X-Gm-Gg: ASbGncuMGMuSrgmpAYpwbHNfbtBH1Q4Vmxj4aHw1wZSIKGqtPpKE4m4ArgVTAQhlNXZ M+Xo3CQH3J49OpgOKqArjJPyHBn8+Tjtr23dmrcHzrANMBEdlPR7aa/ZyxRWUt5Vx3aByVBTnuS ZQW6enJadGlXain0BHqyr48rieMTKMOHT9GrWLnqiLISAQ/hSpumbG/kvyS4pjiFqOSmUL/cMOo ya56IxDTIAULaWcagt9NmX3ptpt1n0ZWHJjQC+CQPVSbM+pu7Yfc9JO48Ai4YGTkbKmuAde2wUM LdJ/7tvP1S0++oGqncU9CfpjZRdhV6qfWN2xQAd6SNEfMfhfK2RDKyQ= X-Received: by 2002:a05:6512:61c:10b0:55b:9483:81b with SMTP id 2adb3069b0e04-55ce507284fmr454478e87.34.1755156279755; Thu, 14 Aug 2025 00:24:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFn8nn/4ZhQUj1K1B8nfz1K5TpdPbaNLcM4Kv2veVg42Ms/oRgm6KLivI0SyHH3p7l2pOodFw== X-Received: by 2002:a05:6512:61c:10b0:55b:9483:81b with SMTP id 2adb3069b0e04-55ce507284fmr454467e87.34.1755156279245; Thu, 14 Aug 2025 00:24:39 -0700 (PDT) Received: from fedora (85-23-48-6.bb.dnainternet.fi. [85.23.48.6]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-55b887f72f4sm5620240e87.0.2025.08.14.00.24.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Aug 2025 00:24:38 -0700 (PDT) From: =?UTF-8?q?Mika=20Penttil=C3=A4?= To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?UTF-8?q?Mika=20Penttil=C3=A4?= , David Hildenbrand , Jason Gunthorpe , Leon Romanovsky , Alistair Popple , Balbir Singh Subject: [RFC PATCH 3/4] mm:/migrate_device.c: remove migrate_vma_collect_*() functions Date: Thu, 14 Aug 2025 10:19:28 +0300 Message-ID: <20250814072045.3637192-5-mpenttil@redhat.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250814072045.3637192-1-mpenttil@redhat.com> References: <20250814072045.3637192-1-mpenttil@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable With the unified fault handling and migrate path, the migrate_vma_collect_*() functions are unused, let's remove them. Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Alistair Popple Cc: Balbir Singh Signed-off-by: Mika Penttil=C3=A4 --- mm/migrate_device.c | 312 +------------------------------------------- 1 file changed, 1 insertion(+), 311 deletions(-) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 87ddc0353165..0c84dfcd5058 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -15,319 +15,9 @@ #include #include #include +#include #include "internal.h" =20 -static int migrate_vma_collect_skip(unsigned long start, - unsigned long end, - struct mm_walk *walk) -{ - struct migrate_vma *migrate =3D walk->private; - unsigned long addr; - - for (addr =3D start; addr < end; addr +=3D PAGE_SIZE) { - migrate->dst[migrate->npages] =3D 0; - migrate->src[migrate->npages++] =3D 0; - } - - return 0; -} - -static int migrate_vma_collect_hole(unsigned long start, - unsigned long end, - __always_unused int depth, - struct mm_walk *walk) -{ - struct migrate_vma *migrate =3D walk->private; - unsigned long addr; - - /* Only allow populating anonymous memory. */ - if (!vma_is_anonymous(walk->vma)) - return migrate_vma_collect_skip(start, end, walk); - - for (addr =3D start; addr < end; addr +=3D PAGE_SIZE) { - migrate->src[migrate->npages] =3D MIGRATE_PFN_MIGRATE; - migrate->dst[migrate->npages] =3D 0; - migrate->npages++; - migrate->cpages++; - } - - return 0; -} - -static int migrate_vma_collect_pmd(pmd_t *pmdp, - unsigned long start, - unsigned long end, - struct mm_walk *walk) -{ - struct migrate_vma *migrate =3D walk->private; - struct folio *fault_folio =3D migrate->fault_page ? - page_folio(migrate->fault_page) : NULL; - struct vm_area_struct *vma =3D walk->vma; - struct mm_struct *mm =3D vma->vm_mm; - unsigned long addr =3D start, unmapped =3D 0; - spinlock_t *ptl; - pte_t *ptep; - -again: - if (pmd_none(*pmdp)) - return migrate_vma_collect_hole(start, end, -1, walk); - - if (pmd_trans_huge(*pmdp)) { - struct folio *folio; - - ptl =3D pmd_lock(mm, pmdp); - if (unlikely(!pmd_trans_huge(*pmdp))) { - spin_unlock(ptl); - goto again; - } - - folio =3D pmd_folio(*pmdp); - if (is_huge_zero_folio(folio)) { - spin_unlock(ptl); - split_huge_pmd(vma, pmdp, addr); - } else { - int ret; - - folio_get(folio); - spin_unlock(ptl); - /* FIXME: we don't expect THP for fault_folio */ - if (WARN_ON_ONCE(fault_folio =3D=3D folio)) - return migrate_vma_collect_skip(start, end, - walk); - if (unlikely(!folio_trylock(folio))) - return migrate_vma_collect_skip(start, end, - walk); - ret =3D split_folio(folio); - if (fault_folio !=3D folio) - folio_unlock(folio); - folio_put(folio); - if (ret) - return migrate_vma_collect_skip(start, end, - walk); - } - } - - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - if (!ptep) - goto again; - arch_enter_lazy_mmu_mode(); - - for (; addr < end; addr +=3D PAGE_SIZE, ptep++) { - struct dev_pagemap *pgmap; - unsigned long mpfn =3D 0, pfn; - struct folio *folio; - struct page *page; - swp_entry_t entry; - pte_t pte; - - pte =3D ptep_get(ptep); - - if (pte_none(pte)) { - if (vma_is_anonymous(vma)) { - mpfn =3D MIGRATE_PFN_MIGRATE; - migrate->cpages++; - } - goto next; - } - - if (!pte_present(pte)) { - /* - * Only care about unaddressable device page special - * page table entry. Other special swap entries are not - * migratable, and we ignore regular swapped page. - */ - entry =3D pte_to_swp_entry(pte); - if (!is_device_private_entry(entry)) - goto next; - - page =3D pfn_swap_entry_to_page(entry); - pgmap =3D page_pgmap(page); - if (!(migrate->flags & - MIGRATE_VMA_SELECT_DEVICE_PRIVATE) || - pgmap->owner !=3D migrate->pgmap_owner) - goto next; - - mpfn =3D migrate_pfn(page_to_pfn(page)) | - MIGRATE_PFN_MIGRATE; - if (is_writable_device_private_entry(entry)) - mpfn |=3D MIGRATE_PFN_WRITE; - } else { - pfn =3D pte_pfn(pte); - if (is_zero_pfn(pfn) && - (migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)) { - mpfn =3D MIGRATE_PFN_MIGRATE; - migrate->cpages++; - goto next; - } - page =3D vm_normal_page(migrate->vma, addr, pte); - if (page && !is_zone_device_page(page) && - !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)) { - goto next; - } else if (page && is_device_coherent_page(page)) { - pgmap =3D page_pgmap(page); - - if (!(migrate->flags & - MIGRATE_VMA_SELECT_DEVICE_COHERENT) || - pgmap->owner !=3D migrate->pgmap_owner) - goto next; - } - mpfn =3D migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE; - mpfn |=3D pte_write(pte) ? MIGRATE_PFN_WRITE : 0; - } - - /* FIXME support THP */ - if (!page || !page->mapping || PageTransCompound(page)) { - mpfn =3D 0; - goto next; - } - - /* - * By getting a reference on the folio we pin it and that blocks - * any kind of migration. Side effect is that it "freezes" the - * pte. - * - * We drop this reference after isolating the folio from the lru - * for non device folio (device folio are not on the lru and thus - * can't be dropped from it). - */ - folio =3D page_folio(page); - folio_get(folio); - - /* - * We rely on folio_trylock() to avoid deadlock between - * concurrent migrations where each is waiting on the others - * folio lock. If we can't immediately lock the folio we fail this - * migration as it is only best effort anyway. - * - * If we can lock the folio it's safe to set up a migration entry - * now. In the common case where the folio is mapped once in a - * single process setting up the migration entry now is an - * optimisation to avoid walking the rmap later with - * try_to_migrate(). - */ - if (fault_folio =3D=3D folio || folio_trylock(folio)) { - bool anon_exclusive; - pte_t swp_pte; - - flush_cache_page(vma, addr, pte_pfn(pte)); - anon_exclusive =3D folio_test_anon(folio) && - PageAnonExclusive(page); - if (anon_exclusive) { - pte =3D ptep_clear_flush(vma, addr, ptep); - - if (folio_try_share_anon_rmap_pte(folio, page)) { - set_pte_at(mm, addr, ptep, pte); - if (fault_folio !=3D folio) - folio_unlock(folio); - folio_put(folio); - mpfn =3D 0; - goto next; - } - } else { - pte =3D ptep_get_and_clear(mm, addr, ptep); - } - - migrate->cpages++; - - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pte)) - folio_mark_dirty(folio); - - /* Setup special migration page table entry */ - if (mpfn & MIGRATE_PFN_WRITE) - entry =3D make_writable_migration_entry( - page_to_pfn(page)); - else if (anon_exclusive) - entry =3D make_readable_exclusive_migration_entry( - page_to_pfn(page)); - else - entry =3D make_readable_migration_entry( - page_to_pfn(page)); - if (pte_present(pte)) { - if (pte_young(pte)) - entry =3D make_migration_entry_young(entry); - if (pte_dirty(pte)) - entry =3D make_migration_entry_dirty(entry); - } - swp_pte =3D swp_entry_to_pte(entry); - if (pte_present(pte)) { - if (pte_soft_dirty(pte)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pte)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - } else { - if (pte_swp_soft_dirty(pte)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pte)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - } - set_pte_at(mm, addr, ptep, swp_pte); - - /* - * This is like regular unmap: we remove the rmap and - * drop the folio refcount. The folio won't be freed, as - * we took a reference just above. - */ - folio_remove_rmap_pte(folio, page, vma); - folio_put(folio); - - if (pte_present(pte)) - unmapped++; - } else { - folio_put(folio); - mpfn =3D 0; - } - -next: - migrate->dst[migrate->npages] =3D 0; - migrate->src[migrate->npages++] =3D mpfn; - } - - /* Only flush the TLB if we actually modified any entries */ - if (unmapped) - flush_tlb_range(walk->vma, start, end); - - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(ptep - 1, ptl); - - return 0; -} - -static const struct mm_walk_ops migrate_vma_walk_ops =3D { - .pmd_entry =3D migrate_vma_collect_pmd, - .pte_hole =3D migrate_vma_collect_hole, - .walk_lock =3D PGWALK_RDLOCK, -}; - -/* - * migrate_vma_collect() - collect pages over a range of virtual addresses - * @migrate: migrate struct containing all migration information - * - * This will walk the CPU page table. For each virtual address backed by a - * valid page, it updates the src array and takes a reference on the page,= in - * order to pin the page until we lock it and unmap it. - */ -static void migrate_vma_collect(struct migrate_vma *migrate) -{ - struct mmu_notifier_range range; - - /* - * Note that the pgmap_owner is passed to the mmu notifier callback so - * that the registered device driver can skip invalidating device - * private page mappings that won't be migrated. - */ - mmu_notifier_range_init_owner(&range, MMU_NOTIFY_MIGRATE, 0, - migrate->vma->vm_mm, migrate->start, migrate->end, - migrate->pgmap_owner); - mmu_notifier_invalidate_range_start(&range); - - walk_page_range(migrate->vma->vm_mm, migrate->start, migrate->end, - &migrate_vma_walk_ops, migrate); - - mmu_notifier_invalidate_range_end(&range); - migrate->end =3D migrate->start + (migrate->npages << PAGE_SHIFT); -} - /* * migrate_vma_check_page() - check if page is pinned or not * @page: struct page to check --=20 2.50.0 From nobody Sat Oct 4 17:29:46 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A3BA263F5F for ; Thu, 14 Aug 2025 07:24:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755156292; cv=none; b=g3YiJrLBBGfh4v+aPwwo+YS5NvbHoC4wJniVHJQy6rdqNgt1JzLGGC/N4DsIM5im6/qllh1EHIPnA+OY7coYgIwDqOgz3K062Ff8wUkoEsoUrjTtdQLs+oLOpEfaFb5zjcQ2gJx0zVN09dgSuM1c/PtsS1o+ytbLZuT9eESvMXE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755156292; c=relaxed/simple; bh=jYXI3iD4/vo1bItLEvNJNVBo+wpWztQA7uOxMSXYoxI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=gNe5OFrp+mtu22XTXA7B5uHBf6pABKhlz6PX/P1RKhYcF6zEGgqQ+59dP6HO7qEry/xWs5FRBJ7lS1BfGTUGSup463QxOCEmhDDpG94NiiL20rdMUWzzZ7U90NFNuaRnnCk8ULelPfTe464719ObxGJ3tdmX0lJJ9TKIwDVq6P0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=G/71iKe/; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="G/71iKe/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755156288; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NFYh+lMLib5d9aEcym56CimWls+PDkMvSuAlLfaPgCY=; b=G/71iKe//NQ3+K4tVSu1BqSayZirvxGCh9Md69roWsqEC3bS/uBIdbx+yfVhei0eD8WBgH e7sYSkw5xNFx4UIvNeq2wASzCjtPeySQRjPknxdV+arBoOPVPLDbmR43Eub400jyYhw8Dq WdtAAWqnVoHXYVhvT3bB7U2wYbdS32s= Received: from mail-lf1-f69.google.com (mail-lf1-f69.google.com [209.85.167.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-489-baaWz-xPPK6PaNcGPHJf-Q-1; Thu, 14 Aug 2025 03:24:46 -0400 X-MC-Unique: baaWz-xPPK6PaNcGPHJf-Q-1 X-Mimecast-MFC-AGG-ID: baaWz-xPPK6PaNcGPHJf-Q_1755156285 Received: by mail-lf1-f69.google.com with SMTP id 2adb3069b0e04-55ce52129d5so230518e87.1 for ; Thu, 14 Aug 2025 00:24:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755156285; x=1755761085; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NFYh+lMLib5d9aEcym56CimWls+PDkMvSuAlLfaPgCY=; b=bH2HCaPdcc0rib+pWhbYj1y/T/zee9gf7GqfoI1d+XmA7XZpzgTFvwG23+akBdQW9U LXwCsYaI6wEcRzqbwtUL1NHCKzdS3pEODyYsoae6L+F9niQkmGaFKIVrSCtkEn6+V16B Rzjbi1u6LVylelZ6D6nqLfBOXaIRknNIuzkmNIhrPTU87SW5foT2KmZRANaasSATu9fY T9Vt3+DXUm4yIKMDiu5SuwZZp4TLDRlsJXFzrR5DGgi9mfZH0hEobVdxGDtTtRNnYreP utW5yYoLO9MgqdXc+finEmfT9zjRVuJSUBYQ83R8mwpADHhrFJlezIM9m2C3jcNCdiHf D2Iw== X-Gm-Message-State: AOJu0YwjQVin3njcOEbBsXziueuebdC/GL4bEAHQQvCwBrIiAQge91j9 +jXpmG0klAAR9urUOIGxxmJx+3dy8NBMtQnxfhqHnRjzJSOh0TSyilIgy+YNywrGmFAinQ0m/7O K6fvHpysYIOMvIXKNGs/R28Sc7lVH/aZE8rzB0wTNRGx3W2x31utW4m9k6oAc5cNI X-Gm-Gg: ASbGnctCp7Oym5JR/6E6FXTOSNEmv9Z5Ku7lvuhS86jWLtzhpqmE0ekvmuuxRy0X5lm 3jpSPs00h3Ocs6Mxe3vMgS4O5fgOzWbT7U12EsH+5BP+chToD1+BW+gfQl1kesg9SgelYltlna4 7lxcgijnnPMRNTDwzSXE9YJcR2i2eV5/dQ6qGxh5I8aqje9VnAgcY8QOripIo6VtTSKSX9FFkhz GhFJocCqh36sZnB/huwBs22zUKb0czZBks3eXCMK+SSt2goE8b4sqiHkSou3+G4NK/Fyvry9ZKw Hi0KFRv6hEQRrqJ2JP0isbOGP0Q3wZJ98IvbNI+zEK3tXf4PMcaF5lc= X-Received: by 2002:a05:6512:3e2a:b0:55b:8767:1b0d with SMTP id 2adb3069b0e04-55ce632aebfmr456623e87.39.1755156284584; Thu, 14 Aug 2025 00:24:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFd7JLE7pUvGqP17rdjWY/nMLf4BrFbhcfKO4RBEc6kzKWQaFTl4wfpAS+y5SdrFE3jgmSvXA== X-Received: by 2002:a05:6512:3e2a:b0:55b:8767:1b0d with SMTP id 2adb3069b0e04-55ce632aebfmr456609e87.39.1755156284074; Thu, 14 Aug 2025 00:24:44 -0700 (PDT) Received: from fedora (85-23-48-6.bb.dnainternet.fi. [85.23.48.6]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-55b887f72f4sm5620240e87.0.2025.08.14.00.24.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Aug 2025 00:24:43 -0700 (PDT) From: =?UTF-8?q?Mika=20Penttil=C3=A4?= To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?UTF-8?q?Mika=20Penttil=C3=A4?= , David Hildenbrand , Jason Gunthorpe , Leon Romanovsky , Alistair Popple , Balbir Singh , Marco Pagani Subject: [RFC PATCH 4/4] mm: add new testcase for the migrate on fault case Date: Thu, 14 Aug 2025 10:19:29 +0300 Message-ID: <20250814072045.3637192-6-mpenttil@redhat.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250814072045.3637192-1-mpenttil@redhat.com> References: <20250814072045.3637192-1-mpenttil@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Alistair Popple Cc: Balbir Singh Signed-off-by: Marco Pagani Signed-off-by: Mika Penttil=C3=A4 --- lib/test_hmm.c | 100 ++++++++++++++++++++++++- lib/test_hmm_uapi.h | 17 +++-- tools/testing/selftests/mm/hmm-tests.c | 53 +++++++++++++ 3 files changed, 161 insertions(+), 9 deletions(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index cd5c139213be..e69a5c87e414 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -36,6 +36,7 @@ #define DMIRROR_RANGE_FAULT_TIMEOUT 1000 #define DEVMEM_CHUNK_SIZE (256 * 1024 * 1024U) #define DEVMEM_CHUNKS_RESERVE 16 +#define PFNS_ARRAY_SIZE 64 =20 /* * For device_private pages, dpage is just a dummy struct page @@ -143,7 +144,7 @@ static bool dmirror_is_private_zone(struct dmirror_devi= ce *mdevice) HMM_DMIRROR_MEMORY_DEVICE_PRIVATE) ? true : false; } =20 -static enum migrate_vma_direction +static enum migrate_vma_info dmirror_select_device(struct dmirror *dmirror) { return (dmirror->mdevice->zone_device_type =3D=3D @@ -1016,6 +1017,99 @@ static int dmirror_migrate_to_device(struct dmirror = *dmirror, return ret; } =20 +static int do_fault_and_migrate(struct dmirror *dmirror, struct hmm_range = *range) +{ + struct migrate_vma *migrate =3D range->migrate; + int ret; + + mmap_read_lock(dmirror->notifier.mm); + + /* Fault-in pages for migration and update device page table */ + ret =3D dmirror_range_fault(dmirror, range); + + pr_debug("Migrating from sys mem to device mem\n"); + migrate_hmm_range_setup(range); + + dmirror_migrate_alloc_and_copy(migrate, dmirror); + migrate_vma_pages(migrate); + dmirror_migrate_finalize_and_map(migrate, dmirror); + migrate_vma_finalize(migrate); + + mmap_read_unlock(dmirror->notifier.mm); + return ret; +} + +static int dmirror_fault_and_migrate_to_device(struct dmirror *dmirror, + struct hmm_dmirror_cmd *cmd) +{ + unsigned long start, size, end, next; + unsigned long src_pfns[PFNS_ARRAY_SIZE] =3D { 0 }; + unsigned long dst_pfns[PFNS_ARRAY_SIZE] =3D { 0 }; + struct migrate_vma migrate =3D { 0 }; + struct hmm_range range =3D { 0 }; + struct dmirror_bounce bounce; + int ret =3D 0; + + /* Whole range */ + start =3D cmd->addr; + size =3D cmd->npages << PAGE_SHIFT; + end =3D start + size; + + if (!mmget_not_zero(dmirror->notifier.mm)) { + ret =3D -EFAULT; + goto out; + } + + migrate.pgmap_owner =3D dmirror->mdevice; + migrate.src =3D src_pfns; + migrate.dst =3D dst_pfns; + + range.migrate =3D &migrate; + range.hmm_pfns =3D src_pfns; + range.pfn_flags_mask =3D 0; + range.default_flags =3D HMM_PFN_REQ_FAULT | HMM_PFN_REQ_MIGRATE; + range.dev_private_owner =3D dmirror->mdevice; + range.notifier =3D &dmirror->notifier; + + for (next =3D start; next < end; next =3D range.end) { + range.start =3D next; + range.end =3D min(end, next + (PFNS_ARRAY_SIZE << PAGE_SHIFT)); + + pr_debug("Fault and migrate range start:%#lx end:%#lx\n", + range.start, range.end); + + ret =3D do_fault_and_migrate(dmirror, &range); + if (ret) + goto out_mmput; + } + + /* + * Return the migrated data for verification. + * Only for pages in device zone + ***/ + ret =3D dmirror_bounce_init(&bounce, start, size); + if (ret) + goto out_mmput; + + mutex_lock(&dmirror->mutex); + ret =3D dmirror_do_read(dmirror, start, end, &bounce); + mutex_unlock(&dmirror->mutex); + if (ret =3D=3D 0) { + ret =3D copy_to_user(u64_to_user_ptr(cmd->ptr), bounce.ptr, bounce.size); + if (ret) + ret =3D -EFAULT; + } + + cmd->cpages =3D bounce.cpages; + dmirror_bounce_fini(&bounce); + + +out_mmput: + mmput(dmirror->notifier.mm); +out: + return ret; +} + static void dmirror_mkentry(struct dmirror *dmirror, struct hmm_range *ran= ge, unsigned char *perm, unsigned long entry) { @@ -1313,6 +1407,10 @@ static long dmirror_fops_unlocked_ioctl(struct file = *filp, ret =3D dmirror_migrate_to_device(dmirror, &cmd); break; =20 + case HMM_DMIRROR_MIGRATE_ON_FAULT_TO_DEV: + ret =3D dmirror_fault_and_migrate_to_device(dmirror, &cmd); + break; + case HMM_DMIRROR_MIGRATE_TO_SYS: ret =3D dmirror_migrate_to_system(dmirror, &cmd); break; diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h index 8c818a2cf4f6..4266f0a12201 100644 --- a/lib/test_hmm_uapi.h +++ b/lib/test_hmm_uapi.h @@ -29,14 +29,15 @@ struct hmm_dmirror_cmd { }; =20 /* Expose the address space of the calling process through hmm device file= */ -#define HMM_DMIRROR_READ _IOWR('H', 0x00, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_WRITE _IOWR('H', 0x01, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_MIGRATE_TO_DEV _IOWR('H', 0x02, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_MIGRATE_TO_SYS _IOWR('H', 0x03, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_SNAPSHOT _IOWR('H', 0x04, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_EXCLUSIVE _IOWR('H', 0x05, struct hmm_dmirror_cmd) -#define HMM_DMIRROR_CHECK_EXCLUSIVE _IOWR('H', 0x06, struct hmm_dmirror_cm= d) -#define HMM_DMIRROR_RELEASE _IOWR('H', 0x07, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_READ _IOWR('H', 0x00, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_WRITE _IOWR('H', 0x01, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_MIGRATE_TO_DEV _IOWR('H', 0x02, struct hmm_dmirror_cm= d) +#define HMM_DMIRROR_MIGRATE_ON_FAULT_TO_DEV _IOWR('H', 0x03, struct hmm_dm= irror_cmd) +#define HMM_DMIRROR_MIGRATE_TO_SYS _IOWR('H', 0x04, struct hmm_dmirror_cm= d) +#define HMM_DMIRROR_SNAPSHOT _IOWR('H', 0x05, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_EXCLUSIVE _IOWR('H', 0x06, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_CHECK_EXCLUSIVE _IOWR('H', 0x07, struct hmm_dmirror_c= md) +#define HMM_DMIRROR_RELEASE _IOWR('H', 0x08, struct hmm_dmirror_cmd) =20 /* * Values returned in hmm_dmirror_cmd.ptr for HMM_DMIRROR_SNAPSHOT. diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftes= ts/mm/hmm-tests.c index 141bf63cbe05..059512f62d29 100644 --- a/tools/testing/selftests/mm/hmm-tests.c +++ b/tools/testing/selftests/mm/hmm-tests.c @@ -272,6 +272,13 @@ static int hmm_migrate_sys_to_dev(int fd, return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_TO_DEV, buffer, npages); } =20 +static int hmm_migrate_on_fault_sys_to_dev(int fd, + struct hmm_buffer *buffer, + unsigned long npages) +{ + return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_ON_FAULT_TO_DEV, buffer, n= pages); +} + static int hmm_migrate_dev_to_sys(int fd, struct hmm_buffer *buffer, unsigned long npages) @@ -998,6 +1005,52 @@ TEST_F(hmm, migrate) hmm_buffer_free(buffer); } =20 +/* + * Fault an migrate anonymous memory to device private memory. + */ +TEST_F(hmm, migrate_on_fault) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + int *ptr; + int ret; + + npages =3D ALIGN(HMM_BUFFER_SIZE, self->page_size) >> self->page_shift; + ASSERT_NE(npages, 0); + size =3D npages << self->page_shift; + + buffer =3D malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd =3D -1; + buffer->size =3D size; + buffer->mirror =3D malloc(size); + ASSERT_NE(buffer->mirror, NULL); + + buffer->ptr =3D mmap(NULL, size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, + buffer->fd, 0); + ASSERT_NE(buffer->ptr, MAP_FAILED); + + /* Initialize buffer in system memory. */ + for (i =3D 0, ptr =3D buffer->ptr; i < size / sizeof(*ptr); ++i) + ptr[i] =3D i; + + /* Fault and migrate memory to device. */ + ret =3D hmm_migrate_on_fault_sys_to_dev(self->fd, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i =3D 0, ptr =3D buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], i); + + hmm_buffer_free(buffer); +} + /* * Migrate anonymous memory to device private memory and fault some of it = back * to system memory, then try migrating the resulting mix of system and de= vice --=20 2.50.0