From nobody Mon Feb 9 19:31:36 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C92501DDC3C for ; Wed, 29 Jan 2025 11:54:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151669; cv=none; b=Kt/rUIoo2okYp5KZcJWgNo8WT/V+24cQCy3ank5ZTZivvVsj8aCanLTZ2V7n/Mqet9mpXZNJzSPdzF91UBZZNfFXR1AQb3ogPrksKGa/mgxv6OF7huYCXtb80kvfZp0NwNapTQ8sMAv6eMWf5g8ICtbNiwDKr+tlq28m4dm7b1I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151669; c=relaxed/simple; bh=nB56LEt4BweD7fZ7V8P6ZzKJLqYCpn9GIE9wy+WIsGg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kniv22I72Dj2TvEMyh9gOFUkppKreyr2UyRaN5L7e1mJyFdEIhtffPXW5g8iyILbPNBhJVSqI910R1zZVtwbmAAvELmQhOms2YB0wkVnZ6e7q0JSgmihkDYEHa6H6IltBTpPxZAp3vw4D7cc5QTd2MT306GvRwne2NrQMLflV5E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=C3uPokhG; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="C3uPokhG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151667; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FbpRkzZA4TilYwHsov5kO/PS96AeznJfw9pJaA8wpMg=; b=C3uPokhGbooATzhNYjT+Xe/6yxpR28Cj01EalBy0KQ0p/Lj4QRqsZeUj6kttTM6hevu5l6 GG330j+Kfpiu5IZZJ2dnmYh52WyynMF2J35t/Bzl3aidQvx+gvxICCpJqrvEYhRx1wAQ9W kQc2oqMK9azLnoTukN+sY0hy9Avaq8o= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-688-V9nGS0hmNoyfTZaccFLJNg-1; Wed, 29 Jan 2025 06:54:25 -0500 X-MC-Unique: V9nGS0hmNoyfTZaccFLJNg-1 X-Mimecast-MFC-AGG-ID: V9nGS0hmNoyfTZaccFLJNg Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38bee9ae3b7so4420932f8f.1 for ; Wed, 29 Jan 2025 03:54:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151664; x=1738756464; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FbpRkzZA4TilYwHsov5kO/PS96AeznJfw9pJaA8wpMg=; b=mL4KlDYDe1bGBR5uyerTJAzR1n9bZk9GgIEq65JutjXdsAhtlh1bVDQl4oUlkgRH4R LpEA+lRoCcugUU+HQin8TSz5zAaBvzDxIINe1OUtDeY/gvOQPR/GriVpZsFps/fdf7Tb qU9xfk4HeD5/+QQczPRGrqJ+VlVyxneXSkff/GBjLKlAmZ6Txc/p6s21ZSaCjckhIb9f CjLmh9fWqvkWKpv766/dwM3lgrBoQ45qRrVH6EBWSdGga83yQJ1RM+pe/5aoy0sOZlNt +0a7dFGIVE5SgRpXk7K9VgwAcHtbDBo5xGvS8tuXCB2YQt6Ub9+Lxe01qSb/S4kbwXYH /oiA== X-Gm-Message-State: AOJu0Yynn2ivJCJGHkFMO2n+rthKQUBuGIQzX/YDvC8UTO9EPfCgkUa8 q+AxfcuFVnDIu1oJm9wCly2TV6ZpiXEMHKnwSWsZ/H158YT3beo8yZKKYNar1gsNymPxLj1nig4 7KmUBHLjANvcZRvxoir1XSndaNeVnE3tjXxT7XeWPdZkSflTC40Pc+MwpkUzhk89x823nXIhkb8 07YT1tArOCtmfBt7AyowFb6k20vtKwsuC+flw/EW/G+UDe X-Gm-Gg: ASbGncuj5CE63HO19eDevO8U2Dc13Uvj+A9dz83voiv10TuNseSkZpnB8wJI1PJtDju 7xE53b4r85laiBDpb+h2/iq3xom4z4GVfzkXJaul9GLh09FX6F9ubZl8OsElD2DPbzZ2bRDYJD4 iki8QF5uMrLJckhQAuHz3OuqTzBCdqj/u5cqtofKh70+ymwcbiriuvVoSZMqaSNvGp2Eksaovq+ NgVAiagKxRT3cFBglsq/E0M/+CeWze3VqhdReFzXk08FUge8Ut45HTLljpdNI60fdpPg9SrGJNK GdhlNq8irC5ByDon1/pNB6Ux13RsJY25IaW8LjbQNZaDBqBxhphBTo24ulbsNc/PMQ== X-Received: by 2002:a5d:64c3:0:b0:385:e5d8:2bea with SMTP id ffacd0b85a97d-38c519460aemr2346447f8f.20.1738151663899; Wed, 29 Jan 2025 03:54:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IECXzKNlO/KxhDgtZSrbOQmu7i91GFToVYgJKpoIoUvo8DlvKFDTSa6/lsKfWQw3fetIFbvYQ== X-Received: by 2002:a5d:64c3:0:b0:385:e5d8:2bea with SMTP id ffacd0b85a97d-38c519460aemr2346391f8f.20.1738151663351; Wed, 29 Jan 2025 03:54:23 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-438dcc27130sm20111455e9.16.2025.01.29.03.54.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:22 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 03/12] mm/rmap: convert make_device_exclusive_range() to make_device_exclusive() Date: Wed, 29 Jan 2025 12:54:01 +0100 Message-ID: <20250129115411.2077152-4-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The single "real" user in the tree of make_device_exclusive_range() always requests making only a single address exclusive. The current implementation is hard to fix for properly supporting anonymous THP / large folios and for avoiding messing with rmap walks in weird ways. So let's always process a single address/page and return folio + page to minimize page -> folio lookups. This is a preparation for further changes. Reject any non-anonymous or hugetlb folios early, directly after GUP. Signed-off-by: David Hildenbrand Acked-by: Simona Vetter Reviewed-by: Alistair Popple --- Documentation/mm/hmm.rst | 2 +- Documentation/translations/zh_CN/mm/hmm.rst | 2 +- drivers/gpu/drm/nouveau/nouveau_svm.c | 5 +- include/linux/mmu_notifier.h | 2 +- include/linux/rmap.h | 5 +- lib/test_hmm.c | 45 +++++------ mm/rmap.c | 90 +++++++++++---------- 7 files changed, 75 insertions(+), 76 deletions(-) diff --git a/Documentation/mm/hmm.rst b/Documentation/mm/hmm.rst index f6d53c37a2ca..7d61b7a8b65b 100644 --- a/Documentation/mm/hmm.rst +++ b/Documentation/mm/hmm.rst @@ -400,7 +400,7 @@ Exclusive access memory Some devices have features such as atomic PTE bits that can be used to imp= lement atomic access to system memory. To support atomic operations to a shared v= irtual memory page such a device needs access to that page which is exclusive of = any -userspace access from the CPU. The ``make_device_exclusive_range()`` funct= ion +userspace access from the CPU. The ``make_device_exclusive()`` function can be used to make a memory range inaccessible from userspace. =20 This replaces all mappings for pages in the given range with special swap diff --git a/Documentation/translations/zh_CN/mm/hmm.rst b/Documentation/tr= anslations/zh_CN/mm/hmm.rst index 0669f947d0bc..22c210f4e94f 100644 --- a/Documentation/translations/zh_CN/mm/hmm.rst +++ b/Documentation/translations/zh_CN/mm/hmm.rst @@ -326,7 +326,7 @@ devm_memunmap_pages() =E5=92=8C devm_release_mem_region= () =E5=BD=93=E8=B5=84=E6=BA=90=E5=8F=AF=E4=BB=A5=E7=BB=91=E5=AE=9A=E5=88=B0= ``s =20 =E4=B8=80=E4=BA=9B=E8=AE=BE=E5=A4=87=E5=85=B7=E6=9C=89=E8=AF=B8=E5=A6=82= =E5=8E=9F=E5=AD=90PTE=E4=BD=8D=E7=9A=84=E5=8A=9F=E8=83=BD=EF=BC=8C=E5=8F=AF= =E4=BB=A5=E7=94=A8=E6=9D=A5=E5=AE=9E=E7=8E=B0=E5=AF=B9=E7=B3=BB=E7=BB=9F=E5= =86=85=E5=AD=98=E7=9A=84=E5=8E=9F=E5=AD=90=E8=AE=BF=E9=97=AE=E3=80=82=E4=B8= =BA=E4=BA=86=E6=94=AF=E6=8C=81=E5=AF=B9=E4=B8=80 =E4=B8=AA=E5=85=B1=E4=BA=AB=E7=9A=84=E8=99=9A=E6=8B=9F=E5=86=85=E5=AD=98= =E9=A1=B5=E7=9A=84=E5=8E=9F=E5=AD=90=E6=93=8D=E4=BD=9C=EF=BC=8C=E8=BF=99=E6= =A0=B7=E7=9A=84=E8=AE=BE=E5=A4=87=E9=9C=80=E8=A6=81=E5=AF=B9=E8=AF=A5=E9=A1= =B5=E7=9A=84=E8=AE=BF=E9=97=AE=E6=98=AF=E6=8E=92=E4=BB=96=E7=9A=84=EF=BC=8C= =E8=80=8C=E4=B8=8D=E6=98=AF=E6=9D=A5=E8=87=AACPU -=E7=9A=84=E4=BB=BB=E4=BD=95=E7=94=A8=E6=88=B7=E7=A9=BA=E9=97=B4=E8=AE=BF= =E9=97=AE=E3=80=82 ``make_device_exclusive_range()`` =E5=87=BD=E6=95=B0=E5= =8F=AF=E4=BB=A5=E7=94=A8=E6=9D=A5=E4=BD=BF=E4=B8=80 +=E7=9A=84=E4=BB=BB=E4=BD=95=E7=94=A8=E6=88=B7=E7=A9=BA=E9=97=B4=E8=AE=BF= =E9=97=AE=E3=80=82 ``make_device_exclusive()`` =E5=87=BD=E6=95=B0=E5=8F=AF= =E4=BB=A5=E7=94=A8=E6=9D=A5=E4=BD=BF=E4=B8=80 =E4=B8=AA=E5=86=85=E5=AD=98=E8=8C=83=E5=9B=B4=E4=B8=8D=E8=83=BD=E4=BB=8E= =E7=94=A8=E6=88=B7=E7=A9=BA=E9=97=B4=E8=AE=BF=E9=97=AE=E3=80=82 =20 =E8=BF=99=E5=B0=86=E7=94=A8=E7=89=B9=E6=AE=8A=E7=9A=84=E4=BA=A4=E6=8D=A2= =E6=9D=A1=E7=9B=AE=E6=9B=BF=E6=8D=A2=E7=BB=99=E5=AE=9A=E8=8C=83=E5=9B=B4=E5= =86=85=E7=9A=84=E6=89=80=E6=9C=89=E9=A1=B5=E7=9A=84=E6=98=A0=E5=B0=84=E3=80= =82=E4=BB=BB=E4=BD=95=E8=AF=95=E5=9B=BE=E8=AE=BF=E9=97=AE=E4=BA=A4=E6=8D=A2= =E6=9D=A1=E7=9B=AE=E7=9A=84=E8=A1=8C=E4=B8=BA=E9=83=BD=E4=BC=9A diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouvea= u/nouveau_svm.c index b4da82ddbb6b..39e3740980bb 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -609,10 +609,9 @@ static int nouveau_atomic_range_fault(struct nouveau_s= vmm *svmm, =20 notifier_seq =3D mmu_interval_read_begin(¬ifier->notifier); mmap_read_lock(mm); - ret =3D make_device_exclusive_range(mm, start, start + PAGE_SIZE, - &page, drm->dev); + page =3D make_device_exclusive(mm, start, drm->dev, &folio); mmap_read_unlock(mm); - if (ret <=3D 0 || !page) { + if (IS_ERR(page)) { ret =3D -EINVAL; goto out; } diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index e2dd57ca368b..d4e714661826 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -46,7 +46,7 @@ struct mmu_interval_notifier; * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will no * longer have exclusive access to the page. When sent during creation of = an * exclusive range the owner will be initialised to the value provided by = the - * caller of make_device_exclusive_range(), otherwise the owner will be NU= LL. + * caller of make_device_exclusive(), otherwise the owner will be NULL. */ enum mmu_notifier_event { MMU_NOTIFY_UNMAP =3D 0, diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 683a04088f3f..86425d42c1a9 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -663,9 +663,8 @@ int folio_referenced(struct folio *, int is_locked, void try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); =20 -int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, - unsigned long end, struct page **pages, - void *arg); +struct page *make_device_exclusive(struct mm_struct *mm, unsigned long add= r, + void *owner, struct folio **foliop); =20 /* Avoid racy checks */ #define PVMW_SYNC (1 << 0) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 056f2e411d7b..9e1b07a227a3 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -780,10 +780,8 @@ static int dmirror_exclusive(struct dmirror *dmirror, unsigned long start, end, addr; unsigned long size =3D cmd->npages << PAGE_SHIFT; struct mm_struct *mm =3D dmirror->notifier.mm; - struct page *pages[64]; struct dmirror_bounce bounce; - unsigned long next; - int ret; + int ret =3D 0; =20 start =3D cmd->addr; end =3D start + size; @@ -795,36 +793,31 @@ static int dmirror_exclusive(struct dmirror *dmirror, return -EINVAL; =20 mmap_read_lock(mm); - for (addr =3D start; addr < end; addr =3D next) { - unsigned long mapped =3D 0; - int i; - - next =3D min(end, addr + (ARRAY_SIZE(pages) << PAGE_SHIFT)); + for (addr =3D start; addr < end; addr +=3D PAGE_SIZE) { + struct folio *folio; + struct page *page; =20 - ret =3D make_device_exclusive_range(mm, addr, next, pages, NULL); - /* - * Do dmirror_atomic_map() iff all pages are marked for - * exclusive access to avoid accessing uninitialized - * fields of pages. - */ - if (ret =3D=3D (next - addr) >> PAGE_SHIFT) - mapped =3D dmirror_atomic_map(addr, next, pages, dmirror); - for (i =3D 0; i < ret; i++) { - if (pages[i]) { - unlock_page(pages[i]); - put_page(pages[i]); - } + page =3D make_device_exclusive(mm, addr, &folio, NULL); + if (IS_ERR(page)) { + ret =3D PTR_ERR(page); + break; } =20 - if (addr + (mapped << PAGE_SHIFT) < next) { - mmap_read_unlock(mm); - mmput(mm); - return -EBUSY; - } + ret =3D dmirror_atomic_map(addr, addr + PAGE_SIZE, &page, dmirror); + if (!ret) + ret =3D -EBUSY; + folio_unlock(folio); + folio_put(folio); + + if (ret) + break; } mmap_read_unlock(mm); mmput(mm); =20 + if (ret) + return -EBUSY; + /* Return the migrated data for verification. */ ret =3D dmirror_bounce_init(&bounce, start, size); if (ret) diff --git a/mm/rmap.c b/mm/rmap.c index 17fbfa61f7ef..676df4fba5b0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2495,70 +2495,78 @@ static bool folio_make_device_exclusive(struct foli= o *folio, .arg =3D &args, }; =20 - /* - * Restrict to anonymous folios for now to avoid potential writeback - * issues. - */ - if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) - return false; - rmap_walk(folio, &rwc); =20 return args.valid && !folio_mapcount(folio); } =20 /** - * make_device_exclusive_range() - Mark a range for exclusive use by a dev= ice + * make_device_exclusive() - Mark an address for exclusive use by a device * @mm: mm_struct of associated target process - * @start: start of the region to mark for exclusive device access - * @end: end address of region - * @pages: returns the pages which were successfully marked for exclusive = access + * @addr: the virtual address to mark for exclusive device access * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier to allow filtering + * @foliop: folio pointer will be stored here on success. + * + * This function looks up the page mapped at the given address, grabs a + * folio reference, locks the folio and replaces the PTE with special + * device-exclusive non-swap entry, preventing userspace CPU access. The + * function will return with the folio locked and referenced. * - * Returns: number of pages found in the range by GUP. A page is marked for - * exclusive access only if the page pointer is non-NULL. + * On fault these special device-exclusive entries are replaced with the + * original PTE under folio lock, after calling MMU notifiers. * - * This function finds ptes mapping page(s) to the given address range, lo= cks - * them and replaces mappings with special swap entries preventing userspa= ce CPU - * access. On fault these entries are replaced with the original mapping a= fter - * calling MMU notifiers. + * Only anonymous non-hugetlb folios are supported and the VMA must have + * write permissions such that we can fault in the anonymous page writable + * in order to mark it exclusive. The caller must hold the mmap_lock in re= ad + * mode. * * A driver using this to program access from a device must use a mmu noti= fier * critical section to hold a device specific lock during programming. Once * programming is complete it should drop the page lock and reference after * which point CPU access to the page will revoke the exclusive access. + * + * Returns: pointer to mapped page on success, otherwise a negative error. */ -int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, - unsigned long end, struct page **pages, - void *owner) +struct page *make_device_exclusive(struct mm_struct *mm, unsigned long add= r, + void *owner, struct folio **foliop) { - long npages =3D (end - start) >> PAGE_SHIFT; - long i; + struct folio *folio; + struct page *page; + long npages; + + mmap_assert_locked(mm); =20 - npages =3D get_user_pages_remote(mm, start, npages, + /* + * Fault in the page writable and try to lock it; note that if the + * address would already be marked for exclusive use by the device, + * the GUP call would undo that first by triggering a fault. + */ + npages =3D get_user_pages_remote(mm, addr, 1, FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, - pages, NULL); - if (npages < 0) - return npages; - - for (i =3D 0; i < npages; i++, start +=3D PAGE_SIZE) { - struct folio *folio =3D page_folio(pages[i]); - if (PageTail(pages[i]) || !folio_trylock(folio)) { - folio_put(folio); - pages[i] =3D NULL; - continue; - } + &page, NULL); + if (npages !=3D 1) + return ERR_PTR(npages); + folio =3D page_folio(page); =20 - if (!folio_make_device_exclusive(folio, mm, start, owner)) { - folio_unlock(folio); - folio_put(folio); - pages[i] =3D NULL; - } + if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) { + folio_put(folio); + return ERR_PTR(-EOPNOTSUPP); + } + + if (!folio_trylock(folio)) { + folio_put(folio); + return ERR_PTR(-EBUSY); } =20 - return npages; + if (!folio_make_device_exclusive(folio, mm, addr, owner)) { + folio_unlock(folio); + folio_put(folio); + return ERR_PTR(-EBUSY); + } + *foliop =3D folio; + return page; } -EXPORT_SYMBOL_GPL(make_device_exclusive_range); +EXPORT_SYMBOL_GPL(make_device_exclusive); #endif =20 void __put_anon_vma(struct anon_vma *anon_vma) --=20 2.48.1