From nobody Sat Feb 7 17:04:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2267EB64D7 for ; Tue, 13 Jun 2023 21:54:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229642AbjFMVym (ORCPT ); Tue, 13 Jun 2023 17:54:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230150AbjFMVyh (ORCPT ); Tue, 13 Jun 2023 17:54:37 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D6631BCD for ; Tue, 13 Jun 2023 14:53:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686693238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RMeXibFgr6WGH9TADmv6GKzvddxtLLfsVSaxyvBxKHs=; b=PARZNzKLdWiCszzTYXIw3Ca3t1mpR+kTz9+VafnEKoYOjj8VHujKQgkLkguCayCeo/XmIq cUW1giZO106cu4CU5OUBXZmwPOCAnKVGrcpn6NGDFsYyEIYMdgliE2wTMJO41Hw5Ui4Nq5 NQX5WpUp4kV69dmmhceRA1PI78RnDTM= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-371-R6PFmtyvP62yj4Agsx3PVw-1; Tue, 13 Jun 2023 17:53:57 -0400 X-MC-Unique: R6PFmtyvP62yj4Agsx3PVw-1 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-3f871a1e3f9so8070031cf.0 for ; Tue, 13 Jun 2023 14:53:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686693232; x=1689285232; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RMeXibFgr6WGH9TADmv6GKzvddxtLLfsVSaxyvBxKHs=; b=kfUo6tZ0VEfUFxaMyYXyqlWtTI0LPup5j6JAtCqN3kwHoDXl77JLxD1K7V3Ae6fNOT KIOaYJNiq5eK1DujZ6A7Kzyykma8rH2GorHIiFEBDVnLX3PvATJ9wB0VC4rIT0Y7SHCH wJ205OvH8k2GuzhLppnczhUdSsFr0RBiKXsmQ+jn70Hh8GwvYJJgDOrlWI0RO3l8tCiU peh2pKuCLZumNARaZVDlC4ARyBdZCeyUWAYiQXQyd5cuhM/x9kfgXnf4R08L+SbuDfvQ 4U4c0sm9X1+A7T6RRJmtSrI5TiHo31EuONJHCIeslFhQ2yLsMAcG2aPBgh/cXmS8BYhh pAcQ== X-Gm-Message-State: AC+VfDxyK7cfLU3+5j4b7UMAK8FGouZryLwR16z6+W8W1UpfNHwK/08T pzhulGNiN5d96GP0QAHWbE8j/nTUfXvcX+dqszq0n4jQoYGotrbFTivNTJL2YUP0htriCo86bin fzJmfACxlbUx7d2e1Y8jftAnHTF3Sc0BAR4YtJ78SKCzWOTXH+oUhvoEC3xHoQ7aR8nRoV4CwY2 1xkUbYmA== X-Received: by 2002:a05:622a:1887:b0:3f9:ab2c:8895 with SMTP id v7-20020a05622a188700b003f9ab2c8895mr18016485qtc.3.1686693232286; Tue, 13 Jun 2023 14:53:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ72MmVJ/KYdtXeaLxllJ/hfKOlDEtGiOvMuFZzCmuKll0yFpjBUzckwXKh9hVVGptkRtecEgQ== X-Received: by 2002:a05:622a:1887:b0:3f9:ab2c:8895 with SMTP id v7-20020a05622a188700b003f9ab2c8895mr18016457qtc.3.1686693231907; Tue, 13 Jun 2023 14:53:51 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id fz24-20020a05622a5a9800b003f9bccc3182sm4522330qtb.32.2023.06.13.14.53.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jun 2023 14:53:50 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , Andrea Arcangeli , John Hubbard , Mike Rapoport , David Hildenbrand , Vlastimil Babka , peterx@redhat.com, "Kirill A . Shutemov" , Andrew Morton , Mike Kravetz , James Houghton , Hugh Dickins Subject: [PATCH 1/7] mm/hugetlb: Handle FOLL_DUMP well in follow_page_mask() Date: Tue, 13 Jun 2023 17:53:40 -0400 Message-Id: <20230613215346.1022773-2-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230613215346.1022773-1-peterx@redhat.com> References: <20230613215346.1022773-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Firstly, the no_page_table() is meaningless for hugetlb which is a no-op there, because a hugetlb page always satisfies: - vma_is_anonymous() =3D=3D false - vma->vm_ops->fault !=3D NULL So we can already safely remove it in hugetlb_follow_page_mask(), alongside with the page* variable. Meanwhile, what we do in follow_hugetlb_page() actually makes sense for a dump: we try to fault in the page only if the page cache is already allocated. Let's do the same here for follow_page_mask() on hugetlb. It should so far has zero effect on real dumps, because that still goes into follow_hugetlb_page(). But this may start to influence a bit on follow_page() users who mimics a "dump page" scenario, but hopefully in a good way. This also paves way for unifying the hugetlb gup-slow. Signed-off-by: Peter Xu Reviewed-by: David Hildenbrand Reviewed-by: Mike Kravetz --- mm/gup.c | 9 ++------- mm/hugetlb.c | 9 +++++++++ 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index dbe96d266670..aa0668505d61 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -781,7 +781,6 @@ static struct page *follow_page_mask(struct vm_area_str= uct *vma, struct follow_page_context *ctx) { pgd_t *pgd; - struct page *page; struct mm_struct *mm =3D vma->vm_mm; =20 ctx->page_mask =3D 0; @@ -794,12 +793,8 @@ static struct page *follow_page_mask(struct vm_area_st= ruct *vma, * hugetlb_follow_page_mask is only for follow_page() handling here. * Ordinary GUP uses follow_hugetlb_page for hugetlb processing. */ - if (is_vm_hugetlb_page(vma)) { - page =3D hugetlb_follow_page_mask(vma, address, flags); - if (!page) - page =3D no_page_table(vma, flags); - return page; - } + if (is_vm_hugetlb_page(vma)) + return hugetlb_follow_page_mask(vma, address, flags); =20 pgd =3D pgd_offset(mm, address); =20 diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 270ec0ecd5a1..82dfdd96db4c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6501,6 +6501,15 @@ struct page *hugetlb_follow_page_mask(struct vm_area= _struct *vma, spin_unlock(ptl); out_unlock: hugetlb_vma_unlock_read(vma); + + /* + * Fixup retval for dump requests: if pagecache doesn't exist, + * don't try to allocate a new page but just skip it. + */ + if (!page && (flags & FOLL_DUMP) && + !hugetlbfs_pagecache_present(h, vma, address)) + page =3D ERR_PTR(-EFAULT); + return page; } =20 --=20 2.40.1 From nobody Sat Feb 7 17:04:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F640EB64D8 for ; Tue, 13 Jun 2023 21:56:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231482AbjFMV4z (ORCPT ); Tue, 13 Jun 2023 17:56:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238340AbjFMVyp (ORCPT ); Tue, 13 Jun 2023 17:54:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF8651BC7 for ; Tue, 13 Jun 2023 14:53:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686693238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J0Si/SHg9D7WH2a+07u6wT7XlhbAs86SLIvfSHyM0qs=; b=Q+yqSOV/aOY7Sh0HTAof0g4Dq66dDfhZ2HEBLLoxBL37r3htqeL1BsxK5jCEIPGI22IRm6 Kkwa0UjbOVwfYra6oAvx2ZNkb+4T5ajn29pitmBhr1fBQhOVVrmWYsprv/55gMca3nR/UT xmzZ3AEZThijUO8Id74qRYTlJikR02o= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-597-IykalVUaMxOB3S1dbCrppA-1; Tue, 13 Jun 2023 17:53:56 -0400 X-MC-Unique: IykalVUaMxOB3S1dbCrppA-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-3f86a3ce946so13356701cf.1 for ; Tue, 13 Jun 2023 14:53:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686693234; x=1689285234; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=J0Si/SHg9D7WH2a+07u6wT7XlhbAs86SLIvfSHyM0qs=; b=FgriEwdbUn4IhSel/Uo2tynK9r7/EHvkCyk8DnX/M/638+guuuOQFlycGiMESIR5e/ WNFcucWcvabdUSXLeKjYNnKas/24XpmOHKqQDRUxZHxCyrqUXQzEZymJFpQeABIIZ9Z3 Lm64oAuyScP7p53mPyLzM7yCY2Z93BmqxgLtODTa0DyHkLs/svzu6KVXwijr5izBx5hk hGYhOLyfwO5Uw2CWwTyAdvRQFXN+oamjsar+OACWiNC7Sr4rJ1z9fePDLmGgfc0FU9pl ZradThL1PpXbm3FBqMcDVKTo/46vlYb2S8IZLJokBCNkOvdFl1VadUHjAi6AgFCVWYN4 R26w== X-Gm-Message-State: AC+VfDzVfdbFq58DPGSzODxdv1Mh1vy5hyDTFrtME6SDn9KMKn25wLhJ ZdU5dKaireBKyz85Xn0wJvXzP3kDdfEBB2YjMEY/QIoybGMLx9dxXIyzZ/65bRNCxXgeO4tPoCu ZpENv2Xy19AZuCr3Sd4+6J/+Qv39KiUi8BZmp/Pynv9tOdH12ambh5VZmr2BSdf90w2ufGz7etP k/AoV2cQ== X-Received: by 2002:a05:622a:1981:b0:3f5:315f:5c1d with SMTP id u1-20020a05622a198100b003f5315f5c1dmr16598129qtc.4.1686693234514; Tue, 13 Jun 2023 14:53:54 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4tfbjVj7iW6/DtRd/s2e+Ti93d6WI6EV2BJqT02+lq+41oEnqcND6tzmbEH3x8gd+k6nduCQ== X-Received: by 2002:a05:622a:1981:b0:3f5:315f:5c1d with SMTP id u1-20020a05622a198100b003f5315f5c1dmr16598098qtc.4.1686693234224; Tue, 13 Jun 2023 14:53:54 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id fz24-20020a05622a5a9800b003f9bccc3182sm4522330qtb.32.2023.06.13.14.53.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jun 2023 14:53:53 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , Andrea Arcangeli , John Hubbard , Mike Rapoport , David Hildenbrand , Vlastimil Babka , peterx@redhat.com, "Kirill A . Shutemov" , Andrew Morton , Mike Kravetz , James Houghton , Hugh Dickins Subject: [PATCH 2/7] mm/hugetlb: Fix hugetlb_follow_page_mask() on permission checks Date: Tue, 13 Jun 2023 17:53:41 -0400 Message-Id: <20230613215346.1022773-3-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230613215346.1022773-1-peterx@redhat.com> References: <20230613215346.1022773-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" It seems hugetlb_follow_page_mask() was missing permission checks. For example, one follow_page() can get the hugetlb page with FOLL_WRITE even if the page is read-only. And it wasn't there even in the old follow_page_mask(), where we can reference from before commit 57a196a58421 ("hugetlb: simplify hugetlb handling in follow_page_mask"). Let's add them, namely, either the need to CoW due to missing write bit, or proper CoR on !AnonExclusive pages over R/O pins to reject the follow page. That brings this function closer to follow_hugetlb_page(). I just doubt how many of us care for that, for FOLL_PIN follow_page doesn't really happen at all. But we'll care, and care more if we switch over slow-gup to use hugetlb_follow_page_mask(). We'll also care when to return -EMLINK then, as that's the gup internal api to mean "we should do CoR". When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be clear that it just should never fail. Signed-off-by: Peter Xu Reviewed-by: Mike Kravetz --- mm/hugetlb.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 82dfdd96db4c..9c261921b2cf 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6481,8 +6481,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area= _struct *vma, ptl =3D huge_pte_lock(h, mm, pte); entry =3D huge_ptep_get(pte); if (pte_present(entry)) { - page =3D pte_page(entry) + - ((address & ~huge_page_mask(h)) >> PAGE_SHIFT); + page =3D pte_page(entry); + + if (gup_must_unshare(vma, flags, page)) { + /* Tell the caller to do Copy-On-Read */ + page =3D ERR_PTR(-EMLINK); + goto out; + } + + if ((flags & FOLL_WRITE) && !pte_write(entry)) { + page =3D NULL; + goto out; + } + + page +=3D ((address & ~huge_page_mask(h)) >> PAGE_SHIFT); + /* * Note that page may be a sub-page, and with vmemmap * optimizations the page struct may be read only. @@ -6492,10 +6505,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area= _struct *vma, * try_grab_page() should always be able to get the page here, * because we hold the ptl lock and have verified pte_present(). */ - if (try_grab_page(page, flags)) { - page =3D NULL; - goto out; - } + WARN_ON_ONCE(try_grab_page(page, flags)); } out: spin_unlock(ptl); --=20 2.40.1 From nobody Sat Feb 7 17:04:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 874C8EB64D8 for ; Tue, 13 Jun 2023 21:57:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229669AbjFMV5A (ORCPT ); Tue, 13 Jun 2023 17:57:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235428AbjFMVyp (ORCPT ); Tue, 13 Jun 2023 17:54:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D57D1BD2 for ; Tue, 13 Jun 2023 14:54:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686693242; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V6YucMw264yJynzunMB03zc5mw1fxLam6CUcenZWneI=; b=atXrYasODl2oXtPLbheyd6azWyJUXfhCX6jCuYWCwWqA+Qn4LydDg+h05g5rzSmXV8z4dX N+aQzuY/wS3FqApB04/zl5RY7bQU8zG6hxX+VAygIqLvl5veDYqzmUJ6m/06O32af7esuW ojqyLioy+UgTqFwKrYDvKqHtlMXB1kQ= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-549-L0cxunivNc21UapB-J4I-Q-1; Tue, 13 Jun 2023 17:54:00 -0400 X-MC-Unique: L0cxunivNc21UapB-J4I-Q-1 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-3fb2e6ca6eeso2354251cf.1 for ; Tue, 13 Jun 2023 14:53:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686693237; x=1689285237; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V6YucMw264yJynzunMB03zc5mw1fxLam6CUcenZWneI=; b=WNC8qqe0DZkcuAguAYOKdFggas8kHMIBAOP64vBQHaWKnJW0UqTb9T9mJ5fISW7qp7 dM2ybkf/tC1k1/jstkvpVNJLpnmeCACWzmWSyXw1ilfC9OCFXn1VbJEiY2+DzYPugwni X9Cijbpw83d44ZyLsmr5OthbczeTr+EtjK0yDhwrY3EQZkMXdf4jShfe4eszJkS4DXlu IzBo/kFsccqyPZsOD85RIodK6WFOOyMK6B5r/fPVmpAUP9rW8wL1JOLKzQNTKU+uMaa3 3qroVFYRuMOhQahu781Q+MzzHVG0DXLzmDRwwjFzMzm6YPQFQaUXMGpG4Squ0tuTKHky NEmQ== X-Gm-Message-State: AC+VfDzlU98rLvpCaLvgRD6PxcY1aJ9Y5SBAAfDP1emFwbfaLfzHG0P+ bzq+NeXkQfwBX9XwSUxIUNWBJfTRcc5hJotkvXhuowx0vMQJcWh9ukTGoaKW1JMkAdKADKmNTLQ 8uIC0sT4RA0wWFnc1e35HDevFUxbCjTfjOZcskHIDFoPUTQ14P4/I93qochMiOd9mVnnkdvi0bp iXrjl9uw== X-Received: by 2002:ac8:5c4e:0:b0:3f6:b556:7c97 with SMTP id j14-20020ac85c4e000000b003f6b5567c97mr18596651qtj.4.1686693237116; Tue, 13 Jun 2023 14:53:57 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ47VYSlqHcJsfV8CfLHB7UhI/dxHot2PSaSbvtLl2psd47hStRHqC+I7+xcOOhhfmPV53DjgA== X-Received: by 2002:ac8:5c4e:0:b0:3f6:b556:7c97 with SMTP id j14-20020ac85c4e000000b003f6b5567c97mr18596618qtj.4.1686693236845; Tue, 13 Jun 2023 14:53:56 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id fz24-20020a05622a5a9800b003f9bccc3182sm4522330qtb.32.2023.06.13.14.53.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jun 2023 14:53:56 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , Andrea Arcangeli , John Hubbard , Mike Rapoport , David Hildenbrand , Vlastimil Babka , peterx@redhat.com, "Kirill A . Shutemov" , Andrew Morton , Mike Kravetz , James Houghton , Hugh Dickins Subject: [PATCH 3/7] mm/hugetlb: Add page_mask for hugetlb_follow_page_mask() Date: Tue, 13 Jun 2023 17:53:42 -0400 Message-Id: <20230613215346.1022773-4-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230613215346.1022773-1-peterx@redhat.com> References: <20230613215346.1022773-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" follow_page() doesn't need it, but we'll start to need it when unifying gup for hugetlb. Signed-off-by: Peter Xu Reviewed-by: David Hildenbrand Reviewed-by: Mike Kravetz --- include/linux/hugetlb.h | 8 +++++--- mm/gup.c | 3 ++- mm/hugetlb.c | 4 +++- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 21f942025fec..0d6f389d98de 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -131,7 +131,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *, struct vm_area_struct *); struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, - unsigned long address, unsigned int flags); + unsigned long address, unsigned int flags, + unsigned int *page_mask); long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, unsigned long *, unsigned long *, long, unsigned int, int *); @@ -297,8 +298,9 @@ static inline void adjust_range_if_pmd_sharing_possible( { } =20 -static inline struct page *hugetlb_follow_page_mask(struct vm_area_struct = *vma, - unsigned long address, unsigned int flags) +static inline struct page *hugetlb_follow_page_mask( + struct vm_area_struct *vma, unsigned long address, unsigned int flags, + unsigned int *page_mask) { BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/ } diff --git a/mm/gup.c b/mm/gup.c index aa0668505d61..8d59ae4554e7 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -794,7 +794,8 @@ static struct page *follow_page_mask(struct vm_area_str= uct *vma, * Ordinary GUP uses follow_hugetlb_page for hugetlb processing. */ if (is_vm_hugetlb_page(vma)) - return hugetlb_follow_page_mask(vma, address, flags); + return hugetlb_follow_page_mask(vma, address, flags, + &ctx->page_mask); =20 pgd =3D pgd_offset(mm, address); =20 diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9c261921b2cf..f037eaf9d819 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6457,7 +6457,8 @@ static inline bool __follow_hugetlb_must_fault(struct= vm_area_struct *vma, } =20 struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, - unsigned long address, unsigned int flags) + unsigned long address, unsigned int flags, + unsigned int *page_mask) { struct hstate *h =3D hstate_vma(vma); struct mm_struct *mm =3D vma->vm_mm; @@ -6506,6 +6507,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_= struct *vma, * because we hold the ptl lock and have verified pte_present(). */ WARN_ON_ONCE(try_grab_page(page, flags)); + *page_mask =3D huge_page_mask(h); } out: spin_unlock(ptl); --=20 2.40.1 From nobody Sat Feb 7 17:04:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 020C5EB64D7 for ; Tue, 13 Jun 2023 21:55:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231950AbjFMVzE (ORCPT ); Tue, 13 Jun 2023 17:55:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231975AbjFMVy7 (ORCPT ); Tue, 13 Jun 2023 17:54:59 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB9A81BDA for ; Tue, 13 Jun 2023 14:54:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686693251; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nVhPI7Qoz7ZqnzsG9+Pwmnk6AKrA3Y9yvdpVEt0mHXM=; b=D5w6/rEXPnNuZccOm0IGyU7DxTPzzNTB1q6pvGDJ8fcJtyGF601vQuZ4DPfrFDTibFN96T kCjSq5AZbf6Mii5yW7x5672VwCVR+orT2u0cND+AWKWJhb2WL2yTvHlN4/8jJnkY3mn3xS pWB6izUF++9ZqIigsmgQOVI95Nk9Cd0= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-657-um5Iev9jM6-lj1Mt6ltFQQ-1; Tue, 13 Jun 2023 17:54:01 -0400 X-MC-Unique: um5Iev9jM6-lj1Mt6ltFQQ-1 Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-3fb2e6ca6eeso2354341cf.1 for ; Tue, 13 Jun 2023 14:54:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686693240; x=1689285240; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nVhPI7Qoz7ZqnzsG9+Pwmnk6AKrA3Y9yvdpVEt0mHXM=; b=C/IkRDqnSdoerU5bi0Fhw1AbXTZ5ueAMOiUk8D8ZNv3dhVhtuWR5g1AjAQjbmMDj5i jg/gR+lPQIE1+vGCGF1dQUaVy2CmAtou+OqS6w9ofZrOeuqMJJwaXEsETcfm2+AYYDrG Hn424eNPQp2rPL2FB/qihlC1/BFz6AONlzCf6F5F/HlPb9errCraPKgGObOGDQnl9xQb W1nUlD/E7dC2Yr+UDkBXLoCZ2IrDiTZFVRtScWfec66OSY6NiDVP/gfJjNt8r2r/JZIC 6LRBkw4YhsNBGtps0IOY3c7SDZNsUG6drsgtNIpInPxn0x/wW1a83dgqA/3ZxNn+MJ0P YnpA== X-Gm-Message-State: AC+VfDxc/5l/M4xt8wpe+gFi3t3jcsGYCbhjznNSNxzX4SdKL3pTt85+ EK2NqOONaMCkwMnA79tLA8pFBHGc0KXkSbALTkkpEIILXe2TvaK6Jl73YhLM5d9eE2GFcLgbZvH 6nUQmZ2r1RTmEjBQEs5DMMfeU6009FBPgMMsbFFno2I1flv9+O7ofaBBcaaswv6jWovDYqQYqK4 NVFeaGbA== X-Received: by 2002:a05:622a:288:b0:3ef:3dc3:4a3e with SMTP id z8-20020a05622a028800b003ef3dc34a3emr17942049qtw.0.1686693239911; Tue, 13 Jun 2023 14:53:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4MJSKcPdnj4lChUYnJRMcBfBWV2PeylliGTEYFlrviAqe21ltK55iRSzWqqMvZUCA8so0tsg== X-Received: by 2002:a05:622a:288:b0:3ef:3dc3:4a3e with SMTP id z8-20020a05622a028800b003ef3dc34a3emr17942021qtw.0.1686693239513; Tue, 13 Jun 2023 14:53:59 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id fz24-20020a05622a5a9800b003f9bccc3182sm4522330qtb.32.2023.06.13.14.53.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jun 2023 14:53:58 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , Andrea Arcangeli , John Hubbard , Mike Rapoport , David Hildenbrand , Vlastimil Babka , peterx@redhat.com, "Kirill A . Shutemov" , Andrew Morton , Mike Kravetz , James Houghton , Hugh Dickins Subject: [PATCH 4/7] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN Date: Tue, 13 Jun 2023 17:53:43 -0400 Message-Id: <20230613215346.1022773-5-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230613215346.1022773-1-peterx@redhat.com> References: <20230613215346.1022773-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" It's coming, not yet, but soon. Loose the restriction. Signed-off-by: Peter Xu --- mm/hugetlb.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f037eaf9d819..31d8f18bc2e4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6467,13 +6467,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area= _struct *vma, spinlock_t *ptl; pte_t *pte, entry; =20 - /* - * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via - * follow_hugetlb_page(). - */ - if (WARN_ON_ONCE(flags & FOLL_PIN)) - return NULL; - hugetlb_vma_lock_read(vma); pte =3D hugetlb_walk(vma, haddr, huge_page_size(h)); if (!pte) --=20 2.40.1 From nobody Sat Feb 7 17:04:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 380F3EB64DA for ; Tue, 13 Jun 2023 21:55:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239081AbjFMVzA (ORCPT ); Tue, 13 Jun 2023 17:55:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235185AbjFMVyo (ORCPT ); Tue, 13 Jun 2023 17:54:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5079D1BD3 for ; Tue, 13 Jun 2023 14:54:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686693245; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P+etnUAQLV4NR7aCVrXKWbeziVnwWOuYfO0O45I26FE=; b=X+LsFmVuAb+J+kUzExri4EsyTs8FY9mvGRF58p10AYaZDRP6HlZufjx/wGrP8QlUs0hDjk s+yaGIGjnTTyvsFtDMXExoH/SZfWqc+qqh7EvLiFEq6EYcBUrPynVFpdJ/GllSfIaRgr8I 0T84nMwlxCHtkeG4DY1487XMtTU1atk= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-328-N_KwzfysNCSjx2fTyoimkA-1; Tue, 13 Jun 2023 17:54:03 -0400 X-MC-Unique: N_KwzfysNCSjx2fTyoimkA-1 Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-3f8283a3a7aso13777301cf.1 for ; Tue, 13 Jun 2023 14:54:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686693242; x=1689285242; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P+etnUAQLV4NR7aCVrXKWbeziVnwWOuYfO0O45I26FE=; b=AYS6A7xHwr8AUU90mM8VRAkicmlyjemGPL2s8RmgAHlYr/N0xxUZyroi4xyxEO1RNp GumhL83+lWFlQYQIi+B6nRXeFU2Q9sx2siGBcYzUh3XAFJuqRcXx7D5VZyHdm0baEYj8 T/JU6T+mYYQJdT9jtjMVI2oiZu9kjzA0/LySE1K/kIVfvVNlxjUXiNhJw2VyE4UtfNpU S1WuXsVLDpkBGD4KMMeKuo5j4GKsOlDHZIYJPzM7HKddeDyKzdIbZRIAiojX+8Jjn+n0 3HI0bp4tnKYhuAXvx3dTj/Az9JeUcIO0D/aL1INzQoq0hh5jVWAEgzdAO6rWvByEQhLm Ytgw== X-Gm-Message-State: AC+VfDxSnCdJ2LrH383rr3XvkN6kC2DjjiSvzitywKhqrJ0512CJBXbr tP0AlY4S+5cyqfUpVH2XDAlyQkZ31rFF/e1Nh/YDtkimAIuWFThb29TQ63Oj0pUHgymuqQgmjJ+ AiyWuFRGAxA3w+mabYFobV+DajCxa3uYBD+gmOzibKYNEi63cOP1dIOOwYxeMu3z8n8LC6UHCIO pbDQlB2A== X-Received: by 2002:a05:622a:288:b0:3e3:c889:ecf9 with SMTP id z8-20020a05622a028800b003e3c889ecf9mr18140069qtw.1.1686693242232; Tue, 13 Jun 2023 14:54:02 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ63MJGi+Hqlnli2JVzskYtPLZcbne7s+MQZPsj8TgX9mGHr+EIA5dm91k8Xq113ELiGPseUlA== X-Received: by 2002:a05:622a:288:b0:3e3:c889:ecf9 with SMTP id z8-20020a05622a028800b003e3c889ecf9mr18140043qtw.1.1686693241922; Tue, 13 Jun 2023 14:54:01 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id fz24-20020a05622a5a9800b003f9bccc3182sm4522330qtb.32.2023.06.13.14.53.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jun 2023 14:54:01 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , Andrea Arcangeli , John Hubbard , Mike Rapoport , David Hildenbrand , Vlastimil Babka , peterx@redhat.com, "Kirill A . Shutemov" , Andrew Morton , Mike Kravetz , James Houghton , Hugh Dickins Subject: [PATCH 5/7] mm/gup: Cleanup next_page handling Date: Tue, 13 Jun 2023 17:53:44 -0400 Message-Id: <20230613215346.1022773-6-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230613215346.1022773-1-peterx@redhat.com> References: <20230613215346.1022773-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The only path that doesn't use generic "**pages" handling is the gate vma. Make it use the same path, meanwhile tune the next_page label upper to cover "**pages" handling. This prepares for THP handling for "**pages". Signed-off-by: Peter Xu Reviewed-by: Lorenzo Stoakes --- mm/gup.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 8d59ae4554e7..a2d1b3c4b104 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1135,7 +1135,7 @@ static long __get_user_pages(struct mm_struct *mm, if (!vma && in_gate_area(mm, start)) { ret =3D get_gate_page(mm, start & PAGE_MASK, gup_flags, &vma, - pages ? &pages[i] : NULL); + pages ? &page : NULL); if (ret) goto out; ctx.page_mask =3D 0; @@ -1205,19 +1205,18 @@ static long __get_user_pages(struct mm_struct *mm, ret =3D PTR_ERR(page); goto out; } - - goto next_page; } else if (IS_ERR(page)) { ret =3D PTR_ERR(page); goto out; } +next_page: if (pages) { pages[i] =3D page; flush_anon_page(vma, page, start); flush_dcache_page(page); ctx.page_mask =3D 0; } -next_page: + page_increm =3D 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask); if (page_increm > nr_pages) page_increm =3D nr_pages; --=20 2.40.1 From nobody Sat Feb 7 17:04:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EF9FEB64D7 for ; Tue, 13 Jun 2023 21:55:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240430AbjFMVzJ (ORCPT ); Tue, 13 Jun 2023 17:55:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231490AbjFMVzA (ORCPT ); Tue, 13 Jun 2023 17:55:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23C7E1BDB for ; Tue, 13 Jun 2023 14:54:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686693252; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HwdaHYLXZAtZykSCGumfFt7VLYdtQYhxN3KKFt+1ERs=; b=WCWQc/c0JjyggKyki1eLvLVeLlV58MZVpK3znCwWwCp1jUVKCxEZInvijX+ONix375sTnW 1nCUHuCQMYj5M4cCnmqF9tmhXBS2IjKrH4zKLmxBZHDIArR1UktMUNiNA+WM/hBnvNKBAb emU35NDQ4v5zHY8oXHI/loMZewxG6vc= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-99-Qquw00nBMdWYlcgd7xZzzw-1; Tue, 13 Jun 2023 17:54:08 -0400 X-MC-Unique: Qquw00nBMdWYlcgd7xZzzw-1 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-7606a134623so44621685a.1 for ; Tue, 13 Jun 2023 14:54:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686693245; x=1689285245; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HwdaHYLXZAtZykSCGumfFt7VLYdtQYhxN3KKFt+1ERs=; b=gwFvz4E+SPJwdeBQGrRqqR6ghu72gSd0V+97Ws+Ad+rQbeEMv7x8LtNjp7BTGkaBKM Ba8Kzl/tlHg1tChRj6kw6h5g9jhaa0zw2T6kPydgkuj8NyCu2p1LwSWyFdZFzoYBMFe4 EPo5zbKzRfJSn3MGvYdReev7xCjz2WdjSG/nWsEd5w44RxLed4oY1Ws8DjBhWiLW+LXQ /rxwJMX3DVS2GbETOwhrovbONH0JkLMjlaTc6Fd9LzUAJnqBcEalsyzVK+QD0s/9M69A X5XO1yOYyOWDTiFTFJgawgUC6J2oiq1gwXz/16LrPw0VquV/5T9zUo4fCghLuoNjjQus S7tQ== X-Gm-Message-State: AC+VfDx4ZeuQeh7r+4cPopGcKmpzQdTEovHOQk6neFh9kwgmV2fc4rld C9/FrZmYxY4QBbYMxZSyW3qe4TGBIDgVaYkIKW/oPZSdBNV82gfnqMxMR3RD94e4fZz4n+kn8VW rAwicqEb266o3Zi6ejeJKjnffbrjKz+wEz7typFaku1rNLiHCaFOhn3/91QZFBrIQta+1r02nEO SWSG91ig== X-Received: by 2002:ac8:5c4e:0:b0:3f6:b556:7c97 with SMTP id j14-20020ac85c4e000000b003f6b5567c97mr18597060qtj.4.1686693245305; Tue, 13 Jun 2023 14:54:05 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5oqukQ8w9LAlWsT5JOrpxotagWUzu2HoxH3Yt91lRCBju32vz1LdI431t9L/5PA4lN4Cy+gA== X-Received: by 2002:ac8:5c4e:0:b0:3f6:b556:7c97 with SMTP id j14-20020ac85c4e000000b003f6b5567c97mr18597030qtj.4.1686693244966; Tue, 13 Jun 2023 14:54:04 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id fz24-20020a05622a5a9800b003f9bccc3182sm4522330qtb.32.2023.06.13.14.54.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jun 2023 14:54:04 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , Andrea Arcangeli , John Hubbard , Mike Rapoport , David Hildenbrand , Vlastimil Babka , peterx@redhat.com, "Kirill A . Shutemov" , Andrew Morton , Mike Kravetz , James Houghton , Hugh Dickins Subject: [PATCH 6/7] mm/gup: Accelerate thp gup even for "pages != NULL" Date: Tue, 13 Jun 2023 17:53:45 -0400 Message-Id: <20230613215346.1022773-7-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230613215346.1022773-1-peterx@redhat.com> References: <20230613215346.1022773-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The acceleration of THP was done with ctx.page_mask, however it'll be ignored if **pages is non-NULL. The old optimization was introduced in 2013 in 240aadeedc4a ("mm: accelerate mm_populate() treatment of THP pages"). It didn't explain why we can't optimize the **pages non-NULL case. It's possible that at that time the major goal was for mm_populate() which should be enough back then. Optimize thp for all cases, by properly looping over each subpage, doing cache flushes, and boost refcounts / pincounts where needed in one go. This can be verified using gup_test below: # chrt -f 1 ./gup_test -m 512 -t -L -n 1024 -r 10 Before: 13992.50 ( +-8.75%) After: 378.50 (+-69.62%) Signed-off-by: Peter Xu --- mm/gup.c | 36 +++++++++++++++++++++++++++++------- 1 file changed, 29 insertions(+), 7 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index a2d1b3c4b104..cdabc8ea783b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1210,16 +1210,38 @@ static long __get_user_pages(struct mm_struct *mm, goto out; } next_page: - if (pages) { - pages[i] =3D page; - flush_anon_page(vma, page, start); - flush_dcache_page(page); - ctx.page_mask =3D 0; - } - page_increm =3D 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask); if (page_increm > nr_pages) page_increm =3D nr_pages; + + if (pages) { + struct page *subpage; + unsigned int j; + + /* + * This must be a large folio (and doesn't need to + * be the whole folio; it can be part of it), do + * the refcount work for all the subpages too. + * Since we already hold refcount on the head page, + * it should never fail. + * + * NOTE: here the page may not be the head page + * e.g. when start addr is not thp-size aligned. + */ + if (page_increm > 1) + WARN_ON_ONCE( + try_grab_folio(compound_head(page), + page_increm - 1, + foll_flags) =3D=3D NULL); + + for (j =3D 0; j < page_increm; j++) { + subpage =3D nth_page(page, j); + pages[i+j] =3D subpage; + flush_anon_page(vma, subpage, start + j * PAGE_SIZE); + flush_dcache_page(subpage); + } + } + i +=3D page_increm; start +=3D page_increm * PAGE_SIZE; nr_pages -=3D page_increm; --=20 2.40.1 From nobody Sat Feb 7 17:04:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7274DEB64D7 for ; Tue, 13 Jun 2023 21:55:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240698AbjFMVzg (ORCPT ); Tue, 13 Jun 2023 17:55:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240613AbjFMVzP (ORCPT ); Tue, 13 Jun 2023 17:55:15 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC63A1BDC for ; Tue, 13 Jun 2023 14:54:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686693260; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/X3bGqn1q+zHUVNMkc4k9dYCbdcixuWgARgx8VaUnYY=; b=DrjKZY1QVUNpXUpJtKdtOL0qFmlbABWMUw19bDqxY2o6kclWhKL9qB0Z/jjxlljYlS+5en CuKG1XSruWkpb6BDA9LxKQ/fG4ELeruHNDeaKgBGal1OebTOhawKCsRY4mryiT1dqZUVpH b8ePr3WE+AQCh1LVYZEtx4GO7RJYE64= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-609-Re07rdKCOm2k1DNzHMMkmQ-1; Tue, 13 Jun 2023 17:54:12 -0400 X-MC-Unique: Re07rdKCOm2k1DNzHMMkmQ-1 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-3f9eb7d5202so4777581cf.1 for ; Tue, 13 Jun 2023 14:54:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686693248; x=1689285248; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/X3bGqn1q+zHUVNMkc4k9dYCbdcixuWgARgx8VaUnYY=; b=AoG4240tIPryGS/LdRBWR8xQyCpgjnejGQUnPAPdbeN4hZ6kqZQTxJZ+aV6g/HICic hS7x4F8i3phNpHJpehqFiawf9cAigQegz1gXVA7WD6z2w1Wra037M4/atXkd3tQ/4yYt iUjN0EQAMtux7VEfW8CIouwK/5qdFk2vB+Zr1BsmGTlVasFpeumxVgSiqnpSHs9sRmVx 0/+cMmXIeaPSZ+GBPlr+BbD0uJ7GKWvDnySgqOq6kbPlmAKtUjtlTV3+0XUY0Z3u9YlY YkLsuESMQjjkC52Q5yRRYHhQKUCoPImpdTnlk6PoiIz3INwLC0TGvWdCvaMIMnCDBI3w bHrw== X-Gm-Message-State: AC+VfDxDVXEQ9rWrMRMaipLfD9DrgACEZYdXHvVslhiPfJK++0xgKuQF 21hpUICY886M+0HmWdY8sSfPIPToU3MW1cNzxhsL7073ScEVHcrKDs8oyOzK/AhnJT6K4I3htpx ySXeiUaxVZ9cN6u8JgtSU8nubi3b4VL00DZo3GXGce9MlghQVUyxkcg4T4rjXHxALVmsqzVT10U FQoGel1w== X-Received: by 2002:a05:622a:1a02:b0:3f6:ab9a:3d8e with SMTP id f2-20020a05622a1a0200b003f6ab9a3d8emr17758141qtb.4.1686693247628; Tue, 13 Jun 2023 14:54:07 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5w87HGnELUi5EkIzhKvNW57lYdNaWbCEMcUpk8sbBZqYIrssusFuVowDO7JEjTYM2qoSnZFw== X-Received: by 2002:a05:622a:1a02:b0:3f6:ab9a:3d8e with SMTP id f2-20020a05622a1a0200b003f6ab9a3d8emr17758109qtb.4.1686693247132; Tue, 13 Jun 2023 14:54:07 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id fz24-20020a05622a5a9800b003f9bccc3182sm4522330qtb.32.2023.06.13.14.54.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jun 2023 14:54:06 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , Andrea Arcangeli , John Hubbard , Mike Rapoport , David Hildenbrand , Vlastimil Babka , peterx@redhat.com, "Kirill A . Shutemov" , Andrew Morton , Mike Kravetz , James Houghton , Hugh Dickins Subject: [PATCH 7/7] mm/gup: Retire follow_hugetlb_page() Date: Tue, 13 Jun 2023 17:53:46 -0400 Message-Id: <20230613215346.1022773-8-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230613215346.1022773-1-peterx@redhat.com> References: <20230613215346.1022773-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Now __get_user_pages() should be well prepared to handle thp completely, as long as hugetlb gup requests even without the hugetlb's special path. Time to retire follow_hugetlb_page(). Tweak the comments in follow_page_mask() to reflect reality, by dropping the "follow_page()" description. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 12 --- mm/gup.c | 19 ---- mm/hugetlb.c | 223 ---------------------------------------- 3 files changed, 254 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 0d6f389d98de..44e5836eed15 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -133,9 +133,6 @@ int copy_hugetlb_page_range(struct mm_struct *, struct = mm_struct *, struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, unsigned long address, unsigned int flags, unsigned int *page_mask); -long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, - struct page **, unsigned long *, unsigned long *, - long, unsigned int, int *); void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, struct page *, zap_flags_t); @@ -305,15 +302,6 @@ static inline struct page *hugetlb_follow_page_mask( BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/ } =20 -static inline long follow_hugetlb_page(struct mm_struct *mm, - struct vm_area_struct *vma, struct page **pages, - unsigned long *position, unsigned long *nr_pages, - long i, unsigned int flags, int *nonblocking) -{ - BUG(); - return 0; -} - static inline int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *dst_vma, diff --git a/mm/gup.c b/mm/gup.c index cdabc8ea783b..a65b80953b7a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -789,9 +789,6 @@ static struct page *follow_page_mask(struct vm_area_str= uct *vma, * Call hugetlb_follow_page_mask for hugetlb vmas as it will use * special hugetlb page table walking code. This eliminates the * need to check for hugetlb entries in the general walking code. - * - * hugetlb_follow_page_mask is only for follow_page() handling here. - * Ordinary GUP uses follow_hugetlb_page for hugetlb processing. */ if (is_vm_hugetlb_page(vma)) return hugetlb_follow_page_mask(vma, address, flags, @@ -1149,22 +1146,6 @@ static long __get_user_pages(struct mm_struct *mm, ret =3D check_vma_flags(vma, gup_flags); if (ret) goto out; - - if (is_vm_hugetlb_page(vma)) { - i =3D follow_hugetlb_page(mm, vma, pages, - &start, &nr_pages, i, - gup_flags, locked); - if (!*locked) { - /* - * We've got a VM_FAULT_RETRY - * and we've lost mmap_lock. - * We must stop here. - */ - BUG_ON(gup_flags & FOLL_NOWAIT); - goto out; - } - continue; - } } retry: /* diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 31d8f18bc2e4..b7ff413ff68b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6425,37 +6425,6 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, } #endif /* CONFIG_USERFAULTFD */ =20 -static void record_subpages(struct page *page, struct vm_area_struct *vma, - int refs, struct page **pages) -{ - int nr; - - for (nr =3D 0; nr < refs; nr++) { - if (likely(pages)) - pages[nr] =3D nth_page(page, nr); - } -} - -static inline bool __follow_hugetlb_must_fault(struct vm_area_struct *vma, - unsigned int flags, pte_t *pte, - bool *unshare) -{ - pte_t pteval =3D huge_ptep_get(pte); - - *unshare =3D false; - if (is_swap_pte(pteval)) - return true; - if (huge_pte_write(pteval)) - return false; - if (flags & FOLL_WRITE) - return true; - if (gup_must_unshare(vma, flags, pte_page(pteval))) { - *unshare =3D true; - return true; - } - return false; -} - struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, unsigned long address, unsigned int flags, unsigned int *page_mask) @@ -6518,198 +6487,6 @@ struct page *hugetlb_follow_page_mask(struct vm_are= a_struct *vma, return page; } =20 -long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, - struct page **pages, unsigned long *position, - unsigned long *nr_pages, long i, unsigned int flags, - int *locked) -{ - unsigned long pfn_offset; - unsigned long vaddr =3D *position; - unsigned long remainder =3D *nr_pages; - struct hstate *h =3D hstate_vma(vma); - int err =3D -EFAULT, refs; - - while (vaddr < vma->vm_end && remainder) { - pte_t *pte; - spinlock_t *ptl =3D NULL; - bool unshare =3D false; - int absent; - struct page *page; - - /* - * If we have a pending SIGKILL, don't keep faulting pages and - * potentially allocating memory. - */ - if (fatal_signal_pending(current)) { - remainder =3D 0; - break; - } - - hugetlb_vma_lock_read(vma); - /* - * Some archs (sparc64, sh*) have multiple pte_ts to - * each hugepage. We have to make sure we get the - * first, for the page indexing below to work. - * - * Note that page table lock is not held when pte is null. - */ - pte =3D hugetlb_walk(vma, vaddr & huge_page_mask(h), - huge_page_size(h)); - if (pte) - ptl =3D huge_pte_lock(h, mm, pte); - absent =3D !pte || huge_pte_none(huge_ptep_get(pte)); - - /* - * When coredumping, it suits get_dump_page if we just return - * an error where there's an empty slot with no huge pagecache - * to back it. This way, we avoid allocating a hugepage, and - * the sparse dumpfile avoids allocating disk blocks, but its - * huge holes still show up with zeroes where they need to be. - */ - if (absent && (flags & FOLL_DUMP) && - !hugetlbfs_pagecache_present(h, vma, vaddr)) { - if (pte) - spin_unlock(ptl); - hugetlb_vma_unlock_read(vma); - remainder =3D 0; - break; - } - - /* - * We need call hugetlb_fault for both hugepages under migration - * (in which case hugetlb_fault waits for the migration,) and - * hwpoisoned hugepages (in which case we need to prevent the - * caller from accessing to them.) In order to do this, we use - * here is_swap_pte instead of is_hugetlb_entry_migration and - * is_hugetlb_entry_hwpoisoned. This is because it simply covers - * both cases, and because we can't follow correct pages - * directly from any kind of swap entries. - */ - if (absent || - __follow_hugetlb_must_fault(vma, flags, pte, &unshare)) { - vm_fault_t ret; - unsigned int fault_flags =3D 0; - - if (pte) - spin_unlock(ptl); - hugetlb_vma_unlock_read(vma); - - if (flags & FOLL_WRITE) - fault_flags |=3D FAULT_FLAG_WRITE; - else if (unshare) - fault_flags |=3D FAULT_FLAG_UNSHARE; - if (locked) { - fault_flags |=3D FAULT_FLAG_ALLOW_RETRY | - FAULT_FLAG_KILLABLE; - if (flags & FOLL_INTERRUPTIBLE) - fault_flags |=3D FAULT_FLAG_INTERRUPTIBLE; - } - if (flags & FOLL_NOWAIT) - fault_flags |=3D FAULT_FLAG_ALLOW_RETRY | - FAULT_FLAG_RETRY_NOWAIT; - if (flags & FOLL_TRIED) { - /* - * Note: FAULT_FLAG_ALLOW_RETRY and - * FAULT_FLAG_TRIED can co-exist - */ - fault_flags |=3D FAULT_FLAG_TRIED; - } - ret =3D hugetlb_fault(mm, vma, vaddr, fault_flags); - if (ret & VM_FAULT_ERROR) { - err =3D vm_fault_to_errno(ret, flags); - remainder =3D 0; - break; - } - if (ret & VM_FAULT_RETRY) { - if (locked && - !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) - *locked =3D 0; - *nr_pages =3D 0; - /* - * VM_FAULT_RETRY must not return an - * error, it will return zero - * instead. - * - * No need to update "position" as the - * caller will not check it after - * *nr_pages is set to 0. - */ - return i; - } - continue; - } - - pfn_offset =3D (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; - page =3D pte_page(huge_ptep_get(pte)); - - VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && - !PageAnonExclusive(page), page); - - /* - * If subpage information not requested, update counters - * and skip the same_page loop below. - */ - if (!pages && !pfn_offset && - (vaddr + huge_page_size(h) < vma->vm_end) && - (remainder >=3D pages_per_huge_page(h))) { - vaddr +=3D huge_page_size(h); - remainder -=3D pages_per_huge_page(h); - i +=3D pages_per_huge_page(h); - spin_unlock(ptl); - hugetlb_vma_unlock_read(vma); - continue; - } - - /* vaddr may not be aligned to PAGE_SIZE */ - refs =3D min3(pages_per_huge_page(h) - pfn_offset, remainder, - (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT); - - if (pages) - record_subpages(nth_page(page, pfn_offset), - vma, refs, - likely(pages) ? pages + i : NULL); - - if (pages) { - /* - * try_grab_folio() should always succeed here, - * because: a) we hold the ptl lock, and b) we've just - * checked that the huge page is present in the page - * tables. If the huge page is present, then the tail - * pages must also be present. The ptl prevents the - * head page and tail pages from being rearranged in - * any way. As this is hugetlb, the pages will never - * be p2pdma or not longterm pinable. So this page - * must be available at this point, unless the page - * refcount overflowed: - */ - if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs, - flags))) { - spin_unlock(ptl); - hugetlb_vma_unlock_read(vma); - remainder =3D 0; - err =3D -ENOMEM; - break; - } - } - - vaddr +=3D (refs << PAGE_SHIFT); - remainder -=3D refs; - i +=3D refs; - - spin_unlock(ptl); - hugetlb_vma_unlock_read(vma); - } - *nr_pages =3D remainder; - /* - * setting position is actually required only if remainder is - * not zero but it's faster not to add a "if (remainder)" - * branch. - */ - *position =3D vaddr; - - return i ? i : err; -} - long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot, unsigned long cp_flags) --=20 2.40.1