From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD5D6C43334 for ; Fri, 24 Jun 2022 17:38:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232273AbiFXRiU (ORCPT ); Fri, 24 Jun 2022 13:38:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35880 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231300AbiFXRhN (ORCPT ); Fri, 24 Jun 2022 13:37:13 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 596C35DF25 for ; Fri, 24 Jun 2022 10:37:12 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id c10-20020a251c0a000000b00669b463d2e1so2744999ybc.11 for ; Fri, 24 Jun 2022 10:37:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=0YfnJKuQrbw7pcq3GiMHpIZjYaYv7/J4Psozl7GAbkY=; b=rhedzQCKx2wNhVDSGuAJpH9VZYMHroyUpR1McnELZRf5xZDgeN7uNFXQFUQY36oI+V sQBZjS3MbDOeENElKNgbkCSeJdkA4KvbvhCfXewwUZV8VRhUz4h9mZPxYQQ152Wr/FN/ 5BdmdPLWcel0PhshC6zVgGbHZJRq7Q7vjb3TkLcOA4+olBP02aOCGQ9Oozydj+ubNW0a x/51FkRIRk4fHQ5AYf3+Jpxz4mZgUNsIXL+ynW22YI6IcrDX3OU3vcAglWcWj6mxHBI5 z3zP8e+fcp3Sh89Okw6YqgR52njYR4st6QIktXU0C0bgCozw42NZ4PuV/5cZ6iJXZUXt qYDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=0YfnJKuQrbw7pcq3GiMHpIZjYaYv7/J4Psozl7GAbkY=; b=yW0Q1bmgM5aP1SAicqs1tAxGxIYHB33SRoHBYbAShJ8CvecuZbuDd7hfsmV4i/0BEV 2ffzGnG5ISMSWIBhsVg/PomNNvKcpVtHhWe1prcSMkEeL+vNRUZ2yFp1DybDlKycOIUe UNVfUt+XXaGtCV++zOSkZqMF3coVuUoTwkiKDHEtWGHu076S7QLMfJcxNDYMnNJXZMEH SlurY9bgu9Z3ohNl/CcZOAaNOrA+2sXxaKJvfnMsaZeK8UuVfVijJfj4eAkTze8DyUoX b7gLRmdJhFR0O2ynMm5dQIA3AxeHHMQiRDYLqp8lXs6lHw0KJQxMf2rv+/5ipOFE5ICY OvZQ== X-Gm-Message-State: AJIora+cjaf55RZa3Bj5QdCuV1HXZxridAJALNLKrBybpKnXenGGzLN0 ePcghfHfVS0IqIs13rVWp6Q/lCYhKfYM3w6e X-Google-Smtp-Source: AGRyM1s/D7w5lW8rHIOJ8tPXZ//y7BlXl1scm2CvIFSsZAI4asuL+1307Qq/vgDdbLiEG20sjasM/mVp1mTCdeSz X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:f4c5:0:b0:317:7f89:f547 with SMTP id d188-20020a0df4c5000000b003177f89f547mr18475ywf.334.1656092231687; Fri, 24 Jun 2022 10:37:11 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:31 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-2-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 01/26] hugetlb: make hstate accessor functions const From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is just a const-correctness change so that the new hugetlb_pte changes can be const-correct too. Acked-by: David Rientjes Signed-off-by: James Houghton Reviewed-By: Mina Almasry --- include/linux/hugetlb.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e4cff27d1198..498a4ae3d462 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -715,7 +715,7 @@ static inline struct hstate *hstate_vma(struct vm_area_= struct *vma) return hstate_file(vma->vm_file); } =20 -static inline unsigned long huge_page_size(struct hstate *h) +static inline unsigned long huge_page_size(const struct hstate *h) { return (unsigned long)PAGE_SIZE << h->order; } @@ -729,27 +729,27 @@ static inline unsigned long huge_page_mask(struct hst= ate *h) return h->mask; } =20 -static inline unsigned int huge_page_order(struct hstate *h) +static inline unsigned int huge_page_order(const struct hstate *h) { return h->order; } =20 -static inline unsigned huge_page_shift(struct hstate *h) +static inline unsigned huge_page_shift(const struct hstate *h) { return h->order + PAGE_SHIFT; } =20 -static inline bool hstate_is_gigantic(struct hstate *h) +static inline bool hstate_is_gigantic(const struct hstate *h) { return huge_page_order(h) >=3D MAX_ORDER; } =20 -static inline unsigned int pages_per_huge_page(struct hstate *h) +static inline unsigned int pages_per_huge_page(const struct hstate *h) { return 1 << h->order; } =20 -static inline unsigned int blocks_per_huge_page(struct hstate *h) +static inline unsigned int blocks_per_huge_page(const struct hstate *h) { return huge_page_size(h) / 512; } --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF748C433EF for ; Fri, 24 Jun 2022 17:38:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232343AbiFXRiI (ORCPT ); Fri, 24 Jun 2022 13:38:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231162AbiFXRhO (ORCPT ); Fri, 24 Jun 2022 13:37:14 -0400 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E0F73609CA for ; Fri, 24 Jun 2022 10:37:13 -0700 (PDT) Received: by mail-ua1-x949.google.com with SMTP id v19-20020ab05593000000b0037ed9894dfbso1008709uaa.19 for ; Fri, 24 Jun 2022 10:37:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=cBo7lIiy2jlsdOG5qT1oBCHlWVvIiK17uVsMNAgjZEA=; b=NJEF0grG6xcnUMzJ7Xx8DncxrM3Dg/CCXuZcQoPGVS8RMCRW7OVDsyTMq2msiIkgR3 QD3rc43cPJSTJK/tuSyrg1IgeA7F7F0rpMnzeXl2raeM1+E+9KIbYlUZ6X/l+oQ1o3X8 FetdG0D5zD5JUcSef1agw+Hok5ZWwdiJaZLaL/pl4egaQHqe4C6GaU0IcgckBRNz/QIi XvX1VQlmkldTFcvliG4NEQiL7t+Leo3XWSFsPzela+hWWv2mTwnOC5NDGzw04gi1JEMx Hu59PBmYCm43RJHjH32Pik1VWkbEEt2CtqEMLAHbbH1gUokwIOPzG/lqTGqFU3e6iia/ o9Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=cBo7lIiy2jlsdOG5qT1oBCHlWVvIiK17uVsMNAgjZEA=; b=2v9179SbMHcLuxNtYpgOaD7B6pkjArD/My5tHiQq89JV94ZTez4gTxEAvQA3GIfm5L 4+JzF5iE/eoPvg3uHihzEV5ASB3a8++izognLZi82iaGQNg/7+nddf6qBRIJc3WTBUxC WtW8M8duMfd3gZMHkdcjrCBCnNaNTX9l80IP4MyqOHdtWHA+bf0YaczYJShaBNFaO4w/ YPLl4sJcJTNaJFBLjKQE4NmKXTTtrVX64B72XWOdMCdVKsSFHxHeubh3ZUhaDHc/6Cf+ iXBvRykGMkJVyDLdgGAzxH1h21vS3HI6HtaJ1BlwM1xaLgPlRK+9UwAZZlnlU2iZTiyc HE0A== X-Gm-Message-State: AJIora+bsCbYhOIReLGBKvmOEenSrvNjT9x7NXE73t7HeVMIy5Y2gGZ9 V1e5zHLrFHf3mYtoVHIOTDL8d/8BR/hfo0nL X-Google-Smtp-Source: AGRyM1uKLxV44YzBDP/q42+n2ejxUvrdr7EuCEClIr+ZyRlO8QyvjxzdjM1m0/iYLEcLM71c8MSRgqLgbnEZ/vLa X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6122:25b:b0:36c:5f1a:d94b with SMTP id t27-20020a056122025b00b0036c5f1ad94bmr11908478vko.31.1656092233083; Fri, 24 Jun 2022 10:37:13 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:32 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-3-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When using HugeTLB high-granularity mapping, we need to go through the supported hugepage sizes in decreasing order so that we pick the largest size that works. Consider the case where we're faulting in a 1G hugepage for the first time: we want hugetlb_fault/hugetlb_no_page to map it with a PUD. By going through the sizes in decreasing order, we will find that PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too. Signed-off-by: James Houghton Reviewed-by: Mina Almasry --- mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a57e1be41401..5df838d86f32 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -33,6 +33,7 @@ #include #include #include +#include =20 #include #include @@ -48,6 +49,10 @@ =20 int hugetlb_max_hstate __read_mostly; unsigned int default_hstate_idx; +/* + * After hugetlb_init_hstates is called, hstates will be sorted from large= st + * to smallest. + */ struct hstate hstates[HUGE_MAX_HSTATE]; =20 #ifdef CONFIG_CMA @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struc= t hstate *h) kfree(node_alloc_noretry); } =20 +static int compare_hstates_decreasing(const void *a, const void *b) +{ + const int shift_a =3D huge_page_shift((const struct hstate *)a); + const int shift_b =3D huge_page_shift((const struct hstate *)b); + + if (shift_a < shift_b) + return 1; + if (shift_a > shift_b) + return -1; + return 0; +} + +static void sort_hstates(void) +{ + unsigned long default_hstate_sz =3D huge_page_size(&default_hstate); + + /* Sort from largest to smallest. */ + sort(hstates, hugetlb_max_hstate, sizeof(*hstates), + compare_hstates_decreasing, NULL); + + /* + * We may have changed the location of the default hstate, so we need to + * update it. + */ + default_hstate_idx =3D hstate_index(size_to_hstate(default_hstate_sz)); +} + static void __init hugetlb_init_hstates(void) { struct hstate *h, *h2; =20 - for_each_hstate(h) { - if (minimum_order > huge_page_order(h)) - minimum_order =3D huge_page_order(h); + sort_hstates(); =20 + /* The last hstate is now the smallest. */ + minimum_order =3D huge_page_order(&hstates[hugetlb_max_hstate - 1]); + + for_each_hstate(h) { /* oversize hugepages were init'ed in early boot */ if (!hstate_is_gigantic(h)) hugetlb_hstate_alloc_pages(h); --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81CB5C433EF for ; Fri, 24 Jun 2022 17:38:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232357AbiFXRiM (ORCPT ); Fri, 24 Jun 2022 13:38:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231302AbiFXRhP (ORCPT ); Fri, 24 Jun 2022 13:37:15 -0400 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2C6D5DF3E for ; Fri, 24 Jun 2022 10:37:14 -0700 (PDT) Received: by mail-ua1-x94a.google.com with SMTP id v14-20020ab0768e000000b0037efa637aeeso1006779uaq.23 for ; Fri, 24 Jun 2022 10:37:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=t5KTYrM6ct0JWom4isvAtP154HT/fkrm8rvSO1JL/WI=; b=M7hxU1278b0Lh3tABv6L9l9qZ4Alxa0YOqUCt2BED4X975N8JVezOpIosUzb1pRlq1 SX94qvz3QX67rX7o3GZ8S0leLb7FFHhNK9SdbX2Rb4HmWbI4dwX/Y22ct/UwPlrwOYgo BEGkS4HWHIzpNKTgxCJfh6kCkdYmxrPG3Nl2ni6y2MM4kjfXPF2tIOcXuallnel4T3/F /EtqJ7IGlIBOBHTK/RDkilRjqZ1ZMgNfsJXOMiaLWaQMYD1+JF22Fefkl8MXbu5xw4Co KYFTD830ASsGDKWn3Fb3Id7J7wa0mQKCv+6AwRNyDerpkb98fQtJVbUhtnQtE3T/QsCS 4F8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=t5KTYrM6ct0JWom4isvAtP154HT/fkrm8rvSO1JL/WI=; b=rqUrsJQuZjyUOBwFPG2890L4K0ZBl462ceA2rY/YFWvQVwL83hXKBegJZHMy9SMAmU FahSB3eSDP2qR+8oF736APaCgbgcFlVr1uQ7WFLtAejI/2qkwF2zyO+LXKUBBQttzdIq Us/tIWyKg7v/JtNQTc1EP4SkxeWUxGfoxO0CWGvakw23lqmaoZTq+r6ZS0qMl29V3wL+ LrMRWJhbYQo+VKhvKnYESB3VOZ3XKI8CBP/ETeA97kfPnaRsbRkqxHSt8JelWUuWNNvW /pFaXzjHXji2MdVbWwYn4P7cZJoXfui93w2LoDffW1A9toOEY1JCjwEiMjxDqVhSHfi6 KAkA== X-Gm-Message-State: AJIora9juRgUJvWvmm5yTOlSwE3F2zUFA9+DgTUiFAP3j7WyzvE5LNJd y9FLEeB4OhDZhD16ns6jycl7z8GRH5osuIY8 X-Google-Smtp-Source: AGRyM1s3Rhw53YqM2sURrEy49E22RRF/1tHTOyTtY+rHKd8zIb/K3DKzbqQUo9FnKORpPjSO3ZnTjnoOojVZvhpT X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:3ec8:b0:335:d67e:7535 with SMTP id n8-20020a0561023ec800b00335d67e7535mr71521vsv.47.1656092234176; Fri, 24 Jun 2022 10:37:14 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:33 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-4-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 03/26] hugetlb: add make_huge_pte_with_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This allows us to make huge PTEs at shifts other than the hstate shift, which will be necessary for high-granularity mappings. Signed-off-by: James Houghton reviewed-by: manish.mishra@nutanix.com --- mm/hugetlb.c | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5df838d86f32..0eec34edf3b2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4686,23 +4686,30 @@ const struct vm_operations_struct hugetlb_vm_ops = =3D { .pagesize =3D hugetlb_vm_op_pagesize, }; =20 +static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, + struct page *page, int writable, + int shift) +{ + bool huge =3D shift > PAGE_SHIFT; + pte_t entry =3D huge ? mk_huge_pte(page, vma->vm_page_prot) + : mk_pte(page, vma->vm_page_prot); + + if (writable) + entry =3D huge ? huge_pte_mkwrite(entry) : pte_mkwrite(entry); + else + entry =3D huge ? huge_pte_wrprotect(entry) : pte_wrprotect(entry); + pte_mkyoung(entry); + if (huge) + entry =3D arch_make_huge_pte(entry, shift, vma->vm_flags); + return entry; +} + static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, - int writable) + int writable) { - pte_t entry; unsigned int shift =3D huge_page_shift(hstate_vma(vma)); =20 - if (writable) { - entry =3D huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page, - vma->vm_page_prot))); - } else { - entry =3D huge_pte_wrprotect(mk_huge_pte(page, - vma->vm_page_prot)); - } - entry =3D pte_mkyoung(entry); - entry =3D arch_make_huge_pte(entry, shift, vma->vm_flags); - - return entry; + return make_huge_pte_with_shift(vma, page, writable, shift); } =20 static void set_huge_ptep_writable(struct vm_area_struct *vma, --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80584C43334 for ; Fri, 24 Jun 2022 17:38:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231951AbiFXRiD (ORCPT ); Fri, 24 Jun 2022 13:38:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231330AbiFXRhR (ORCPT ); Fri, 24 Jun 2022 13:37:17 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B9EF609EF for ; Fri, 24 Jun 2022 10:37:16 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 130-20020a251288000000b0066c81091670so1025006ybs.18 for ; Fri, 24 Jun 2022 10:37:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Prs/g0eAXOv/ZKQzs92554GHZM/jn5SpuUFUCkOXcEo=; b=Px4lWScALFzFiAT+UIjCnXt+kA7DjkP32e6aEOW4CwgMESdfSnOQVe0TMA2O89nVFQ 2FSb4AFEEui36dp7cEO2cCeeW25R3wvs2P0Pw293UvGXo1bFkXyTw3c22GRwbp6ZeroB BKYDs88dfDTkiN+2kUji3gPFSrrfiflHbvNAwBPQkx1aAAnlNFHrCyf3UirmhSLZm9oL nyAorTU4OhiyKwV0sm0qey3GF9yE03pGEq+AkLOcUeZfuKYpmDNlnep1AHlKc5CHBoWs lchQ3wZR304ZcaHQH9yJaAqziuHYgjAFwrM+TJW+0jvW+Bqrs/0OMU3uJ876q5Jq/AM4 BUoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Prs/g0eAXOv/ZKQzs92554GHZM/jn5SpuUFUCkOXcEo=; b=b7WABfmSxiIwPUihl23DWHPBzquADWSYe3wpq28Xmh2npw83tf6RZW7Xh84noL2Pva 6C7LHHh2Z+8pI4d1F/PnMLgkdNRNc/113DPOmLGKSpQE+PZZ8f0Z9kLuPayj9FFi7AQn IPm+KNAzTYSMHJ6iOTJZj7iavSb2ddjADtiR4FHWdDoSppwCGDL4c6FQ/OzUBjsKY1up rug4G0SUst7+Sj+/RN1BnrKIIHfCIKvD2g48Dq7XIlNpn55gFCCi+DHzYG7SkiacPtxl BOJVSPX1tfmdQVrynqOqDQJBXbuQeTGdlFFF53IuDHnV7kAY8n/mUoToPJXcDlER7Xrl H6VA== X-Gm-Message-State: AJIora9/awiFIQ1WuPzPBJYg1QjQ1j43QkNj+PzOFcVywrNyPz97xBIB ydBhxQ1J74vAY0AgB6F7sQ1YJHM17hglEoXe X-Google-Smtp-Source: AGRyM1vNqp1siTq4+vM6xRRCFB8ViZUf9ItF5XtUfUff2x9P2l161GxyqwN1706DB9irdsmvY90a1AH0gr7PPFum X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:b19a:0:b0:668:a921:310d with SMTP id h26-20020a25b19a000000b00668a921310dmr271313ybj.341.1656092235523; Fri, 24 Jun 2022 10:37:15 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:34 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-5-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument. From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is needed to handle PTL locking with high-granularity mapping. We won't always be using the PMD-level PTL even if we're using the 2M hugepage hstate. It's possible that we're dealing with 4K PTEs, in which case, we need to lock the PTL for the 4K PTE. Signed-off-by: James Houghton --- arch/powerpc/mm/pgtable.c | 3 ++- include/linux/hugetlb.h | 19 ++++++++++++++----- mm/hugetlb.c | 9 +++++---- mm/migrate.c | 3 ++- mm/page_vma_mapped.c | 3 ++- 5 files changed, 25 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index e6166b71d36d..663d591a8f08 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -261,7 +261,8 @@ int huge_ptep_set_access_flags(struct vm_area_struct *v= ma, =20 psize =3D hstate_get_psize(h); #ifdef CONFIG_DEBUG_VM - assert_spin_locked(huge_pte_lockptr(h, vma->vm_mm, ptep)); + assert_spin_locked(huge_pte_lockptr(huge_page_shift(h), + vma->vm_mm, ptep)); #endif =20 #else diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 498a4ae3d462..5fe1db46d8c9 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -868,12 +868,11 @@ static inline gfp_t htlb_modify_alloc_mask(struct hst= ate *h, gfp_t gfp_mask) return modified_mask; } =20 -static inline spinlock_t *huge_pte_lockptr(struct hstate *h, +static inline spinlock_t *huge_pte_lockptr(unsigned int shift, struct mm_struct *mm, pte_t *pte) { - if (huge_page_size(h) =3D=3D PMD_SIZE) + if (shift =3D=3D PMD_SHIFT) return pmd_lockptr(mm, (pmd_t *) pte); - VM_BUG_ON(huge_page_size(h) =3D=3D PAGE_SIZE); return &mm->page_table_lock; } =20 @@ -1076,7 +1075,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hst= ate *h, gfp_t gfp_mask) return 0; } =20 -static inline spinlock_t *huge_pte_lockptr(struct hstate *h, +static inline spinlock_t *huge_pte_lockptr(unsigned int shift, struct mm_struct *mm, pte_t *pte) { return &mm->page_table_lock; @@ -1116,7 +1115,17 @@ static inline spinlock_t *huge_pte_lock(struct hstat= e *h, { spinlock_t *ptl; =20 - ptl =3D huge_pte_lockptr(h, mm, pte); + ptl =3D huge_pte_lockptr(huge_page_shift(h), mm, pte); + spin_lock(ptl); + return ptl; +} + +static inline spinlock_t *huge_pte_lock_shift(unsigned int shift, + struct mm_struct *mm, pte_t *pte) +{ + spinlock_t *ptl; + + ptl =3D huge_pte_lockptr(shift, mm, pte); spin_lock(ptl); return ptl; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0eec34edf3b2..d6d0d4c03def 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4817,7 +4817,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, continue; =20 dst_ptl =3D huge_pte_lock(h, dst, dst_pte); - src_ptl =3D huge_pte_lockptr(h, src, src_pte); + src_ptl =3D huge_pte_lockptr(huge_page_shift(h), src, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry =3D huge_ptep_get(src_pte); dst_entry =3D huge_ptep_get(dst_pte); @@ -4894,7 +4894,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, =20 /* Install the new huge page if src pte stable */ dst_ptl =3D huge_pte_lock(h, dst, dst_pte); - src_ptl =3D huge_pte_lockptr(h, src, src_pte); + src_ptl =3D huge_pte_lockptr(huge_page_shift(h), + src, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry =3D huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { @@ -4948,7 +4949,7 @@ static void move_huge_pte(struct vm_area_struct *vma,= unsigned long old_addr, pte_t pte; =20 dst_ptl =3D huge_pte_lock(h, mm, dst_pte); - src_ptl =3D huge_pte_lockptr(h, mm, src_pte); + src_ptl =3D huge_pte_lockptr(huge_page_shift(h), mm, src_pte); =20 /* * We don't have to worry about the ordering of src and dst ptlocks @@ -6024,7 +6025,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, page_in_pagecache =3D true; } =20 - ptl =3D huge_pte_lockptr(h, dst_mm, dst_pte); + ptl =3D huge_pte_lockptr(huge_page_shift(h), dst_mm, dst_pte); spin_lock(ptl); =20 /* diff --git a/mm/migrate.c b/mm/migrate.c index e51588e95f57..a8a960992373 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -318,7 +318,8 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *= pmd, void migration_entry_wait_huge(struct vm_area_struct *vma, struct mm_struct *mm, pte_t *pte) { - spinlock_t *ptl =3D huge_pte_lockptr(hstate_vma(vma), mm, pte); + spinlock_t *ptl =3D huge_pte_lockptr(huge_page_shift(hstate_vma(vma)), + mm, pte); __migration_entry_wait(mm, pte, ptl); } =20 diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index c10f839fc410..8921dd4e41b1 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -174,7 +174,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *= pvmw) if (!pvmw->pte) return false; =20 - pvmw->ptl =3D huge_pte_lockptr(hstate, mm, pvmw->pte); + pvmw->ptl =3D huge_pte_lockptr(huge_page_shift(hstate), + mm, pvmw->pte); spin_lock(pvmw->ptl); if (!check_pte(pvmw)) return not_found(pvmw); --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A90A4CCA473 for ; Fri, 24 Jun 2022 17:37:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232197AbiFXRhv (ORCPT ); Fri, 24 Jun 2022 13:37:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231335AbiFXRhS (ORCPT ); Fri, 24 Jun 2022 13:37:18 -0400 Received: from mail-vk1-xa49.google.com (mail-vk1-xa49.google.com [IPv6:2607:f8b0:4864:20::a49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8254463628 for ; Fri, 24 Jun 2022 10:37:17 -0700 (PDT) Received: by mail-vk1-xa49.google.com with SMTP id m38-20020ac5cfe6000000b0036c7c6b8508so918356vkf.7 for ; Fri, 24 Jun 2022 10:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=a/I3oVqLRkRj5HBns0MWW3Ig4smTJMVIYbM6WBPe2Ok=; b=Oi4b3uHib9QeDIaeFbYIYjHKGKKHh9PJftREg1F7x7dC3MdxzlFlkF4nFN2oUkEmJv ksO18/0H6IRArOgx9eqn0gBPvHv2o9/dMee0FdF4JnH+jRgLOM2Nmsz6Dpy3izb1wBXI nx/vriW2gKgSHU0JVyGwH1vUXSXmgF4EHLnj5axRDix5D/a9yC/E9keoj2kNivTofhb4 42LtC0DSUfNjTGTr1svMT6wtFWhev3hUy0WltrufYZ9eblHJrOi+VTLQFp5FE3ILODHz B23csyok2qwTFatLcA+FGCv8SsDhrPMlNnuTVoNybqfkwHtiNOSZnpNZ1wSXRcUNFGXT kHFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=a/I3oVqLRkRj5HBns0MWW3Ig4smTJMVIYbM6WBPe2Ok=; b=XfjdLiOk5/7hb6s+xiQvIPx+tbdocIstaggXuzltJ3BBmOy35clKa4kmq+LnJpEIMS TeQAw4ceAQOMEwzGFX3ZcznMd+Cw/syUNh9MWcYBYKLrd867YINnW6giPvwj8bQGsJV0 a0E+hAj/JMiuGYq9hJD4NHXR1ivXUZ5R53UyGkGaXQxas3uwB39zlBg1IqwevyLS94I/ qIAfKy+RBVJGTBibM9dPI+4Qg2JHOj0p+jTagSZIcMztK177Nsy80yQN9YtG/RK7xdXN ZRqK2/Ghk2JVwSu5EvYXv1tlCrDU45uAeTmUzaodjRk7MXqPaoVnyCp+OrZVPv04/tV0 5X9Q== X-Gm-Message-State: AJIora8Hw7tv2+5NXoyRELCEbDIvN4Rw7Y5+bsp7AnqCfRc0y8Hlo2rv QvZQcD2ezDNPZJHAzwSPK5bP1H3OyLDVVCYl X-Google-Smtp-Source: AGRyM1ulcuwzT9arHdsQPyjRKEdIRoxwLhneCD7SBt/oCPOlZSq/mYJ1bqnOaa+5zikkIexlm8WeaXwx6yxitdUI X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a1f:2657:0:b0:36c:d88e:88cc with SMTP id m84-20020a1f2657000000b0036cd88e88ccmr256693vkm.36.1656092236628; Fri, 24 Jun 2022 10:37:16 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:35 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-6-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 05/26] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This adds the Kconfig to enable or disable high-granularity mapping. It is enabled by default for architectures that use ARCH_WANT_GENERAL_HUGETLB. There is also an arch-specific config ARCH_HAS_SPECIAL_HUGETLB_HGM which controls whether or not the architecture has been updated to support HGM if it doesn't use general HugeTLB. Signed-off-by: James Houghton Reviewed-by: Mina Almasry reviewed-by:manish.mishra@nutanix.com --- fs/Kconfig | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/Kconfig b/fs/Kconfig index 5976eb33535f..d76c7d812656 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -268,6 +268,13 @@ config HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON to enable optimizing vmemmap pages of HugeTLB by default. It can then be disabled on the command line via hugetlb_free_vmemmap=3Doff. =20 +config ARCH_HAS_SPECIAL_HUGETLB_HGM + bool + +config HUGETLB_HIGH_GRANULARITY_MAPPING + def_bool ARCH_WANT_GENERAL_HUGETLB || ARCH_HAS_SPECIAL_HUGETLB_HGM + depends on HUGETLB_PAGE + config MEMFD_CREATE def_bool TMPFS || HUGETLBFS =20 --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30C24C43334 for ; Fri, 24 Jun 2022 17:38:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232235AbiFXRh5 (ORCPT ); Fri, 24 Jun 2022 13:37:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231349AbiFXRhT (ORCPT ); Fri, 24 Jun 2022 13:37:19 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E3DD609CA for ; Fri, 24 Jun 2022 10:37:18 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-317b6ecba61so27045837b3.9 for ; Fri, 24 Jun 2022 10:37:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=/rA4FWSpYAR9L2sjJS2B0eCc4aGZeNFQ2hS7zEPXx88=; b=mjmAAKf/rdfFJdvjC9OXk41WB0xb48wdXBGxSuL0EQl9SkDoXcqxrY9GXlvPWLaEkg 8Hx1fw01wcTtT9K/GLcq3rroLEsW0zKCq/TUIz18NlMwdRcR1pA7Iaz2vn8ZlKAXcim1 wOaCTh2gfUTYSHpFrxRcxF2GTYKnhiAIuJO4LJxX4Vd8vg8DEaO1anDPe1vI6jmOnZyo xym6RDo4VAwtwvCcWa490l4s9zNK4x/xstjaksrlIYlIG61i/GAqmjnAH04jRi+sdL4m P8snVFmVGcbUsQWry1OYKwwBVkaqJWg10bFh2y6ecVXque/dyOxRMd1yJMz25o75f8Kd 6OwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=/rA4FWSpYAR9L2sjJS2B0eCc4aGZeNFQ2hS7zEPXx88=; b=RhUggUbMqs+mZLjQov3DhLCIXFTFIjkWOveiCweLm3evIDuUT3Ys9ynPQ5uBmmz2JU Un9hzbfTFeQyGnM8jpw3s6SN1wBeH8WvW3dEdmXBxSwumDBa4OIUEANfGH3c/DZuFVh+ JSQG+DMFUPtdTVk7Zva/vwm67J8vXMCnLGPyyeNU4LME2zhw+PhhB6NsvmkuZKFuwOfr n38yA7rA1SrisO5bvAu+NufDKy855ZOWcb2ELNX7DzdPUS5m+lGfnSAZJWcoc8uiCCcR U/b4YnxP0FP/24nHMktT+d9fpkeuiJnIAKKBt8p7L1aOtvJ4bRgF2Fnd0KecFiRFteDR oaGA== X-Gm-Message-State: AJIora8SLplLfU+JwHxd9bU92cZiPsZ5+16v7Mx0pAk4KTmOfD5sMDkb tcEvdB+heOC5nP4wnr31pSVIGttlaAgR93z0 X-Google-Smtp-Source: AGRyM1t9tv0yjearXIr71+3GNqpFRDSr2edCHZwVgwQ6f0v6TT0Pfet2HMTd37UlieYMZq6c6zvogGIBBnnftS40 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:b806:0:b0:663:d35d:8b8a with SMTP id v6-20020a25b806000000b00663d35d8b8amr282022ybj.69.1656092237926; Fri, 24 Jun 2022 10:37:17 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:36 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-7-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 06/26] mm: make free_p?d_range functions public From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This makes them usable for HugeTLB page table freeing operations. After HugeTLB high-granularity mapping, the page table for a HugeTLB VMA can get more complex, and these functions handle freeing page tables generally. Signed-off-by: James Houghton reviewed-by: manish.mishra@nutanix.com --- include/linux/mm.h | 7 +++++++ mm/memory.c | 8 ++++---- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index bc8f326be0ce..07f5da512147 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1847,6 +1847,13 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_ar= ea_struct *start_vma, =20 struct mmu_notifier_range; =20 +void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, unsigned long addr= ); +void free_pmd_range(struct mmu_gather *tlb, pud_t *pud, unsigned long addr, + unsigned long end, unsigned long floor, unsigned long ceiling); +void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d, unsigned long addr, + unsigned long end, unsigned long floor, unsigned long ceiling); +void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd, unsigned long addr, + unsigned long end, unsigned long floor, unsigned long ceiling); void free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling); int diff --git a/mm/memory.c b/mm/memory.c index 7a089145cad4..bb3b9b5b94fb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -227,7 +227,7 @@ static void check_sync_rss_stat(struct task_struct *tas= k) * Note: this doesn't free the actual pages themselves. That * has been handled earlier when unmapping all the memory regions. */ -static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, +void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, unsigned long addr) { pgtable_t token =3D pmd_pgtable(*pmd); @@ -236,7 +236,7 @@ static void free_pte_range(struct mmu_gather *tlb, pmd_= t *pmd, mm_dec_nr_ptes(tlb->mm); } =20 -static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud, +inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling) { @@ -270,7 +270,7 @@ static inline void free_pmd_range(struct mmu_gather *tl= b, pud_t *pud, mm_dec_nr_pmds(tlb->mm); } =20 -static inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d, +inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling) { @@ -304,7 +304,7 @@ static inline void free_pud_range(struct mmu_gather *tl= b, p4d_t *p4d, mm_dec_nr_puds(tlb->mm); } =20 -static inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd, +inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling) { --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B323C433EF for ; Fri, 24 Jun 2022 17:37:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232137AbiFXRhq (ORCPT ); Fri, 24 Jun 2022 13:37:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231448AbiFXRhU (ORCPT ); Fri, 24 Jun 2022 13:37:20 -0400 Received: from mail-vk1-xa4a.google.com (mail-vk1-xa4a.google.com [IPv6:2607:f8b0:4864:20::a4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1AE4609EF for ; Fri, 24 Jun 2022 10:37:19 -0700 (PDT) Received: by mail-vk1-xa4a.google.com with SMTP id y75-20020a1f7d4e000000b0036cbc90a40dso920385vkc.2 for ; Fri, 24 Jun 2022 10:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=C3fH1oRby2iG8QHlX2xdR+iuje/PS4n79lvPkGthMbg=; b=ZvJyRZYi5Luh9HoaMNclQG/FEjMLfzSsnyQ/9mxS9ijpI3JJW36iQDpaHsY87LXRBZ MYe/3q2xvO35L2IDpu2v2DM/tTOw5jERdNYlWr+w9FKaAmGF7Qw169EIGH/QBP5/M5Hu EnvP0MM3PVu+Qq3VOyCfhXtU1PqbPDjXRp9bYa+nmG69Ejf6C4LVUubpOccWnz+8x8al Ud1kt4H0mzAyAWMnP0xMUwaKTGQ7OAkmev0EEidQ3pcSQbK/8Wk2CIruV0DC5jWgMmae phVsDW4MQ5+a3GUwp4m0a+E3VZr2h8GwiRLlTkbUBbEknSpuJ1gbOzRLqoe5ZpouQ3g/ 4rOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=C3fH1oRby2iG8QHlX2xdR+iuje/PS4n79lvPkGthMbg=; b=7D7bX+24TZPXw73Qtc0q6YuCj28VVvTnI3O02khLVrCmCWJpFHJ/1pKRxHMz+FgAcz CkX3LfuhBHLPgmDe/fCmotl44hUt6lKl4sxen6fSq43F4fYuTO1wi+Q3+xaOwcyed+XZ fQTrHywfIUhpWgu0LkwYS0pkDQI5Sb/RDrxcc0/CStwoDpov+e+R3coR7PT2WwxDEFD1 z4/AQhORLAR/Cui6hqeqeE+6n9pZF1uV1GtwDAxBO3PHBlJCTBycdohUVFMGSW+aaAmT ePeg88Y+9NayEPddApcD3MOeKdFjX4qbZYCp0uiYPapOMpDJ/cvzp4akBeLG51WNpZLZ MN7A== X-Gm-Message-State: AJIora+mBmtDEiCyiwYYiXK1gLG5OnJMSPThrkWU5LPGjaSkmuzC4/cJ GVGPDz4Y7omI32eWZBTlzr8+z5dYyO20TNDO X-Google-Smtp-Source: AGRyM1sN569V6cAWRy0BJnUUQMejK8sKMmpW5w3o0qztJIemJ7kY/kSH0xH0s1Ckgj1aJGX5wLB5oOn168goMMsQ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6122:506:b0:36c:3d23:38e7 with SMTP id x6-20020a056122050600b0036c3d2338e7mr9197vko.26.1656092239059; Fri, 24 Jun 2022 10:37:19 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:37 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-8-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" After high-granularity mapping, page table entries for HugeTLB pages can be of any size/type. (For example, we can have a 1G page mapped with a mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB PTE after we have done a page table walk. Without this, we'd have to pass around the "size" of the PTE everywhere. We effectively did this before; it could be fetched from the hstate, which we pass around pretty much everywhere. This commit includes definitions for some basic helper functions that are used later. These helper functions wrap existing PTE inspection/modification functions, where the correct version is picked depending on if the HugeTLB PTE is actually "huge" or not. (Previously, all HugeTLB PTEs were "huge"). For example, hugetlb_ptep_get wraps huge_ptep_get and ptep_get, where ptep_get is used when the HugeTLB PTE is PAGE_SIZE, and huge_ptep_get is used in all other cases. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 84 +++++++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 57 ++++++++++++++++++++++++++++ 2 files changed, 141 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5fe1db46d8c9..1d4ec9dfdebf 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -46,6 +46,68 @@ enum { __NR_USED_SUBPAGE, }; =20 +struct hugetlb_pte { + pte_t *ptep; + unsigned int shift; +}; + +static inline +void hugetlb_pte_init(struct hugetlb_pte *hpte) +{ + hpte->ptep =3D NULL; +} + +static inline +void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep, + unsigned int shift) +{ + BUG_ON(!ptep); + hpte->ptep =3D ptep; + hpte->shift =3D shift; +} + +static inline +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte) +{ + BUG_ON(!hpte->ptep); + return 1UL << hpte->shift; +} + +static inline +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte) +{ + BUG_ON(!hpte->ptep); + return ~(hugetlb_pte_size(hpte) - 1); +} + +static inline +unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte) +{ + BUG_ON(!hpte->ptep); + return hpte->shift; +} + +static inline +bool hugetlb_pte_huge(const struct hugetlb_pte *hpte) +{ + return !IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) || + hugetlb_pte_shift(hpte) > PAGE_SHIFT; +} + +static inline +void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *= src) +{ + dest->ptep =3D src->ptep; + dest->shift =3D src->shift; +} + +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte); +bool hugetlb_pte_none(const struct hugetlb_pte *hpte); +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte); +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte); +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpt= e, + unsigned long address); + struct hugepage_subpool { spinlock_t lock; long count; @@ -1130,6 +1192,28 @@ static inline spinlock_t *huge_pte_lock_shift(unsign= ed int shift, return ptl; } =20 +static inline +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *= hpte) +{ + + BUG_ON(!hpte->ptep); + // Only use huge_pte_lockptr if we are at leaf-level. Otherwise use + // the regular page table lock. + if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte)) + return huge_pte_lockptr(hugetlb_pte_shift(hpte), + mm, hpte->ptep); + return &mm->page_table_lock; +} + +static inline +spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpt= e) +{ + spinlock_t *ptl =3D hugetlb_pte_lockptr(mm, hpte); + + spin_lock(ptl); + return ptl; +} + #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA) extern void __init hugetlb_cma_reserve(int order); extern void __init hugetlb_cma_check(void); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d6d0d4c03def..1a1434e29740 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1120,6 +1120,63 @@ static bool vma_has_reserves(struct vm_area_struct *= vma, long chg) return false; } =20 +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte) +{ + pgd_t pgd; + p4d_t p4d; + pud_t pud; + pmd_t pmd; + + BUG_ON(!hpte->ptep); + if (hugetlb_pte_size(hpte) >=3D PGDIR_SIZE) { + pgd =3D *(pgd_t *)hpte->ptep; + return pgd_present(pgd) && pgd_leaf(pgd); + } else if (hugetlb_pte_size(hpte) >=3D P4D_SIZE) { + p4d =3D *(p4d_t *)hpte->ptep; + return p4d_present(p4d) && p4d_leaf(p4d); + } else if (hugetlb_pte_size(hpte) >=3D PUD_SIZE) { + pud =3D *(pud_t *)hpte->ptep; + return pud_present(pud) && pud_leaf(pud); + } else if (hugetlb_pte_size(hpte) >=3D PMD_SIZE) { + pmd =3D *(pmd_t *)hpte->ptep; + return pmd_present(pmd) && pmd_leaf(pmd); + } else if (hugetlb_pte_size(hpte) >=3D PAGE_SIZE) + return pte_present(*hpte->ptep); + BUG(); +} + +bool hugetlb_pte_none(const struct hugetlb_pte *hpte) +{ + if (hugetlb_pte_huge(hpte)) + return huge_pte_none(huge_ptep_get(hpte->ptep)); + return pte_none(ptep_get(hpte->ptep)); +} + +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte) +{ + if (hugetlb_pte_huge(hpte)) + return huge_pte_none_mostly(huge_ptep_get(hpte->ptep)); + return pte_none_mostly(ptep_get(hpte->ptep)); +} + +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte) +{ + if (hugetlb_pte_huge(hpte)) + return huge_ptep_get(hpte->ptep); + return ptep_get(hpte->ptep); +} + +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpt= e, + unsigned long address) +{ + BUG_ON(!hpte->ptep); + unsigned long sz =3D hugetlb_pte_size(hpte); + + if (sz > PAGE_SIZE) + return huge_pte_clear(mm, address, hpte->ptep, sz); + return pte_clear(mm, address, hpte->ptep); +} + static void enqueue_huge_page(struct hstate *h, struct page *page) { int nid =3D page_to_nid(page); --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13254CCA473 for ; Fri, 24 Jun 2022 17:37:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232066AbiFXRhn (ORCPT ); Fri, 24 Jun 2022 13:37:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231510AbiFXRhV (ORCPT ); Fri, 24 Jun 2022 13:37:21 -0400 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35EF05DF3E for ; Fri, 24 Jun 2022 10:37:21 -0700 (PDT) Received: by mail-ua1-x949.google.com with SMTP id g21-20020ab02455000000b0037f227fdc94so1016494uan.10 for ; Fri, 24 Jun 2022 10:37:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=LYP6EWO0tHkDZ/Eb/N4ct/rtumW77ceMKYt6OWO28gw=; b=XFGCLL1FPCDch3s8zoHY1/aiVrNxiWrnh/6o6qV/K2RAwhUG1NoqoYSivh5BlEjSSN JSeDqdD7wmJqUHehWTyabP/SFxoPxLcdmJQciamvw0zZEkmrFOLEfH5MKvY7BMKKaRs5 OeQHOUrdz9iPL7xkNs8N8nJEHEVe58C79l9VExLO0snSG9Lzk432UZi5FNR67YMTpjNb nWD0bA1/rl29atfGjcsEE2x8Ah2JCvGLGQcQO7+KX8SrBMuFX9ufi1FM9t5s4yyq73GD HMYrizZfRIn1go+aVU8Gs1V6ny5gGjt1jqAVjKkGM9IiyAl9gtwolhjj4PEcCHbqvzlV R5nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=LYP6EWO0tHkDZ/Eb/N4ct/rtumW77ceMKYt6OWO28gw=; b=Xbjlai8HJblQNR4MXddm3IrZpIt07whGkmf8QNrs9IMMwCWHhAR/uoGi9CEMoHqaSZ WNPjsNQOkHkCPId9x7BoZMgeQqYPfBaoFubJZdY1uEztY5ArfK6RkGPxvy+/nqDCqk32 n4YfQi0gF2Os2O4EOud95873Q1cgqSjhNw/dTRC0EWGK+HwAVSmsFHwBrWqudfU6VlV1 qaxI38RTaTQ11DH01EQmYB7vl0YuoGxf9i205OdYsdk7kTzZMhkkuIvA0tictQs4wkPH 9wUHkiNuBOqP84MZ1MyHSlEm3IixL2TbpKB2HW+e2ddjlb7xR9W07+bcX2ragW7EuE3e Qr/g== X-Gm-Message-State: AJIora8tujQEcdfTbZwq0ghOhXc+epdwuAtrQ2lTMPSs5t0D0uzjzhuq b2Y/rE1Mva2KTMvjepcPpeEF1fHdoACxq7wB X-Google-Smtp-Source: AGRyM1ssMLMIRbwDkEduAqDq4DS/T59s9retjKTDy0iTSci5NwCImR9/ISla7A1bU/FQrm45/Cl8LL0Be0TLQc6l X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:374d:b0:34c:1f02:aa59 with SMTP id u13-20020a056102374d00b0034c1f02aa59mr21351211vst.21.1656092240411; Fri, 24 Jun 2022 10:37:20 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:38 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-9-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 08/26] hugetlb: add hugetlb_free_range to free PT structures From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is a helper function for freeing the bits of the page table that map a particular HugeTLB PTE. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c | 17 +++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 1d4ec9dfdebf..33ba48fac551 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -107,6 +107,8 @@ bool hugetlb_pte_none_mostly(const struct hugetlb_pte *= hpte); pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte); void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpt= e, unsigned long address); +void hugetlb_free_range(struct mmu_gather *tlb, const struct hugetlb_pte *= hpte, + unsigned long start, unsigned long end); =20 struct hugepage_subpool { spinlock_t lock; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1a1434e29740..a2d2ffa76173 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1120,6 +1120,23 @@ static bool vma_has_reserves(struct vm_area_struct *= vma, long chg) return false; } =20 +void hugetlb_free_range(struct mmu_gather *tlb, const struct hugetlb_pte *= hpte, + unsigned long start, unsigned long end) +{ + unsigned long floor =3D start & hugetlb_pte_mask(hpte); + unsigned long ceiling =3D floor + hugetlb_pte_size(hpte); + + if (hugetlb_pte_size(hpte) >=3D PGDIR_SIZE) { + free_p4d_range(tlb, (pgd_t *)hpte->ptep, start, end, floor, ceiling); + } else if (hugetlb_pte_size(hpte) >=3D P4D_SIZE) { + free_pud_range(tlb, (p4d_t *)hpte->ptep, start, end, floor, ceiling); + } else if (hugetlb_pte_size(hpte) >=3D PUD_SIZE) { + free_pmd_range(tlb, (pud_t *)hpte->ptep, start, end, floor, ceiling); + } else if (hugetlb_pte_size(hpte) >=3D PMD_SIZE) { + free_pte_range(tlb, (pmd_t *)hpte->ptep, start); + } +} + bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte) { pgd_t pgd; --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59778C433EF for ; Fri, 24 Jun 2022 17:37:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231979AbiFXRhk (ORCPT ); Fri, 24 Jun 2022 13:37:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231576AbiFXRhW (ORCPT ); Fri, 24 Jun 2022 13:37:22 -0400 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 444E1609EF for ; Fri, 24 Jun 2022 10:37:22 -0700 (PDT) Received: by mail-ua1-x94a.google.com with SMTP id v14-20020ab0768e000000b0037efa637aeeso1006779uaq.23 for ; Fri, 24 Jun 2022 10:37:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=WX+TLc+c32jY73zAmELqfC7scmTotnHMJPgbPcH9Hag=; b=rmtQIA0RTPQYWEYJ8kZRSAxO4ZWdbskuHARii0tUqF87tsg4HJnPzpTHsKmHhi7YHN szZo8r74Pfgxru6pGunO7KfxJtr6Wzabwa1wcCmLVAdMbm+j8fKHC5fV7wwrJG+ncVrB fWoXECRaKY9gz8ZiGg0zDe612OVnVf2pAh18JbfgZ2OytbBwiZCNN/oOGXrLiF3q3rzH fxFhSCJMn3o5octv4QkArXh3RvJqGO05jZSaGjrL9vscpWTsOFD0x47zLoXDuqUkCgEo PIWSrdAA+02ET/Ct0BrCGPnvyDWs7lIPDo0hPsnRg76cZrOaj70ey7iZ/86fIxktGj4r ckXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=WX+TLc+c32jY73zAmELqfC7scmTotnHMJPgbPcH9Hag=; b=fg2uxSb0eIeOx+SwHEUnTeeQ0/mQ7qejyhIJQZ4lkZ1n4jf9eQkvW5M2RrasivKKI6 NYJxFBoJDDEGsZFH9oViVnlhFUAvObZzF9S2LL6HFFGe7HJuEieQfDJ8E1cs0KfEPanG zUJTalnxI10kLm3IthIy2KYcXVo3+KfKvqEBBYvebOY97W1vTvDI1mLFc4QqEhapoWlw 5DH8a/gMDi9elceWE8dWkAhmCM+qDzMyBVuCk49MRbfkMkU1U1uX8DfX9my2lcfadQZ/ h6f6drcPCHRoQH4dglZZOMNoRJBpbsPJwPmeXfInD0Js+wQ/zyL8iI5t5j+NmYCEbtZd 5DUw== X-Gm-Message-State: AJIora8VMsmlJ6zZyoRqImoEEJbryC3gfh4oDt4b/E52fcTjp70ImxeV QyT7G7Y940Z43AR6JJI+lHIzJNscK5LGWSzH X-Google-Smtp-Source: AGRyM1vX/wsU1Y9GWc4aZF68+l300w5Pdum+7CFc6GFKGc4oec+e4PYz/yVOvKtQiR6acb6V/yx2DV4xq/fD/H/b X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:6e47:0:b0:354:52a0:8ac2 with SMTP id j68-20020a676e47000000b0035452a08ac2mr173638vsc.65.1656092242031; Fri, 24 Jun 2022 10:37:22 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:39 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-10-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, this is always true if the VMA is shared. In the future, it's possible that private mappings will get some or all HGM functionality. Signed-off-by: James Houghton Reviewed-by: Mina Almasry --- include/linux/hugetlb.h | 10 ++++++++++ mm/hugetlb.c | 8 ++++++++ 2 files changed, 18 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 33ba48fac551..e7a6b944d0cc 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1174,6 +1174,16 @@ static inline void set_huge_pte_at(struct mm_struct = *mm, unsigned long addr, } #endif /* CONFIG_HUGETLB_PAGE */ =20 +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +/* If HugeTLB high-granularity mappings are enabled for this VMA. */ +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); +#else +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + return false; +} +#endif + static inline spinlock_t *huge_pte_lock(struct hstate *h, struct mm_struct *mm, pte_t *pte) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a2d2ffa76173..8b10b941458d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6983,6 +6983,14 @@ pte_t *huge_pte_offset(struct mm_struct *mm, =20 #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ =20 +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + /* All shared VMAs have HGM enabled. */ + return vma->vm_flags & VM_SHARED; +} +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + /* * These functions are overwritable if your architecture needs its own * behavior. --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC832CCA473 for ; Fri, 24 Jun 2022 17:37:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231888AbiFXRhe (ORCPT ); Fri, 24 Jun 2022 13:37:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231739AbiFXRh0 (ORCPT ); Fri, 24 Jun 2022 13:37:26 -0400 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A3A65DF3E for ; Fri, 24 Jun 2022 10:37:24 -0700 (PDT) Received: by mail-ua1-x94a.google.com with SMTP id j15-20020ab0644f000000b00368bac468ceso1007662uap.17 for ; Fri, 24 Jun 2022 10:37:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=BHwQikGJ0T9dngSuqajvJzRNJRNB+p1hBOZL6v8gtdU=; b=SJaLm0XrTpsH4nsCpNczOnWA4pAKrfnpfCQD7J/A/w8ZLuyF3qMil1lg2kFawB2M4a itbVC6Cy96FvwmkCOrHBppgxya0l4XK+nbemqmVGl2TtO1nxOedDpeGypnCjJRZS7QQN NDC0hCezqTOYHNT6HpGvGH8b8GjOJaayBlAgzsQ6R71RrjTlRa2+jtXbsXNhdPHlJ6Xj Bf0Xtsg/vvdXkEZqCE7ZRTpn6YWG/VddVaye5OHuJMLgDbbZwZkgCRO6djsmySZP8iDS KKIDvmk/HJlHB4lLRHC4SCJoBQ4oZp4rMf+AAsSXjtVnRk/loJUaiEBO6jLZd1OibD+8 oWQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=BHwQikGJ0T9dngSuqajvJzRNJRNB+p1hBOZL6v8gtdU=; b=y5m8IjKspkb9qekqrWFumUh7pDEku/dAz2XjEfai7lE2yCkD/Qeu6aLSI1WBfXi2gI 9CmPDgBW7tmOVAGCndEM4Xeb5a2MUYQ/bHHL4JECvUkzLrZlIvQARly5yiFbfCYv/mKM NkjKq+oHnjOODUUMR5jPX46PgU6xZyHzMgXjaOS8zpcNF9fBJLn5kJEcIlOZ7GDeVIVg VZCv4JI30sUJx1UJ4CCZ4bIYgHUglB6+eKkPZoeujakvVS9bd8VefbmQudHhIt+4yvLY o3t/PIcf293uYYjXa+51TO4BS9Cki1gV44snPOmA30lOuN8NBaUie+nMiT31LrSzZcdo CpLA== X-Gm-Message-State: AJIora8b2UiaDNUnofBqXF9dDNUynpJ3rmoOodEFxMkkkS62Gcul2+cT TzHCMJAlHbzf5QuN7CBHjCpmHKx0C1OTX3F0 X-Google-Smtp-Source: AGRyM1vwAeqnR6McfUFoedAHpp3M9GTRsmGOf0kSjUYrMb74mEmizd7GUpekp5x9kN5nii0pCQDDYM1i4RCw9Zq8 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:d785:0:b0:354:355b:60cd with SMTP id q5-20020a67d785000000b00354355b60cdmr14256935vsj.65.1656092243596; Fri, 24 Jun 2022 10:37:23 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:40 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-11-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is a helper macro to loop through all the usable page sizes for a high-granularity-enabled HugeTLB VMA. Given the VMA's hstate, it will loop, in descending order, through the page sizes that HugeTLB supports for this architecture; it always includes PAGE_SIZE. Signed-off-by: James Houghton reviewed-by:manish.mishra@nutanix.com --- mm/hugetlb.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8b10b941458d..557b0afdb503 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6989,6 +6989,16 @@ bool hugetlb_hgm_enabled(struct vm_area_struct *vma) /* All shared VMAs have HGM enabled. */ return vma->vm_flags & VM_SHARED; } +static unsigned int __shift_for_hstate(struct hstate *h) +{ + if (h >=3D &hstates[hugetlb_max_hstate]) + return PAGE_SHIFT; + return huge_page_shift(h); +} +#define for_each_hgm_shift(hstate, tmp_h, shift) \ + for ((tmp_h) =3D hstate; (shift) =3D __shift_for_hstate(tmp_h), \ + (tmp_h) <=3D &hstates[hugetlb_max_hstate]; \ + (tmp_h)++) #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ =20 /* --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F32F0CCA47E for ; Fri, 24 Jun 2022 17:37:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231859AbiFXRha (ORCPT ); Fri, 24 Jun 2022 13:37:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36076 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231757AbiFXRh0 (ORCPT ); Fri, 24 Jun 2022 13:37:26 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01AA9609CA for ; Fri, 24 Jun 2022 10:37:26 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-2eb7d137101so26909517b3.12 for ; Fri, 24 Jun 2022 10:37:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=5N1QqSZZ3sAe7uqZbPvcX1MwgZtWBTxE3Z4UbaM2Ubk=; b=q+mTw9aXiewEEPn69Lo/c9WP/AOoUrPORPuTtAp49sBVoij1KAB/iU1Xsmfcz2RONi 97ZReXD54tQvzGoEYGUk7bqiHAMjMBdSDeN0ZeRlmES9GnpySAERBNIAd4xcYCwSqWB9 7rTzJ5T7Icb9PLM1GoXpIL2LTZ8f3PW3xI9Uskzsm3fmgHe4+UJzlKEwXBwzjTskVRhP jO0DeJtpPQFYq6QRquH/FNG3s47JBo454bcpHh+Nv3bPhNDZ/+ST1no2b8ufY9BNObOU 06CUcUjjHu4D3yaQIgx4SDUgySvd1FhGM2vQVwgOwraU6Bc5o7byEvf4kBsFdHoToBy1 X8+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=5N1QqSZZ3sAe7uqZbPvcX1MwgZtWBTxE3Z4UbaM2Ubk=; b=wc5JqplZlU/TJBrqEk3YmSJVsRKykt62LmmjI1Hru0SuTvVjGLoRJSxJ5cPkHNN36h pC56PjoHv52BRoyfnHH8pSJqDmMgHJHYoOSjFC8U5WvzpWSwtAPpy6u2ZAKiQ+FD9end teNarHp6lirbGsOVgBKXOcNSsSkX5XHLSSh4ZIqH/Q0YVjCJr2H78p8oHXAyuwaCtdRC C8k3/FcIgOn0oMub1Mk7xV6HJHk4gmRoeVXlkvjoP2ktPsgWlOHKycMYwoVVibgBVSHF yREz+ImbeEbSluPoklWwjlJlwiiyfrhaTr0JludJp9OGcx5cobL93uHljWkrnIZxlI6i WSBg== X-Gm-Message-State: AJIora+ZWdejjUnZ8F5FPvtPj3mQ6pUVr0cZLgKczNbWO1zoGR11/Q/V FDGacpgOTINULISu4PoqYdwV4DB1aVT5WQAA X-Google-Smtp-Source: AGRyM1s7zrxwUtB89vtQHWN8oyQ64yZ//LRbGigtUbZGCP3Sqfb2xmvuSoQ6VmyGFOOMey8Bgk7vqBsKoi9rXZ66 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:9103:0:b0:668:b7f8:b428 with SMTP id v3-20020a259103000000b00668b7f8b428mr309298ybl.270.1656092245324; Fri, 24 Jun 2022 10:37:25 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:41 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-12-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This adds it for architectures that use GENERAL_HUGETLB, including x86. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c | 45 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e7a6b944d0cc..605aa19d8572 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -258,6 +258,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_a= rea_struct *vma, unsigned long addr, unsigned long sz); pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); +int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr, unsigned long sz, bool stop_at_none); int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long *addr, pte_t *ptep); void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 557b0afdb503..3ec2a921ee6f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6981,6 +6981,51 @@ pte_t *huge_pte_offset(struct mm_struct *mm, return (pte_t *)pmd; } =20 +int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr, unsigned long sz, bool stop_at_none) +{ + pte_t *ptep; + + if (!hpte->ptep) { + pgd_t *pgd =3D pgd_offset(mm, addr); + + if (!pgd) + return -ENOMEM; + ptep =3D (pte_t *)p4d_alloc(mm, pgd, addr); + if (!ptep) + return -ENOMEM; + hugetlb_pte_populate(hpte, ptep, P4D_SHIFT); + } + + while (hugetlb_pte_size(hpte) > sz && + !hugetlb_pte_present_leaf(hpte) && + !(stop_at_none && hugetlb_pte_none(hpte))) { + if (hpte->shift =3D=3D PMD_SHIFT) { + ptep =3D pte_alloc_map(mm, (pmd_t *)hpte->ptep, addr); + if (!ptep) + return -ENOMEM; + hpte->shift =3D PAGE_SHIFT; + hpte->ptep =3D ptep; + } else if (hpte->shift =3D=3D PUD_SHIFT) { + ptep =3D (pte_t *)pmd_alloc(mm, (pud_t *)hpte->ptep, + addr); + if (!ptep) + return -ENOMEM; + hpte->shift =3D PMD_SHIFT; + hpte->ptep =3D ptep; + } else if (hpte->shift =3D=3D P4D_SHIFT) { + ptep =3D (pte_t *)pud_alloc(mm, (p4d_t *)hpte->ptep, + addr); + if (!ptep) + return -ENOMEM; + hpte->shift =3D PUD_SHIFT; + hpte->ptep =3D ptep; + } else + BUG(); + } + return 0; +} + #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ =20 #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06D58C43334 for ; Fri, 24 Jun 2022 17:37:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231247AbiFXRhh (ORCPT ); Fri, 24 Jun 2022 13:37:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231801AbiFXRh2 (ORCPT ); Fri, 24 Jun 2022 13:37:28 -0400 Received: from mail-vk1-xa4a.google.com (mail-vk1-xa4a.google.com [IPv6:2607:f8b0:4864:20::a4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9256A609CA for ; Fri, 24 Jun 2022 10:37:27 -0700 (PDT) Received: by mail-vk1-xa4a.google.com with SMTP id y8-20020a1f3208000000b0035e4cb54f24so908851vky.17 for ; Fri, 24 Jun 2022 10:37:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=9NP33zy0A1MX3vEGrpVIKeFhmsIPx2kWV4buJ37voF4=; b=ZKOKH7vAAREURJ1l8IWF0iXVyZJmt/59w23N609lJlz66JMzYMime9PnmzUaf55ztT vzLMK8s9/fXSSxWnYgNDHKf7LqeMXxNL7KAl5SuDjU3NQJJ60yl5tOO73MRzJCUdunKz 7Lgn3pxPgvpWylL4Lwm2BTczrJtzLYwEI/PWM2Cho00mXJipoxojyLs3eWxEQPPDnMB8 Oxl49ZR9JG0HGfk3WVBD25dYMN0YwU/lye70tN93BwsbS6nd07nY55SUbsCU9gTsjkxw 9k4o1zcXpunPWXdWooquHfqVjKqZmpuvtPY3r1RM36B/t5rz9uVN89mK3zfIIDtHV9hY WUTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=9NP33zy0A1MX3vEGrpVIKeFhmsIPx2kWV4buJ37voF4=; b=F8mnT85Rrefc5OkIBHJ8rVdSwjmj0w/w43Lui8wkrAdfKMWGMJ2ffxSo7IFCCIGESW l6bb90mAAl74KUCNi1q4DmwwUYgdTXBmXbZzht5nVLEeZKuWnviOvAGWGTFo4qt1twKl MlNGG5BK/CjhnayJeyAJy8123qywojhFbGiafPyZgBgPj3IkpqlTYwQj/NH8FMrIG4Nt 0xEV6hYjwZp8iGOGHn2yxXLtD2eT2fyHykhKz+vclrJivSRBqSNWhovORK7QEzpOzat6 sDepQq/+1Jp0WPs0YYoh2ueSbAmeLAQTL8BhAlqKcaa9MsX7p6vMxdIbMER9vNIlSRBD cv/g== X-Gm-Message-State: AJIora8w08kgWFYdTVGD8gQdcPyMR4zJjwunYIRmsp4ZSKrlm6EUr+S0 cAKTI9zachEUn5+kvxJNyRtC3xbyBfCycCoB X-Google-Smtp-Source: AGRyM1ud2VpyFSoYe3jfqlLqsCZHAGiyta2WrrFtmqkuRYO0f2uYFJE5sTDJIrzr1fNDbhgGC1mFcexvMmBaf2W2 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:ef4a:0:b0:354:6546:8b97 with SMTP id k10-20020a67ef4a000000b0035465468b97mr8260702vsr.83.1656092246792; Fri, 24 Jun 2022 10:37:26 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:42 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-13-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The new function, hugetlb_split_to_shift, will optimally split the page table to map a particular address at a particular granularity. This is useful for punching a hole in the mapping and for mapping small sections of a HugeTLB page (via UFFDIO_CONTINUE, for example). Signed-off-by: James Houghton --- mm/hugetlb.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3ec2a921ee6f..eaffe7b4f67c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -102,6 +102,18 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_= aligned_in_smp; /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); =20 +/* + * Find the subpage that corresponds to `addr` in `hpage`. + */ +static struct page *hugetlb_find_subpage(struct hstate *h, struct page *hp= age, + unsigned long addr) +{ + size_t idx =3D (addr & ~huge_page_mask(h))/PAGE_SIZE; + + BUG_ON(idx >=3D pages_per_huge_page(h)); + return &hpage[idx]; +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -7044,6 +7056,116 @@ static unsigned int __shift_for_hstate(struct hstat= e *h) for ((tmp_h) =3D hstate; (shift) =3D __shift_for_hstate(tmp_h), \ (tmp_h) <=3D &hstates[hugetlb_max_hstate]; \ (tmp_h)++) + +/* + * Given a particular address, split the HugeTLB PTE that currently maps it + * so that, for the given address, the PTE that maps it is `desired_shift`. + * This function will always split the HugeTLB PTE optimally. + * + * For example, given a HugeTLB 1G page that is mapped from VA 0 to 1G. If= we + * call this function with addr=3D0 and desired_shift=3DPAGE_SHIFT, will r= esult in + * these changes to the page table: + * 1. The PUD will be split into 2M PMDs. + * 2. The first PMD will be split again into 4K PTEs. + */ +static int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_str= uct *vma, + const struct hugetlb_pte *hpte, + unsigned long addr, unsigned long desired_shift) +{ + unsigned long start, end, curr; + unsigned long desired_sz =3D 1UL << desired_shift; + struct hstate *h =3D hstate_vma(vma); + int ret; + struct hugetlb_pte new_hpte; + struct mmu_notifier_range range; + struct page *hpage =3D NULL; + struct page *subpage; + pte_t old_entry; + struct mmu_gather tlb; + + BUG_ON(!hpte->ptep); + BUG_ON(hugetlb_pte_size(hpte) =3D=3D desired_sz); + + start =3D addr & hugetlb_pte_mask(hpte); + end =3D start + hugetlb_pte_size(hpte); + + i_mmap_assert_write_locked(vma->vm_file->f_mapping); + + BUG_ON(!hpte->ptep); + /* This function only works if we are looking at a leaf-level PTE. */ + BUG_ON(!hugetlb_pte_none(hpte) && !hugetlb_pte_present_leaf(hpte)); + + /* + * Clear the PTE so that we will allocate the PT structures when + * walking the page table. + */ + old_entry =3D huge_ptep_get_and_clear(mm, start, hpte->ptep); + + if (!huge_pte_none(old_entry)) + hpage =3D pte_page(old_entry); + + BUG_ON(!IS_ALIGNED(start, desired_sz)); + BUG_ON(!IS_ALIGNED(end, desired_sz)); + + for (curr =3D start; curr < end;) { + struct hstate *tmp_h; + unsigned int shift; + + for_each_hgm_shift(h, tmp_h, shift) { + unsigned long sz =3D 1UL << shift; + + if (!IS_ALIGNED(curr, sz) || curr + sz > end) + continue; + /* + * If we are including `addr`, we need to make sure + * splitting down to the correct size. Go to a smaller + * size if we are not. + */ + if (curr <=3D addr && curr + sz > addr && + shift > desired_shift) + continue; + + /* + * Continue the page table walk to the level we want, + * allocate PT structures as we go. + */ + hugetlb_pte_copy(&new_hpte, hpte); + ret =3D hugetlb_walk_to(mm, &new_hpte, curr, sz, + /*stop_at_none=3D*/false); + if (ret) + goto err; + BUG_ON(hugetlb_pte_size(&new_hpte) !=3D sz); + if (hpage) { + pte_t new_entry; + + subpage =3D hugetlb_find_subpage(h, hpage, curr); + new_entry =3D make_huge_pte_with_shift(vma, subpage, + huge_pte_write(old_entry), + shift); + set_huge_pte_at(mm, curr, new_hpte.ptep, new_entry); + } + curr +=3D sz; + goto next; + } + /* We couldn't find a size that worked. */ + BUG(); +next: + continue; + } + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, + start, end); + mmu_notifier_invalidate_range_start(&range); + return 0; +err: + tlb_gather_mmu(&tlb, mm); + /* Free any newly allocated page table entries. */ + hugetlb_free_range(&tlb, hpte, start, curr); + /* Restore the old entry. */ + set_huge_pte_at(mm, start, hpte->ptep, old_entry); + tlb_finish_mmu(&tlb); + return ret; +} #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ =20 /* --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A917EC433EF for ; Fri, 24 Jun 2022 17:38:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232287AbiFXRi3 (ORCPT ); Fri, 24 Jun 2022 13:38:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231851AbiFXRha (ORCPT ); Fri, 24 Jun 2022 13:37:30 -0400 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2234162C13 for ; Fri, 24 Jun 2022 10:37:29 -0700 (PDT) Received: by mail-ua1-x949.google.com with SMTP id j14-20020ab01d0e000000b0037f3ad22193so1059489uak.0 for ; Fri, 24 Jun 2022 10:37:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=fLhWVrtL3jCVQZTI/SpR/dF9bcFaOpNW6kUek+4VMrQ=; b=h/ysFkhblpzJ+iMXOAuoAW1RCDzHD/+S02IOwgDp+W1ftcj/5GpmKopNXVJGmI11dt mAS2/ecnjScCvn9y+SSrPgkps+68W5ymor9ZeqP24UdBQp7MkKwq9sGbZh9bcS3Kot12 BGqJ5gfYWmse/+yrqaROihgM78qPJlOL2tffwnW0DZUm7UP9c6M6BVGl7wnznm72cE8e Gmgcs44D/bkYNixg8jOWwZMa5AJhJKIcbwa6ooNOCsk7KyEQSYEcelE/7NURJzuTXfld QKmlDWTSdSuaR2EkI87u4H/5x0GVEh50Cju+5MyrG72wa/G6IgHcBFf2GIqziepRDeTo ynXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=fLhWVrtL3jCVQZTI/SpR/dF9bcFaOpNW6kUek+4VMrQ=; b=HTZlH8smlUzrKWjBJfkuXoGi6myGN96LENsE/GGbMTbNpOl67OZ4hdkdyNQGESX7/U lLoKjGadfStgy/uhR7z9+lIZC4zieMWHRS2MaKm8h4JhzthzFXcMrlNTgtem1kD7s1MH 3Qu3vaWsFPInwfTXPI+TobMEanNxfxJRSONVdisEdGaNuBIQJFB77kocw1nuomo4UfHo k0aaRp3uRWMzY6wOkEw2c2aTVzrP58inFMpP1GUefxZmkAoGdehT/l1TnWXoJKLtFRQe QSqHiKbrhOo1eLcAbbW/+uIG5EMM5ENYAtxARHDbg/RxNFNAgi+6I6+hSMg9xX6PkBiX JOyA== X-Gm-Message-State: AJIora/OB0iYlODwmQ3XsPMTYPdWcknlG5GfZN/iXZX4UaHISG+JgxqF Qw7ENSb69PLJafsrXO8CInTUmCJagC2ThSN9 X-Google-Smtp-Source: AGRyM1uRSxwvI1ZalFt+ckvzYpzesshV4sVRNlb4t9wpXcehiUcvJo2Hg3rBOz88Imnv+bFCKqP3jsH6tlItOe1M X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:c30f:0:b0:354:45bf:748c with SMTP id r15-20020a67c30f000000b0035445bf748cmr351vsj.13.1656092248216; Fri, 24 Jun 2022 10:37:28 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:43 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-14-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 13/26] hugetlb: add huge_pte_alloc_high_granularity From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This function is to be used to do a HugeTLB page table walk where we may need to split a leaf-level huge PTE into a new page table level. Consider the case where we want to install 4K inside an empty 1G page: 1. We walk to the PUD and notice that it is pte_none. 2. We split the PUD by calling `hugetlb_split_to_shift`, creating a standard PUD that points to PMDs that are all pte_none. 3. We continue the PT walk to find the PMD. We split it just like we split the PUD. 4. We find the PTE and give it back to the caller. To avoid concurrent splitting operations on the same page table entry, we require that the mapping rwsem is held for writing while collapsing and for reading when doing a high-granularity PT walk. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 23 ++++++++++++++ mm/hugetlb.c | 67 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 90 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 605aa19d8572..321f5745d87f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1176,14 +1176,37 @@ static inline void set_huge_pte_at(struct mm_struct= *mm, unsigned long addr, } #endif /* CONFIG_HUGETLB_PAGE */ =20 +enum split_mode { + HUGETLB_SPLIT_NEVER =3D 0, + HUGETLB_SPLIT_NONE =3D 1 << 0, + HUGETLB_SPLIT_PRESENT =3D 1 << 1, + HUGETLB_SPLIT_ALWAYS =3D HUGETLB_SPLIT_NONE | HUGETLB_SPLIT_PRESENT, +}; #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING /* If HugeTLB high-granularity mappings are enabled for this VMA. */ bool hugetlb_hgm_enabled(struct vm_area_struct *vma); +int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte, + struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long addr, + unsigned int desired_sz, + enum split_mode mode, + bool write_locked); #else static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { return false; } +static inline int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte, + struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long addr, + unsigned int desired_sz, + enum split_mode mode, + bool write_locked) +{ + return -EINVAL; +} #endif =20 static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index eaffe7b4f67c..6e0c5fbfe32c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7166,6 +7166,73 @@ static int hugetlb_split_to_shift(struct mm_struct *= mm, struct vm_area_struct *v tlb_finish_mmu(&tlb); return ret; } + +/* + * Similar to huge_pte_alloc except that this can be used to create or walk + * high-granularity mappings. It will automatically split existing HugeTLB= PTEs + * if required by @mode. The resulting HugeTLB PTE will be returned in @hp= te. + * + * There are three options for @mode: + * - HUGETLB_SPLIT_NEVER - Never split. + * - HUGETLB_SPLIT_NONE - Split empty PTEs. + * - HUGETLB_SPLIT_PRESENT - Split present PTEs. + * - HUGETLB_SPLIT_ALWAYS - Split both empty and present PTEs. + */ +int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte, + struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long addr, + unsigned int desired_shift, + enum split_mode mode, + bool write_locked) +{ + struct address_space *mapping =3D vma->vm_file->f_mapping; + bool has_write_lock =3D write_locked; + unsigned long desired_sz =3D 1UL << desired_shift; + int ret; + + BUG_ON(!hpte); + + if (has_write_lock) + i_mmap_assert_write_locked(mapping); + else + i_mmap_assert_locked(mapping); + +retry: + ret =3D 0; + hugetlb_pte_init(hpte); + + ret =3D hugetlb_walk_to(mm, hpte, addr, desired_sz, + !(mode & HUGETLB_SPLIT_NONE)); + if (ret || hugetlb_pte_size(hpte) =3D=3D desired_sz) + goto out; + + if ( + ((mode & HUGETLB_SPLIT_NONE) && hugetlb_pte_none(hpte)) || + ((mode & HUGETLB_SPLIT_PRESENT) && + hugetlb_pte_present_leaf(hpte)) + ) { + if (!has_write_lock) { + i_mmap_unlock_read(mapping); + i_mmap_lock_write(mapping); + has_write_lock =3D true; + goto retry; + } + ret =3D hugetlb_split_to_shift(mm, vma, hpte, addr, + desired_shift); + } + +out: + if (has_write_lock && !write_locked) { + /* Drop the write lock. */ + i_mmap_unlock_write(mapping); + i_mmap_lock_read(mapping); + has_write_lock =3D false; + goto retry; + } + + return ret; +} #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ =20 /* --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85687C43334 for ; Fri, 24 Jun 2022 17:38:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232438AbiFXRih (ORCPT ); Fri, 24 Jun 2022 13:38:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231737AbiFXRhc (ORCPT ); Fri, 24 Jun 2022 13:37:32 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 641C45DF3E for ; Fri, 24 Jun 2022 10:37:30 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-31814f7654dso26698277b3.15 for ; Fri, 24 Jun 2022 10:37:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=lYqZbxAfBwsss+xA5HxS7xzSZhdiSmDQ+D50F9gXIgQ=; b=AiWn9iagaaEqL24+KVa8KjoC0SIZwamKtMJ35pbIDJTojCuL6VXP0VSfXxYM2GJoiA tXj5pA7zO/MZz+I7fQy4xKFovF8i9LNOHZGdUx3KVD9yK1D4zESZShI0E/gGWUdB3D/2 1ADUjQ+gxpS4V0oB9/bdZGTJt2yorQkzSsS+y6oGy9k77oaO80RuBq2WUTr0EJMY1jAO gLoMWS+xm8OqsYrHKuE+kwcZ2qKoOlUObqWrWZk586l/lOiFFSq9bpKQKyql0c1sWbLX UF71Bd5MqWqGJwsN7Z/xzMqXDGiK3xdOGEQTTNo1TpbEk6rqSZV7mNM6CeMM2SuGbv54 jdJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=lYqZbxAfBwsss+xA5HxS7xzSZhdiSmDQ+D50F9gXIgQ=; b=RaxyfB4Qd/WN4aDbd5WPupF9a6EGJ65S8eSzlv/Mhby9W6UAovDUo2dBpQoPfihXXt N035PF97jo9F4ORY4Eojx6Wbj4BP6HWcZUaQN2nMUqMQc4QOiK0FV0wyTxzCplIFpPtb JIMLbLn8ydBCVncHr2VBHJ23FbRtT31E5h2tSbqx0sA7OPCjAKp7dqT5hZytR4zCKcAd 8B4rwecJ5IkmWPAbLmU3w1m/fkapLRo3UVarUl+w6dZrEouHUqDtoMnM5KX7Wlcl5/b0 p9pwcGwYPnloNxUHJK3ym48qmXoA39meCx/RlgukbM99EdFBrd525hOgOBDOYBfYAGst jdfQ== X-Gm-Message-State: AJIora+Tn+leFIxO/JcbAB+0SIzNnF5vMkYdTlRRbgaBMdg0tF5AFH5N QpaaSQhUT/A5VD+jHsWxxMoBEU8HJxvUk2uA X-Google-Smtp-Source: AGRyM1tFgx74zurr2iIb2oIw18HoYxXMoONA1Emkdp55kb3soSpWLVNuC13FbAed12gldve16t6gKku4h/EYaeMe X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:cd0:0:b0:668:f06d:df60 with SMTP id e16-20020a5b0cd0000000b00668f06ddf60mr299470ybr.191.1656092249707; Fri, 24 Jun 2022 10:37:29 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:44 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-15-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 14/26] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This CL is the first main functional HugeTLB change. Together, these changes allow the HugeTLB fault path to handle faults on HGM-enabled VMAs. The two main behaviors that can be done now: 1. Faults can be passed to handle_userfault. (Userspace will want to use UFFD_FEATURE_REAL_ADDRESS to get the real address to know which region they should be call UFFDIO_CONTINUE on later.) 2. Faults on pages that have been partially mapped (and userfaultfd is not being used) will get mapped at the largest possible size. For example, if a 1G page has been partially mapped at 2M, and we fault on an unmapped 2M section, hugetlb_no_page will create a 2M PMD to map the faulting address. This commit does not handle hugetlb_wp right now, and it doesn't handle HugeTLB page migration and swap entries. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 12 ++++ mm/hugetlb.c | 121 +++++++++++++++++++++++++++++++--------- 2 files changed, 106 insertions(+), 27 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 321f5745d87f..ac4ac8fbd901 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1185,6 +1185,9 @@ enum split_mode { #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING /* If HugeTLB high-granularity mappings are enabled for this VMA. */ bool hugetlb_hgm_enabled(struct vm_area_struct *vma); +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *= mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end); int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte, struct mm_struct *mm, struct vm_area_struct *vma, @@ -1197,6 +1200,15 @@ static inline bool hugetlb_hgm_enabled(struct vm_are= a_struct *vma) { return false; } + +static inline +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *= mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + BUG(); +} + static inline int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte, struct mm_struct *mm, struct vm_area_struct *vma, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6e0c5fbfe32c..da30621656b8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5605,18 +5605,24 @@ static inline vm_fault_t hugetlb_handle_userfault(s= truct vm_area_struct *vma, static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, - unsigned long address, pte_t *ptep, + unsigned long address, struct hugetlb_pte *hpte, pte_t old_pte, unsigned int flags) { struct hstate *h =3D hstate_vma(vma); vm_fault_t ret =3D VM_FAULT_SIGBUS; int anon_rmap =3D 0; unsigned long size; - struct page *page; + struct page *page, *subpage; pte_t new_pte; spinlock_t *ptl; unsigned long haddr =3D address & huge_page_mask(h); + unsigned long haddr_hgm =3D address & hugetlb_pte_mask(hpte); bool new_page, new_pagecache_page =3D false; + /* + * This page is getting mapped for the first time, in which case we + * want to increment its mapcount. + */ + bool new_mapping =3D hpte->shift =3D=3D huge_page_shift(h); =20 /* * Currently, we are forced to kill the process in the event the @@ -5665,9 +5671,9 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *m= m, * here. Before returning error, get ptl and make * sure there really is no pte entry. */ - ptl =3D huge_pte_lock(h, mm, ptep); + ptl =3D hugetlb_pte_lock(mm, hpte); ret =3D 0; - if (huge_pte_none(huge_ptep_get(ptep))) + if (hugetlb_pte_none(hpte)) ret =3D vmf_error(PTR_ERR(page)); spin_unlock(ptl); goto out; @@ -5731,18 +5737,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct = *mm, vma_end_reservation(h, vma, haddr); } =20 - ptl =3D huge_pte_lock(h, mm, ptep); + ptl =3D hugetlb_pte_lock(mm, hpte); ret =3D 0; /* If pte changed from under us, retry */ - if (!pte_same(huge_ptep_get(ptep), old_pte)) + if (!pte_same(hugetlb_ptep_get(hpte), old_pte)) goto backout; =20 - if (anon_rmap) { - ClearHPageRestoreReserve(page); - hugepage_add_new_anon_rmap(page, vma, haddr); - } else - page_dup_file_rmap(page, true); - new_pte =3D make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE) + if (new_mapping) { + /* Only increment this page's mapcount if we are mapping it + * for the first time. + */ + if (anon_rmap) { + ClearHPageRestoreReserve(page); + hugepage_add_new_anon_rmap(page, vma, haddr); + } else + page_dup_file_rmap(page, true); + } + + subpage =3D hugetlb_find_subpage(h, page, haddr_hgm); + new_pte =3D make_huge_pte(vma, subpage, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED))); /* * If this pte was previously wr-protected, keep it wr-protected even @@ -5750,12 +5763,13 @@ static vm_fault_t hugetlb_no_page(struct mm_struct = *mm, */ if (unlikely(pte_marker_uffd_wp(old_pte))) new_pte =3D huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte)); - set_huge_pte_at(mm, haddr, ptep, new_pte); + set_huge_pte_at(mm, haddr_hgm, hpte->ptep, new_pte); =20 - hugetlb_count_add(pages_per_huge_page(h), mm); + hugetlb_count_add(hugetlb_pte_size(hpte) / PAGE_SIZE, mm); if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { + BUG_ON(hugetlb_pte_size(hpte) !=3D huge_page_size(h)); /* Optimization, do the COW without a second fault */ - ret =3D hugetlb_wp(mm, vma, address, ptep, flags, page, ptl); + ret =3D hugetlb_wp(mm, vma, address, hpte->ptep, flags, page, ptl); } =20 spin_unlock(ptl); @@ -5816,11 +5830,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, stru= ct vm_area_struct *vma, u32 hash; pgoff_t idx; struct page *page =3D NULL; + struct page *subpage =3D NULL; struct page *pagecache_page =3D NULL; struct hstate *h =3D hstate_vma(vma); struct address_space *mapping; int need_wait_lock =3D 0; unsigned long haddr =3D address & huge_page_mask(h); + unsigned long haddr_hgm; + bool hgm_enabled =3D hugetlb_hgm_enabled(vma); + struct hugetlb_pte hpte; =20 ptep =3D huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { @@ -5866,11 +5884,22 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, stru= ct vm_area_struct *vma, hash =3D hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); =20 - entry =3D huge_ptep_get(ptep); + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h)); + + if (hgm_enabled) { + ret =3D hugetlb_walk_to(mm, &hpte, address, + PAGE_SIZE, /*stop_at_none=3D*/true); + if (ret) { + ret =3D vmf_error(ret); + goto out_mutex; + } + } + + entry =3D hugetlb_ptep_get(&hpte); /* PTE markers should be handled the same way as none pte */ - if (huge_pte_none_mostly(entry)) { - ret =3D hugetlb_no_page(mm, vma, mapping, idx, address, ptep, - entry, flags); + if (hugetlb_pte_none_mostly(&hpte)) { + ret =3D hugetlb_no_page(mm, vma, mapping, idx, address, &hpte, + entry, flags); goto out_mutex; } =20 @@ -5908,14 +5937,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, stru= ct vm_area_struct *vma, vma, haddr); } =20 - ptl =3D huge_pte_lock(h, mm, ptep); + ptl =3D hugetlb_pte_lock(mm, &hpte); =20 /* Check for a racing update before calling hugetlb_wp() */ - if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) + if (unlikely(!pte_same(entry, hugetlb_ptep_get(&hpte)))) goto out_ptl; =20 + /* haddr_hgm is the base address of the region that hpte maps. */ + haddr_hgm =3D address & hugetlb_pte_mask(&hpte); + /* Handle userfault-wp first, before trying to lock more pages */ - if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) && + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(hugetlb_ptep_get(&hpte)) && (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { struct vm_fault vmf =3D { .vma =3D vma, @@ -5939,7 +5971,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, * pagecache_page, so here we need take the former one * when page !=3D pagecache_page or !pagecache_page. */ - page =3D pte_page(entry); + subpage =3D pte_page(entry); + page =3D compound_head(subpage); if (page !=3D pagecache_page) if (!trylock_page(page)) { need_wait_lock =3D 1; @@ -5950,7 +5983,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, =20 if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { if (!huge_pte_write(entry)) { - ret =3D hugetlb_wp(mm, vma, address, ptep, flags, + BUG_ON(hugetlb_pte_size(&hpte) !=3D huge_page_size(h)); + ret =3D hugetlb_wp(mm, vma, address, hpte.ptep, flags, pagecache_page, ptl); goto out_put_page; } else if (likely(flags & FAULT_FLAG_WRITE)) { @@ -5958,9 +5992,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, } } entry =3D pte_mkyoung(entry); - if (huge_ptep_set_access_flags(vma, haddr, ptep, entry, + if (huge_ptep_set_access_flags(vma, haddr_hgm, hpte.ptep, entry, flags & FAULT_FLAG_WRITE)) - update_mmu_cache(vma, haddr, ptep); + update_mmu_cache(vma, haddr_hgm, hpte.ptep); out_put_page: if (page !=3D pagecache_page) unlock_page(page); @@ -6951,7 +6985,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm= _area_struct *vma, pte =3D (pte_t *)pmd_alloc(mm, pud, addr); } } - BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); + if (!hugetlb_hgm_enabled(vma)) + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); =20 return pte; } @@ -7057,6 +7092,38 @@ static unsigned int __shift_for_hstate(struct hstate= *h) (tmp_h) <=3D &hstates[hugetlb_max_hstate]; \ (tmp_h)++) =20 +/* + * Allocate a HugeTLB PTE that maps as much of [start, end) as possible wi= th a + * single page table entry. The allocated HugeTLB PTE is returned in hpte. + */ +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *= mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + struct hstate *h =3D hstate_vma(vma), *tmp_h; + unsigned int shift; + int ret; + + for_each_hgm_shift(h, tmp_h, shift) { + unsigned long sz =3D 1UL << shift; + + if (!IS_ALIGNED(start, sz) || start + sz > end) + continue; + ret =3D huge_pte_alloc_high_granularity(hpte, mm, vma, start, + shift, HUGETLB_SPLIT_NONE, + /*write_locked=3D*/false); + if (ret) + return ret; + + if (hpte->shift > shift) + return -EEXIST; + + BUG_ON(hpte->shift !=3D shift); + return 0; + } + return -EINVAL; +} + /* * Given a particular address, split the HugeTLB PTE that currently maps it * so that, for the given address, the PTE that maps it is `desired_shift`. --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D203FC43334 for ; Fri, 24 Jun 2022 17:38:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232501AbiFXRin (ORCPT ); Fri, 24 Jun 2022 13:38:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231667AbiFXRhc (ORCPT ); Fri, 24 Jun 2022 13:37:32 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B92055DF25 for ; Fri, 24 Jun 2022 10:37:31 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-317bfb7aaacso27132397b3.1 for ; Fri, 24 Jun 2022 10:37:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=LoJsmKx0RIJYFCgh3RUmS9FnkZR9nxjSRnqvv6TZGjM=; b=IZS4DM3M3K6LoUVsC8Bl04dfOKEEAWyhZp/PaBKn0d92TIDnzwMDFN7wLf3gZK8KXk Hqp1S+enLGzviCJsO6DpNpGzxKEp+T0yzrdIwA4S415ZdLqTsoPiTRgFb6aOA0kKPBAY zyheLmcN2aPMf7MJVwN4znu5j9nveQnM+HbQGBNnfXtj5p0uhIS5X82WJqumvoFeu78Z kexFoLJRgVhL59k8quEJaOagGrrlQ0qdHEHVyls+GVapUIwLCmrA+dHzKoYHQOi/e8xv vUWXfvm60zyRjMUV+v9MtvRAUv6UlhcNITygQafNxgge+q1x6oNJBsUIR4wKbgjkQNng /GzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=LoJsmKx0RIJYFCgh3RUmS9FnkZR9nxjSRnqvv6TZGjM=; b=qhJaJYFMdgMXch3NHyQbh2BWhh79AeaKbusfwGTc3gYPNmUNXo12QnXnc03E9BFO2M FxG1bauGMQOE360jE9iRTQOuXqwInl7tF6hPDoCODVTZvHA6tICerruHOoyMatsQWRNp 16LCIE9d7yeBeFEsz4pBzWHejp7U1XBJKlJIRurrcE3wmzASEbMpxY4/njaXim32IUn9 plnlYc0GmrqrKadVuAr/1FRhVTqwCCTHQGac0I+n6M8+XAJxqLrcHBtyk/mNK20ljBQC fuzH+BQyMaW5A9t+hw+oAEvG+BHHtgGdDwShiCk8z2G/plXf7zh5cjSExgMiEf1nPwFz A63w== X-Gm-Message-State: AJIora8WKIvIYBI0GQdzUxqdtQKysUk7jBSrpBQn9S0BdPl5CIax/qX3 Q6LKW8f2xJTnRx+jKQCddUwF2SNuxVZGXJ+4 X-Google-Smtp-Source: AGRyM1se79AgT0Cox1n1PfeD3monM06aR7cLfdIp8zaSZ2VT02A9SECnyrpKDnZIvwtzr+hFf9tsMPJX4L1dTD28 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:7801:0:b0:669:b51b:10d0 with SMTP id t1-20020a257801000000b00669b51b10d0mr331321ybc.204.1656092251057; Fri, 24 Jun 2022 10:37:31 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:45 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-16-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 15/26] hugetlb: make unmapping compatible with high-granularity mappings From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This enlightens __unmap_hugepage_range to deal with high-granularity mappings. This doesn't change its API; it still must be called with hugepage alignment, but it will correctly unmap hugepages that have been mapped at high granularity. Analogous to the mapcount rules introduced by hugetlb_no_page, we only drop mapcount in this case if we are unmapping an entire hugepage in one operation. This is the case when a VMA is destroyed. Eventually, functionality here can be expanded to allow users to call MADV_DONTNEED on PAGE_SIZE-aligned sections of a hugepage, but that is not done here. Signed-off-by: James Houghton --- include/asm-generic/tlb.h | 6 +-- mm/hugetlb.c | 85 ++++++++++++++++++++++++++------------- 2 files changed, 59 insertions(+), 32 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index ff3e82553a76..8daa3ae460d9 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -562,9 +562,9 @@ static inline void tlb_flush_p4d_range(struct mmu_gathe= r *tlb, __tlb_remove_tlb_entry(tlb, ptep, address); \ } while (0) =20 -#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ +#define tlb_remove_huge_tlb_entry(tlb, hpte, address) \ do { \ - unsigned long _sz =3D huge_page_size(h); \ + unsigned long _sz =3D hugetlb_pte_size(&hpte); \ if (_sz >=3D P4D_SIZE) \ tlb_flush_p4d_range(tlb, address, _sz); \ else if (_sz >=3D PUD_SIZE) \ @@ -573,7 +573,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gathe= r *tlb, tlb_flush_pmd_range(tlb, address, _sz); \ else \ tlb_flush_pte_range(tlb, address, _sz); \ - __tlb_remove_tlb_entry(tlb, ptep, address); \ + __tlb_remove_tlb_entry(tlb, hpte.ptep, address);\ } while (0) =20 /** diff --git a/mm/hugetlb.c b/mm/hugetlb.c index da30621656b8..51fc1d3f122f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5120,24 +5120,20 @@ static void __unmap_hugepage_range(struct mmu_gathe= r *tlb, struct vm_area_struct { struct mm_struct *mm =3D vma->vm_mm; unsigned long address; - pte_t *ptep; + struct hugetlb_pte hpte; pte_t pte; spinlock_t *ptl; - struct page *page; + struct page *hpage, *subpage; struct hstate *h =3D hstate_vma(vma); unsigned long sz =3D huge_page_size(h); struct mmu_notifier_range range; bool force_flush =3D false; + bool hgm_enabled =3D hugetlb_hgm_enabled(vma); =20 WARN_ON(!is_vm_hugetlb_page(vma)); BUG_ON(start & ~huge_page_mask(h)); BUG_ON(end & ~huge_page_mask(h)); =20 - /* - * This is a hugetlb vma, all the pte entries should point - * to huge page. - */ - tlb_change_page_size(tlb, sz); tlb_start_vma(tlb, vma); =20 /* @@ -5148,25 +5144,43 @@ static void __unmap_hugepage_range(struct mmu_gathe= r *tlb, struct vm_area_struct adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end); mmu_notifier_invalidate_range_start(&range); address =3D start; - for (; address < end; address +=3D sz) { - ptep =3D huge_pte_offset(mm, address, sz); - if (!ptep) + + while (address < end) { + pte_t *ptep =3D huge_pte_offset(mm, address, sz); + + if (!ptep) { + address +=3D sz; continue; + } + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h)); + if (hgm_enabled) { + int ret =3D huge_pte_alloc_high_granularity( + &hpte, mm, vma, address, PAGE_SHIFT, + HUGETLB_SPLIT_NEVER, + /*write_locked=3D*/true); + /* + * We will never split anything, so this should always + * succeed. + */ + BUG_ON(ret); + } =20 - ptl =3D huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, &address, ptep)) { + ptl =3D hugetlb_pte_lock(mm, &hpte); + if (!hgm_enabled && huge_pmd_unshare( + mm, vma, &address, hpte.ptep)) { spin_unlock(ptl); tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE); force_flush =3D true; - continue; + goto next_hpte; } =20 - pte =3D huge_ptep_get(ptep); - if (huge_pte_none(pte)) { + if (hugetlb_pte_none(&hpte)) { spin_unlock(ptl); - continue; + goto next_hpte; } =20 + pte =3D hugetlb_ptep_get(&hpte); + /* * Migrating hugepage or HWPoisoned hugepage is already * unmapped and its refcount is dropped, so just clear pte here. @@ -5180,24 +5194,27 @@ static void __unmap_hugepage_range(struct mmu_gathe= r *tlb, struct vm_area_struct */ if (pte_swp_uffd_wp_any(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); else - huge_pte_clear(mm, address, ptep, sz); + huge_pte_clear(mm, address, hpte.ptep, + hugetlb_pte_size(&hpte)); spin_unlock(ptl); - continue; + goto next_hpte; } =20 - page =3D pte_page(pte); + subpage =3D pte_page(pte); + BUG_ON(!subpage); + hpage =3D compound_head(subpage); /* * If a reference page is supplied, it is because a specific * page is being unmapped, not a range. Ensure the page we * are about to unmap is the actual page of interest. */ if (ref_page) { - if (page !=3D ref_page) { + if (hpage !=3D ref_page) { spin_unlock(ptl); - continue; + goto next_hpte; } /* * Mark the VMA as having unmapped its page so that @@ -5207,25 +5224,35 @@ static void __unmap_hugepage_range(struct mmu_gathe= r *tlb, struct vm_area_struct set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED); } =20 - pte =3D huge_ptep_get_and_clear(mm, address, ptep); - tlb_remove_huge_tlb_entry(h, tlb, ptep, address); + pte =3D huge_ptep_get_and_clear(mm, address, hpte.ptep); + tlb_change_page_size(tlb, hugetlb_pte_size(&hpte)); + tlb_remove_huge_tlb_entry(tlb, hpte, address); if (huge_pte_dirty(pte)) - set_page_dirty(page); + set_page_dirty(hpage); /* Leave a uffd-wp pte marker if needed */ if (huge_pte_uffd_wp(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); - hugetlb_count_sub(pages_per_huge_page(h), mm); - page_remove_rmap(page, vma, true); + + hugetlb_count_sub(hugetlb_pte_size(&hpte)/PAGE_SIZE, mm); + + /* + * If we are unmapping the entire page, remove it from the + * rmap. + */ + if (IS_ALIGNED(address, sz) && address + sz <=3D end) + page_remove_rmap(hpage, vma, true); =20 spin_unlock(ptl); - tlb_remove_page_size(tlb, page, huge_page_size(h)); + tlb_remove_page_size(tlb, subpage, hugetlb_pte_size(&hpte)); /* * Bail out after unmapping reference page if supplied */ if (ref_page) break; +next_hpte: + address +=3D hugetlb_pte_size(&hpte); } mmu_notifier_invalidate_range_end(&range); tlb_end_vma(tlb, vma); --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8380C433EF for ; Fri, 24 Jun 2022 17:38:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232512AbiFXRir (ORCPT ); Fri, 24 Jun 2022 13:38:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231877AbiFXRhe (ORCPT ); Fri, 24 Jun 2022 13:37:34 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AA6968024 for ; Fri, 24 Jun 2022 10:37:33 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-31859b57239so27211297b3.3 for ; Fri, 24 Jun 2022 10:37:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ndW8f2/FMhLCn5Y2u5fQS3kI5t4rS7D1K1+5Zv5C8rs=; b=czqXqvIE/8k7tydJPozB7auTFGn2IWAhFN0QqqSLWSloUnswocGDGgUeo23aTjkMei kJVRiKE5/tiYCuHldEVJHKMPdnIu1VB4kfdTvkuHI+ANDJBJq4WBaukBrLfqF7kOY/tK MT6za/4MfqzLFtBFRgSQ5jACJHp8mD5dBxqaUXvHHjfCB4WTLOVkleis0beQo12/qieV 0OirUq/zaPSgClUBBFEAredAa0pJS7lZV3B53aynse7fEXzlquNVE8V9XBV/gY5amf1H Zy8GYpedNEgd7iU2Q+yG3Q5EJxgpMOpVeMlp5AHPW2AVdClXpdfObc+IfRZlB5gzBNda kxBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ndW8f2/FMhLCn5Y2u5fQS3kI5t4rS7D1K1+5Zv5C8rs=; b=YxLe2vLRfV+29HquSTz7n/MTPPAn6jsVEbivQGifBO2VpEs4TLbOStyxgkiEskQfzG byFYjtI4qRzmbHXDktW3efc96XUE0voXmRF5TjA3Gbp1Oj0q535dBMdy+SwaaqH3nvcL SYicltwGiT/66JhRU2UOa6CHLMDPHGOzSYW3H7uZHfPC9NxOwuQ9vre1zA1M0piUyUlA gRvLxBQn1IqR4Nopy8dozB6paaxc7nbNeyu/oe6vdhHyGvHKagaJm5P+7ZZ4PF5cVUva RKwGb3eVZKcX9hOEQEJsGARhSX3FSyKf5loZ93/Ekq69mhK9nKp6jHokvyp2SlhF0eYQ L7NQ== X-Gm-Message-State: AJIora/ShY6t3Z7Lsvw05A0T8qkArSh+ntw2M2NnB/ytgtspFXKdm3pb bIDyZjnPy+1sGqf+mXgxjyneSN3rwC+T2/gN X-Google-Smtp-Source: AGRyM1vf9VAPWkUYJ/WFD/esjoiahsG2GN/fHJroFrmfPTAWRMvqzCQsNWId7C9YuKC2vmXazC7krxj0p+91Xop2 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:f30f:0:b0:664:af3b:f83f with SMTP id c15-20020a25f30f000000b00664af3bf83fmr292571ybs.516.1656092252403; Fri, 24 Jun 2022 10:37:32 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:46 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-17-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 16/26] hugetlb: make hugetlb_change_protection compatible with HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" HugeTLB is now able to change the protection of hugepages that are mapped at high granularity. I need to add more of the HugeTLB PTE wrapper functions to clean up this patch. I'll do this in the next version. Signed-off-by: James Houghton --- mm/hugetlb.c | 91 +++++++++++++++++++++++++++++++++++----------------- 1 file changed, 62 insertions(+), 29 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 51fc1d3f122f..f9c7daa6c090 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6476,14 +6476,15 @@ unsigned long hugetlb_change_protection(struct vm_a= rea_struct *vma, { struct mm_struct *mm =3D vma->vm_mm; unsigned long start =3D address; - pte_t *ptep; pte_t pte; struct hstate *h =3D hstate_vma(vma); - unsigned long pages =3D 0, psize =3D huge_page_size(h); + unsigned long base_pages =3D 0, psize =3D huge_page_size(h); bool shared_pmd =3D false; struct mmu_notifier_range range; bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; + struct hugetlb_pte hpte; + bool hgm_enabled =3D hugetlb_hgm_enabled(vma); =20 /* * In the case of shared PMDs, the area to flush could be beyond @@ -6499,28 +6500,38 @@ unsigned long hugetlb_change_protection(struct vm_a= rea_struct *vma, =20 mmu_notifier_invalidate_range_start(&range); i_mmap_lock_write(vma->vm_file->f_mapping); - for (; address < end; address +=3D psize) { + while (address < end) { spinlock_t *ptl; - ptep =3D huge_pte_offset(mm, address, psize); - if (!ptep) + pte_t *ptep =3D huge_pte_offset(mm, address, huge_page_size(h)); + + if (!ptep) { + address +=3D huge_page_size(h); continue; - ptl =3D huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, &address, ptep)) { + } + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h)); + if (hgm_enabled) { + int ret =3D hugetlb_walk_to(mm, &hpte, address, PAGE_SIZE, + /*stop_at_none=3D*/true); + BUG_ON(ret); + } + + ptl =3D hugetlb_pte_lock(mm, &hpte); + if (huge_pmd_unshare(mm, vma, &address, hpte.ptep)) { /* * When uffd-wp is enabled on the vma, unshare * shouldn't happen at all. Warn about it if it * happened due to some reason. */ WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); - pages++; + base_pages +=3D hugetlb_pte_size(&hpte) / PAGE_SIZE; spin_unlock(ptl); shared_pmd =3D true; - continue; + goto next_hpte; } - pte =3D huge_ptep_get(ptep); + pte =3D hugetlb_ptep_get(&hpte); if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) { spin_unlock(ptl); - continue; + goto next_hpte; } if (unlikely(is_hugetlb_entry_migration(pte))) { swp_entry_t entry =3D pte_to_swp_entry(pte); @@ -6540,12 +6551,13 @@ unsigned long hugetlb_change_protection(struct vm_a= rea_struct *vma, newpte =3D pte_swp_mkuffd_wp(newpte); else if (uffd_wp_resolve) newpte =3D pte_swp_clear_uffd_wp(newpte); - set_huge_swap_pte_at(mm, address, ptep, - newpte, psize); - pages++; + set_huge_swap_pte_at(mm, address, hpte.ptep, + newpte, + hugetlb_pte_size(&hpte)); + base_pages +=3D hugetlb_pte_size(&hpte) / PAGE_SIZE; } spin_unlock(ptl); - continue; + goto next_hpte; } if (unlikely(pte_marker_uffd_wp(pte))) { /* @@ -6553,21 +6565,40 @@ unsigned long hugetlb_change_protection(struct vm_a= rea_struct *vma, * no need for huge_ptep_modify_prot_start/commit(). */ if (uffd_wp_resolve) - huge_pte_clear(mm, address, ptep, psize); + huge_pte_clear(mm, address, hpte.ptep, psize); } - if (!huge_pte_none(pte)) { + if (!hugetlb_pte_none(&hpte)) { pte_t old_pte; - unsigned int shift =3D huge_page_shift(hstate_vma(vma)); - - old_pte =3D huge_ptep_modify_prot_start(vma, address, ptep); - pte =3D huge_pte_modify(old_pte, newprot); - pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); - if (uffd_wp) - pte =3D huge_pte_mkuffd_wp(huge_pte_wrprotect(pte)); - else if (uffd_wp_resolve) - pte =3D huge_pte_clear_uffd_wp(pte); - huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); - pages++; + unsigned int shift =3D hpte.shift; + /* + * This is ugly. This will be cleaned up in a future + * version of this series. + */ + if (shift > PAGE_SHIFT) { + old_pte =3D huge_ptep_modify_prot_start( + vma, address, hpte.ptep); + pte =3D huge_pte_modify(old_pte, newprot); + pte =3D arch_make_huge_pte( + pte, shift, vma->vm_flags); + if (uffd_wp) + pte =3D huge_pte_mkuffd_wp(huge_pte_wrprotect(pte)); + else if (uffd_wp_resolve) + pte =3D huge_pte_clear_uffd_wp(pte); + huge_ptep_modify_prot_commit( + vma, address, hpte.ptep, + old_pte, pte); + } else { + old_pte =3D ptep_modify_prot_start( + vma, address, hpte.ptep); + pte =3D pte_modify(old_pte, newprot); + if (uffd_wp) + pte =3D pte_mkuffd_wp(pte_wrprotect(pte)); + else if (uffd_wp_resolve) + pte =3D pte_clear_uffd_wp(pte); + ptep_modify_prot_commit( + vma, address, hpte.ptep, old_pte, pte); + } + base_pages +=3D hugetlb_pte_size(&hpte) / PAGE_SIZE; } else { /* None pte */ if (unlikely(uffd_wp)) @@ -6576,6 +6607,8 @@ unsigned long hugetlb_change_protection(struct vm_are= a_struct *vma, make_pte_marker(PTE_MARKER_UFFD_WP)); } spin_unlock(ptl); +next_hpte: + address +=3D hugetlb_pte_size(&hpte); } /* * Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare @@ -6597,7 +6630,7 @@ unsigned long hugetlb_change_protection(struct vm_are= a_struct *vma, i_mmap_unlock_write(vma->vm_file->f_mapping); mmu_notifier_invalidate_range_end(&range); =20 - return pages << h->order; + return base_pages; } =20 /* Return true if reservation was successful, false otherwise. */ --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4983FC433EF for ; Fri, 24 Jun 2022 17:38:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232518AbiFXRiv (ORCPT ); Fri, 24 Jun 2022 13:38:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231907AbiFXRhi (ORCPT ); Fri, 24 Jun 2022 13:37:38 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E86F68C41 for ; Fri, 24 Jun 2022 10:37:34 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-31814f7654dso26698277b3.15 for ; Fri, 24 Jun 2022 10:37:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=4vXUhYS1gVPcFf2LfwHq3p4PeYS4NxljSjceZ18tGG4=; b=SANzcf+LwCVGpRM55Cay++FK4cUCDPSiB1XYFVIRVPd8bp9OmeUxkkbKxAjSLEPaec rO0L74FDJGW8GBsbPMOk/D6MIlmhHhIDSjtWw7C+OlohYel2WaCeh5uufn8zu/z6hULm /YPEqs3V1KLPyQPBtO4T1n3x5IEA0mIDDhG4OM6wA+fQpkRtkK3I4ZlOCnhtXfYJJBvT zqcEyFkmPgpGY0oBWs35n3cyP752+a4rX4v1dipphzs54KYnjLeWO2uVKbQ1AeddxDAb M9UdJg2b1CXLL7GgWGFeWr1dEZLZQzO1Y1PjZaxJ6F/EO+UtB2CbCiS/IWX6f6NMt489 ntQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=4vXUhYS1gVPcFf2LfwHq3p4PeYS4NxljSjceZ18tGG4=; b=ki9vXsLPrCM1MgrI+KvqKUY9ZMw3kog5ktHMt+fzrDJ0FZgkv5SZ3FpQQpzrkvqjjT qIRbOImyJwD6noI4fYj3wfINTw5wS7oLL7/dxoC9nW3YhPzqrdIy9wcQBpIouyTZcQbe jZAFtF8nhExf6Np2iaTFGzz+H6rWgtnUe+qmaRu7OVYKH1/jqp2LSnqEZcKo5TM99tW+ b2ge24O9w3+JiKXPApSSpMwwxWzKRbjfsYxT34htlrh6zcxCHlSPGNtK5t512pzNPrXZ 9+PVWlqzSRjhZbskoiNwhqLoJUTvInzGwGTxj16pLSm+zvV+YFDhnrVkees9OkSLwA3O GJGQ== X-Gm-Message-State: AJIora81xfY7OffOFH38JOhP22a7lTA/1me3+FJeZz6N6JxAx/op4wWB 1AawGmieO3ICrqg/mzPnN5EHIfTkE//xD/TD X-Google-Smtp-Source: AGRyM1ukQX4IYJ9JitkrwX+RpgB7i6QAoj2im0rwaYwc2vW9f/595sRIyJPCfzsFwuJfCBggqm9wh7FmHwoLS8un X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:b622:0:b0:317:af00:63c3 with SMTP id u34-20020a81b622000000b00317af0063c3mr24103ywh.298.1656092253828; Fri, 24 Jun 2022 10:37:33 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:47 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-18-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 17/26] hugetlb: update follow_hugetlb_page to support HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This enables support for GUP, and it is needed for the KVM demand paging self-test to work. One important change here is that, before, we never needed to grab the i_mmap_sem, but now, to prevent someone from collapsing the page tables out from under us, we grab it for reading when doing high-granularity PT walks. Signed-off-by: James Houghton --- mm/hugetlb.c | 70 ++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 57 insertions(+), 13 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f9c7daa6c090..aadfcee947cf 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6298,14 +6298,18 @@ long follow_hugetlb_page(struct mm_struct *mm, stru= ct vm_area_struct *vma, unsigned long vaddr =3D *position; unsigned long remainder =3D *nr_pages; struct hstate *h =3D hstate_vma(vma); + struct address_space *mapping =3D vma->vm_file->f_mapping; int err =3D -EFAULT, refs; + bool has_i_mmap_sem =3D false; =20 while (vaddr < vma->vm_end && remainder) { pte_t *pte; spinlock_t *ptl =3D NULL; bool unshare =3D false; int absent; + unsigned long pages_per_hpte; struct page *page; + struct hugetlb_pte hpte; =20 /* * If we have a pending SIGKILL, don't keep faulting pages and @@ -6325,9 +6329,23 @@ long follow_hugetlb_page(struct mm_struct *mm, struc= t vm_area_struct *vma, */ pte =3D huge_pte_offset(mm, vaddr & huge_page_mask(h), huge_page_size(h)); - if (pte) - ptl =3D huge_pte_lock(h, mm, pte); - absent =3D !pte || huge_pte_none(huge_ptep_get(pte)); + if (pte) { + hugetlb_pte_populate(&hpte, pte, huge_page_shift(h)); + if (hugetlb_hgm_enabled(vma)) { + BUG_ON(has_i_mmap_sem); + i_mmap_lock_read(mapping); + /* + * Need to hold the mapping semaphore for + * reading to do a HGM walk. + */ + has_i_mmap_sem =3D true; + hugetlb_walk_to(mm, &hpte, vaddr, PAGE_SIZE, + /*stop_at_none=3D*/true); + } + ptl =3D hugetlb_pte_lock(mm, &hpte); + } + + absent =3D !pte || hugetlb_pte_none(&hpte); =20 /* * When coredumping, it suits get_dump_page if we just return @@ -6338,8 +6356,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struc= t vm_area_struct *vma, */ if (absent && (flags & FOLL_DUMP) && !hugetlbfs_pagecache_present(h, vma, vaddr)) { - if (pte) + if (pte) { + if (has_i_mmap_sem) { + i_mmap_unlock_read(mapping); + has_i_mmap_sem =3D false; + } spin_unlock(ptl); + } remainder =3D 0; break; } @@ -6359,8 +6382,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struc= t vm_area_struct *vma, vm_fault_t ret; unsigned int fault_flags =3D 0; =20 - if (pte) + if (pte) { + if (has_i_mmap_sem) { + i_mmap_unlock_read(mapping); + has_i_mmap_sem =3D false; + } spin_unlock(ptl); + } if (flags & FOLL_WRITE) fault_flags |=3D FAULT_FLAG_WRITE; else if (unshare) @@ -6403,8 +6431,11 @@ long follow_hugetlb_page(struct mm_struct *mm, struc= t vm_area_struct *vma, continue; } =20 - pfn_offset =3D (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; - page =3D pte_page(huge_ptep_get(pte)); + pfn_offset =3D (vaddr & ~hugetlb_pte_mask(&hpte)) >> PAGE_SHIFT; + page =3D pte_page(hugetlb_ptep_get(&hpte)); + pages_per_hpte =3D hugetlb_pte_size(&hpte) / PAGE_SIZE; + if (hugetlb_hgm_enabled(vma)) + page =3D compound_head(page); =20 VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && !PageAnonExclusive(page), page); @@ -6414,17 +6445,21 @@ long follow_hugetlb_page(struct mm_struct *mm, stru= ct vm_area_struct *vma, * and skip the same_page loop below. */ if (!pages && !vmas && !pfn_offset && - (vaddr + huge_page_size(h) < vma->vm_end) && - (remainder >=3D pages_per_huge_page(h))) { - vaddr +=3D huge_page_size(h); - remainder -=3D pages_per_huge_page(h); - i +=3D pages_per_huge_page(h); + (vaddr + pages_per_hpte < vma->vm_end) && + (remainder >=3D pages_per_hpte)) { + vaddr +=3D pages_per_hpte; + remainder -=3D pages_per_hpte; + i +=3D pages_per_hpte; spin_unlock(ptl); + if (has_i_mmap_sem) { + has_i_mmap_sem =3D false; + i_mmap_unlock_read(mapping); + } continue; } =20 /* vaddr may not be aligned to PAGE_SIZE */ - refs =3D min3(pages_per_huge_page(h) - pfn_offset, remainder, + refs =3D min3(pages_per_hpte - pfn_offset, remainder, (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT); =20 if (pages || vmas) @@ -6447,6 +6482,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struc= t vm_area_struct *vma, if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs, flags))) { spin_unlock(ptl); + if (has_i_mmap_sem) { + has_i_mmap_sem =3D false; + i_mmap_unlock_read(mapping); + } remainder =3D 0; err =3D -ENOMEM; break; @@ -6458,8 +6497,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struc= t vm_area_struct *vma, i +=3D refs; =20 spin_unlock(ptl); + if (has_i_mmap_sem) { + has_i_mmap_sem =3D false; + i_mmap_unlock_read(mapping); + } } *nr_pages =3D remainder; + BUG_ON(has_i_mmap_sem); /* * setting position is actually required only if remainder is * not zero but it's faster not to add a "if (remainder)" --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F2DDC43334 for ; Fri, 24 Jun 2022 17:38:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232524AbiFXRiz (ORCPT ); Fri, 24 Jun 2022 13:38:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231929AbiFXRhj (ORCPT ); Fri, 24 Jun 2022 13:37:39 -0400 Received: from mail-vk1-xa4a.google.com (mail-vk1-xa4a.google.com [IPv6:2607:f8b0:4864:20::a4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E97B069A9E for ; Fri, 24 Jun 2022 10:37:35 -0700 (PDT) Received: by mail-vk1-xa4a.google.com with SMTP id bi53-20020a05612218b500b0036c4f0a665aso919415vkb.13 for ; Fri, 24 Jun 2022 10:37:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=PksJjdHLiQ+Ffd6zQef9Iz4jywxKAioMqz57hWInkDg=; b=CVSOUoyNCnP0rQqtT4BE26Dr49QTmhaHr+EPb5lqChGIPzWFaKi2n/SA62lBxC6G/B 38JLajUBFsj5yNNTJYJpPlG3zt4pbxEEo2Rz5Lr/JGAlR2IwlDVrB8t1N7Y+51hiEkuD gcnSn3SJ6wF8gQl0GgUcoaWzUop7tCNZg00gOvNwMpyc7M5qd3SPq9S8bqdni3sMcV35 kXFlC8wCSqKpXC2OZoAMiwb0Uh8BsCsg9QMBeTSA8RKx5u8y//crTwU2xRxSAPBpji+j 8CrB/r/r4To8R+R8wlCzmWqlOMTd2sXPyHS+kZrv02x6KWrEb//6NFB/h8l/58GK1WFV fg6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=PksJjdHLiQ+Ffd6zQef9Iz4jywxKAioMqz57hWInkDg=; b=f48cD1RlcRsSzweARPv5M6TwdvUte9SbTDEYX3hC6A5WjAt9kl2Z/v/gvuWgckA0/w 2wbedoWnjr9BI1Q+bpI4+S1tdF/u8G2bDH7zb1NO65tvb3spolNRbMlhAl0RGzY25KLO mtBATeOAZFdr93HPAGmFuOUAy6BSrqzHhlj9zO0T8FhgunwGfmsjuORW4Sm+H7ZhqXmu ZqQ0bPQTIsNDAdoNccs99eXb/r+E7jIrcUxDl6US/SYi7PvLp7cEjPqc6nl8QwD/Jovg nCETtR5KkI2fqjrxtcd3o2mNJwqoVtF0fE94YGgLnF9/cIWehrpp4b8LmhxPf6+bY+69 fTRg== X-Gm-Message-State: AJIora/XOLwCmMrmSTk4qOsZbqklQB7phba1N1BGIiOmGz1ymLLadVSH +F/ATAILHUzXVAQ5nqyojL92Dhqv7LHiUrRl X-Google-Smtp-Source: AGRyM1uzsUo/NuKSmF0jRF4mJpyyTbf5p62c3XbnqsReOpJTYqhsntb/uUYVHZ67smUJaqTgbmvBTzx9NYHbRHzK X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6122:10c7:b0:36c:1d34:16bc with SMTP id l7-20020a05612210c700b0036c1d3416bcmr17244vko.22.1656092255089; Fri, 24 Jun 2022 10:37:35 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:48 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-19-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 18/26] hugetlb: use struct hugetlb_pte for walk_hugetlb_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Although this change is large, it is somewhat straightforward. Before, all users of walk_hugetlb_range could get the size of the PTE just be checking the hmask or the mm_walk struct. With HGM, that information is held in the hugetlb_pte struct, so we provide that instead of the raw pte_t*. Signed-off-by: James Houghton --- arch/s390/mm/gmap.c | 8 ++++++-- fs/proc/task_mmu.c | 35 +++++++++++++++++++---------------- include/linux/pagewalk.h | 3 ++- mm/damon/vaddr.c | 34 ++++++++++++++++++---------------- mm/hmm.c | 7 ++++--- mm/mempolicy.c | 11 ++++++++--- mm/mincore.c | 4 ++-- mm/mprotect.c | 6 +++--- mm/pagewalk.c | 18 ++++++++++++++++-- 9 files changed, 78 insertions(+), 48 deletions(-) diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index b8ae4a4aa2ba..518cebfd72cd 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2620,10 +2620,14 @@ static int __s390_enable_skey_pmd(pmd_t *pmd, unsig= ned long addr, return 0; } =20 -static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, - unsigned long hmask, unsigned long next, +static int __s390_enable_skey_hugetlb(struct hugetlb_pte *hpte, + unsigned long addr, unsigned long next, struct mm_walk *walk) { + if (!hugetlb_pte_present_leaf(hpte) || + hugetlb_pte_size(hpte) !=3D PMD_SIZE) + return -EINVAL; + pmd_t *pmd =3D (pmd_t *)pte; unsigned long start, end; struct page *page =3D pmd_page(*pmd); diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 2d04e3470d4c..b2d683f99fa9 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -714,18 +714,19 @@ static void show_smap_vma_flags(struct seq_file *m, s= truct vm_area_struct *vma) } =20 #ifdef CONFIG_HUGETLB_PAGE -static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, +static int smaps_hugetlb_range(struct hugetlb_pte *hpte, unsigned long addr, unsigned long end, struct mm_walk *walk) { struct mem_size_stats *mss =3D walk->private; struct vm_area_struct *vma =3D walk->vma; struct page *page =3D NULL; + pte_t pte =3D hugetlb_ptep_get(hpte); =20 - if (pte_present(*pte)) { - page =3D vm_normal_page(vma, addr, *pte); - } else if (is_swap_pte(*pte)) { - swp_entry_t swpent =3D pte_to_swp_entry(*pte); + if (hugetlb_pte_present_leaf(hpte)) { + page =3D vm_normal_page(vma, addr, pte); + } else if (is_swap_pte(pte)) { + swp_entry_t swpent =3D pte_to_swp_entry(pte); =20 if (is_pfn_swap_entry(swpent)) page =3D pfn_swap_entry_to_page(swpent); @@ -734,9 +735,9 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned lon= g hmask, int mapcount =3D page_mapcount(page); =20 if (mapcount >=3D 2) - mss->shared_hugetlb +=3D huge_page_size(hstate_vma(vma)); + mss->shared_hugetlb +=3D hugetlb_pte_size(hpte); else - mss->private_hugetlb +=3D huge_page_size(hstate_vma(vma)); + mss->private_hugetlb +=3D hugetlb_pte_size(hpte); } return 0; } @@ -1535,7 +1536,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned lo= ng addr, unsigned long end, =20 #ifdef CONFIG_HUGETLB_PAGE /* This function walks within one hugetlb entry in the single call */ -static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, +static int pagemap_hugetlb_range(struct hugetlb_pte *hpte, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -1543,13 +1544,13 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsig= ned long hmask, struct vm_area_struct *vma =3D walk->vma; u64 flags =3D 0, frame =3D 0; int err =3D 0; - pte_t pte; + unsigned long hmask =3D hugetlb_pte_mask(hpte); =20 if (vma->vm_flags & VM_SOFTDIRTY) flags |=3D PM_SOFT_DIRTY; =20 - pte =3D huge_ptep_get(ptep); - if (pte_present(pte)) { + if (hugetlb_pte_present_leaf(hpte)) { + pte_t pte =3D hugetlb_ptep_get(hpte); struct page *page =3D pte_page(pte); =20 if (!PageAnon(page)) @@ -1565,7 +1566,7 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigne= d long hmask, if (pm->show_pfn) frame =3D pte_pfn(pte) + ((addr & ~hmask) >> PAGE_SHIFT); - } else if (pte_swp_uffd_wp_any(pte)) { + } else if (pte_swp_uffd_wp_any(hugetlb_ptep_get(hpte))) { flags |=3D PM_UFFD_WP; } =20 @@ -1869,17 +1870,19 @@ static int gather_pte_stats(pmd_t *pmd, unsigned lo= ng addr, return 0; } #ifdef CONFIG_HUGETLB_PAGE -static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, struct mm_walk *walk) +static int gather_hugetlb_stats(struct hugetlb_pte *hpte, unsigned long ad= dr, + unsigned long end, struct mm_walk *walk) { - pte_t huge_pte =3D huge_ptep_get(pte); + pte_t huge_pte =3D hugetlb_ptep_get(hpte); struct numa_maps *md; struct page *page; =20 - if (!pte_present(huge_pte)) + if (!hugetlb_pte_present_leaf(hpte)) return 0; =20 page =3D pte_page(huge_pte); + if (page !=3D compound_head(page)) + return 0; =20 md =3D walk->private; gather_stats(page, md, pte_dirty(huge_pte), 1); diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index ac7b38ad5903..0d21e25df37f 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -3,6 +3,7 @@ #define _LINUX_PAGEWALK_H =20 #include +#include =20 struct mm_walk; =20 @@ -47,7 +48,7 @@ struct mm_walk_ops { unsigned long next, struct mm_walk *walk); int (*pte_hole)(unsigned long addr, unsigned long next, int depth, struct mm_walk *walk); - int (*hugetlb_entry)(pte_t *pte, unsigned long hmask, + int (*hugetlb_entry)(struct hugetlb_pte *hpte, unsigned long addr, unsigned long next, struct mm_walk *walk); int (*test_walk)(unsigned long addr, unsigned long next, diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 59e1653799f8..ce50b937dcf2 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -324,14 +324,15 @@ static int damon_mkold_pmd_entry(pmd_t *pmd, unsigned= long addr, } =20 #ifdef CONFIG_HUGETLB_PAGE -static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm, +static void damon_hugetlb_mkold(struct hugetlb_pte *hpte, struct mm_struct= *mm, struct vm_area_struct *vma, unsigned long addr) { bool referenced =3D false; pte_t entry =3D huge_ptep_get(pte); struct page *page =3D pte_page(entry); + struct page *hpage =3D compound_head(page); =20 - get_page(page); + get_page(hpage); =20 if (pte_young(entry)) { referenced =3D true; @@ -342,18 +343,18 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm= _struct *mm, =20 #ifdef CONFIG_MMU_NOTIFIER if (mmu_notifier_clear_young(mm, addr, - addr + huge_page_size(hstate_vma(vma)))) + addr + hugetlb_pte_size(hpte)); referenced =3D true; #endif /* CONFIG_MMU_NOTIFIER */ =20 if (referenced) - set_page_young(page); + set_page_young(hpage); =20 - set_page_idle(page); - put_page(page); + set_page_idle(hpage); + put_page(hpage); } =20 -static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask, +static int damon_mkold_hugetlb_entry(struct hugetlb_pte *hpte, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -361,12 +362,12 @@ static int damon_mkold_hugetlb_entry(pte_t *pte, unsi= gned long hmask, spinlock_t *ptl; pte_t entry; =20 - ptl =3D huge_pte_lock(h, walk->mm, pte); - entry =3D huge_ptep_get(pte); + ptl =3D huge_pte_lock_shift(hpte->shift, walk->mm, hpte->ptep); + entry =3D huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto out; =20 - damon_hugetlb_mkold(pte, walk->mm, walk->vma, addr); + damon_hugetlb_mkold(hpte, walk->mm, walk->vma, addr); =20 out: spin_unlock(ptl); @@ -474,31 +475,32 @@ static int damon_young_pmd_entry(pmd_t *pmd, unsigned= long addr, } =20 #ifdef CONFIG_HUGETLB_PAGE -static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask, +static int damon_young_hugetlb_entry(struct hugetlb_pte *hpte, unsigned long addr, unsigned long end, struct mm_walk *walk) { struct damon_young_walk_private *priv =3D walk->private; struct hstate *h =3D hstate_vma(walk->vma); - struct page *page; + struct page *page, *hpage; spinlock_t *ptl; pte_t entry; =20 - ptl =3D huge_pte_lock(h, walk->mm, pte); + ptl =3D huge_pte_lock_shift(hpte->shift, walk->mm, hpte->ptep); entry =3D huge_ptep_get(pte); if (!pte_present(entry)) goto out; =20 page =3D pte_page(entry); - get_page(page); + hpage =3D compound_head(page); + get_page(hpage); =20 - if (pte_young(entry) || !page_is_idle(page) || + if (pte_young(entry) || !page_is_idle(hpage) || mmu_notifier_test_young(walk->mm, addr)) { *priv->page_sz =3D huge_page_size(h); priv->young =3D true; } =20 - put_page(page); + put_page(hpage); =20 out: spin_unlock(ptl); diff --git a/mm/hmm.c b/mm/hmm.c index 3fd3242c5e50..1ad5d76fa8be 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -472,7 +472,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long = start, unsigned long end, #endif =20 #ifdef CONFIG_HUGETLB_PAGE -static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, +static int hmm_vma_walk_hugetlb_entry(struct hugetlb_pte *hpte, unsigned long start, unsigned long end, struct mm_walk *walk) { @@ -483,11 +483,12 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, uns= igned long hmask, unsigned int required_fault; unsigned long pfn_req_flags; unsigned long cpu_flags; + unsigned long hmask =3D hugetlb_pte_mask(hpte); spinlock_t *ptl; pte_t entry; =20 - ptl =3D huge_pte_lock(hstate_vma(vma), walk->mm, pte); - entry =3D huge_ptep_get(pte); + ptl =3D huge_pte_lock_shift(hpte->shift, walk->mm, hpte->ptep); + entry =3D huge_ptep_get(hpte->ptep); =20 i =3D (start - range->start) >> PAGE_SHIFT; pfn_req_flags =3D range->hmm_pfns[i]; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d39b01fd52fe..a1d82db7c19f 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -559,7 +559,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned l= ong addr, return addr !=3D end ? -EIO : 0; } =20 -static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask, +static int queue_pages_hugetlb(struct hugetlb_pte *hpte, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -571,8 +571,13 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned lo= ng hmask, spinlock_t *ptl; pte_t entry; =20 - ptl =3D huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte); - entry =3D huge_ptep_get(pte); + /* We don't migrate high-granularity HugeTLB mappings for now. */ + if (hugetlb_pte_size(hpte) !=3D + huge_page_size(hstate_vma(walk->vma))) + return -EINVAL; + + ptl =3D hugetlb_pte_lock(walk->mm, hpte); + entry =3D hugetlb_ptep_get(hpte); if (!pte_present(entry)) goto unlock; page =3D pte_page(entry); diff --git a/mm/mincore.c b/mm/mincore.c index fa200c14185f..dc1717dc6a2c 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -22,7 +22,7 @@ #include #include "swap.h" =20 -static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long = addr, +static int mincore_hugetlb(struct hugetlb_pte *hpte, unsigned long addr, unsigned long end, struct mm_walk *walk) { #ifdef CONFIG_HUGETLB_PAGE @@ -33,7 +33,7 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmas= k, unsigned long addr, * Hugepages under user process are always in RAM and never * swapped out, but theoretically it needs to be checked. */ - present =3D pte && !huge_pte_none(huge_ptep_get(pte)); + present =3D hpte->ptep && !hugetlb_pte_none(hpte); for (; addr !=3D end; vec++, addr +=3D PAGE_SIZE) *vec =3D present; walk->private =3D vec; diff --git a/mm/mprotect.c b/mm/mprotect.c index ba5592655ee3..9c5a35a1c0eb 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -476,12 +476,12 @@ static int prot_none_pte_entry(pte_t *pte, unsigned l= ong addr, 0 : -EACCES; } =20 -static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask, +static int prot_none_hugetlb_entry(struct hugetlb_pte *hpte, unsigned long addr, unsigned long next, struct mm_walk *walk) { - return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ? - 0 : -EACCES; + return pfn_modify_allowed(pte_pfn(*hpte->ptep), + *(pgprot_t *)(walk->private)) ? 0 : -EACCES; } =20 static int prot_none_test(unsigned long addr, unsigned long next, diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 9b3db11a4d1d..f8e24a0a0179 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -3,6 +3,7 @@ #include #include #include +#include =20 /* * We want to know the real level where a entry is located ignoring any @@ -301,13 +302,26 @@ static int walk_hugetlb_range(unsigned long addr, uns= igned long end, pte_t *pte; const struct mm_walk_ops *ops =3D walk->ops; int err =3D 0; + struct hugetlb_pte hpte; =20 do { - next =3D hugetlb_entry_end(h, addr, end); pte =3D huge_pte_offset(walk->mm, addr & hmask, sz); + if (!pte) { + next =3D hugetlb_entry_end(h, addr, end); + } else { + hugetlb_pte_populate(&hpte, pte, huge_page_shift(h)); + if (hugetlb_hgm_enabled(vma)) { + err =3D hugetlb_walk_to(walk->mm, &hpte, addr, + PAGE_SIZE, + /*stop_at_none=3D*/true); + if (err) + break; + } + next =3D min(addr + hugetlb_pte_size(&hpte), end); + } =20 if (pte) - err =3D ops->hugetlb_entry(pte, hmask, addr, next, walk); + err =3D ops->hugetlb_entry(&hpte, addr, next, walk); else if (ops->pte_hole) err =3D ops->pte_hole(addr, next, -1, walk); =20 --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A5AFC43334 for ; Fri, 24 Jun 2022 17:39:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232531AbiFXRi7 (ORCPT ); Fri, 24 Jun 2022 13:38:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231817AbiFXRhj (ORCPT ); Fri, 24 Jun 2022 13:37:39 -0400 Received: from mail-vk1-xa4a.google.com (mail-vk1-xa4a.google.com [IPv6:2607:f8b0:4864:20::a4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FBA7609EF for ; Fri, 24 Jun 2022 10:37:37 -0700 (PDT) Received: by mail-vk1-xa4a.google.com with SMTP id p185-20020a1fd8c2000000b0036c453f2ea4so918259vkg.18 for ; Fri, 24 Jun 2022 10:37:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=BEnHzeA8IqLcgv72Sg9jh80/umPdhd6MjUiQ8hqfSFI=; b=GdzVxrX7wkYjgErfDdqCIkYSvWPbbIDKSxW5Wzna9kLimEZLIF7v/kjfzJ0/MJJI8C Nan7vqP8mjs7qbiftpzOotx0G0VhhFdl+a3xoAwOSm1ekZLV4zffUeKakFvQsIpRI1Zt tqdgg1fUIcbUxn4sKjY6xnWu7yiFtN8U8XUowuMeteG3pczq9XwOEfTu1w4q9aFihIch Gea+lKdHnWOnA7lB1uuW7+Uv8ZFug+g0pv2kbNnErXzZs6E84L9Efj3yIVwEZKkFg8tY dzRXPhe7pxjxGRQKU8w+ukahEWnGGDq1tUSkIQMfu89Z2YI4CkLnSQBu54xqTuD+UzJw axPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=BEnHzeA8IqLcgv72Sg9jh80/umPdhd6MjUiQ8hqfSFI=; b=zswNRVVii+Ivh856yVvn3FXk7aHAeHn+zRNgwzIfxc2WH7pxKDZ90DnbPJG6X87rZ7 QMAresIAVgX8mPFyqnBm+ImGk9qN4oMhilEdxlH1ZGqsxHHcgI4uPK6brTccivBkHXrF pX0/FgHUNTwapYGytYkZp2S2r3f8IJY+5PdqwqsdI2PGm+vxp1zO23qrqEuIz6WQsull +RZgddL2Nz95IUqIW5/XNzmB9f8SghQBkPhNlRvxw9JsfWmM3KhwFgmkElce0mJnm5E6 TqAkN7RmhbjB6MwEMe/JSXnbhIxs6N5F/5IILkbTo/v9A5YR6DjbByKVyf7QwS6Q7l6Q OZUA== X-Gm-Message-State: AJIora/aWDEezu47cZCrRm8ulVED4msV3qdiyVkzD24BxOaklZNRppr3 K6wL36J+WRKpdJVPhi2KU9KGLSz5MM3H2wfk X-Google-Smtp-Source: AGRyM1uf1g766JLfRQ4yT/c5CHXe9SFwBo1833UUdGMaiNufztGVkND1yyG58uFyUB+Rm94Hkh8sbhU0SS7vep8r X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a1f:3456:0:b0:368:ca4e:1bd1 with SMTP id b83-20020a1f3456000000b00368ca4e1bd1mr4581vka.36.1656092256495; Fri, 24 Jun 2022 10:37:36 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:49 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-20-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This allows fork() to work with high-granularity mappings. The page table structure is copied such that partially mapped regions will remain partially mapped in the same way for the new process. Signed-off-by: James Houghton --- mm/hugetlb.c | 74 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 59 insertions(+), 15 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index aadfcee947cf..0ec2f231524e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4851,7 +4851,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry, dst_entry; - struct page *ptepage; + struct hugetlb_pte src_hpte, dst_hpte; + struct page *ptepage, *hpage; unsigned long addr; bool cow =3D is_cow_mapping(src_vma->vm_flags); struct hstate *h =3D hstate_vma(src_vma); @@ -4878,17 +4879,44 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, i_mmap_lock_read(mapping); } =20 - for (addr =3D src_vma->vm_start; addr < src_vma->vm_end; addr +=3D sz) { + addr =3D src_vma->vm_start; + while (addr < src_vma->vm_end) { spinlock_t *src_ptl, *dst_ptl; + unsigned long hpte_sz; src_pte =3D huge_pte_offset(src, addr, sz); - if (!src_pte) + if (!src_pte) { + addr +=3D sz; continue; + } dst_pte =3D huge_pte_alloc(dst, dst_vma, addr, sz); if (!dst_pte) { ret =3D -ENOMEM; break; } =20 + hugetlb_pte_populate(&src_hpte, src_pte, huge_page_shift(h)); + hugetlb_pte_populate(&dst_hpte, dst_pte, huge_page_shift(h)); + + if (hugetlb_hgm_enabled(src_vma)) { + BUG_ON(!hugetlb_hgm_enabled(dst_vma)); + ret =3D hugetlb_walk_to(src, &src_hpte, addr, + PAGE_SIZE, /*stop_at_none=3D*/true); + if (ret) + break; + ret =3D huge_pte_alloc_high_granularity( + &dst_hpte, dst, dst_vma, addr, + hugetlb_pte_shift(&src_hpte), + HUGETLB_SPLIT_NONE, + /*write_locked=3D*/false); + if (ret) + break; + + src_pte =3D src_hpte.ptep; + dst_pte =3D dst_hpte.ptep; + } + + hpte_sz =3D hugetlb_pte_size(&src_hpte); + /* * If the pagetables are shared don't copy or take references. * dst_pte =3D=3D src_pte is the common case of src/dest sharing. @@ -4899,16 +4927,19 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, * after taking the lock below. */ dst_entry =3D huge_ptep_get(dst_pte); - if ((dst_pte =3D=3D src_pte) || !huge_pte_none(dst_entry)) + if ((dst_pte =3D=3D src_pte) || !hugetlb_pte_none(&dst_hpte)) { + addr +=3D hugetlb_pte_size(&src_hpte); continue; + } =20 - dst_ptl =3D huge_pte_lock(h, dst, dst_pte); - src_ptl =3D huge_pte_lockptr(huge_page_shift(h), src, src_pte); + dst_ptl =3D hugetlb_pte_lock(dst, &dst_hpte); + src_ptl =3D hugetlb_pte_lockptr(src, &src_hpte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry =3D huge_ptep_get(src_pte); dst_entry =3D huge_ptep_get(dst_pte); again: - if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) { + if (hugetlb_pte_none(&src_hpte) || + !hugetlb_pte_none(&dst_hpte)) { /* * Skip if src entry none. Also, skip in the * unlikely case dst entry !none as this implies @@ -4931,11 +4962,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, if (userfaultfd_wp(src_vma) && uffd_wp) entry =3D huge_pte_mkuffd_wp(entry); set_huge_swap_pte_at(src, addr, src_pte, - entry, sz); + entry, hpte_sz); } if (!userfaultfd_wp(dst_vma) && uffd_wp) entry =3D huge_pte_clear_uffd_wp(entry); - set_huge_swap_pte_at(dst, addr, dst_pte, entry, sz); + set_huge_swap_pte_at(dst, addr, dst_pte, entry, + hpte_sz); } else if (unlikely(is_pte_marker(entry))) { /* * We copy the pte marker only if the dst vma has @@ -4946,7 +4978,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, } else { entry =3D huge_ptep_get(src_pte); ptepage =3D pte_page(entry); - get_page(ptepage); + hpage =3D compound_head(ptepage); + get_page(hpage); =20 /* * Failing to duplicate the anon rmap is a rare case @@ -4959,9 +4992,16 @@ int copy_hugetlb_page_range(struct mm_struct *dst, s= truct mm_struct *src, * sleep during the process. */ if (!PageAnon(ptepage)) { - page_dup_file_rmap(ptepage, true); + /* Only dup_rmap once for a page */ + if (IS_ALIGNED(addr, sz)) + page_dup_file_rmap(hpage, true); } else if (page_try_dup_anon_rmap(ptepage, true, src_vma)) { + if (hugetlb_hgm_enabled(src_vma)) { + ret =3D -EINVAL; + break; + } + BUG_ON(!IS_ALIGNED(addr, hugetlb_pte_size(&src_hpte))); pte_t src_pte_old =3D entry; struct page *new; =20 @@ -4970,13 +5010,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, /* Do not use reserve as it's private owned */ new =3D alloc_huge_page(dst_vma, addr, 1); if (IS_ERR(new)) { - put_page(ptepage); + put_page(hpage); ret =3D PTR_ERR(new); break; } - copy_user_huge_page(new, ptepage, addr, dst_vma, + copy_user_huge_page(new, hpage, addr, dst_vma, npages); - put_page(ptepage); + put_page(hpage); =20 /* Install the new huge page if src pte stable */ dst_ptl =3D huge_pte_lock(h, dst, dst_pte); @@ -4994,6 +5034,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, hugetlb_install_page(dst_vma, dst_pte, addr, new); spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr +=3D hugetlb_pte_size(&src_hpte); continue; } =20 @@ -5010,10 +5051,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, } =20 set_huge_pte_at(dst, addr, dst_pte, entry); - hugetlb_count_add(npages, dst); + hugetlb_count_add( + hugetlb_pte_size(&dst_hpte) / PAGE_SIZE, + dst); } spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr +=3D hugetlb_pte_size(&src_hpte); } =20 if (cow) { --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F0EEC433EF for ; Fri, 24 Jun 2022 17:39:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232296AbiFXRjE (ORCPT ); Fri, 24 Jun 2022 13:39:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232163AbiFXRhs (ORCPT ); Fri, 24 Jun 2022 13:37:48 -0400 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8445869FA5 for ; Fri, 24 Jun 2022 10:37:38 -0700 (PDT) Received: by mail-ua1-x949.google.com with SMTP id j15-20020ab0644f000000b00368bac468ceso1007918uap.17 for ; Fri, 24 Jun 2022 10:37:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=eidAN3wZMLr6mteQRN/ZtvQWu3ENTPRsJrFQKIye6Xw=; b=ifSq4WGIJ2MCNV5i5c6CLm1k1Z7n2P0GnMgBr+kmdUIdSKCtXRvtwj81oWCo0YoMrQ IcDr8n7yQfUrrSxAkN1YXwEqM5IfWAKR1PNsnVeRNVm2rrNwgsnyKXFCZPsKFz18eJJi AjC4fr/5aG8LaYEcmLNbETyNpFrp12C+EhOfYQCwYxaqx8OG+RRnBg3h+rYInBsaXiGS KRXgjFMiQk5PwqNRHbBuU728wz+WYRDzZh3FNG4M1JnAhfkHDi44T6MgrmY0SGp9Cbs2 7hO2DcqZyx877IoqKOW+5z0GVcwNVXUbeq0rR28lZa597DKuk/BcTnZ/dqDW4zwXaf5J ajig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=eidAN3wZMLr6mteQRN/ZtvQWu3ENTPRsJrFQKIye6Xw=; b=ESkuC3L3Uyaz2YVYZczuDq5ZkTxXVEMK+AhpzX+W3zUSwfg964vnasycVdliqsLP6o YWvvC3AHe5zsKS64ge75Cqra7Zoi+I0OaixNtKJSbauxi1Q7UDrXd1fNTApKF2ZAl4nf H6NW7ZcD4rSCy3pM6nEaU5T0O19yLdHOL+bovl7J/Nn40d3MP73IaC/Ss+kKM1rxV+KU Ri1jT7Ekl1hbeiyOXPGtSkQM9wQ2Uo0BTtcdZxoK4j4rUkjZNzQcDfAOUysZ3KpqmcXh fabRy/IUw4PNSjRz6zqZMCJKi8vrLP+cRLj71/tb7f/QVXW2ZFTVE6Omt4Sj4Gqxx/Rw pPeA== X-Gm-Message-State: AJIora8jIq9+FQu+ZbqzHUCNYUdpRNpD7XLAfxzt3eKxl7zz6c/WrkTU FN1kBmLnzxOY2KE/ztwX5yi9PEZidimLMxbY X-Google-Smtp-Source: AGRyM1vWVTZJ5pVWjn7x2PEWVIBLimy1j0CTpYliO/gy+6d80cSvRPXbRG7kYQht6u2HA8oY0zQFqOyEJ9POXahm X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:5c41:0:b0:356:20ab:2f29 with SMTP id q62-20020a675c41000000b0035620ab2f29mr1454923vsb.63.1656092257795; Fri, 24 Jun 2022 10:37:37 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:50 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-21-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The changes here are very similar to the changes made to hugetlb_no_page, where we do a high-granularity page table walk and do accounting slightly differently because we are mapping only a piece of a page. Signed-off-by: James Houghton --- fs/userfaultfd.c | 3 +++ include/linux/hugetlb.h | 6 +++-- mm/hugetlb.c | 54 +++++++++++++++++++++----------------- mm/userfaultfd.c | 57 +++++++++++++++++++++++++++++++---------- 4 files changed, 82 insertions(+), 38 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index e943370107d0..77c1b8a7d0b9 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -245,6 +245,9 @@ static inline bool userfaultfd_huge_must_wait(struct us= erfaultfd_ctx *ctx, if (!ptep) goto out; =20 + if (hugetlb_hgm_enabled(vma)) + goto out; + ret =3D false; pte =3D huge_ptep_get(ptep); =20 diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ac4ac8fbd901..c207b1ac6195 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -221,13 +221,15 @@ unsigned long hugetlb_total_pages(void); vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags); #ifdef CONFIG_USERFAULTFD -int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, +int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, + struct hugetlb_pte *dst_hpte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, struct page **pagep, - bool wp_copy); + bool wp_copy, + bool new_mapping); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0ec2f231524e..09fa57599233 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5808,6 +5808,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *m= m, vma_end_reservation(h, vma, haddr); } =20 + /* This lock will get pretty expensive at 4K. */ ptl =3D hugetlb_pte_lock(mm, hpte); ret =3D 0; /* If pte changed from under us, retry */ @@ -6098,24 +6099,26 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, stru= ct vm_area_struct *vma, * modifications for huge pages. */ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, - pte_t *dst_pte, + struct hugetlb_pte *dst_hpte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, struct page **pagep, - bool wp_copy) + bool wp_copy, + bool new_mapping) { bool is_continue =3D (mode =3D=3D MCOPY_ATOMIC_CONTINUE); struct hstate *h =3D hstate_vma(dst_vma); struct address_space *mapping =3D dst_vma->vm_file->f_mapping; + unsigned long haddr =3D dst_addr & huge_page_mask(h); pgoff_t idx =3D vma_hugecache_offset(h, dst_vma, dst_addr); unsigned long size; int vm_shared =3D dst_vma->vm_flags & VM_SHARED; pte_t _dst_pte; spinlock_t *ptl; int ret =3D -ENOMEM; - struct page *page; + struct page *page, *subpage; int writable; bool page_in_pagecache =3D false; =20 @@ -6130,12 +6133,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_= mm, * a non-missing case. Return -EEXIST. */ if (vm_shared && - hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + hugetlbfs_pagecache_present(h, dst_vma, haddr)) { ret =3D -EEXIST; goto out; } =20 - page =3D alloc_huge_page(dst_vma, dst_addr, 0); + page =3D alloc_huge_page(dst_vma, haddr, 0); if (IS_ERR(page)) { ret =3D -ENOMEM; goto out; @@ -6151,13 +6154,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_= mm, /* Free the allocated page which may have * consumed a reservation. */ - restore_reserve_on_error(h, dst_vma, dst_addr, page); + restore_reserve_on_error(h, dst_vma, haddr, page); put_page(page); =20 /* Allocate a temporary page to hold the copied * contents. */ - page =3D alloc_huge_page_vma(h, dst_vma, dst_addr); + page =3D alloc_huge_page_vma(h, dst_vma, haddr); if (!page) { ret =3D -ENOMEM; goto out; @@ -6171,14 +6174,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_= mm, } } else { if (vm_shared && - hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + hugetlbfs_pagecache_present(h, dst_vma, haddr)) { put_page(*pagep); ret =3D -EEXIST; *pagep =3D NULL; goto out; } =20 - page =3D alloc_huge_page(dst_vma, dst_addr, 0); + page =3D alloc_huge_page(dst_vma, haddr, 0); if (IS_ERR(page)) { ret =3D -ENOMEM; *pagep =3D NULL; @@ -6216,8 +6219,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, page_in_pagecache =3D true; } =20 - ptl =3D huge_pte_lockptr(huge_page_shift(h), dst_mm, dst_pte); - spin_lock(ptl); + ptl =3D hugetlb_pte_lock(dst_mm, dst_hpte); =20 /* * Recheck the i_size after holding PT lock to make sure not @@ -6239,14 +6241,16 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_= mm, * registered, we firstly wr-protect a none pte which has no page cache * page backing it, then access the page. */ - if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) + if (!hugetlb_pte_none_mostly(dst_hpte)) goto out_release_unlock; =20 - if (vm_shared) { - page_dup_file_rmap(page, true); - } else { - ClearHPageRestoreReserve(page); - hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); + if (new_mapping) { + if (vm_shared) { + page_dup_file_rmap(page, true); + } else { + ClearHPageRestoreReserve(page); + hugepage_add_new_anon_rmap(page, dst_vma, haddr); + } } =20 /* @@ -6258,7 +6262,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_m= m, else writable =3D dst_vma->vm_flags & VM_WRITE; =20 - _dst_pte =3D make_huge_pte(dst_vma, page, writable); + subpage =3D hugetlb_find_subpage(h, page, dst_addr); + if (subpage !=3D page) + BUG_ON(!hugetlb_hgm_enabled(dst_vma)); + + _dst_pte =3D make_huge_pte(dst_vma, subpage, writable); /* * Always mark UFFDIO_COPY page dirty; note that this may not be * extremely important for hugetlbfs for now since swapping is not @@ -6271,14 +6279,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_= mm, if (wp_copy) _dst_pte =3D huge_pte_mkuffd_wp(_dst_pte); =20 - set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); + set_huge_pte_at(dst_mm, dst_addr, dst_hpte->ptep, _dst_pte); =20 - (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte, - dst_vma->vm_flags & VM_WRITE); - hugetlb_count_add(pages_per_huge_page(h), dst_mm); + (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_hpte->ptep, + _dst_pte, dst_vma->vm_flags & VM_WRITE); + hugetlb_count_add(hugetlb_pte_size(dst_hpte) / PAGE_SIZE, dst_mm); =20 /* No need to invalidate - it was non-present before */ - update_mmu_cache(dst_vma, dst_addr, dst_pte); + update_mmu_cache(dst_vma, dst_addr, dst_hpte->ptep); =20 spin_unlock(ptl); if (!is_continue) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 4f4892a5f767..ee40d98068bf 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -310,14 +310,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb= (struct mm_struct *dst_mm, { int vm_shared =3D dst_vma->vm_flags & VM_SHARED; ssize_t err; - pte_t *dst_pte; unsigned long src_addr, dst_addr; long copied; struct page *page; - unsigned long vma_hpagesize; + unsigned long vma_hpagesize, vma_altpagesize; pgoff_t idx; u32 hash; struct address_space *mapping; + bool use_hgm =3D hugetlb_hgm_enabled(dst_vma) && + mode =3D=3D MCOPY_ATOMIC_CONTINUE; + struct hstate *h =3D hstate_vma(dst_vma); =20 /* * There is no default zero huge page for all huge page sizes as @@ -335,12 +337,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb= (struct mm_struct *dst_mm, copied =3D 0; page =3D NULL; vma_hpagesize =3D vma_kernel_pagesize(dst_vma); + if (use_hgm) + vma_altpagesize =3D PAGE_SIZE; + else + vma_altpagesize =3D vma_hpagesize; =20 /* * Validate alignment based on huge page size */ err =3D -EINVAL; - if (dst_start & (vma_hpagesize - 1) || len & (vma_hpagesize - 1)) + if (dst_start & (vma_altpagesize - 1) || len & (vma_altpagesize - 1)) goto out_unlock; =20 retry: @@ -361,6 +367,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, vm_shared =3D dst_vma->vm_flags & VM_SHARED; } =20 + BUG_ON(!vm_shared && use_hgm); + /* * If not shared, ensure the dst_vma has a anon_vma. */ @@ -371,11 +379,13 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb= (struct mm_struct *dst_mm, } =20 while (src_addr < src_start + len) { + struct hugetlb_pte hpte; + bool new_mapping; BUG_ON(dst_addr >=3D dst_start + len); =20 /* * Serialize via i_mmap_rwsem and hugetlb_fault_mutex. - * i_mmap_rwsem ensures the dst_pte remains valid even + * i_mmap_rwsem ensures the hpte.ptep remains valid even * in the case of shared pmds. fault mutex prevents * races with other faulting threads. */ @@ -383,27 +393,47 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb= (struct mm_struct *dst_mm, i_mmap_lock_read(mapping); idx =3D linear_page_index(dst_vma, dst_addr); hash =3D hugetlb_fault_mutex_hash(mapping, idx); + /* This lock will get expensive at 4K. */ mutex_lock(&hugetlb_fault_mutex_table[hash]); =20 - err =3D -ENOMEM; - dst_pte =3D huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); - if (!dst_pte) { + err =3D 0; + + pte_t *ptep =3D huge_pte_alloc(dst_mm, dst_vma, dst_addr, + vma_hpagesize); + if (!ptep) + err =3D -ENOMEM; + else { + hugetlb_pte_populate(&hpte, ptep, + huge_page_shift(h)); + /* + * If the hstate-level PTE is not none, then a mapping + * was previously established. + * The per-hpage mutex prevents double-counting. + */ + new_mapping =3D hugetlb_pte_none(&hpte); + if (use_hgm) + err =3D hugetlb_alloc_largest_pte(&hpte, dst_mm, dst_vma, + dst_addr, + dst_start + len); + } + + if (err) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); goto out_unlock; } =20 if (mode !=3D MCOPY_ATOMIC_CONTINUE && - !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { + !hugetlb_pte_none_mostly(&hpte)) { err =3D -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); goto out_unlock; } =20 - err =3D hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, + err =3D hugetlb_mcopy_atomic_pte(dst_mm, &hpte, dst_vma, dst_addr, src_addr, mode, &page, - wp_copy); + wp_copy, new_mapping); =20 mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -413,6 +443,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, if (unlikely(err =3D=3D -ENOENT)) { mmap_read_unlock(dst_mm); BUG_ON(!page); + BUG_ON(hpte.shift !=3D huge_page_shift(h)); =20 err =3D copy_huge_page_from_user(page, (const void __user *)src_addr, @@ -430,9 +461,9 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, BUG_ON(page); =20 if (!err) { - dst_addr +=3D vma_hpagesize; - src_addr +=3D vma_hpagesize; - copied +=3D vma_hpagesize; + dst_addr +=3D hugetlb_pte_size(&hpte); + src_addr +=3D hugetlb_pte_size(&hpte); + copied +=3D hugetlb_pte_size(&hpte); =20 if (fatal_signal_pending(current)) err =3D -EINTR; --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:04 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAB2CC433EF for ; Fri, 24 Jun 2022 17:39:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232544AbiFXRjH (ORCPT ); Fri, 24 Jun 2022 13:39:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231758AbiFXRht (ORCPT ); Fri, 24 Jun 2022 13:37:49 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8483F69FB0 for ; Fri, 24 Jun 2022 10:37:40 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id h82-20020a25d055000000b00668b6a4ee32so2795078ybg.3 for ; Fri, 24 Jun 2022 10:37:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=qZvlqd2OvHLqh9twYpoJb6ja2/RhkWxX/89zPHTcn/8=; b=HypVRxxS8PLtYfgYxeV9SmynEG7TPz9Ce9bLT5zNY+BqsBZ2dR4Yu4comEv+XUubMf Tmv+PSPoTEgf52wvkhnc+37jzKxSIz2QTdTgR7IQrEhC+hxTC0PzE5yPLYSLsF+ypT+o 1OeHChfqJn7nF+ydLkcR+31+kCdErcxCg8RBgd1BXnunKktxWtdnnRX0+mbYnTgG3k0r JrUchYTC7GVTsL0xlXQAvH8al5VjVjNSgxGZ6hOsGdc8TeDaNczkB4rHzNv7Qy1NkMg2 e5uS8qHvOfJ8S8UR/qFyiENp/V1amoyWL0FT1OODwP+y6Qxkpk2El/2APigbVyzdYVIR wp2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=qZvlqd2OvHLqh9twYpoJb6ja2/RhkWxX/89zPHTcn/8=; b=CnHQHAY9mSfE+8MAkonYqLHNatYz+rBYnL9Dl/MbW4g3PEcnRkzY+7dUZx/dEJIivV vfJGHu4EAHNK5Zssr/i9mxcvLL6aiT56tjwFdA9Yg4/NVPjN95t14VjtYH0cGb5/m5DE OejC6roPlgE+SyS+WHwhQBtsCj5g9kyBfRqOX2TTLcj/qdL9Rd9WowuKnTJtoLFpHLAh /CR9PZyDh+t3f7T93fC0h3euk5YfSu/EBccQ6XYQDpKmIukpuJwKDCFpJ4vIoEgJdE5P wojKIrQWi+dgE1Nh4PVqTfl+58/IjBc8kSUvWbh/SMsud7bKT+TnVuhxkXfYsQeQElqo JGyQ== X-Gm-Message-State: AJIora9CWvd/EHCwXEzCnI0V/4qPpIM/iyr2oBdXRJNnzpFlNHc6nAKr l/Rz4sROGXQW9xT2HCh8GBngkRihm0zIqvcp X-Google-Smtp-Source: AGRyM1vcrrrISl0XVFny3ew+g+bNEBjQOnnF9F+zSGfcQuntVV3TKgvjvU6TjNDpiUuNx/38+gEcAslpNt52T6De X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:c7cb:0:b0:664:67aa:36a7 with SMTP id w194-20020a25c7cb000000b0066467aa36a7mr298952ybe.548.1656092259292; Fri, 24 Jun 2022 10:37:39 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:51 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-22-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 21/26] hugetlb: add hugetlb_collapse From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is what implements MADV_COLLAPSE for HugeTLB pages. This is a necessary extension to the UFFDIO_CONTINUE changes. When userspace finishes mapping an entire hugepage with UFFDIO_CONTINUE, the kernel has no mechanism to automatically collapse the page table to map the whole hugepage normally. We require userspace to inform us that they would like the hugepages to be collapsed; they do this with MADV_COLLAPSE. If userspace has not mapped all of a hugepage with UFFDIO_CONTINUE, but only some, hugetlb_collapse will cause the requested range to be mapped as if it were UFFDIO_CONTINUE'd already. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 7 ++++ mm/hugetlb.c | 88 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 95 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c207b1ac6195..438057dc3b75 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1197,6 +1197,8 @@ int huge_pte_alloc_high_granularity(struct hugetlb_pt= e *hpte, unsigned int desired_sz, enum split_mode mode, bool write_locked); +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end); #else static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { @@ -1221,6 +1223,11 @@ static inline int huge_pte_alloc_high_granularity(st= ruct hugetlb_pte *hpte, { return -EINVAL; } +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + return -EINVAL; +} #endif =20 static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 09fa57599233..70bb3a1342d9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7280,6 +7280,94 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hp= te, struct mm_struct *mm, return -EINVAL; } =20 +/* + * Collapse the address range from @start to @end to be mapped optimally. + * + * This is only valid for shared mappings. The main use case for this func= tion + * is following UFFDIO_CONTINUE. If a user UFFDIO_CONTINUEs an entire huge= page + * by calling UFFDIO_CONTINUE once for each 4K region, the kernel doesn't = know + * to collapse the mapping after the final UFFDIO_CONTINUE. Instead, we le= ave + * it up to userspace to tell us to do so, via MADV_COLLAPSE. + * + * Any holes in the mapping will be filled. If there is no page in the + * pagecache for a region we're collapsing, the PTEs will be cleared. + */ +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + struct hstate *h =3D hstate_vma(vma); + struct address_space *mapping =3D vma->vm_file->f_mapping; + struct mmu_notifier_range range; + struct mmu_gather tlb; + struct hstate *tmp_h; + unsigned int shift; + unsigned long curr =3D start; + int ret =3D 0; + struct page *hpage, *subpage; + pgoff_t idx; + bool writable =3D vma->vm_flags & VM_WRITE; + bool shared =3D vma->vm_flags & VM_SHARED; + pte_t entry; + + /* + * This is only supported for shared VMAs, because we need to look up + * the page to use for any PTEs we end up creating. + */ + if (!shared) + return -EINVAL; + + i_mmap_assert_write_locked(mapping); + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, + start, end); + mmu_notifier_invalidate_range_start(&range); + tlb_gather_mmu(&tlb, mm); + + while (curr < end) { + for_each_hgm_shift(h, tmp_h, shift) { + unsigned long sz =3D 1UL << shift; + struct hugetlb_pte hpte; + + if (!IS_ALIGNED(curr, sz) || curr + sz > end) + continue; + + hugetlb_pte_init(&hpte); + ret =3D hugetlb_walk_to(mm, &hpte, curr, sz, + /*stop_at_none=3D*/false); + if (ret) + goto out; + if (hugetlb_pte_size(&hpte) >=3D sz) + goto hpte_finished; + + idx =3D vma_hugecache_offset(h, vma, curr); + hpage =3D find_lock_page(mapping, idx); + hugetlb_free_range(&tlb, &hpte, curr, + curr + hugetlb_pte_size(&hpte)); + if (!hpage) { + hugetlb_pte_clear(mm, &hpte, curr); + goto hpte_finished; + } + + subpage =3D hugetlb_find_subpage(h, hpage, curr); + entry =3D make_huge_pte_with_shift(vma, subpage, + writable, shift); + set_huge_pte_at(mm, curr, hpte.ptep, entry); + unlock_page(hpage); +hpte_finished: + curr +=3D hugetlb_pte_size(&hpte); + goto next; + } + ret =3D -EINVAL; + goto out; +next: + continue; + } +out: + tlb_finish_mmu(&tlb); + mmu_notifier_invalidate_range_end(&range); + return ret; +} + /* * Given a particular address, split the HugeTLB PTE that currently maps it * so that, for the given address, the PTE that maps it is `desired_shift`. --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:04 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A094C43334 for ; Fri, 24 Jun 2022 17:39:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232558AbiFXRjM (ORCPT ); Fri, 24 Jun 2022 13:39:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232176AbiFXRht (ORCPT ); Fri, 24 Jun 2022 13:37:49 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 84F616DB25 for ; Fri, 24 Jun 2022 10:37:41 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id k206-20020a2556d7000000b0066c815d2aa5so941335ybb.23 for ; Fri, 24 Jun 2022 10:37:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=HisKs0OAYNP82NuB/Frk0iPqJLb2lc+sAZjNm287jcQ=; b=P/aHZ9fJlVwvkJfaZVFajZm98KOIfMj2IYppFM5LuQ1zf9EsSZ8qMdlGw+rfUUepxT Rk4rneqzgNI3hBkLTvsDOw/sovXOPg7QHxBlzbSMDv947qVGHCYD/AKf6PpAucF+SB8+ CPR/TNLo08h/tyhtM460hIdHHPcABjdzuulR5lTk4iFehrldnIL1GfqZda2In7YFtggB usvpcPQWqZ0WphYnX3EhkMBx+Typu02xyZyvlY2VIUMvBbNVFv8qTl1N/MQr/QOO/hLC HQXC2pLyWYxCtUrqnUx0Na56WcsA/KgE6sPe3nZndOHiUWkRsKxAtTwz+BZWdqDY7Yot OYtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=HisKs0OAYNP82NuB/Frk0iPqJLb2lc+sAZjNm287jcQ=; b=inpdvmUfXcdc8NIsDaRe0EJei0qQrrijB2J9nk69Ct0hMQfF80yT0hks0joGf7OlBB /lEYxDaWll/g9p4JUNEEI0gAhWZHRdAOJM2JC9ioTkKN8LwF7/qVEFcXJY4nj3R4YbZT j75grbvusV9QE24XQAlGeRyxQsjNR5qIHlU/ANShMkVHluXGqeJAb6T3NH9U+4g+zLas fPTmyVigB7A3j/JVFg0z6tVryRBNaGvzAk/KoK35eAPB6GFMuZiCWMX6OqDGeRSo4oQB IKTuhUd58TkYBe1idkrX0+yXWp8+mvjLH2Ve3HIAlZ9ogandy/CP0tZ1FqRd+psQNyYR PHrw== X-Gm-Message-State: AJIora+j3In+lotx9Pp94AwTqD455KBpjRQjvim4IXqCSkXW3S9itAOs xIXjcoUiRWLOEp0BO8o8hrfC/FTJhXiEbLrb X-Google-Smtp-Source: AGRyM1tzk7GXyftSzcieFt/kefrccg/jUZ3bHYFjHpboz2q85ML1dIMb8Z3beFS8X7RsMAKgtTvTxLeNH4OzIoqy X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:e60e:0:b0:317:893c:90a3 with SMTP id p14-20020a0de60e000000b00317893c90a3mr18394744ywe.241.1656092260720; Fri, 24 Jun 2022 10:37:40 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:52 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-23-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 22/26] madvise: add uapi for HugeTLB HGM collapse: MADV_COLLAPSE From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This commit is co-opting the same madvise mode that is being introduced by zokeefe@google.com to manually collapse THPs[1]. As with the rest of the high-granularity mapping support, MADV_COLLAPSE is only supported for shared VMAs right now. [1] https://lore.kernel.org/linux-mm/20220604004004.954674-10-zokeefe@googl= e.com/ Signed-off-by: James Houghton --- include/uapi/asm-generic/mman-common.h | 2 ++ mm/madvise.c | 23 +++++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-gene= ric/mman-common.h index 6c1aa92a92e4..b686920ca731 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -77,6 +77,8 @@ =20 #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages to= o */ =20 +#define MADV_COLLAPSE 25 /* collapse an address range into hugepages */ + /* compatibility flags */ #define MAP_FILE 0 =20 diff --git a/mm/madvise.c b/mm/madvise.c index d7b4f2602949..c624c0f02276 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -59,6 +59,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_FREE: case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: + case MADV_COLLAPSE: return 0; default: /* be safe, default to 1. list exceptions explicitly */ @@ -981,6 +982,20 @@ static long madvise_remove(struct vm_area_struct *vma, return error; } =20 +static int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + bool shared =3D vma->vm_flags & VM_SHARED; + *prev =3D vma; + + /* Only allow collapsing for HGM-enabled, shared mappings. */ + if (!is_vm_hugetlb_page(vma) || !hugetlb_hgm_enabled(vma) || !shared) + return -EINVAL; + + return hugetlb_collapse(vma->vm_mm, vma, start, end); +} + /* * Apply an madvise behavior to a region of a vma. madvise_update_vma * will handle splitting a vm area into separate areas, each area with its= own @@ -1011,6 +1026,8 @@ static int madvise_vma_behavior(struct vm_area_struct= *vma, case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: return madvise_populate(vma, prev, start, end, behavior); + case MADV_COLLAPSE: + return madvise_collapse(vma, prev, start, end); case MADV_NORMAL: new_flags =3D new_flags & ~VM_RAND_READ & ~VM_SEQ_READ; break; @@ -1158,6 +1175,9 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_MEMORY_FAILURE case MADV_SOFT_OFFLINE: case MADV_HWPOISON: +#endif +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + case MADV_COLLAPSE: #endif return true; =20 @@ -1351,6 +1371,9 @@ int madvise_set_anon_name(struct mm_struct *mm, unsig= ned long start, * triggering read faults if required * MADV_POPULATE_WRITE - populate (prefault) page tables writable by * triggering write faults if required + * MADV_COLLAPSE - collapse a high-granularity HugeTLB mapping into huge + * mappings. This is useful after an entire hugepage has been + * mapped with individual small UFFDIO_CONTINUE operations. * * return values: * zero - success --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:04 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC313C433EF for ; Fri, 24 Jun 2022 17:39:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232456AbiFXRjQ (ORCPT ); Fri, 24 Jun 2022 13:39:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232204AbiFXRh4 (ORCPT ); Fri, 24 Jun 2022 13:37:56 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D60436DB28 for ; Fri, 24 Jun 2022 10:37:42 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id e85-20020a25e758000000b00668ad2fcfdcso2776845ybh.8 for ; Fri, 24 Jun 2022 10:37:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=fAGdwktIt8KMFEeY6rcBub88vVCunW5aTmCo6YUaLic=; b=Vv9tNv7i10/uP/TErfNMavUn+CONiybyM+rrVRU58xqDyKcuGlAEDDrYb4H8klnBqK 6JRS9rF1Ax96bUDo9E8Q4msDjYxTQRaGDpgp0PfneuVYFjBRZnNER/PyTbYsKCHGqcfZ wZWHRDAniodz7pO51uXdkiH13sfi6Axk4M+P3ph2OFO2iVkw+faALWo+6f5rgiVVvomg DpY1Vfkw9K59AUl480vwthENZtLgRMWXBf+ShceS/WJ9l4dL6v1IQjWdqA0Jh9uJzr7B 6UZ1PTjJenNdIGVZeRP15MnVvJZeuiEXKSxtsNuQDFnn+yupWoF7oiY8vsG+FIoUiDIP Hx5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=fAGdwktIt8KMFEeY6rcBub88vVCunW5aTmCo6YUaLic=; b=29DP/W2ZUO6j5FvloLuOzmHuL+hT6QPOn3T/ewUJbwqNUithCSw67Z1XhPI3pw5mbB bBOizCSeuoPP3OIbvLGULkNazts3wMvnxwYU4TYeNU0LX4Uf1DNvDlKfRKHE4+xsQuwf QWTeYM9jeRWBnXRGfWWzlkcf1fsHsuFTClVGZQeFXGuktwWXnSHKZzygqnHvF/26bFfZ UDt5wyJV3Hy53WI7+/99qeIuJ1cKSEd4hplj2vtT35fRnuOTUTxRwzW+Kxo8If/0m+2d rFKvi3WQyqcQ4kxbd6kouYShO5OW5+Oi06Eb/EHcW8S8FbfS1cQq167w5vngimvGnfdY bhQQ== X-Gm-Message-State: AJIora+61CntuQgoeBrWGUTJRe1dLmSEZNSzluujcxR5STHj+iM9OKqP OokfuV7sw4lRillueQf6kjRnYo1tKEEAmW6w X-Google-Smtp-Source: AGRyM1s9InsMUuonAQPM2rPsoKgLt0oAO15w8QDXDBNTwbyi4uqbeJxT6CBZesWu39P67C1HtW9QkiD1Grt/J/X2 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:64f:0:b0:318:8841:b44e with SMTP id 76-20020a81064f000000b003188841b44emr29471ywg.216.1656092262019; Fri, 24 Jun 2022 10:37:42 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:53 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-24-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 23/26] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is so that userspace is aware that their kernel was compiled with HugeTLB high-granularity mapping and that UFFDIO_CONTINUE down to PAGE_SIZE-aligned chunks are valid. Signed-off-by: James Houghton --- fs/userfaultfd.c | 7 ++++++- include/uapi/linux/userfaultfd.h | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 77c1b8a7d0b9..59bfdb7a67e0 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1935,10 +1935,15 @@ static int userfaultfd_api(struct userfaultfd_ctx *= ctx, goto err_out; /* report all available features and ioctls to userland */ uffdio_api.features =3D UFFD_API_FEATURES; + #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR uffdio_api.features &=3D ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); -#endif +#ifndef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + uffdio_api.features &=3D ~UFFD_FEATURE_MINOR_HUGETLBFS_HGM; +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ + #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP uffdio_api.features &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; #endif diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index 7d32b1e797fb..50fbcb0bcba0 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -32,6 +32,7 @@ UFFD_FEATURE_SIGBUS | \ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ + UFFD_FEATURE_MINOR_HUGETLBFS_HGM | \ UFFD_FEATURE_MINOR_SHMEM | \ UFFD_FEATURE_EXACT_ADDRESS | \ UFFD_FEATURE_WP_HUGETLBFS_SHMEM) @@ -213,6 +214,7 @@ struct uffdio_api { #define UFFD_FEATURE_MINOR_SHMEM (1<<10) #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) +#define UFFD_FEATURE_MINOR_HUGETLBFS_HGM (1<<13) __u64 features; =20 __u64 ioctls; --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:04 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15E8CC43334 for ; Fri, 24 Jun 2022 17:39:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232579AbiFXRjU (ORCPT ); Fri, 24 Jun 2022 13:39:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232203AbiFXRh4 (ORCPT ); Fri, 24 Jun 2022 13:37:56 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FEAC6DB37 for ; Fri, 24 Jun 2022 10:37:44 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-2dc7bdd666fso26872577b3.7 for ; Fri, 24 Jun 2022 10:37:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=+v5vFcEkCeptJMisViVmMo9Bs1ZO+THs4heiTRIsd+o=; b=NNXX2wgNjzsnd7NQe74D6jQQ3JbhNS1xVYTChLiQX5nqdIcnMSKpyyj3i3EvpGCsEm RHq6youwXY0rUgRMsMTqDuY+7MTgv+Ntfb9USuGpLdovFk8QaUNnj/5hu4tOf0Mr8Ypa tM+TcTpJN5YH+/RNf30TRecLDgM5H/Zp5MJnxEVgf0YteupwUS+Lg1C1mvGu3kxmJaAP xgJKg9LuXjQrzoqn63vyXZqC4Ckt6dxdIBoGoTX+pIFxhc2k6bw9SxJVfy3KdgryLAmu 8iaVOe1LIhnXOC1SpxD4kIV9TfjqwXSaMv6b0r1/9KFQEay+2Tk+s5gsM4+trTfyjtP4 tSMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=+v5vFcEkCeptJMisViVmMo9Bs1ZO+THs4heiTRIsd+o=; b=TlRMTihQpXKV8+Are596oJW0TjHT/rcUziUppx3KlB6cRADZZ219m5MDFl6qlZncJP q5QvFQF8AeYtgsG8I7BrhHpA/fkx+E9ST7v3v3dZRF7gVMOG31v2jBpSvwb3XaerCG3A cxeXGjQLWEiKdzYBXBWTCp5vtY3sjC3upr9jPZt3kCpOcBxl85CE4ZCgQiJoIw8wEbe6 DvSA4OZ6LqMPYJuj4m7vyvUOwHFybxuGf3ek63bYyRUk9ZoTxNoxU7sNnGADw973fy1i F/6upRw6KxBHRPTk5ft0lctaXPsU6UIR9VXS2xG/6k6jV9Ch3h5x85f8Vs5mbfYvk3fB LmmQ== X-Gm-Message-State: AJIora+36zAvojn7jCdX3mWozWckcaymCa4SiSVPzaNFCCQde7kG5Bzs 1datimYXiCUwbi+ZDfpvC2tlYuhvfNzmrYVK X-Google-Smtp-Source: AGRyM1uKY4yvxaVhU0NJgxRylqUzzNHeETpBTfmI2EV5YsAV33yJ/2TcaSldBOT8fH//yUIflu07auIqQ61eftvT X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:5e42:0:b0:31b:6254:1c2b with SMTP id s63-20020a815e42000000b0031b62541c2bmr3298ywb.35.1656092263238; Fri, 24 Jun 2022 10:37:43 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:54 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-25-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 24/26] arm64/hugetlb: add support for high-granularity mappings From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is included in this RFC to demonstrate how an architecture that doesn't use ARCH_WANT_GENERAL_HUGETLB can be updated to support HugeTLB high-granularity mappings: an architecture just needs to implement hugetlb_walk_to. Signed-off-by: James Houghton --- arch/arm64/Kconfig | 1 + arch/arm64/mm/hugetlbpage.c | 63 +++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 1652a9800ebe..74108713a99a 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -99,6 +99,7 @@ config ARM64 select ARCH_WANT_FRAME_POINTERS select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && = !ARM64_VA_BITS_36) select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP + select ARCH_HAS_SPECIAL_HUGETLB_HGM select ARCH_WANT_LD_ORPHAN_WARN select ARCH_WANTS_NO_INSTR select ARCH_HAS_UBSAN_SANITIZE_ALL diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index e2a5ec9fdc0d..1901818bed9d 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -281,6 +281,69 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsign= ed long addr, set_pte(ptep, pte); } =20 +int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr, unsigned long sz, bool stop_at_none) +{ + pgd_t *pgdp; + p4d_t *p4dp; + pte_t *ptep; + + if (!hpte->ptep) { + pgdp =3D pgd_offset(mm, addr); + p4dp =3D p4d_offset(pgdp, addr); + if (!p4dp) + return -ENOMEM; + hugetlb_pte_populate(hpte, (pte_t *)p4dp, P4D_SHIFT); + } + + while (hugetlb_pte_size(hpte) > sz && + !hugetlb_pte_present_leaf(hpte) && + !(stop_at_none && hugetlb_pte_none(hpte))) { + if (hpte->shift =3D=3D PMD_SHIFT) { + unsigned long rounded_addr =3D sz =3D=3D CONT_PTE_SIZE + ? addr & CONT_PTE_MASK + : addr; + + ptep =3D pte_offset_kernel((pmd_t *)hpte->ptep, + rounded_addr); + if (!ptep) + return -ENOMEM; + if (sz =3D=3D CONT_PTE_SIZE) + hpte->shift =3D CONT_PTE_SHIFT; + else + hpte->shift =3D pte_cont(*ptep) ? CONT_PTE_SHIFT + : PAGE_SHIFT; + hpte->ptep =3D ptep; + } else if (hpte->shift =3D=3D PUD_SHIFT) { + pud_t *pudp =3D (pud_t *)hpte->ptep; + + ptep =3D (pte_t *)pmd_alloc(mm, pudp, addr); + + if (!ptep) + return -ENOMEM; + if (sz =3D=3D CONT_PMD_SIZE) + hpte->shift =3D CONT_PMD_SHIFT; + else + hpte->shift =3D pte_cont(*ptep) ? CONT_PMD_SHIFT + : PMD_SHIFT; + hpte->ptep =3D ptep; + } else if (hpte->shift =3D=3D P4D_SHIFT) { + ptep =3D (pte_t *)pud_alloc(mm, (p4d_t *)hpte->ptep, addr); + if (!ptep) + return -ENOMEM; + hpte->shift =3D PUD_SHIFT; + hpte->ptep =3D ptep; + } else + /* + * This also catches the cases of CONT_PMD_SHIFT and + * CONT_PTE_SHIFT. Those PTEs should always be leaves. + */ + BUG(); + } + + return 0; +} + pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:04 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A280EC43334 for ; Fri, 24 Jun 2022 17:39:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232587AbiFXRjX (ORCPT ); Fri, 24 Jun 2022 13:39:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232267AbiFXRiB (ORCPT ); Fri, 24 Jun 2022 13:38:01 -0400 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8753D6E78B for ; Fri, 24 Jun 2022 10:37:45 -0700 (PDT) Received: by mail-ua1-x94a.google.com with SMTP id z7-20020ab05647000000b003793ef68061so1035728uaa.2 for ; Fri, 24 Jun 2022 10:37:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=SIBOH6opXUz/XwGM5Y6ya4CpHP7LmoYYYTh8F0m81no=; b=Pbkkfo9sSsWByEA3MDQOtz+Q2704OLexkMuoEPgStT8U3UpSblQ3IEPUJoumDhRsCS KF3OzIl1hpNYAOnJFSoAUINBxdMD3KyitvNSBwfbQwAIVinkxf2E5fS++Nptgx5omNHK MhGNV51KMhOAPw8/F/4bz6597P0y0Tjf0Quez0cg4ldgJsyCR4bIHrUcoZUbXkNnkJTy Y9FpzlIt/V/d7jDwUfIW+aYewo3REz+w4TXtV8Ubu/W8DxnbGVsCZKID9paL3I07nVHN ibi7C91OOK5NPVkKxk8pfTxCZhen97wIWN591SFKIs0pk37m97cZ299uot1663T0JWat EJYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=SIBOH6opXUz/XwGM5Y6ya4CpHP7LmoYYYTh8F0m81no=; b=qvS6OtBijubS8v2ngaFhB+5jIF4ShAXD8SyOUIi0XhovLu9L+OD7EKb7PL2JGCVshH xU1Bs/xrbdrgaURxxdUtRn7YA0pxGErpVjSO/F80WIm44zD3LxIa4qcShtgUt852xxDW cBoSRJFulbMr015EPlj9Gz9cf6TskBq+7c6z6K5xRUx4AGklQoMVRZskeAWClR+k6F8B UgKNn91UGm/Dujkzdsj8cruJZHyhfr3/mX6bW600OSxXHCSUDL1IH+8AFWwYrxoCKL/S oJCchd1PcxWoHJ7lpiDi5arNnBD/PHOVAWXjZkrL9K9Dk+klLkS+UAnvYOvkHQlpF44O 1f+g== X-Gm-Message-State: AJIora/HFGS+95nWCQeo416g7++GXIESYBFuDVd3rdjWsctxXnL6xOi2 0b8htdzTelKxv3ka0MU267R2caXa78RxBJuc X-Google-Smtp-Source: AGRyM1vSDDKOe0+ex+JDy0gPJ7CNOXQ1Vy1CORyh/oXQDqJ/7aNJp/5naeafubXWqtHGG6rbV8KIUOgKlN+dxbOo X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:d4:b0:356:1e42:b3d8 with SMTP id u20-20020a05610200d400b003561e42b3d8mr1691552vsp.82.1656092264711; Fri, 24 Jun 2022 10:37:44 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:55 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-26-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 25/26] selftests: add HugeTLB HGM to userfaultfd selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It behaves just like the regular shared HugeTLB configuration, except that it uses 4K instead of hugepages. This doesn't test collapsing yet. I'll add a test for that for v1. Signed-off-by: James Houghton --- tools/testing/selftests/vm/userfaultfd.c | 61 ++++++++++++++++++++---- 1 file changed, 51 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selft= ests/vm/userfaultfd.c index 0bdfc1955229..9cbb959519a6 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -64,7 +64,7 @@ =20 #ifdef __NR_userfaultfd =20 -static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size; +static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size, hpage= _size; =20 #define BOUNCE_RANDOM (1<<0) #define BOUNCE_RACINGFAULTS (1<<1) @@ -72,9 +72,10 @@ static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu= , page_size; #define BOUNCE_POLL (1<<3) static int bounces; =20 -#define TEST_ANON 1 -#define TEST_HUGETLB 2 -#define TEST_SHMEM 3 +#define TEST_ANON 1 +#define TEST_HUGETLB 2 +#define TEST_HUGETLB_HGM 3 +#define TEST_SHMEM 4 static int test_type; =20 /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */ @@ -85,6 +86,7 @@ static volatile bool test_uffdio_zeropage_eexist =3D true; static bool test_uffdio_wp =3D true; /* Whether to test uffd minor faults */ static bool test_uffdio_minor =3D false; +static bool test_uffdio_copy =3D true; =20 static bool map_shared; static int shm_fd; @@ -140,12 +142,17 @@ static void usage(void) fprintf(stderr, "\nUsage: ./userfaultfd " "[hugetlbfs_file]\n\n"); fprintf(stderr, "Supported : anon, hugetlb, " - "hugetlb_shared, shmem\n\n"); + "hugetlb_shared, hugetlb_shared_hgm, shmem\n\n"); fprintf(stderr, "Examples:\n\n"); fprintf(stderr, "%s", examples); exit(1); } =20 +static bool test_is_hugetlb(void) +{ + return test_type =3D=3D TEST_HUGETLB || test_type =3D=3D TEST_HUGETLB_HGM; +} + #define _err(fmt, ...) \ do { \ int ret =3D errno; \ @@ -348,7 +355,7 @@ static struct uffd_test_ops *uffd_test_ops; =20 static inline uint64_t uffd_minor_feature(void) { - if (test_type =3D=3D TEST_HUGETLB && map_shared) + if (test_is_hugetlb() && map_shared) return UFFD_FEATURE_MINOR_HUGETLBFS; else if (test_type =3D=3D TEST_SHMEM) return UFFD_FEATURE_MINOR_SHMEM; @@ -360,7 +367,7 @@ static uint64_t get_expected_ioctls(uint64_t mode) { uint64_t ioctls =3D UFFD_API_RANGE_IOCTLS; =20 - if (test_type =3D=3D TEST_HUGETLB) + if (test_is_hugetlb()) ioctls &=3D ~(1 << _UFFDIO_ZEROPAGE); =20 if (!((mode & UFFDIO_REGISTER_MODE_WP) && test_uffdio_wp)) @@ -1116,6 +1123,12 @@ static int userfaultfd_events_test(void) char c; struct uffd_stats stats =3D { 0 }; =20 + if (!test_uffdio_copy) { + printf("Skipping userfaultfd events test " + "(test_uffdio_copy=3Dfalse)\n"); + return 0; + } + printf("testing events (fork, remap, remove): "); fflush(stdout); =20 @@ -1169,6 +1182,12 @@ static int userfaultfd_sig_test(void) char c; struct uffd_stats stats =3D { 0 }; =20 + if (!test_uffdio_copy) { + printf("Skipping userfaultfd signal test " + "(test_uffdio_copy=3Dfalse)\n"); + return 0; + } + printf("testing signal delivery: "); fflush(stdout); =20 @@ -1438,6 +1457,12 @@ static int userfaultfd_stress(void) pthread_attr_init(&attr); pthread_attr_setstacksize(&attr, 16*1024*1024); =20 + if (!test_uffdio_copy) { + printf("Skipping userfaultfd stress test " + "(test_uffdio_copy=3Dfalse)\n"); + bounces =3D 0; + } + while (bounces--) { printf("bounces: %d, mode:", bounces); if (bounces & BOUNCE_RANDOM) @@ -1598,6 +1623,13 @@ static void set_test_type(const char *type) uffd_test_ops =3D &hugetlb_uffd_test_ops; /* Minor faults require shared hugetlb; only enable here. */ test_uffdio_minor =3D true; + } else if (!strcmp(type, "hugetlb_shared_hgm")) { + map_shared =3D true; + test_type =3D TEST_HUGETLB_HGM; + uffd_test_ops =3D &hugetlb_uffd_test_ops; + /* Minor faults require shared hugetlb; only enable here. */ + test_uffdio_minor =3D true; + test_uffdio_copy =3D false; } else if (!strcmp(type, "shmem")) { map_shared =3D true; test_type =3D TEST_SHMEM; @@ -1607,8 +1639,10 @@ static void set_test_type(const char *type) err("Unknown test type: %s", type); } =20 + hpage_size =3D default_huge_page_size(); if (test_type =3D=3D TEST_HUGETLB) - page_size =3D default_huge_page_size(); + // TEST_HUGETLB_HGM gets small pages. + page_size =3D hpage_size; else page_size =3D sysconf(_SC_PAGE_SIZE); =20 @@ -1658,19 +1692,26 @@ int main(int argc, char **argv) nr_cpus =3D sysconf(_SC_NPROCESSORS_ONLN); nr_pages_per_cpu =3D atol(argv[2]) * 1024*1024 / page_size / nr_cpus; + if (test_type =3D=3D TEST_HUGETLB_HGM) + /* + * `page_size` refers to the page_size we can use in + * UFFDIO_CONTINUE. We still need nr_pages to be appropriately + * aligned, so align it here. + */ + nr_pages_per_cpu -=3D nr_pages_per_cpu % (hpage_size / page_size); if (!nr_pages_per_cpu) { _err("invalid MiB"); usage(); } + nr_pages =3D nr_pages_per_cpu * nr_cpus; =20 bounces =3D atoi(argv[3]); if (bounces <=3D 0) { _err("invalid bounces"); usage(); } - nr_pages =3D nr_pages_per_cpu * nr_cpus; =20 - if (test_type =3D=3D TEST_HUGETLB && map_shared) { + if (test_is_hugetlb() && map_shared) { if (argc < 5) usage(); huge_fd =3D open(argv[4], O_CREAT | O_RDWR, 0755); --=20 2.37.0.rc0.161.g10f37bed90-goog From nobody Sun Apr 26 16:02:04 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68F6DC433EF for ; Fri, 24 Jun 2022 17:39:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232473AbiFXRj0 (ORCPT ); Fri, 24 Jun 2022 13:39:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232304AbiFXRiC (ORCPT ); Fri, 24 Jun 2022 13:38:02 -0400 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD5E96F4BB for ; Fri, 24 Jun 2022 10:37:46 -0700 (PDT) Received: by mail-ua1-x94a.google.com with SMTP id g1-20020ab00e01000000b00379820aee7cso1007801uak.18 for ; Fri, 24 Jun 2022 10:37:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=u+0c8IGQItY5NCCATolNVsWOPdUIr5MDdbCK/q0TIvo=; b=D90ixVslphWwRS1UyK82N/Fu4Pbp7RfYXJAXQqDLm0AYhlK59/wd1jqEmTakmfoAdM bCaJEmUAqS6tCX2dak5OsRfQg3/S32jkFNOOv5AhzS6KIq2Y8/bxRLDG5Ynhh7Q4ZWRf T28FEgQTuQAmTAzbinSDCUL5FqalKu06/tPW5zgoGnZaHOERoYdGxV1amlIWNDhV6gNx Hux7xxh8o8EA2S1TUMwn3Pd1RB6gqb3P03MxNrYVTbnyXFz2pdn7Ecfkw2C38+x5Tq4t 6G6ZSXt7n/jc3VeCFkh0qVe2rMcVgaPEC639yccX83zFJnO8qJjsZQZj8m3ROTU/7Otn uqtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=u+0c8IGQItY5NCCATolNVsWOPdUIr5MDdbCK/q0TIvo=; b=x85XUDyONTdTiIquoeP4ANjI8LvZpTxHNSFFghfBWCZ36HTIVF+TSUZk8iPUKb5iqs SVEW9ic4LPbROTj1lYbKBJTOZyspEeOvjwFcRS0eKINI1dHEdSqrgesD5P9OJf1J3fXO 8glJh3xhDpXZqdCW8i5UV8sx6IPq2TFA9foRuTOSNWW/IX85VNeK0Hmx+RUZHMLgF5HL VEBQSzOYxqrhSj5veRk0bQSkgxM8+DITfcPl7/J+b9WyClZUddV+IXbU+CvVEPmUsPf2 9oMpBQHK9065AQc6i6nC9SxveSnJ8BDRfFBZIjMScPRQPBD9k7DY0wSLEZh27d7hN4nT xoDA== X-Gm-Message-State: AJIora9w5RWF9H+KIX/y7YcwgaQgld55rJovXMWQPpksKTPbvf75KWZC VvxtPt4Rn8esYMTzvnHsQrLtJ8w/qQP7lOfs X-Google-Smtp-Source: AGRyM1sXBK2BOgwtHhnCdm6QzU6aNINH2NrcD5ngkogaDCkYiXeckpXlq4vGXhiBI+c1PtjBBhz07CDWB+WagWPT X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6122:1990:b0:36b:f70c:ba55 with SMTP id bv16-20020a056122199000b0036bf70cba55mr26413vkb.12.1656092265972; Fri, 24 Jun 2022 10:37:45 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:56 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-27-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 26/26] selftests: add HugeTLB HGM to KVM demand paging selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This doesn't address collapsing yet, and it only works with the MINOR mode (UFFDIO_CONTINUE). Signed-off-by: James Houghton --- tools/testing/selftests/kvm/include/test_util.h | 2 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 2 +- tools/testing/selftests/kvm/lib/test_util.c | 14 ++++++++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testin= g/selftests/kvm/include/test_util.h index 99e0dcdc923f..6209e44981a7 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -87,6 +87,7 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM, VM_MEM_SRC_SHARED_HUGETLB, + VM_MEM_SRC_SHARED_HUGETLB_HGM, NUM_SRC_TYPES, }; =20 @@ -105,6 +106,7 @@ size_t get_def_hugetlb_pagesz(void); const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i= ); size_t get_backing_src_pagesz(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); +bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type); void backing_src_help(const char *flag); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name); long get_run_delay(void); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/sel= ftests/kvm/lib/kvm_util.c index 1665a220abcb..382f8fb75b7f 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -993,7 +993,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, region->fd =3D -1; if (backing_src_is_shared(src_type)) region->fd =3D kvm_memfd_alloc(region->mmap_size, - src_type =3D=3D VM_MEM_SRC_SHARED_HUGETLB); + is_backing_src_shared_hugetlb(src_type)); =20 region->mmap_start =3D mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE, diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/se= lftests/kvm/lib/test_util.c index 6d23878bbfe1..710dc42077fe 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -254,6 +254,13 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_= src_alias(uint32_t i) */ .flag =3D MAP_SHARED, }, + [VM_MEM_SRC_SHARED_HUGETLB_HGM] =3D { + /* + * Identical to shared_hugetlb except for the name. + */ + .name =3D "shared_hugetlb_hgm", + .flag =3D MAP_SHARED, + }, }; _Static_assert(ARRAY_SIZE(aliases) =3D=3D NUM_SRC_TYPES, "Missing new backing src types?"); @@ -272,6 +279,7 @@ size_t get_backing_src_pagesz(uint32_t i) switch (i) { case VM_MEM_SRC_ANONYMOUS: case VM_MEM_SRC_SHMEM: + case VM_MEM_SRC_SHARED_HUGETLB_HGM: return getpagesize(); case VM_MEM_SRC_ANONYMOUS_THP: return get_trans_hugepagesz(); @@ -288,6 +296,12 @@ bool is_backing_src_hugetlb(uint32_t i) return !!(vm_mem_backing_src_alias(i)->flag & MAP_HUGETLB); } =20 +bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type) +{ + return src_type =3D=3D VM_MEM_SRC_SHARED_HUGETLB || + src_type =3D=3D VM_MEM_SRC_SHARED_HUGETLB_HGM; +} + static void print_available_backing_src_types(const char *prefix) { int i; --=20 2.37.0.rc0.161.g10f37bed90-goog