From nobody Mon Jun 22 22:24:35 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 66F16C433F5
	for <linux-kernel@archiver.kernel.org>; Wed, 16 Mar 2022 02:27:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1350581AbiCPC2Z (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 15 Mar 2022 22:28:25 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55068 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231411AbiCPC2X (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 15 Mar 2022 22:28:23 -0400
Received: from spam.unicloud.com (gw.haihefund.cn [220.194.70.58])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7AFD3B29F
        for <linux-kernel@vger.kernel.org>;
 Tue, 15 Mar 2022 19:27:08 -0700 (PDT)
Received: from eage.unicloud.com ([220.194.70.35])
        by spam.unicloud.com with ESMTP id 22G2Qo1x028879;
        Wed, 16 Mar 2022 10:26:50 +0800 (GMT-8)
        (envelope-from luofei@unicloud.com)
Received: from zgys-ex-mb09.Unicloud.com (10.10.0.24) by
 zgys-ex-mb10.Unicloud.com (10.10.0.6) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id
 15.1.2375.17; Wed, 16 Mar 2022 10:26:49 +0800
Received: from zgys-ex-mb09.Unicloud.com ([fe80::eda0:6815:ca71:5aa]) by
 zgys-ex-mb09.Unicloud.com ([fe80::eda0:6815:ca71:5aa%16]) with mapi id
 15.01.2375.017; Wed, 16 Mar 2022 10:26:49 +0800
From: =?gb2312?B?wt63yQ==?= <luofei@unicloud.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
        Muchun Song <songmuchun@bytedance.com>
CC: Andrew Morton <akpm@linux-foundation.org>,
        Linux Memory Management List <linux-mm@kvack.org>,
        LKML <linux-kernel@vger.kernel.org>
Subject: 
 =?gb2312?B?tPC4tDogW1BBVENIXSBodWdldGxiZnM6IGZpeCBkZXNjcmlwdGlvbiBhYm91?=
 =?gb2312?B?dCBhdG9taWMgYWxsb2NhdGlvbiBvZiB2bWVtbWFwIHBhZ2VzIHdoZW4gZnJl?=
 =?gb2312?Q?e_huge_page?=
Thread-Topic: [PATCH] hugetlbfs: fix description about atomic allocation of
 vmemmap pages when free huge page
Thread-Index: AQHYOCSEJQtF7F2NS0Ol60FQZC1p/Ky/6v2AgACCawCAANlxmw==
Date: Wed, 16 Mar 2022 02:26:49 +0000
Message-ID: <380ea251d3084917aaaf6cc973a857e1@unicloud.com>
References: <20220315042355.362810-1-luofei@unicloud.com>
 <CAMZfGtWjnhZLVmRD0BSpMbAWr_vD5BCj5s0ARfNHpHeAAGWYjA@mail.gmail.com>,<a56e0ea8-3b11-8239-d39c-ed33e479427e@oracle.com>
In-Reply-To: <a56e0ea8-3b11-8239-d39c-ed33e479427e@oracle.com>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.10.1.7]
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-DNSRBL: 
X-MAIL: spam.unicloud.com 22G2Qo1x028879
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

>>>
>>> No matter what context update_and_free_page() is called in,
>>> the flag for allocating the vmemmap page is fixed
>>> (GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE), and no atomic
>>> allocation is involved, so the description of atomicity here
>>> is somewhat inappropriate.
>>>
>>> and the atomic parameter naming of update_and_free_page() is
>>> somewhat misleading.
>>>
>>> Signed-off-by: luofei <luofei@unicloud.com>
>>> ---
>>>  mm/hugetlb.c | 10 ++++------
>>>  1 file changed, 4 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index f8ca7cca3c1a..239ef82b7897 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -1570,8 +1570,8 @@ static void __update_and_free_page(struct hstate =
*h, struct page *page)
>>>
>>>  /*
>>>   * As update_and_free_page() can be called under any context, so we ca=
nnot
>>> - * use GFP_KERNEL to allocate vmemmap pages. However, we can defer the
>>> - * actual freeing in a workqueue to prevent from using GFP_ATOMIC to a=
llocate
>>> + * use GFP_ATOMIC to allocate vmemmap pages. However, we can defer the
>>> + * actual freeing in a workqueue to prevent waits caused by allocating
>>>   * the vmemmap pages.
>>>   *
>>>   * free_hpage_workfn() locklessly retrieves the linked list of pages t=
o be
>>> @@ -1617,16 +1617,14 @@ static inline void flush_free_hpage_work(struct=
 hstate *h)
>>>  }
>>>
>>>  static void update_and_free_page(struct hstate *h, struct page *page,
>>> -                                bool atomic)
>>> +                                bool delay)
>>
>> Hi luofei,
>>
>> At least, I don't agree with this change.  The "atomic" means if the
>> caller is under atomic context instead of whether using atomic
>> GFP_MASK.  The "delay" seems to tell the caller that it can undelay
>> the allocation even if it is under atomic context (actually, it has no
>> choice).  But "atomic" can indicate the user is being asked to tell us
>> if it is under atomic context.
>
>There may be some confusion since GFP_ATOMIC is mentioned in the comments
>and GFP_ATOMIC is not used in the allocation of vmemmap pages.  IIRC,
>the use of GFP_ATOMIC was discussed at one time but dismissed because of
>undesired side effects such as dipping into "atomic reserves".
>
>How about an update to the comments as follows (sorry mailer may mess up
>formatting)?
>
>diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>index f8ca7cca3c1a..6a4d27e24b21 100644
>--- a/mm/hugetlb.c
>+++ b/mm/hugetlb.c
>@@ -1569,10 +1569,12 @@ static void __update_and_free_page(struct hstate *=
h, struct page *page)
>}
>
>/*
>- * As update_and_free_page() can be called under any context, so we cannot
>- * use GFP_KERNEL to allocate vmemmap pages. However, we can defer the
>- * actual freeing in a workqueue to prevent from using GFP_ATOMIC to allo=
cate
>- * the vmemmap pages.
>+ * Freeing hugetlb pages in done in update_and_free_page().  When freeing=
 a
>+ * hugetlb page, vmemmap pages may need to be allocated.  The routine
>+ * alloc_huge_page_vmemmap() can possibly sleep as it uses GFP_KERNEL.
>+ * However, update_and_free_page() can be called under any context.  To
>+ * avoid the possibility of sleeping in a context where sleeping is not
>+ * allowed, defer the actual freeing in a workqueue where sleeping is all=
owed.
> *
>  * free_hpage_workfn() locklessly retrieves the linked list of pages to be
>  * freed and frees them one-by-one. As the page->mapping pointer is going
>@@ -1616,6 +1618,10 @@ static inline void flush_free_hpage_work(struct hst=
ate *h)
>                 flush_work(&free_hpage_work);
> }
>
>+/*
>+ * atomic =3D=3D true indicates called from a context where sleeping is
>+ * not allowed.
>+ */
> static void update_and_free_page(struct hstate *h, struct page *page,
>                                 bool atomic)
> {
>@@ -1625,7 +1631,8 @@ static void update_and_free_page(struct hstate *h, s=
truct page *page,
>         }
>
>        /*
>-        * Defer freeing to avoid using GFP_ATOMIC to allocate vmemmap pag=
es.
>+        * Defer freeing to avoid possible sleeping when allocating
>+        * vmemmap pages.
>          *
>          * Only call schedule_work() if hpage_freelist is previously
>          * empty. Otherwise, schedule_work() had been called but the work=
fn

The comments made it clearer for me. I will revise the patch. Thanks :)
________________________________________
=E5=8F=91=E4=BB=B6=E4=BA=BA: Mike Kravetz <mike.kravetz@oracle.com>
=E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2022=E5=B9=B43=E6=9C=8816=E6=97=A5 5:=
16:14
=E6=94=B6=E4=BB=B6=E4=BA=BA: Muchun Song; =E7=BD=97=E9=A3=9E
=E6=8A=84=E9=80=81: Andrew Morton; Linux Memory Management List; LKML
=E4=B8=BB=E9=A2=98: Re: [PATCH] hugetlbfs: fix description about atomic all=
ocation of vmemmap pages when free huge page

On 3/15/22 06:29, Muchun Song wrote:
> On Tue, Mar 15, 2022 at 12:24 PM luofei <luofei@unicloud.com> wrote:
>>
>> No matter what context update_and_free_page() is called in,
>> the flag for allocating the vmemmap page is fixed
>> (GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE), and no atomic
>> allocation is involved, so the description of atomicity here
>> is somewhat inappropriate.
>>
>> and the atomic parameter naming of update_and_free_page() is
>> somewhat misleading.
>>
>> Signed-off-by: luofei <luofei@unicloud.com>
>> ---
>>  mm/hugetlb.c | 10 ++++------
>>  1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index f8ca7cca3c1a..239ef82b7897 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -1570,8 +1570,8 @@ static void __update_and_free_page(struct hstate *=
h, struct page *page)
>>
>>  /*
>>   * As update_and_free_page() can be called under any context, so we can=
not
>> - * use GFP_KERNEL to allocate vmemmap pages. However, we can defer the
>> - * actual freeing in a workqueue to prevent from using GFP_ATOMIC to al=
locate
>> + * use GFP_ATOMIC to allocate vmemmap pages. However, we can defer the
>> + * actual freeing in a workqueue to prevent waits caused by allocating
>>   * the vmemmap pages.
>>   *
>>   * free_hpage_workfn() locklessly retrieves the linked list of pages to=
 be
>> @@ -1617,16 +1617,14 @@ static inline void flush_free_hpage_work(struct =
hstate *h)
>>  }
>>
>>  static void update_and_free_page(struct hstate *h, struct page *page,
>> -                                bool atomic)
>> +                                bool delay)
>
> Hi luofei,
>
> At least, I don't agree with this change.  The "atomic" means if the
> caller is under atomic context instead of whether using atomic
> GFP_MASK.  The "delay" seems to tell the caller that it can undelay
> the allocation even if it is under atomic context (actually, it has no
> choice).  But "atomic" can indicate the user is being asked to tell us
> if it is under atomic context.

There may be some confusion since GFP_ATOMIC is mentioned in the comments
and GFP_ATOMIC is not used in the allocation of vmemmap pages.  IIRC,
the use of GFP_ATOMIC was discussed at one time but dismissed because of
undesired side effects such as dipping into "atomic reserves".

How about an update to the comments as follows (sorry mailer may mess up
formatting)?

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f8ca7cca3c1a..6a4d27e24b21 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1569,10 +1569,12 @@ static void __update_and_free_page(struct hstate *h=
, struct page *page)
 }

 /*
- * As update_and_free_page() can be called under any context, so we cannot
- * use GFP_KERNEL to allocate vmemmap pages. However, we can defer the
- * actual freeing in a workqueue to prevent from using GFP_ATOMIC to alloc=
ate
- * the vmemmap pages.
+ * Freeing hugetlb pages in done in update_and_free_page().  When freeing a
+ * hugetlb page, vmemmap pages may need to be allocated.  The routine
+ * alloc_huge_page_vmemmap() can possibly sleep as it uses GFP_KERNEL.
+ * However, update_and_free_page() can be called under any context.  To
+ * avoid the possibility of sleeping in a context where sleeping is not
+ * allowed, defer the actual freeing in a workqueue where sleeping is allo=
wed.
  *
  * free_hpage_workfn() locklessly retrieves the linked list of pages to be
  * freed and frees them one-by-one. As the page->mapping pointer is going
@@ -1616,6 +1618,10 @@ static inline void flush_free_hpage_work(struct hsta=
te *h)
                flush_work(&free_hpage_work);
 }

+/*
+ * atomic =3D=3D true indicates called from a context where sleeping is
+ * not allowed.
+ */
 static void update_and_free_page(struct hstate *h, struct page *page,
                                 bool atomic)
 {
@@ -1625,7 +1631,8 @@ static void update_and_free_page(struct hstate *h, st=
ruct page *page,
        }

        /*
-        * Defer freeing to avoid using GFP_ATOMIC to allocate vmemmap page=
s.
+        * Defer freeing to avoid possible sleeping when allocating
+        * vmemmap pages.
         *
         * Only call schedule_work() if hpage_freelist is previously
         * empty. Otherwise, schedule_work() had been called but the workfn

--
Mike Kravetz