From nobody Fri Apr 3 06:41:00 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E983ECAAD3 for ; Wed, 14 Sep 2022 22:19:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229703AbiINWTR (ORCPT ); Wed, 14 Sep 2022 18:19:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52244 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229683AbiINWTO (ORCPT ); Wed, 14 Sep 2022 18:19:14 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7E1F86C15 for ; Wed, 14 Sep 2022 15:19:11 -0700 (PDT) Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28EMACHD029326; Wed, 14 Sep 2022 22:18:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=Gsxm/1hzbwI5YCEaMDozVqtcgnXRFDIev726SYDrqyw=; b=t0UoD+xyo70TIZXXUy/WWeyP59VvqKWMSoNB2LbfjsDsfrCRYVwD3MiNgnIWls0NrIND yDwSg7XHYND0JBk3/ayxswuqRuejBc71jMyNYwZZsMCdHL0jTO/ZQmPtzO4kKIQB+smo 6N9lb/Oizu3USee/LzRTAZDeRU8tMGgF7BC/uR42RI1KE9DzXbLvNuvOf9+YpoiizRle xdJT7/UPlCEQySS5PJaaHwKGK3OkjYQ6DWKTEd4LQnl7UAMA2iwlEofAiZbObIOPfe3u TyQrytaqAywaVnV+aYNwAkRobt+8b2vwzFDPxxXu/7PouDZCAGWGN5AySS5u/Vwfj1VG iQ== Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3jjxycbnrt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:24 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 28EJ1dqi035524; Wed, 14 Sep 2022 22:18:23 GMT Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2177.outbound.protection.outlook.com [104.47.56.177]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3jjyehtdq1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:23 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CyAs7KMyn2YGShqjQEucATTn4SPjoD9R68511rhtZewNr38Hh+jgQH9OZGCkgZqkW//lgeoF1ipJ26wlMGfzHVUZvMNSFQBbpZnKZAvs/xksUMEn/8IKUlJmRwLKNDfWTaMsQ/bWUt01rKpgDECI9vQmDHyNuJadfSWP/RShTCWnSk3c3VbfVGXLS6gkKjr3ke3X+dOFv42jj4cbx7Rqzv1x9BVwsfy4SPjDOsjlhf2CIJRFeRei9d21JzCF76HccAGVqeHA3TKK2r6SEBWYsh5Auw3SXxDgrt1CKu0TEoi9OpWDwCI/c7bCMok5pPo266lFN8sl2bwbHRVVE3mjww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Gsxm/1hzbwI5YCEaMDozVqtcgnXRFDIev726SYDrqyw=; b=BWHadgknGL7b516j4XeRTVrWlibmU/H6ypki9gCSDJz8caBOuL/BrCHpLvISNZxY00/45I95lNuNDSUY9IJ1SCQHOyl6Wm4Glvd+Qbrw8Qp8LdkkrHyUAlil4SQarJH+ChCTpZUyqViR6kXhWkCOmxMRQ/o+EjlVmxzP7L1+XNizGkUtvZAVsAONttEgl8j0Q4/zOet1PimfdARAGbyEaPmWMkKxR2f6VWN7byRo1asetSlH2S9692o0Ay5rZOf++/wctnL/f4SS9GpgGeHbntrmTjGpX/R4vw66MKGSkLCYXBSjqrC4WsWb3jnfJysZxhUwCsl+TK2mENuHLM85gQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Gsxm/1hzbwI5YCEaMDozVqtcgnXRFDIev726SYDrqyw=; b=HCVBZBN7BeZQ+82Ol6Jy1cm2l/DWikqh7i/F3wHFERgEUa6n3saZuJbReBaElVAsFlB25BOF2OeV/HJVITGPk+xD5d8e4fM2CSi3mK2LBzD9XHGZf3nKl+Q2RRNpV1Edlzwo/Rdod0gxVLT2oD2Ju+IHE34F3AvImvn6QsQWF/Y= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by BN0PR10MB5126.namprd10.prod.outlook.com (2603:10b6:408:129::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.14; Wed, 14 Sep 2022 22:18:21 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a%6]) with mapi id 15.20.5612.022; Wed, 14 Sep 2022 22:18:20 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Sven Schnelle , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH v2 1/9] hugetlbfs: revert use i_mmap_rwsem to address page fault/truncate race Date: Wed, 14 Sep 2022 15:18:02 -0700 Message-Id: <20220914221810.95771-2-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220914221810.95771-1-mike.kravetz@oracle.com> References: <20220914221810.95771-1-mike.kravetz@oracle.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: MW4PR04CA0049.namprd04.prod.outlook.com (2603:10b6:303:6a::24) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BY5PR10MB4196:EE_|BN0PR10MB5126:EE_ X-MS-Office365-Filtering-Correlation-Id: 21c3a85a-3d4a-4c40-d1f9-08da969f0821 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: DOBSIGiEHnpxATbIRB3K9n1spLw7AnagkCqYAXYOTf9MAVTf/aB6mMq3co8l2nf2V+dads2VY/Itpt/qE5L0d3Kd9HlYTqL6+1CFzOIKSb+CE/eRFn3Knru+7V1sv4tUaXapLOQFgdgzhi89H/o48ogI9wW/emlN+/4FGEfHyiY63eSYgXXkYVtW137qq37jVwxIy4QJWL5JhG/MjMZFIfxxC98QGr4e8iot1svUROg4RNX5omo7xFGovQKYEDFUXdWFcVzbGZJOaKpNRBL3Jv8RNI/wAFu2hU8neouEAgAmbkjBv2nFdCLJT9h/i8mcGf4JWZUkNgbXFTnf12auYIyb4bn7EmgtZns7VdvLB64hXC6YovAr3CwszlxdTMUADJxY6mA3by5Lv+Py3H6E5uRUtnYBEjHhsc3nkb4Y1hJgcMzNyDVX04K9/AHBHUW+2aYj3jOTsM/t7JAGbFImVyENH3/F6iWWVmSHognUMFQx6IXMi0vENcaXkCVJSYGM9d9q3y+1Iq9lcK/85oWk6foKKuHqkHBlzrX5uaBpCxrtQynpEvZ7hW2PiaWK3OIOvns6LEdBJoDwB3Bt2MXDRI+A8wImtRkNPMIOnL37zQuK8yoEdWekyAzURuRV2VO67OOHc7w4r8fido4ZowHzRbmak+FITWOufWZ9+9ZILe8LmDgOkRPLlR7b8rh8SiUFYC6n0NEHGuQfwiatzpVpCg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(396003)(136003)(366004)(39860400002)(376002)(346002)(451199015)(66946007)(6506007)(107886003)(54906003)(66476007)(86362001)(4326008)(6666004)(44832011)(5660300002)(66556008)(41300700001)(2906002)(36756003)(26005)(6512007)(7416002)(38100700002)(478600001)(316002)(8936002)(83380400001)(6486002)(8676002)(2616005)(186003)(1076003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?dH35fTSiRReeM70sD3rIGq8TrnUXO5yQGFRiSejmt4A53IoFu6xYvDhyZvVj?= =?us-ascii?Q?cywNEBHJZOMXEsYE63GmcICcPbkdWf0YpNzYKQ9vbf51ZTQcKOnW6UiPi7Xw?= =?us-ascii?Q?fNhgoFSXFhizb/g+L6cumLVc8bw0XYOSjT/feP+ghHJwPdxj2cuXeJq3y+2e?= =?us-ascii?Q?WY1DV6GEn86pj6CnGS1gbhDOZD2BbVFCenwUq9KziFkXk79jU4CdTvH9ZTv4?= =?us-ascii?Q?RiniHIqEybnvezOqHhYqIVpwTDqDZ4ttvE1IWVqibn+9vKf0aa++0IoGN3XM?= =?us-ascii?Q?qMlH88AYokLU4DpWvqsPRovFdHVEJpGIIdQOYk25/v55sX6Ev/8YBxM9nnFN?= =?us-ascii?Q?HlH0WjeMxQCZiBncWkDlTl9M4IHXfwt22D12wAH3t/6lkW+ycmDJoV5PHfqb?= =?us-ascii?Q?9oNslVDuaQ89gLUGjVttYRgCK/0cKVSLFEhN5Z5oFxR1i80tQkr3W8JWIZKM?= =?us-ascii?Q?VcrAmzNWrJUzzN7EYAlxkucJSiMXKdXLgkuHN3YRZYwY3sbep6kGPmD307dl?= =?us-ascii?Q?5OFkMW+352xyJ9n32IpwmLa80TM4rcZiJpDb5rrsKexyDXGFbZgtIX975L/E?= =?us-ascii?Q?Ku+foJwCjIeRocMKbhKvv0hWO8C2CJRRaHZnbT1yObxIqrFVu6zCqojtuv9+?= =?us-ascii?Q?fc8nTjgpWm7SLkEZnAi1YncGD8zenvoZxf65XY1VM+haUsw6Cqpu4kMay2K7?= =?us-ascii?Q?anWQwC/9cJIopjtMtuXwuid2FjdHSbd2kgjjwfZ8ZYT+62r05LZ3jGlBM78S?= =?us-ascii?Q?u+ZeCd9NZG/abWhirT/quVwWmdPNZAifiKmkPW1NNKvpX1BBp1MhYKnpGBNA?= =?us-ascii?Q?kLloWjuxJypdbYtUoWsK9IttBIZDqbFZH0MGgAl2MWCBLNLN1YzdncEgEiRd?= =?us-ascii?Q?jTtwcfnv+MMrikCZJAR8Bmr54Zc8q0tkrDYe1DdXC8weH5FbnP95nFU1crUU?= =?us-ascii?Q?1aeKch29zS5Ea2mGeTT+iULGWfR3RRQEWC/TYvyKtQgaOuFHhB+nGF3wlsIy?= =?us-ascii?Q?WhtktVneqcUvPTAjhsJQI+/5ELS4z7UpGNZq0tfg296jDmnBl7myS8IF5L90?= =?us-ascii?Q?6MPLsPjCYToSO1racXJfbRhBHknu7I3YxkIl5CMZRStw8geBs6Zwky7LAj7C?= =?us-ascii?Q?1LZ72mh0qRxZqdZMHQ9HuERu46Qj1RtdKoIHf6EMY0Io3JrcwrKmbJw7EU9K?= =?us-ascii?Q?haWESAk1xrMBqB1RRlFsaFvQNp31ISOaJajFuQAL16QNVUsTF6UAsHYazVFV?= =?us-ascii?Q?iR8q8zUyCEROlon7Uk87zBaD6oXmTLhHHNroCnn48gs+/m3163LSYFawvzK3?= =?us-ascii?Q?wkJkoYMzBjdfvt8ojtTea/k+Hpnm1q06ixOvRWvgwYKFn43LBeP5qaffuwno?= =?us-ascii?Q?X6DdDTEug70Wo04X2Ujo/cLbsDLclmQocUp3pR3YgWbf0hK5pkWtAh0egnaj?= =?us-ascii?Q?FdMby+76CvzZFl0vlhXwh0sO5y+cUmmNbe91JPXhYofFLjf9Zg/emdNEOvAC?= =?us-ascii?Q?o7Zipo6elZrjwrYmhc68Ue64L4JY8nrNeefprTdJidAX2g07d2o/RiD6DnJK?= =?us-ascii?Q?9ofwh+hcomqKEMVflz8nx43PAJG+Pc7OFfQLzAsdR/msC5Zb6qMUf7+1BBGN?= =?us-ascii?Q?uQ=3D=3D?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 21c3a85a-3d4a-4c40-d1f9-08da969f0821 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2022 22:18:20.7684 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Bf/w6rThK33agumlcukB7lnWd0mK0SYvbJW4oKZ/R6m7YtePx2/YmHoJAUJV3He2gfN5Zk19ZHdElCERFOXigQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN0PR10MB5126 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-14_09,2022-09-14_04,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 mlxlogscore=999 bulkscore=0 phishscore=0 spamscore=0 suspectscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2208220000 definitions=main-2209140108 X-Proofpoint-ORIG-GUID: o0ru1XvVX_M_KTc6nyJUcJTeY73X9glX X-Proofpoint-GUID: o0ru1XvVX_M_KTc6nyJUcJTeY73X9glX Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") added code to take i_mmap_rwsem in read mode for the duration of fault processing. The use of i_mmap_rwsem to prevent fault/truncate races depends on this. However, this has been shown to cause performance/scaling issues. As a result, that code will be reverted. Since the use i_mmap_rwsem to address page fault/truncate races depends on this, it must also be reverted. In a subsequent patch, code will be added to detect the fault/truncate race and back out operations as required. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- fs/hugetlbfs/inode.c | 30 +++++++++--------------------- mm/hugetlb.c | 22 +++++++++++----------- 2 files changed, 20 insertions(+), 32 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index f7a5b5124d8a..a32031e751d1 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -419,9 +419,10 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgo= ff_t start, pgoff_t end, * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserve * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() holds i_mmap_rwsem and prevents - * page faults in the truncated range by checking i_size. i_size is - * modified while holding i_mmap_rwsem. + * in this routine. hugetlb_no_page() prevents page faults in the + * truncated range. It checks i_size before allocation, and again after + * with the page table lock for the page held. The same lock must be + * acquired to unmap a page. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserve map @@ -451,16 +452,8 @@ static void remove_inode_hugepages(struct inode *inode= , loff_t lstart, u32 hash =3D 0; =20 index =3D folio->index; - if (!truncate_op) { - /* - * Only need to hold the fault mutex in the - * hole punch case. This prevents races with - * page faults. Races are not possible in the - * case of truncation. - */ - hash =3D hugetlb_fault_mutex_hash(mapping, index); - mutex_lock(&hugetlb_fault_mutex_table[hash]); - } + hash =3D hugetlb_fault_mutex_hash(mapping, index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); =20 /* * If folio is mapped, it was faulted in after being @@ -504,8 +497,7 @@ static void remove_inode_hugepages(struct inode *inode,= loff_t lstart, } =20 folio_unlock(folio); - if (!truncate_op) - mutex_unlock(&hugetlb_fault_mutex_table[hash]); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); } folio_batch_release(&fbatch); cond_resched(); @@ -543,8 +535,8 @@ static void hugetlb_vmtruncate(struct inode *inode, lof= f_t offset) BUG_ON(offset & ~huge_page_mask(h)); pgoff =3D offset >> PAGE_SHIFT; =20 - i_mmap_lock_write(mapping); i_size_write(inode, offset); + i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, ZAP_FLAG_DROP_MARKER); @@ -703,11 +695,7 @@ static long hugetlbfs_fallocate(struct file *file, int= mode, loff_t offset, /* addr is the offset within the file (zero based) */ addr =3D index * hpage_size; =20 - /* - * fault mutex taken here, protects against fault path - * and hole punch. inode_lock previously taken protects - * against truncation. - */ + /* mutex taken here, fault path and hole punch */ hash =3D hugetlb_fault_mutex_hash(mapping, index); mutex_lock(&hugetlb_fault_mutex_table[hash]); =20 diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c6b53bcf823d..6c97b97aa252 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5559,17 +5559,15 @@ static vm_fault_t hugetlb_no_page(struct mm_struct = *mm, } =20 /* - * We can not race with truncation due to holding i_mmap_rwsem. - * i_size is modified when holding i_mmap_rwsem, so check here - * once for faults beyond end of file. + * Use page lock to guard against racing truncation + * before we get page_table_lock. */ - size =3D i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >=3D size) - goto out; - new_page =3D false; page =3D find_lock_page(mapping, idx); if (!page) { + size =3D i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >=3D size) + goto out; /* Check for page in userfault range */ if (userfaultfd_missing(vma)) { ret =3D hugetlb_handle_userfault(vma, mapping, idx, @@ -5665,6 +5663,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *= mm, } =20 ptl =3D huge_pte_lock(h, mm, ptep); + size =3D i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >=3D size) + goto backout; + ret =3D 0; /* If pte changed from under us, retry */ if (!pte_same(huge_ptep_get(ptep), old_pte)) @@ -5773,10 +5775,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, =20 /* * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This serves two purposes: - * 1) It prevents huge_pmd_unshare from being called elsewhere - * and making the ptep no longer valid. - * 2) It synchronizes us with i_size modifications during truncation. + * until finished with ptep. This prevents huge_pmd_unshare from + * being called elsewhere and making the ptep no longer valid. * * ptep could have already be assigned via huge_pte_offset. That * is OK, as huge_pte_alloc will return the same value unless --=20 2.37.2 From nobody Fri Apr 3 06:41:00 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82380C6FA82 for ; Wed, 14 Sep 2022 22:19:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229718AbiINWTY (ORCPT ); Wed, 14 Sep 2022 18:19:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229681AbiINWTP (ORCPT ); Wed, 14 Sep 2022 18:19:15 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDF6F86B74 for ; Wed, 14 Sep 2022 15:19:11 -0700 (PDT) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28EMA2PU000788; Wed, 14 Sep 2022 22:18:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=WWgFa5uvH4cxJCWBo6D7vjTXo0sJrhLbYOZnN6Af1kM=; b=Kko7SIoA+IIeIrBt1b+QlvmLxy8+khTSidnYyUxPK6eIOftFtdQehjRQkCZ2+INUnQWY U5adHVLIOVTEihsBS4Y/U+KN1auWV/hDoqefSC9FOpyK9RyUuPOGhkxvcNzUlNDiJhMs a846rObNLAuLIQhkZ72vtdGVWCWyrKmrX9e4ZMXDQ8Mln1sBAeyBKG1bBl5MqpgKT5Lv Xp+x0RxXyray2na+gJkQD90p0eNDRcrQoYbX47vHy1kquC36Lc0UozD8E8UR1kSR0Kmu A9gih2zd+YX6CtWhVLcZvT4Grf8jSCXu7AXR8xngrjV2emFh0Zc5trQqSSRIuQ4Act6z fg== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3jjxyf3jn1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:26 +0000 Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 28ELrNe6006549; Wed, 14 Sep 2022 22:18:25 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3jjy2bjuf4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:25 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=oEhAZ8aUK7pbhgyuahZi2B3/lEIHd6L3qE6Efo3abq2tcAReL74dDItAZz24azsKmDM82SdhBT33gwFU9QpDpM22Mzj7VojMcwL4xb9V73uwswaURXn3FuHz7c9jH9jFoMDTOxrggjapcvhQvxOv1YIIupZQ4oXZBaw3IBe2WKplLFItuCAxLo4NT1xjJGIiyEy8kyt8i8/se3nK04mIWn+aTj7WTqx3QyjouyRCbCuVV5p/qrGHagDjLpVVG8td1w/5yToseYxTcGMBQWeSUVoqOBZmBV7lRwu8Xj6P/QpvgvKUGjCSjUQfSV5x4p2bmIhpAk4WJSyXdilK4Fn4Kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=WWgFa5uvH4cxJCWBo6D7vjTXo0sJrhLbYOZnN6Af1kM=; b=WTqzQ9zAZkug+MRRZUmNGsFixN4HDGnGcQQqDlPvwbz43gB3B4tZ8skyKZiPeC9SLEXUj/0A3RzygE+zaMdXZFgZ2htfkA/e/YVks1CsyRZ2qMUfnV7mGQahsbYglpwvmCZLVlbOSm8cfcZhZKdT1tyKELMQYh/uleaJJrt2s5FwD2ekoP/fSKwH+VDh6OrNw/F3+p3L8fKReOTEb4aOnpkNEmTVjXgDS4D7w29wUp8wBAENzv/ceJTsk1HJrF7ZnHWYExSvt/WjET34B2KMcPcGXTa74bQYGS0nrOCe3qg+Ut1BeTunTVbUUSaqcBv1UFOm6r34HKOPw4QG8biXSA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=WWgFa5uvH4cxJCWBo6D7vjTXo0sJrhLbYOZnN6Af1kM=; b=DqpXcWI4rU3rLaWzAXHSALgMM0eLKCEtk5OZUy8EqSUIE+kWtBWbpwidcwuHksXYASFkPJS5SsXi05REweBXdAIB9oSo81+LpDGxkeTCvJ5/Um/++1IP2B8xVCtpyXD3d6D4VsntmGd+xdF1+nljaNZ0QBjciqxW372eNkxA7ck= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by SA2PR10MB4745.namprd10.prod.outlook.com (2603:10b6:806:11b::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.23; Wed, 14 Sep 2022 22:18:23 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a%6]) with mapi id 15.20.5612.022; Wed, 14 Sep 2022 22:18:23 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Sven Schnelle , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH v2 2/9] hugetlbfs: revert use i_mmap_rwsem for more pmd sharing synchronization Date: Wed, 14 Sep 2022 15:18:03 -0700 Message-Id: <20220914221810.95771-3-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220914221810.95771-1-mike.kravetz@oracle.com> References: <20220914221810.95771-1-mike.kravetz@oracle.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: MW4PR04CA0312.namprd04.prod.outlook.com (2603:10b6:303:82::17) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BY5PR10MB4196:EE_|SA2PR10MB4745:EE_ X-MS-Office365-Filtering-Correlation-Id: 6443825b-dd8c-4e61-6ea1-08da969f09be X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Lo5UJ63f2XZtY8xEQxrekAbBjEUG4EVI69sZIXmYp6y/8KwszMQWYmyE95il8gZguYqXwgqzFYzJFzvVzH1YBcCcN0leq9Zu7q+u6k4b609YIJy0m3Ty61n12zSFuRSj6mEJ/RFgy/82eAHEda8eWpiGnz3MFE/JVBWYlWt49SofhFhtvYUN0f9INNgeIqLMyViYkQXrzW5DsyLfkfp9CtDcsWg16ki8IUqMeBF0l7OfpRtmV2o/iRKoUaBM1IJVyYoZD4/WhZ0Pe4Ejb/s0TkG0jDqAamB6AhkW4x4pEYLmShk/cmkqOFe8Pq4ZV4n7bI+gWTKLoIE5gA/++4RGs5PEd+TnqVo2/OcqE3pej546bgTxzCnthT4gzI7PJCp+sBHJ+x3u3K4ztkPLbYuQJYzKRXipAN8Ypj2jY21brL8XyeC9GdWEOvvpilUmIxZhLUZrA0i7izhp5XgrF1xskmrdgWsUXRT18+hTlvRfQ0AWWLQdp/J8WX4q5wOBb3i0/hcRYRvQaLQvRuYt6gMwnjYkmkhXMXypDX1y6PcbB31+hSprpCO8qiKdrzSk9R7qMDDHego7z8BsXvNQIquv8l127ELwsv2QkuEnvtvlxuhmS3+YjxnRDS6wu0BRgrEPj9UiNy/80xz4lpaXsLBLpS+mNEKu2R1rmzI3fvJ83b0o97aZNGBJONjm4VAUr1lsNNP5WAKAV0hw5pjm7/b5AA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(376002)(346002)(39860400002)(136003)(366004)(396003)(451199015)(478600001)(38100700002)(54906003)(316002)(36756003)(6506007)(8676002)(4326008)(66946007)(83380400001)(186003)(2616005)(1076003)(5660300002)(41300700001)(107886003)(66476007)(66556008)(86362001)(8936002)(6666004)(6486002)(7416002)(2906002)(26005)(30864003)(6512007)(44832011);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?J1Ymtx/Ci0K3NDy6iwzUctn2M4kd81dSEpUttHF/UDXnUZ0kxS1QapZJRWv5?= =?us-ascii?Q?P7dF8StBPZa2r2haYQEnF8cJhl/hA3NqyLv6fh/F2aVTxYaNZ1i0c8bIAYEw?= =?us-ascii?Q?0SmmCx6LVEzB+RdQ8jQfIdZV0uQ4IlfA3fz2Um7bSzYV3osOlce36NviJktJ?= =?us-ascii?Q?7ef9hZ2THrkz97k1kiEivNsaaUScUJ9XscmyELiTIrHmMBiwJm6OxS4I0hPb?= =?us-ascii?Q?hIhavAO9zqzGbn1fOml6SyHE8Dc7FDvyRO7pV3DKQiacoeHDAR5eDKA9GyKK?= =?us-ascii?Q?3cppl0CP28rkGust0hmLwiN2t9MOCz7yhBu3htdwhVzj6PKeGt2/PhnbSDNA?= =?us-ascii?Q?LLYWtXVOy4EC3YooF/YgbQ+tJhMik5ltb/LnCzW3a9QE0bM+7CB6362o+3fP?= =?us-ascii?Q?BbGS/0ZOzeT0+xuJDoLnBhNgNRJEyvtFWPHkTMbh2TE0Q59ucn7VMgfIBi1w?= =?us-ascii?Q?54R6fLWt4z7sk4CQFeRobSaHsIRm3edaI8YO7+AEMX5DeaN8g5H48+pNkctn?= =?us-ascii?Q?GFcGuRNbanzfc88x3SwDQHmmRiEeg1hpxC/xQ5xp+WkvRTt9f17kH73Pw/b9?= =?us-ascii?Q?tgA06Nsc1Q/IqHkWdX+Iyb5gJ7aUXEUxmhoC+BikF8/Qg8goKQGrB0qnHMhc?= =?us-ascii?Q?jC3HTrWZYB/hlu+Smn1eSwXwC9U/PhEcbpUlHvLgZLE+uit4Qn4tzFxxn5e6?= =?us-ascii?Q?2oTi1t/JaXlFxirPTb1xccEi+h9P2MlvpPxKd47ugNo+ed4DIGOIqizYKVVY?= =?us-ascii?Q?QBkpbIAailtNL2USXhSaMykzDy3f4LtiCGGalUefhlBOw0TA6lheD/6mBhds?= =?us-ascii?Q?JjFCodSQexITLisF4Nnz5K7LfijIsiAFxiC+LQJWnS12QNOU5Lgc1lVTqhBr?= =?us-ascii?Q?a82zYePQxoXNzDdq00IpYqVxjoy4d51oUi+wcqxO5bpJjUAQuKn0Rh7HsV4n?= =?us-ascii?Q?+bRzu2OkCXVXuw7VFpHR1TXzs9S48OY/zJhbHIiSh2OVG+YwnsvM/ocaqpEQ?= =?us-ascii?Q?3tNEjs5e4gynCdTlQbZkdbd1UZyK+JWaoJLc6zq3b1ZZeHDaM4AyEiL+IAeX?= =?us-ascii?Q?TbV7mUtXgsIewMApFkpuf1V7+75GDn+Kt/YqlemWuEMdLSxf7+oDnDQ4Z88S?= =?us-ascii?Q?fd+W/Caq00kKDjbmlIi5dV9KMCVUFk+aeS9q3HzI898J7jEf/3ZXBaQH2DrU?= =?us-ascii?Q?uqZO/Nlp27gJhCq7uvuI6HIoM0zeG3s1Sm6RJiVeSPBDHiZJFI9mBF9OclEd?= =?us-ascii?Q?aWNBP0PBVjn6Xj2oYwzfNeoOuy07xBs8fl/RKW+7UUTgzDs+gQv3pEhtz9ca?= =?us-ascii?Q?ew4ZhXASj4SA018asQhVy78ZM88u/Uq7qx+/1CV9azM9TMPPr2WQQ2+6Huh9?= =?us-ascii?Q?EoPAnobsOm5KnF2rUETDRRkF2YPDhkuljw7kE1GXa7k1SPLiwnEdVqdItHKz?= =?us-ascii?Q?Hszktaqarm4lcGvT2cHJOuFM2HCMYak+LABHx5R8Qnvdr7iqtOxI2IL9bMUV?= =?us-ascii?Q?emCTdM9cCPt8MCCDtKTAWN6Mi73pLG2fLVfU1KaO9UjHtOeIrj8+cfTsW/ip?= =?us-ascii?Q?7QP+dZSANGOWMHZpaG7KNd3kZWbiEGFPh4zTPql5kyu7rU7WsP3SHsbk5mhY?= =?us-ascii?Q?1A=3D=3D?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6443825b-dd8c-4e61-6ea1-08da969f09be X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2022 22:18:23.4725 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: oBIoKe0s4962hSjH+K3zMf4B/C2dE0ttVnF5qEO3JkQiuV9GhDHFyR6KATlhUGyAqPnJ0jf777z/IkKHUpzwFw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA2PR10MB4745 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-14_09,2022-09-14_04,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 adultscore=0 malwarescore=0 spamscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2208220000 definitions=main-2209140108 X-Proofpoint-GUID: sjQciN9cBpPl3dePeAIQlRszQPesb11b X-Proofpoint-ORIG-GUID: sjQciN9cBpPl3dePeAIQlRszQPesb11b Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") added code to take i_mmap_rwsem in read mode for the duration of fault processing. However, this has been shown to cause performance/scaling issues. Revert the code and go back to only taking the semaphore in huge_pmd_share during the fault path. Keep the code that takes i_mmap_rwsem in write mode before calling try_to_unmap as this is required if huge_pmd_unshare is called. NOTE: Reverting this code does expose the following race condition. Faulting thread Unsharing thread ... ... ptep =3D huge_pte_offset() or ptep =3D huge_pte_alloc() ... i_mmap_lock_write lock page table ptep invalid <------------------------ huge_pmd_unshare() Could be in a previously unlock_page_table sharing process or worse i_mmap_unlock_write ... ptl =3D huge_pte_lock(ptep) get/update pte set_pte_at(pte, ptep) It is unknown if the above race was ever experienced by a user. It was discovered via code inspection when initially addressed. In subsequent patches, a new synchronization mechanism will be added to coordinate pmd sharing and eliminate this race. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- fs/hugetlbfs/inode.c | 2 -- mm/hugetlb.c | 77 +++++++------------------------------------- mm/rmap.c | 8 +---- mm/userfaultfd.c | 11 ++----- 4 files changed, 15 insertions(+), 83 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index a32031e751d1..dfb735a91bbb 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -467,9 +467,7 @@ static void remove_inode_hugepages(struct inode *inode,= loff_t lstart, if (unlikely(folio_mapped(folio))) { BUG_ON(truncate_op); =20 - mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_lock_write(mapping); - mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), (index + 1) * pages_per_huge_page(h), diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6c97b97aa252..00fba195a439 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4769,7 +4769,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, struct hstate *h =3D hstate_vma(src_vma); unsigned long sz =3D huge_page_size(h); unsigned long npages =3D pages_per_huge_page(h); - struct address_space *mapping =3D src_vma->vm_file->f_mapping; struct mmu_notifier_range range; unsigned long last_addr_mask; int ret =3D 0; @@ -4781,14 +4780,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, s= truct mm_struct *src, mmu_notifier_invalidate_range_start(&range); mmap_assert_write_locked(src); raw_write_seqcount_begin(&src->write_protect_seq); - } else { - /* - * For shared mappings i_mmap_rwsem must be held to call - * huge_pte_alloc, otherwise the returned ptep could go - * away if part of a shared pmd and another thread calls - * huge_pmd_unshare. - */ - i_mmap_lock_read(mapping); } =20 last_addr_mask =3D hugetlb_mask_last_page(h); @@ -4935,8 +4926,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, if (cow) { raw_write_seqcount_end(&src->write_protect_seq); mmu_notifier_invalidate_range_end(&range); - } else { - i_mmap_unlock_read(mapping); } =20 return ret; @@ -5345,29 +5334,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, s= truct vm_area_struct *vma, * may get SIGKILLed if it later faults. */ if (outside_reserve) { - struct address_space *mapping =3D vma->vm_file->f_mapping; - pgoff_t idx; - u32 hash; - put_page(old_page); - /* - * Drop hugetlb_fault_mutex and i_mmap_rwsem before - * unmapping. unmapping needs to hold i_mmap_rwsem - * in write mode. Dropping i_mmap_rwsem in read mode - * here is OK as COW mappings do not interact with - * PMD sharing. - * - * Reacquire both after unmap operation. - */ - idx =3D vma_hugecache_offset(h, vma, haddr); - hash =3D hugetlb_fault_mutex_hash(mapping, idx); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); - unmap_ref_private(mm, vma, old_page, haddr); - - i_mmap_lock_read(mapping); - mutex_lock(&hugetlb_fault_mutex_table[hash]); spin_lock(ptl); ptep =3D huge_pte_offset(mm, haddr, huge_page_size(h)); if (likely(ptep && @@ -5522,9 +5490,7 @@ static inline vm_fault_t hugetlb_handle_userfault(str= uct vm_area_struct *vma, */ hash =3D hugetlb_fault_mutex_hash(mapping, idx); mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); ret =3D handle_userfault(&vmf, reason); - i_mmap_lock_read(mapping); mutex_lock(&hugetlb_fault_mutex_table[hash]); =20 return ret; @@ -5759,11 +5725,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, =20 ptep =3D huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { - /* - * Since we hold no locks, ptep could be stale. That is - * OK as we are only making decisions based on content and - * not actually modifying content here. - */ entry =3D huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { migration_entry_wait_huge(vma, ptep); @@ -5771,31 +5732,20 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, stru= ct vm_area_struct *vma, } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); + } else { + ptep =3D huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); + if (!ptep) + return VM_FAULT_OOM; } =20 - /* - * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This prevents huge_pmd_unshare from - * being called elsewhere and making the ptep no longer valid. - * - * ptep could have already be assigned via huge_pte_offset. That - * is OK, as huge_pte_alloc will return the same value unless - * something has changed. - */ mapping =3D vma->vm_file->f_mapping; - i_mmap_lock_read(mapping); - ptep =3D huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); - if (!ptep) { - i_mmap_unlock_read(mapping); - return VM_FAULT_OOM; - } + idx =3D vma_hugecache_offset(h, vma, haddr); =20 /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ - idx =3D vma_hugecache_offset(h, vma, haddr); hash =3D hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); =20 @@ -5860,7 +5810,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, put_page(pagecache_page); } mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); return handle_userfault(&vmf, VM_UFFD_WP); } =20 @@ -5904,7 +5853,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, } out_mutex: mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); /* * Generally it's safe to hold refcount during waiting page lock. But * here we just wait to defer the next page fault to avoid busy loop and @@ -6744,12 +6692,10 @@ void adjust_range_if_pmd_sharing_possible(struct vm= _area_struct *vma, * Search for a shareable pmd page for hugetlb. In any case calls pmd_allo= c() * and returns the corresponding pte. While this is not necessary for the * !shared pmd case because we can allocate the pmd later as well, it make= s the - * code much cleaner. - * - * This routine must be called with i_mmap_rwsem held in at least read mod= e if - * sharing is possible. For hugetlbfs, this prevents removal of any page - * table entries associated with the address space. This is important as = we - * are setting up sharing based on existing page table entries (mappings). + * code much cleaner. pmd allocation is essential for the shared case beca= use + * pud has to be populated inside the same i_mmap_rwsem section - otherwise + * racing tasks could either miss the sharing (see huge_pte_offset) or sel= ect a + * bad pmd for sharing. */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud) @@ -6763,7 +6709,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm= _area_struct *vma, pte_t *pte; spinlock_t *ptl; =20 - i_mmap_assert_locked(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma =3D=3D vma) continue; @@ -6793,6 +6739,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm= _area_struct *vma, spin_unlock(ptl); out: pte =3D (pte_t *)pmd_alloc(mm, pud, addr); + i_mmap_unlock_read(mapping); return pte; } =20 @@ -6803,7 +6750,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm= _area_struct *vma, * indicated by page_count > 1, unmap is achieved by clearing pud and * decrementing the ref count. If count =3D=3D 1, the pte page is not shar= ed. * - * Called with page table lock held and i_mmap_rwsem held in write mode. + * Called with page table lock held. * * returns: 1 successfully unmapped a shared pte page * 0 the underlying pte page is not shared, or it is the last user diff --git a/mm/rmap.c b/mm/rmap.c index 08d552ea4ceb..d17d68a9b15b 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -23,10 +23,9 @@ * inode->i_rwsem (while writing or truncating, not reading or faulting) * mm->mmap_lock * mapping->invalidate_lock (in filemap_fault) - * page->flags PG_locked (lock_page) * (see hugetlbfs below) + * page->flags PG_locked (lock_page) * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) * anon_vma->rwsem * mm->page_table_lock or pte_lock * swap_lock (in swap_duplicate, swap_info_get) @@ -45,11 +44,6 @@ * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_= anon) * ->tasklist_lock * pte map lock - * - * * hugetlbfs PageHuge() pages take locks in this order: - * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) - * page->flags PG_locked (lock_page) */ =20 #include diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 9c035be2148b..0fdbd2c05587 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -379,14 +379,10 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb= (struct mm_struct *dst_mm, BUG_ON(dst_addr >=3D dst_start + len); =20 /* - * Serialize via i_mmap_rwsem and hugetlb_fault_mutex. - * i_mmap_rwsem ensures the dst_pte remains valid even - * in the case of shared pmds. fault mutex prevents - * races with other faulting threads. + * Serialize via hugetlb_fault_mutex. */ - mapping =3D dst_vma->vm_file->f_mapping; - i_mmap_lock_read(mapping); idx =3D linear_page_index(dst_vma, dst_addr); + mapping =3D dst_vma->vm_file->f_mapping; hash =3D hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); =20 @@ -394,7 +390,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, dst_pte =3D huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); goto out_unlock; } =20 @@ -402,7 +397,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err =3D -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); goto out_unlock; } =20 @@ -411,7 +405,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, wp_copy); =20 mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); =20 cond_resched(); =20 --=20 2.37.2 From nobody Fri Apr 3 06:41:00 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78442C6FA82 for ; Wed, 14 Sep 2022 22:19:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229785AbiINWTg (ORCPT ); Wed, 14 Sep 2022 18:19:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229671AbiINWTQ (ORCPT ); Wed, 14 Sep 2022 18:19:16 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C71C87A50A for ; Wed, 14 Sep 2022 15:19:15 -0700 (PDT) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28EM9q9k031220; Wed, 14 Sep 2022 22:18:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=73nPWJYn6DsvU2DsSwVM+vUmrXiJ+Unn+yfCo2I4Nr0=; b=LNW7p/CZnL1VS1XofYR6MnqhR6o5jYbOhAXIb++SY1t+V1sdUPRHUeUClajyGX1Lkl9e Uj6lwX4rcV/JAoFUcpcbTAcrbGjEHD+F/6t/Kl4aeZNAFZLXKpV9IJWTYAtveD7pHPQh bNAc3jgzQWMuZTkRcpkYoFBEAto7v1gi2zNcULFlBv6GMj0D3W81gBWJM49/jYqP+daQ MwYbSsTt8s2r9QhZ7++xnu9JCk0C8RQG+Xq8Ustq/8f0uCPoLpw6N8Ne2gZocPCQq1Kt eyPrrTI5kjy5dG1RJmnLuN9Ki1d2qhS4buKH0X4x5507A5O1RsvEfQHitcEwYOx+3CX1 2w== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3jjxyr3shq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:29 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 28EMIItv019429; Wed, 14 Sep 2022 22:18:28 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2168.outbound.protection.outlook.com [104.47.58.168]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3jjym51gxt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:28 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XNBv/mAaAIi7Sioy0w+TZ9oWP1BbJ7Q1A6XqLuMh20yi2bhJKAPTNln2Yax6UbJ4ckPrzNIrOUyej++F0PRdEwmSLV+vHMxl+18ZZgE0mqkhT1081tM6wkt2bqmiu11PcQ3dx8PR1Wb0izLNK/01KzHWhnfj0mhHnkWd1GB1Dz/S1aMBWkLgY8xSLi5AnSIC4PhAx6ybEExbXogYphLKzc0ya7AIQZ8C+TbLcjI0ygXX1GUyb4QB2nky++yO8oPxmDGvNcpW/A3zElzG33k2j9sMJq4jE5X8Gy9UckPjq463Ylj2VLfQwS/0odE4eEHRD0YHYq1T2mNQskpUha8chw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=73nPWJYn6DsvU2DsSwVM+vUmrXiJ+Unn+yfCo2I4Nr0=; b=e5lwyFU8UHzmECdWMyHxd03fhvK+WD3AynV+p4ohUUf0i0wwvnSBHghDR4guviqNInNlqz4z5T3XHgFoXB5Ito98KrpsAW8M/8ErirwVLk4lukyz6I56AeXhgtxLsMfEeyoHkxUioHdzyH7M66Q3VzZEk6aO9fMrkfBEWBx1CBATcBTLuWSlJYaQTihz8xlKZhp5bgyAuwXNHKcyiUz+uTdiMtsSAmqCn5FT+7IopZikegRC0zzbPXpCMpZTYHG1SqB9eL2YQ6okn4JSzZpWm+jOX3Ok2gAkwH/RtofTnzHYdyvV79QjKVPLVFBhPBvVevV3S9ixRpoCXEvdbgOT4Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=73nPWJYn6DsvU2DsSwVM+vUmrXiJ+Unn+yfCo2I4Nr0=; b=z6dHn7E6ls+U4FBDH2lJ83SwofrmPalvBn57L0lE5w1xx8zdr2lvgI2lAoCzAXoG6PIPIogXn/kbJzZPGcVFoey/BBEKi+ugAgtXiubeWmwaapdxK5J8OPpiPZgwJUIj4PxuS9jpWkzDKBmwG+UxV0o1iLFnfwV83Rqp1XY35Vo= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by SA2PR10MB4745.namprd10.prod.outlook.com (2603:10b6:806:11b::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.23; Wed, 14 Sep 2022 22:18:25 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a%6]) with mapi id 15.20.5612.022; Wed, 14 Sep 2022 22:18:25 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Sven Schnelle , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH v2 3/9] hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache Date: Wed, 14 Sep 2022 15:18:04 -0700 Message-Id: <20220914221810.95771-4-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220914221810.95771-1-mike.kravetz@oracle.com> References: <20220914221810.95771-1-mike.kravetz@oracle.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: MW4PR03CA0246.namprd03.prod.outlook.com (2603:10b6:303:b4::11) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BY5PR10MB4196:EE_|SA2PR10MB4745:EE_ X-MS-Office365-Filtering-Correlation-Id: fed4d5e5-0d02-4fb4-8661-08da969f0b13 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: mDpbeEoGxAFAIjM+N3hRcxUBTXU76t+yZo46xiyRzhnydcrxoa0IsGxnhjxYlwRmR4yCctT4UPbe4fJzwlJDF9fN3DzdPhxaDihKYHpKqsC7mw9X3YIYX3NCPR2+PStnSGzfc5uOrF2zFKDLpde4VTXzcKs2TUxNGKGxGyof9j/gYN6pbM7tEd+Gjk/d6MLHzoZ/3UTwVb44J9jjkOU1sulHDvLYb+sCgEit9M3cGUlrr/KOu1/7p97ebyaI7TlC94Gt/ieK4oQOapjx2CYYnZtLBoe9aEeNk8mt38ztZdy16lxexKzZJVr5LMOAAIRN+jK5RlgrWDYEorSGmvGx/wQ1lohV5+PAODB6HN7hU7dCG9MQe2h/8hDPuPBeJmmHSYZUQtgy+VyfBo2VYFaupP/2gkyD5rYsLyjwLhlBemYlBzeZkR5PR1UYpONJeYF+TMYVdrVL1HLylTkuMiiyNAwuMF5P4+s1MgQnk8zWcSeQR31gsyJ9ekoG3HhjRwPyy2b/m+UGU1UskGszLu1GiO76seBKa9eghPdk2g1gnJkT1oHAwj8xYdxgPjWfLH6Ht3mfiLb8ycbMRC8YSZ8FCt0qM9tmr795lB7NxDr1ZXLEIFtmyo2EGVFyNgdn71u2sEvgfOpsHdNgbtOLKxDPYcNjNbuaLcCGXJkK2KyroXGX6Oc2N7ZF1p2mwfom6VKx4DGdkO/Vunf9Y1aGiYzI8vYuByKlOMObz9o+l1ND50WgxMmbdRnJXrrDAUel3vhC X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(376002)(346002)(39860400002)(136003)(366004)(396003)(451199015)(478600001)(38100700002)(54906003)(316002)(36756003)(6506007)(8676002)(4326008)(66946007)(83380400001)(186003)(2616005)(1076003)(5660300002)(41300700001)(107886003)(66476007)(66556008)(86362001)(8936002)(6666004)(6486002)(7416002)(2906002)(26005)(6512007)(44832011)(14583001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?JDPb/tYT/opNocd87JMxDpK9jRdTTVt2q5ofh+7lBBCg21TeQIHCWi1QkikF?= =?us-ascii?Q?DWyJCoeY3l2cpBj/XRxO2H812OqTSsteSDQDFDE95mILCkE3cl3xtweUP4Z4?= =?us-ascii?Q?cLnXaxK8HsoK7a+H4RCAswO5cKhboGB8O7+PcTYQqY39eO02JmklEwCnqm18?= =?us-ascii?Q?ZYSKyQsxyWad7HOU2HtE5Z4kS2rzpHp24lSkaw54Bk6tGD0b4G/wBoMa+njN?= =?us-ascii?Q?UVM78wBarM0JOWMmOuSTy1rt1NPax1D+yvXvB7f5cLbpoAWUL/Ggc2CmcMcy?= =?us-ascii?Q?RsC2R1/wYiAq5EFOl1kPJWo1zdtQ+IWNSXoZsD3OksnG6hxVNUuFeNizGPmG?= =?us-ascii?Q?KCX54QI3RhcAVuvI8ED86XdQvlHt4BwQAIhJoLuD6oLDglLx859Gd0YVyjQQ?= =?us-ascii?Q?WkZTGVCfC3tAHFyfw8eqBNq+PLUx3ugkUlQ6ICN6IgaMenSLVtNUmRKeBwVV?= =?us-ascii?Q?5aNWNCmj60gm/C8AUR0COH6g4VO/dhWX7d9AhiVdfQq9irMjoDayPFLkeQ6a?= =?us-ascii?Q?henPomA2uuCNTc9+WTQb3Qf0MuLdPJS6+r94KI27ugyXADEleDPk+eM2ClHw?= =?us-ascii?Q?TKv+w43K9+c30kk90sSaaMQdVmNH6zUbfzUrAeilY8VcNwO6KAImNBWr5bq2?= =?us-ascii?Q?O+YfUmeqXXY6GGlKeG0aAeodpcpslfSqzKORNqfu0sbJmN+/4BGpnK+60twt?= =?us-ascii?Q?bTUAsY6eEb05tzQ3jEdeGPtl3blPqmxyOV819PJR0dNZ2DVa6nyy1lsQeR4V?= =?us-ascii?Q?C5ibTGRHSXQMpTLlGfWanMCH4ZMF5p/I9DXoyAKVnWMo0zrPtnD4YDxc47Qw?= =?us-ascii?Q?tUkyH6b0UYszoswwZYv28DC3j4v+ljt9zYWl1uH+z2rZ9URIZmO32SQmu+Dc?= =?us-ascii?Q?qNkZkWgZab2xs5NkHBg/imY0fhCSLESVHhm8vG80qva42r+b48Er26WtAt/d?= =?us-ascii?Q?J/0xjnBJc0XMlzZ+IbfVqYCWiNiTKQg33fv31dAEG8raBWpkiqhoASsqfdzo?= =?us-ascii?Q?W28XkDkGAacE09wsOipSODcddQQ6f2ytuuMW+EASDy3YD+/hPgZlW23Eegf0?= =?us-ascii?Q?F18rGPRk2capgAiPv48GpxSvi8EPQ/6Wnmog8Us0l3eEe1fVpEeN52J/2ojb?= =?us-ascii?Q?t1VGzUQCVDRK/PoO0Jsi6JrQgm68UsZANE7AJRMrxK5P66D7f8F4fuJVozcc?= =?us-ascii?Q?iM0xOL/o8bOTURoE64tW8tDkRYw9BaJ89n+jkSW4f0oLTLgPUBz4ayMxc13H?= =?us-ascii?Q?iVWlMCULzjV7+qH49Z1nwXwGp1GUlzyU2jOxJths/f3d7gV5OOT55gCGhtQ8?= =?us-ascii?Q?cUkfiHTrQ9QvjyWJMZJEEvtuaoeIOo2kP8vEH6JduadvpJd2nseq2YTYpaKj?= =?us-ascii?Q?XLYT4zQkpnwmx3BvQ8RnDkld9lFU9/CSNPm5xwNKNr3RUzCn1kpT0nDyN1hG?= =?us-ascii?Q?PlwzATth5OQ7oSi6mms6NoN0SZW5PUBflSjvlkeXFraYkkSAE6uS1uh1rwJb?= =?us-ascii?Q?9sx9yYugXIZ7FuEdKfzxs7T4Mk2S1BVFR3TtLoKuJB18aX75Wwh/mc2+2NWf?= =?us-ascii?Q?hTfAOZ51UbUKD1SSM/7ayG/65TKfKEaXeWdnAEDV1tcQXNGQ/vQn8f4HE7CB?= =?us-ascii?Q?+A=3D=3D?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: fed4d5e5-0d02-4fb4-8661-08da969f0b13 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2022 22:18:25.7235 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 23LCo9UjX8wUJ8WH9IZWTYxqQhnt2ZUV4p7qgUynTouW6Jod8MU0cdjX3lyPxGbbIDnzygGMTev6S9B/Kw6cUA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA2PR10MB4745 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-14_09,2022-09-14_04,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 mlxscore=0 suspectscore=0 malwarescore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2208220000 definitions=main-2209140108 X-Proofpoint-GUID: 0pWXxtYyyBO8C75-j4b6hA0JkX6F332_ X-Proofpoint-ORIG-GUID: 0pWXxtYyyBO8C75-j4b6hA0JkX6F332_ Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" remove_huge_page removes a hugetlb page from the page cache. Change to hugetlb_delete_from_page_cache as it is a more descriptive name. huge_add_to_page_cache is global in scope, but only deals with hugetlb pages. For consistency and clarity, rename to hugetlb_add_to_page_cache. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- fs/hugetlbfs/inode.c | 21 ++++++++++----------- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 8 ++++---- 3 files changed, 15 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index dfb735a91bbb..edd69cc43ca5 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -364,7 +364,7 @@ static int hugetlbfs_write_end(struct file *file, struc= t address_space *mapping, return -EINVAL; } =20 -static void remove_huge_page(struct page *page) +static void hugetlb_delete_from_page_cache(struct page *page) { ClearPageDirty(page); ClearPageUptodate(page); @@ -478,15 +478,14 @@ static void remove_inode_hugepages(struct inode *inod= e, loff_t lstart, folio_lock(folio); /* * We must free the huge page and remove from page - * cache (remove_huge_page) BEFORE removing the - * region/reserve map (hugetlb_unreserve_pages). In - * rare out of memory conditions, removal of the - * region/reserve map could fail. Correspondingly, - * the subpool and global reserve usage count can need - * to be adjusted. + * cache BEFORE removing the region/reserve map + * (hugetlb_unreserve_pages). In rare out of memory + * conditions, removal of the region/reserve map could + * fail. Correspondingly, the subpool and global + * reserve usage count can need to be adjusted. */ VM_BUG_ON(HPageRestoreReserve(&folio->page)); - remove_huge_page(&folio->page); + hugetlb_delete_from_page_cache(&folio->page); freed++; if (!truncate_op) { if (unlikely(hugetlb_unreserve_pages(inode, @@ -723,7 +722,7 @@ static long hugetlbfs_fallocate(struct file *file, int = mode, loff_t offset, } clear_huge_page(page, addr, pages_per_huge_page(h)); __SetPageUptodate(page); - error =3D huge_add_to_page_cache(page, mapping, index); + error =3D hugetlb_add_to_page_cache(page, mapping, index); if (unlikely(error)) { restore_reserve_on_error(h, &pseudo_vma, addr, page); put_page(page); @@ -735,7 +734,7 @@ static long hugetlbfs_fallocate(struct file *file, int = mode, loff_t offset, =20 SetHPageMigratable(page); /* - * unlock_page because locked by huge_add_to_page_cache() + * unlock_page because locked by hugetlb_add_to_page_cache() * put_page() due to reference from alloc_huge_page() */ unlock_page(page); @@ -980,7 +979,7 @@ static int hugetlbfs_error_remove_page(struct address_s= pace *mapping, struct inode *inode =3D mapping->host; pgoff_t index =3D page->index; =20 - remove_huge_page(page); + hugetlb_delete_from_page_cache(page); if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1))) hugetlb_fix_reserve_counts(inode); =20 diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 890f7b6a2eff..0ce916d1afca 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -665,7 +665,7 @@ struct page *alloc_huge_page_nodemask(struct hstate *h,= int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *= vma, unsigned long address); -int huge_add_to_page_cache(struct page *page, struct address_space *mappin= g, +int hugetlb_add_to_page_cache(struct page *page, struct address_space *map= ping, pgoff_t idx); void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, unsigned long address, struct page *page); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 00fba195a439..eb38ae3e7a83 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5429,7 +5429,7 @@ static bool hugetlbfs_pagecache_present(struct hstate= *h, return page !=3D NULL; } =20 -int huge_add_to_page_cache(struct page *page, struct address_space *mappin= g, +int hugetlb_add_to_page_cache(struct page *page, struct address_space *map= ping, pgoff_t idx) { struct folio *folio =3D page_folio(page); @@ -5568,7 +5568,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *m= m, new_page =3D true; =20 if (vma->vm_flags & VM_MAYSHARE) { - int err =3D huge_add_to_page_cache(page, mapping, idx); + int err =3D hugetlb_add_to_page_cache(page, mapping, idx); if (err) { /* * err can't be -EEXIST which implies someone @@ -5980,11 +5980,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_= mm, =20 /* * Serialization between remove_inode_hugepages() and - * huge_add_to_page_cache() below happens through the + * hugetlb_add_to_page_cache() below happens through the * hugetlb_fault_mutex_table that here must be hold by * the caller. */ - ret =3D huge_add_to_page_cache(page, mapping, idx); + ret =3D hugetlb_add_to_page_cache(page, mapping, idx); if (ret) goto out_release_nounlock; page_in_pagecache =3D true; --=20 2.37.2 From nobody Fri Apr 3 06:41:00 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F089AC6FA82 for ; Wed, 14 Sep 2022 22:19:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229706AbiINWT3 (ORCPT ); Wed, 14 Sep 2022 18:19:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229693AbiINWTP (ORCPT ); Wed, 14 Sep 2022 18:19:15 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6757286C3E for ; Wed, 14 Sep 2022 15:19:13 -0700 (PDT) Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28EMIW4w015296; Wed, 14 Sep 2022 22:18:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=6ivtxW+IRCVlKCBX7kIWyo1zJFXhNtt3pmRL4dhkLoM=; b=I+lre44FO8s0M0U69LG3tY8Pe8YkS0JC1yN2GEAsXGc02DL4IuvpQni6piIP8TmRhTw+ BnFhqL3/aBwvowC4t9pXQzDvH+UVfmuuXlrxeoSSO+ftPdi+3/1Vv9oIEZrOcNYKyFnl 1McVAWR7vChS2Svf6vCE/CAZry8mZsU1iCx8qPmcOoXKhDimRiF86KqW/avE4SeBwDMc qysdFg84//ikwimKfiVhhrm5RzrIW1R1mBf6rrBsXugLt3m3ZK9XtTTJBid/FrvpJrNe vN8yNeLe/enM6xCe02AFI5gwIO7/w2iirZKBCwn0caK7R0Blh7eQ21SY5qN/T1WP70iV oA== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3jjxycbnry-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:32 +0000 Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 28EJ636Z006504; Wed, 14 Sep 2022 22:18:29 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3jjy2bjuh2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:29 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iycipkIzS93zggF5NZArGo7Sb/sDycsYbs6EclebenBIXIW542AXdYt0vdD4qhI/6/IBKPrYF/sS/zJ3+TSOzxz7EidS1Ajvs5oxOmYTZXt+gLEk2IQI9vB6F0JNSJ03uG4PM2WENfjZ/Bybe0xeXaaNl5xOwqLAx1yelM1gP0MhHgV+wqLb1U3X7GYqiWyQ4iPhSV4zUrZ4PtloAyXobpFDbum0BRme8HN4gBmbkRvvZRdahrmFV/mOhATw+OebFCv/NdVw3ASVW9j4iwY093LjHgcjnszLLdjk+PcAWvEZEPNZaeqSqHgMYGHi60TkxnPyk1eoWT9mi3FHnZCCPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6ivtxW+IRCVlKCBX7kIWyo1zJFXhNtt3pmRL4dhkLoM=; b=QXZGb0dYcRSedrQtIPn3ueGswyZRCrrd+hBTIBnHumQ9UdKMeGcBIKy0H2f43EmwA2/0wKoOvvcf1LvGcjHUuYsrNjgmCjnp5xYANRtSRrvS4bSdHxXvWPWS1/CBlW2V3c3Cu0gkzdTv1rD2etAlOZqQWqZPxcxuD21L0SwPffiKM68NRDPPnMsOQmGolZtYIiEIJJDuGvhvdfCQq8ccP5xme+6YM7AOmkWyiSq1iSTuREt2JzDTBNT7m650O11Jy7uuez2AwYV3QnAINEHDkEejZFlAf5tlv84zcTLZdptjE7ErA7eEO44J76eadhaM44RdaHFDKhKtDtmPevKBdQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6ivtxW+IRCVlKCBX7kIWyo1zJFXhNtt3pmRL4dhkLoM=; b=jcXVK2GNCO9wnzgsK5xoxmKMt4DYHOEc6YPzrUR0Wv6ayj4aDiAqFgsiyaPlMHOP27v2nP1mxqnsqWWfKbB4HiGx4QNVfOJU0ChvqiJsup+0W0g0FmdU/DzgaTl9dEP9B2HHxGOkSdGcJhX31VSKNY/JdqhAA+mJNjWNtVvZowA= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by SA2PR10MB4745.namprd10.prod.outlook.com (2603:10b6:806:11b::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.23; Wed, 14 Sep 2022 22:18:28 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a%6]) with mapi id 15.20.5612.022; Wed, 14 Sep 2022 22:18:28 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Sven Schnelle , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH v2 4/9] hugetlb: create remove_inode_single_folio to remove single file folio Date: Wed, 14 Sep 2022 15:18:05 -0700 Message-Id: <20220914221810.95771-5-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220914221810.95771-1-mike.kravetz@oracle.com> References: <20220914221810.95771-1-mike.kravetz@oracle.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: MW4PR04CA0156.namprd04.prod.outlook.com (2603:10b6:303:85::11) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BY5PR10MB4196:EE_|SA2PR10MB4745:EE_ X-MS-Office365-Filtering-Correlation-Id: 77f30175-95d2-458e-e2f4-08da969f0c7d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ZOtt8tSkYhN3bY41zBn9JpHPI5+XldPzOHMe3U7RDO9ZbUZVRIHsPrI3WSuUO9qhwrwTQsznmWfldZN6MR3qsm/m0AKQQnKl6KvXa4K2LNiaB/aHod+EVMiDtQpZCaUpwEOUy/Mz+MUaTtgyBupb5PGgcUnut0KwWWB+P0dBPniNFXqPLbrmfiqXjnNUXTGX8qViE9oAYfsXQW7DBHQQU2524IlWoGajzEDSCayYYmLXLcqbzoa7WacpMDGHSifE6os2uICqYjrUsKT65zJR/Lyun71U/wsZT+JYecjy6aQVsRYKRu39t1lxtFMUvnDStp82Z7EqceVW/VZJeZuxVT+iu+SQxXh+3U6KCEvrDai2QlXqb/AFfT2UJZMgXd1qvckFdbM3udL6JXBDFSEEGX/jfulg2XfQsoBK97kw/39Vo8J4oWlb2hoOQOwKyGHpAUkPeLu0QlnbJMg4v0dG2uyq+NkXbch1Oo6BV6HJ9w7WmiuLyzJZviF92zRDgeIm1Dvx32uWUhl/hJVmFRRRyMgQkahBgz8YQI6xawmN11Mgw0GDQAVdm0NaBD5VqgCJ/tTIgTalRHqz9y40mCYC1eJ38NMMA4WEA0Hox4Lj5D2xwX/2y9W45X9y3A8/8QFy2g2j/gboXl68MlMWtbWpKzefYu9KvB9B1ZpqL2i7EDiE954T71zoaQs2VheBIfkXGfsGD/3fBL/9oR6JTyutZQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(376002)(346002)(39860400002)(136003)(366004)(396003)(451199015)(478600001)(38100700002)(54906003)(316002)(36756003)(6506007)(8676002)(4326008)(66946007)(83380400001)(186003)(2616005)(1076003)(5660300002)(41300700001)(107886003)(66476007)(66556008)(86362001)(8936002)(6666004)(6486002)(7416002)(2906002)(26005)(6512007)(44832011);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?s5Aj33NahkbR6fy9WTa2aL6kpGY87TZT9GcXP1EH1smnte6MJ9nrsaedGZSu?= =?us-ascii?Q?e2W9XnuSIWzjvnWZQvm0uCfBWJRsUDH9JwwbObMvji2+cXFSAr9fdz2HVllN?= =?us-ascii?Q?m7oyp3NVmIiANp9Gbh5SWYEh5fXFgYE8XIfUoZLo4qpIyHEas3qrPvqckPMi?= =?us-ascii?Q?oHbkdVjKVcav7n4DguHVagaVg3G8/sMDsFAiOq2/xFTJGUg9Xof8wH0YX5Zc?= =?us-ascii?Q?I+b3uXeds1Z6ixmAw6HUro3giiI4rHa5vHlnjSAFqhfPJH7LpcPinWFm5GEP?= =?us-ascii?Q?Alv0B1G775NNMA7NXENdaf7vJZR3HI3Hc33BNl3llI6gUFBEQVWjz7nrNpOI?= =?us-ascii?Q?182jTEQY+F2RlXZTTiesVWr/y0uKlybpy5zScDzqO8ZN9ye3qUtJN/qjbqeG?= =?us-ascii?Q?jqEDwb/R35xj+UGhiCyTCXCmfOiadrXqyfp26G4hY2P8Va3HPr4Pot7GvF8/?= =?us-ascii?Q?FLyGMxEGItB1WVCNM+XUtUIomir7OCHZFAAQYqFF690N4t2ThuOr4VpMLSvH?= =?us-ascii?Q?qfHU4cRwpelU2kPmqfuEXLrOB1W9dXlnVqXQr2wxAxtwtq1nTiwWBMOBF8yC?= =?us-ascii?Q?v6xUb4BbLctFSfN0Cr+4HxbCLZkWT142byDvNYUyWaHZ0I2nA/TH2RZfdAV0?= =?us-ascii?Q?AZqv1VPoIj4kRwt592oFWP6orHf9pjr8vD2lE2kC+H4W7nJoVa1+z6FxdjTx?= =?us-ascii?Q?7AtZjBN1dLbJJc8ZfaDgrrjJ2PPLS4FZVbLseVDr+cX/L9KmrHSxHDIZhSTf?= =?us-ascii?Q?IN3u2yFVQ4W3U894NvhuYbvESryjUw2nJFGxx0i81erEsgZVVyY4nr9rhqGt?= =?us-ascii?Q?bXKlEYcZ06DFa0e7JdM0uDaxfRfi3vBYo3JQsf46NRkHnYO9MsNWQz6bthDF?= =?us-ascii?Q?TQNyZlQf4NQb5lX+/7AZZskVqheA3LGsaDLvnxAIIynFxZW6APvo1iQPG8OM?= =?us-ascii?Q?fqM2l5EG6WTgmP1Di/RnbFFwmP0h5+z03IWZaoRTe5mpHI6NJ4CTUw93FoyY?= =?us-ascii?Q?whnToCGOcsOQOi6nBXi+iaXee9LWWaZkQd0TZR5/oSIefIQ9YDJK7jjbcqP6?= =?us-ascii?Q?Ac2PqeEb/DdIRiW/tWcZ16NIVnMYys015KGEj2ZDx6zSe8q9pW+D/KjG0K4e?= =?us-ascii?Q?A4/Z1ICcvtgPYM4M4mjDXQ/jYGo04eF/cjYxb6CUK4GrNd23B5PnhILHXqCx?= =?us-ascii?Q?JI3hNRg/OQTLZo1kDeoWX0ylF3t2OjcV2EcSZ1pCej+yFnSalcXSgY0a5Sh2?= =?us-ascii?Q?rwBohWZNNhOsmupn3gsf6hmZy1fpSuHDhetEpoqXjhFxNwDne1uQ2KukAAlq?= =?us-ascii?Q?Y4eNHgyGe+Bd1a3tEE+P5XK3KH4LNoELOFKS2rC2mCG5H4vWkW/xfZKevydo?= =?us-ascii?Q?4Wr4fLvzgIUMIXtzp460/ZzRCFRF6BWAfYJhFbmrKbieKiqPlPqNnQbd+1LS?= =?us-ascii?Q?rbrniFoMqJjssLz5lkRa1gAOYpCGW3clKdF0vsw0uP0M4mx3rB+e7USqaWlu?= =?us-ascii?Q?4woGcoAtuORZHhnTppZgKcwetWPxhxAr2ZCkqwZJVLeIcSCGvCmXt686V4Rh?= =?us-ascii?Q?fW+Yq7xjWt2w7AjZ/60eiqHN/ZXGshMJ3Xq/EU4qnE4ZhYSL3SQ7HguOVMzM?= =?us-ascii?Q?NA=3D=3D?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 77f30175-95d2-458e-e2f4-08da969f0c7d X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2022 22:18:28.0983 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: UBTpN4TeVvAJc/ul7F0Mt0A2ljYCFrNvSd8T4YXRumDlyicSk4mnUmSyC4PCZu+sJTvtU00g8cXUAXJyZK2wpw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA2PR10MB4745 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-14_09,2022-09-14_04,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 adultscore=0 malwarescore=0 spamscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2208220000 definitions=main-2209140108 X-Proofpoint-ORIG-GUID: xBN2k2TtQfk0k7oxqVQj4K85fe08Lf9H X-Proofpoint-GUID: xBN2k2TtQfk0k7oxqVQj4K85fe08Lf9H Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Create the new routine remove_inode_single_folio that will remove a single folio from a file. This is refactored code from remove_inode_hugepages. It checks for the uncommon case in which the folio is still mapped and unmaps. No functional change. This refactoring will be put to use and expanded upon in a subsequent patches. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- fs/hugetlbfs/inode.c | 105 ++++++++++++++++++++++++++----------------- 1 file changed, 63 insertions(+), 42 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index edd69cc43ca5..7112a9a9f54d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -411,6 +411,60 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgo= ff_t start, pgoff_t end, } } =20 +/* + * Called with hugetlb fault mutex held. + * Returns true if page was actually removed, false otherwise. + */ +static bool remove_inode_single_folio(struct hstate *h, struct inode *inod= e, + struct address_space *mapping, + struct folio *folio, pgoff_t index, + bool truncate_op) +{ + bool ret =3D false; + + /* + * If folio is mapped, it was faulted in after being + * unmapped in caller. Unmap (again) while holding + * the fault mutex. The mutex will prevent faults + * until we finish removing the folio. + */ + if (unlikely(folio_mapped(folio))) { + i_mmap_lock_write(mapping); + hugetlb_vmdelete_list(&mapping->i_mmap, + index * pages_per_huge_page(h), + (index + 1) * pages_per_huge_page(h), + ZAP_FLAG_DROP_MARKER); + i_mmap_unlock_write(mapping); + } + + folio_lock(folio); + /* + * After locking page, make sure mapping is the same. + * We could have raced with page fault populate and + * backout code. + */ + if (folio_mapping(folio) =3D=3D mapping) { + /* + * We must remove the folio from page cache before removing + * the region/ reserve map (hugetlb_unreserve_pages). In + * rare out of memory conditions, removal of the region/reserve + * map could fail. Correspondingly, the subpool and global + * reserve usage count can need to be adjusted. + */ + VM_BUG_ON(HPageRestoreReserve(&folio->page)); + hugetlb_delete_from_page_cache(&folio->page); + ret =3D true; + if (!truncate_op) { + if (unlikely(hugetlb_unreserve_pages(inode, index, + index + 1, 1))) + hugetlb_fix_reserve_counts(inode); + } + } + + folio_unlock(folio); + return ret; +} + /* * remove_inode_hugepages handles two distinct cases: truncation and hole * punch. There are subtle differences in operation for each case. @@ -418,11 +472,10 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pg= off_t start, pgoff_t end, * truncation is indicated by end of range being LLONG_MAX * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserve - * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() prevents page faults in the - * truncated range. It checks i_size before allocation, and again after - * with the page table lock for the page held. The same lock must be - * acquired to unmap a page. + * maps and global counts. Page faults can race with truncation. + * During faults, hugetlb_no_page() checks i_size before page allocation, + * and again after obtaining page table lock. It will 'back out' + * allocations in the truncated range. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserve map @@ -456,44 +509,12 @@ static void remove_inode_hugepages(struct inode *inod= e, loff_t lstart, mutex_lock(&hugetlb_fault_mutex_table[hash]); =20 /* - * If folio is mapped, it was faulted in after being - * unmapped in caller. Unmap (again) now after taking - * the fault mutex. The mutex will prevent faults - * until we finish removing the folio. - * - * This race can only happen in the hole punch case. - * Getting here in a truncate operation is a bug. + * Remove folio that was part of folio_batch. */ - if (unlikely(folio_mapped(folio))) { - BUG_ON(truncate_op); - - i_mmap_lock_write(mapping); - hugetlb_vmdelete_list(&mapping->i_mmap, - index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h), - ZAP_FLAG_DROP_MARKER); - i_mmap_unlock_write(mapping); - } - - folio_lock(folio); - /* - * We must free the huge page and remove from page - * cache BEFORE removing the region/reserve map - * (hugetlb_unreserve_pages). In rare out of memory - * conditions, removal of the region/reserve map could - * fail. Correspondingly, the subpool and global - * reserve usage count can need to be adjusted. - */ - VM_BUG_ON(HPageRestoreReserve(&folio->page)); - hugetlb_delete_from_page_cache(&folio->page); - freed++; - if (!truncate_op) { - if (unlikely(hugetlb_unreserve_pages(inode, - index, index + 1, 1))) - hugetlb_fix_reserve_counts(inode); - } - - folio_unlock(folio); + if (remove_inode_single_folio(h, inode, mapping, folio, + index, truncate_op)) + freed++; + mutex_unlock(&hugetlb_fault_mutex_table[hash]); } folio_batch_release(&fbatch); --=20 2.37.2 From nobody Fri Apr 3 06:41:00 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06789ECAAD3 for ; Wed, 14 Sep 2022 22:19:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229599AbiINWTw (ORCPT ); Wed, 14 Sep 2022 18:19:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229709AbiINWTW (ORCPT ); Wed, 14 Sep 2022 18:19:22 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACE9779EFD for ; Wed, 14 Sep 2022 15:19:20 -0700 (PDT) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28EMAARp026129; Wed, 14 Sep 2022 22:18:33 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=c5CV29ucnX6P7BSExad26jDY9ce0i4MfDdk5OL0pR+4=; b=y1yf3FkwprIzarb0yhWit8irhp79vPw/iGZt6ErczAtgkGwAgDa1vrqd3XrSEmjEcBW/ 4yIblnZrWg5Lq84vOGDU14STENVGKL9LjIeLF1/WNVPn5/BRaFcxi2NsMwLQpRwsAeGp I8raYYun2NMuCg7FW0NPbh9HjXAAN6MilCM/MVnJNt5N2XM3ZWH2/SOUS36gtBH+0LKu ItjlYD6CGEVLNQu3LKtS22noxXpzThpgnStXCEZbxywqLBDA40BkCQHAAN3drxgtOwCB 6OvRRaYVruovpemTn26o6hQtEn6MhwRJXkO6rYGiUeWQLl/mb4P/UzAtB0bBsivIDbkp KQ== Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3jjxypbqr6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:33 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 28EM1D1C035494; Wed, 14 Sep 2022 22:18:32 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3jjyehtdu5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:32 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PWLjzSJ8uiB6pDYXfUXVDz0jnU6pTCV7K1boAA0r3jgVaeI9cyc4mN4HxHDyORQtbVWPWfRr+iJkGBEXfNoHTO87ZVwB864UKYpCLa27HeBqxuih5/Rll2LDnADEiBr52ltl2shceyKAKDFjoSbPitChxq0E5Apbq7AMHW/d4dzN3RortlAWOHvUyRNbEvd+53q+bquxmDnfdORMaO4K2IlJwT1AflczgkFfY+z44j8b2SpNTnNDG8Zgeh+/IZYmJ04QAzok1lyDI4/eqEg97K3g2WshdDgox1af5iK5Q4xPlC/aNSqm4NItJsrurAsvS+dAkcYTZ2SqF+/meAGkqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=c5CV29ucnX6P7BSExad26jDY9ce0i4MfDdk5OL0pR+4=; b=Mbn2pD9dj2yCro6F40fT/OyN/1IRkITaPnhMe+9jkQ9pStxnyeETPSTWlQ/sShTURG3tbP6ep8hPWfHgoW4JWHlBeSpUiNUe3v3+bDfvD2ofmDsjZ85CrFh0qKhkWPSufXS3YdbxLX7zBGyLD6rX/P2aYeiRkE48rPhXVX0kRjuyJtJJCG7Zk5WeXt2mPMZqHmnTOB+oU5rB1QeaAwnKgVKUfH0KVsMDGdMf6bYvEpBVDpJiFhMKiD1Wu81KTotXHwsH1HtAPQK6fSYQgMabadKma6wnauUDAT4aTHGScM69P4/D9mvhlfAHbYURw+UpcQPA91Qu6/KutLo8QZoIkw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=c5CV29ucnX6P7BSExad26jDY9ce0i4MfDdk5OL0pR+4=; b=Vv2V0Ecj4I6Qej1KdNudLjrEYJLEzuA668EHVocTAqkErBwIqo2Byp1rTdZ8up0zfIBoSH0/14xBdwMPaK5FVWS9/s+xXr7nj2aQm1txI7x/2GsiAWg+n/SVrJSPN5xPWPj9REMdcY+wrJ4nEm+xs4CqcU1l2Py5Xw4lKvJIlXM= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by SA2PR10MB4745.namprd10.prod.outlook.com (2603:10b6:806:11b::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.23; Wed, 14 Sep 2022 22:18:30 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a%6]) with mapi id 15.20.5612.022; Wed, 14 Sep 2022 22:18:30 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Sven Schnelle , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH v2 5/9] hugetlb: rename vma_shareable() and refactor code Date: Wed, 14 Sep 2022 15:18:06 -0700 Message-Id: <20220914221810.95771-6-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220914221810.95771-1-mike.kravetz@oracle.com> References: <20220914221810.95771-1-mike.kravetz@oracle.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: MW4PR03CA0153.namprd03.prod.outlook.com (2603:10b6:303:8d::8) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BY5PR10MB4196:EE_|SA2PR10MB4745:EE_ X-MS-Office365-Filtering-Correlation-Id: 2fa0f240-db29-43b2-05a6-08da969f0de6 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 7AvGPbLdzoFj1uc71qBP6dbl3v7Xm7E/DJ4ArChkH7V0w7TZaTRWX6BAgHsG/d1AT4ynDjw8F+f4oAgfAyUz77XZt//8LolIAyajtwjj6L5D5fO7htRoBQlGF0j9+yxUpbHfnrp8Fl2Sm6gMa951koqPZedKdNRna7+BBl4ui/1mv5ER89P1Ij+y1/LqtAB3b1HT8zPdLHLDgXsRYt9Learp6H7ila3ps6Qkukxi6mT89lj8fb3xAttE8ycX1t/1geNtznXS6C4Sh1UT2v1DfTwj91do3wWGW8qN8/vL37QKa7xWPBksK3DHhMTsLVHgMMpB+1W55XHEeMqo0gZSFXU8uqzCVvraLmAfYgnLY68TBGUcVo+0OP1TVKjuy1WrGeECc7iZ9CedNpgbgtoF78MeFWGaKtZLd1TwJ9otj3pJFtUyjJJqIUBszdIyrcOWyIHLk/ZcjZPiSOTAuIFWDbDAMjhuo6NUp+78yBOT5m7HVB2ksbdk0wHY9OvQk2pI9NQYM5XgwDv5oo03XYs2wRiQ8hMD24mBLZrn/C2IQ1U01KJEOQoWK3a+8p1looTjPXGyht+Dj2ofoDGwt1R2HnwJyImc9lOBr6+DrPvJ+0ODpJqQHRUOUEnFNg5yFXY2t5HsHGBU+4hfnYbdTfi8F87RfvNVh2dsVn2S53yBCbJGnYim1vXVc5jeQg64nxENX3Et1oEVWM5LXJJbtmfCRA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(376002)(346002)(39860400002)(136003)(366004)(396003)(451199015)(478600001)(38100700002)(54906003)(316002)(36756003)(6506007)(8676002)(4326008)(66946007)(83380400001)(186003)(2616005)(1076003)(5660300002)(41300700001)(107886003)(66476007)(66556008)(86362001)(8936002)(6666004)(6486002)(7416002)(2906002)(26005)(6512007)(44832011);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?ABmSBjnD+dy0WguvxoTWhK5crCfRpyyyiOVa6aXEU0Oj4AEE3UYBvgBEB0uq?= =?us-ascii?Q?6NNXSl71/WHrg62H2CABw5cbw9Kf/cyScFOUy9Q+ym+Qrj3G9Bze/aVj1d7P?= =?us-ascii?Q?0gr4AlgbJRRbekjSZUFFUGiOZO8wWZ3BXgb8eP8155kLPF/NmEd0MQaUYOH4?= =?us-ascii?Q?Jv1pq07UB0T5dl93ojpIYtc663NaKLyzIsZYL51mNPoLpQNmTo19x1Le6wqm?= =?us-ascii?Q?cqxyhtRGeSmdXiM5XGhL0gHbgHOGdxDy31d3Lr7PpT1V4Avf/Pq7S2hGrvPn?= =?us-ascii?Q?x3ao39o7QrIpXczlKCkTZ3U6+O+VAvhXTl1DT2qR7PsRwf1IQ0Pi1nMhPC+3?= =?us-ascii?Q?cEq0A010uZ6Rtm+ofShsplSn6peLCZvpMLinOPoF16RlVZKtwfYGTR/XlddQ?= =?us-ascii?Q?7hGxwlURs37ZuQuVfTL/Zq8kQmfmnXNOSpLsR3QrYxihZQ/vBhnlDuMxBfyH?= =?us-ascii?Q?eGscSzXKG9RWUxPnv2XjAfGSrLCQBtOGPaR6o+RRbYRP6SA1yJVEYeOHjgDV?= =?us-ascii?Q?CoJxRAS7B2CD7Kp6xYon2ud5MJ9tzhP2zh/AEe4Q5hgZFo6/niglhk3T5ASh?= =?us-ascii?Q?pXDnHmvIKe6Gn+JxjpEDoyWdIw2wwIgsgMz2O7JaPhskivYHydXrbWDkghxV?= =?us-ascii?Q?9+V9ffVtmtYY/PUExaZr47rchT3PlWKJKw6OgDcBxPPaTXmju9XPtnEgF0C9?= =?us-ascii?Q?/IRcbcZLIbTxA+2hkj4RJaCDpJhwGgf29PAOEJ+J4TGRMbXQZ5eo4HBawPqa?= =?us-ascii?Q?VK7kQFHUViiul7qcBF0MX/y3WgQmA7I54OMzyf6B1OXEkLp7bB2/tHxGcpnT?= =?us-ascii?Q?BcDOZC87Lt4IJLG/CL7nlhVCSkjI/KNNwz0/bGW+JT9dPDWnoNbxo+TXoxd1?= =?us-ascii?Q?SXk48a3yUsbuuY5UDOZj3jBQnANze081dAxEv+VPGyfqf4rpEmbCIqyoVe0E?= =?us-ascii?Q?ONRY0V3iGGGjz2QJDGae9s75q9yyKQTVr4wZFT39Fob/Bj4H8YkuujnK4nyQ?= =?us-ascii?Q?0cTU+82AAYMC9Pn1ZAMjE0kXBzriSSuWlvvGVOilAPW3PIhdEk+j7sG/hgaI?= =?us-ascii?Q?rehq5wh9noHxkypP9q436zRaHOgzljvOucm7DiamMnkvc5rfLWegKSINPWqS?= =?us-ascii?Q?pnHwzoudoiUHi76YG392z8aL8GV6zh3bN/go6L5NUyAuGl5wcGvQfN1/1+dW?= =?us-ascii?Q?iHCa/W+/QBcc7W/p0AOexCUTR/VS9kc+p28VniT7zqdmdTM2ReJ4ohEjr0qY?= =?us-ascii?Q?fZXul8haQzpyuJLX/UhRgEI1i7sx2iN0r+RNBQFQfkuH1qa3uKAON3+pIrQO?= =?us-ascii?Q?OrPPD83Q1N6BcgFFD1HPwKjuEjFA/FtLTmTuZ4VL376UIQiUXIgY6674E6gR?= =?us-ascii?Q?LA/mmbN1kbjCocCYA9/RhRjHSxdVNTgQOGPyXvZn93Y5AQaiCozKdVLGRboS?= =?us-ascii?Q?G+YRSKiZm71PYM+m0uctzsqngkxrVTmyRthZ+klXzyrId22p59JhmbWs+nSJ?= =?us-ascii?Q?Qk5gx2/1IIrk7kL42bUBcVEYQe4fHhF8zOOVyh7ahlXlvxTdg2OpvbKYINKR?= =?us-ascii?Q?Lr05UPa+QzumGzSjsB78ukB1rdozpNtwBpYxffpuDqkwhGcZeuMym3kGAWV1?= =?us-ascii?Q?vA=3D=3D?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2fa0f240-db29-43b2-05a6-08da969f0de6 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2022 22:18:30.4272 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: mPk2CMG5DbWfOS5iILOJobbNQZ/ch/MWrRxmfleo2S6V3PohxLgn4ELZiabsQ5LFwJwGeORbwhyJ04TMCTEa8g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA2PR10MB4745 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-14_09,2022-09-14_04,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 mlxlogscore=999 bulkscore=0 phishscore=0 spamscore=0 suspectscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2208220000 definitions=main-2209140108 X-Proofpoint-ORIG-GUID: P8wqMPzQxSCpFJXxg0I-dCr1o4h08_Tf X-Proofpoint-GUID: P8wqMPzQxSCpFJXxg0I-dCr1o4h08_Tf Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Rename the routine vma_shareable to vma_addr_pmd_shareable as it is checking a specific address within the vma. Refactor code to check if an aligned range is shareable as this will be needed in a subsequent patch. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- mm/hugetlb.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index eb38ae3e7a83..8117bc299c46 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6639,26 +6639,33 @@ static unsigned long page_table_shareable(struct vm= _area_struct *svma, return saddr; } =20 -static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) +static bool __vma_aligned_range_pmd_shareable(struct vm_area_struct *vma, + unsigned long start, unsigned long end) { - unsigned long base =3D addr & PUD_MASK; - unsigned long end =3D base + PUD_SIZE; - /* * check on proper vm_flags and page table alignment */ - if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, base, end)) + if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, start, end)) return true; return false; } =20 +static bool vma_addr_pmd_shareable(struct vm_area_struct *vma, + unsigned long addr) +{ + unsigned long start =3D addr & PUD_MASK; + unsigned long end =3D start + PUD_SIZE; + + return __vma_aligned_range_pmd_shareable(vma, start, end); +} + bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) { #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; #endif - return vma_shareable(vma, addr); + return vma_addr_pmd_shareable(vma, addr); } =20 /* --=20 2.37.2 From nobody Fri Apr 3 06:41:00 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56504C6FA82 for ; Wed, 14 Sep 2022 22:19:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229503AbiINWTq (ORCPT ); Wed, 14 Sep 2022 18:19:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229712AbiINWTW (ORCPT ); Wed, 14 Sep 2022 18:19:22 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE21786C15 for ; Wed, 14 Sep 2022 15:19:20 -0700 (PDT) Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28EM9o1x004934; Wed, 14 Sep 2022 22:18:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=Qv/NQPpFdTbOef+QCh3XKlQ9j1lu0LB1XJ6P0Mxj338=; b=ohwo0sPRvDfc3LkIPVpwGPcDgo5E2ZlrGrWVLhNNEBf+khYjHOw3FrkJYnJpONkIyuM3 0l5lVqVOFL2Qf8r7fbUuIUszNcpnWcRU07ZM9SY4Dzkprw/GPFeV2TTr5bfSwHC2aMh8 a4/JwoZaHJOCzruZB8Fyq8n5fruLVgONNKODBy8LwsI76Q06H0uGWfah3ffzzfw/xdlH Cp9Ugxr/jq/X2zyBBofIR9i1HodTaiAK1wrEPZ1R+c6WqTcke2oErqb5mjz9iem7jgZG FKnC5e5c8we0VwKgQqACe6TEjXEwdWXOCIIRWH57zMCcDHOLQBFZxxpmLPqyAob3LGL7 qw== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3jjxypbmgs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:36 +0000 Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 28EKGKKR035390; Wed, 14 Sep 2022 22:18:35 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3jjym095jm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:35 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NqeyfE9+zDsHKAyjvySib8rFkH2Y6hdvbWZ291/jlbIpuK9WIcJu6dfjaO4akWNbggfwEsFm01/JXdwQd/NNORqxMihWp5NubCLXiPnNCDeugKIAptcaKvVaQUg8h/4n4KempW8Memdgrzw1dRq8D77w5now5WQ7C2XQdYl+c6k4AtQwWeVvv6Wi3/dQH5wVg4C9CD6MluR/B+iCTeHjEumSmiQ+4LqT4913v9qykSJEQcf1da60LdzbB2l2SdVuXNJEomV1GCDAmp0fE901jU3N7w6YtHeAa2fQhSFSApcbTjohLg64cPM5YzHzIUnbGaEO1F+r9GBQJJCKLNuZvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Qv/NQPpFdTbOef+QCh3XKlQ9j1lu0LB1XJ6P0Mxj338=; b=lqaaWlBC1KTTjMte7lXR6tvnnNcvyE+ZKbZ76PoowpL+giGTg3407oafAW2KGAzA49oAYtxFd936ASuBUTxiwVmpa0TNb35FgCv9GzWQv24G38uC2dxRHh5VnHc1gWNJh6B5j56fV141ubTx3f9Hzryrq8iauUh7PGG1c3/OGudu0zsPvL8NXPUiLOkasXhFaq3DwGq7vVRtdU6NB61gQFhl33vs/jTH21GsssuZL+YIDxxRC4MdaNY5lxsZHOUqcTKg77etKpzspR6UP5e4M5RzDlaSXE8SH+ERaeak4RLRQKIeREpviapF0/e5BHcYouk/NkHN6cZFELH49FefLw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Qv/NQPpFdTbOef+QCh3XKlQ9j1lu0LB1XJ6P0Mxj338=; b=ZQTVXYrEiMDfCjAwDVOBAnUtYmhFECU+bRiteRAlt4+qWXE5AZUHp9OddD2ykcX6F0d0HyThnoLDzTVuSrfM+/lLJXT3z5AwLXhNnNjP16vGzCIP47Z5NQgkVK75pZpeYelDatoSDh4xAxN4VH4WYr/tQ+CFqtwvInt86kMMrRs= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by SA2PR10MB4745.namprd10.prod.outlook.com (2603:10b6:806:11b::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.23; Wed, 14 Sep 2022 22:18:32 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a%6]) with mapi id 15.20.5612.022; Wed, 14 Sep 2022 22:18:32 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Sven Schnelle , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH v2 6/9] hugetlb: add vma based lock for pmd sharing Date: Wed, 14 Sep 2022 15:18:07 -0700 Message-Id: <20220914221810.95771-7-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220914221810.95771-1-mike.kravetz@oracle.com> References: <20220914221810.95771-1-mike.kravetz@oracle.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: MW4PR03CA0330.namprd03.prod.outlook.com (2603:10b6:303:dd::35) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BY5PR10MB4196:EE_|SA2PR10MB4745:EE_ X-MS-Office365-Filtering-Correlation-Id: d9512851-c283-4b64-a786-08da969f0f25 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xBSOHUDtQKEGj1DvthAIYyNzfvnqUjtofJWiMi1Qa+41QkkUlDWthfd04aHpNbZZgXY6VzRZ+hfIILGklSosrMx/VqvHpGTa0ku16JWEGo/+W96PsjeB3Pt+9YPLqhDYcum9ydREywIVMRahI9buQzKkfztyfmbTb/KlbXuHDfeaqjD5EwtAUrIynUCvSMzjJCLDu+ec6bkSAKrfis0GDZRtJv8nRPh2cihAQJFyfMHycOQApqS76mCA3nY557M2f72Sidn30f2ymYyrnLCAI3Gu3KG4RPl4yxNeGCu+CEAaL+272vcKuva6FzjJtcVVldcarCkXaja/wrb6+F6j1HVDLFlnzb6GxtRY4XVHZE9HiYOWcS0zG4a9E+4ezlUx+s03ZwatHrmTkp0XNZu99FEObzIDFHCP0yWq+uyBIphyX+Ott8RfTzxNiDn6UNvJZ+h/ZestEsHcDRgul4LJlbJ0i4DNVNe9bzryFCUW7BMVNGyRizXOQ8wndMme0VDOaBPYoHr0Un7f5XKewVOAV1g1K/LMAVY19GYo2krwPXunqTPYVvwTLnlMG5TVuhU5EKwdrWMeqjQEPpOpR/WoYOzLqEKNBadRgastskcLL8cbWjnFwj588UpL2RXzJKUQPyhYdJ1sRctAOWoOo327tbJ2o+q97SyMBmeG0uP782HrvGB9zw6xSkXq89spQLDdF2HxOuP4teEp3JUaktkc2g== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(376002)(346002)(39860400002)(136003)(366004)(396003)(451199015)(478600001)(38100700002)(54906003)(316002)(36756003)(6506007)(8676002)(4326008)(66946007)(83380400001)(186003)(2616005)(1076003)(5660300002)(41300700001)(107886003)(66476007)(66556008)(86362001)(8936002)(6666004)(6486002)(7416002)(2906002)(26005)(30864003)(6512007)(44832011);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?aQO5un1b747IiC7g6OG5JYowu0RhUsdKXct//tg/YMYRdT/mdMRGS3HLlKsz?= =?us-ascii?Q?3x9lJDBX3ZuohVtsTBJCAkOLnY+jVDyhFbMNqJuYVbK5L1MWQwHraZhaZ6Yf?= =?us-ascii?Q?AutdR7xngmcs7lUIKqj8xDtueUZYBz5SGk6mpGbczTqg72j4vbLFDayW7Tpu?= =?us-ascii?Q?hvuK3tnMTiOpIevLEFyJfjpkTxa3vY3oHC5LdE8a2TBRKH/4qiMjILNx/xUh?= =?us-ascii?Q?7NXGTySzzZvzNSlJB9Lf8QiOf/pqKZxCu8cjFcSe73FxSmuCukyZfmHYrQtR?= =?us-ascii?Q?ExuA9AloYKRZQCcfHfkgSyxih7ex5mdM+DXcm5PyhS+44VBQVTS0lreKV6JC?= =?us-ascii?Q?7VgP983gVQ3dRWiNMjMzERuRtCjsdB1XeUAzNnqjnun5RnIpF1I5Y3cEjP7B?= =?us-ascii?Q?YgD4TjpJs/mneG9pA7o6c1qKK9OyYoKq0q/kGA/o97evfVD5vYaXer6fNAby?= =?us-ascii?Q?9I3u56noxHq1GBePTCZIaxLvO9++C2a5w2yB7DsEi5YM1Y9X3Zc0KWeT+ESF?= =?us-ascii?Q?kYdvyDsh9L6oFHwOx7VH5jT/W/yioQ7KfwUfEUG2fsCS6YqCWF+jeRIhdHgI?= =?us-ascii?Q?UNlIWGpHTtmcZyQGOfFIf6cn3t5Z0p8zUnMGFvYpHeqgL8NvCqcIWsAiEn+M?= =?us-ascii?Q?HcCJRSFNkUsjYTM0aCzD5j0bMwNjubxrLOTKEEEjDOzYRoecrgDs4Hjzs8Gp?= =?us-ascii?Q?4rPiXal0FurhBm2BRrrWxAoBKjIMfw2T1mw3MzhNc6+tiVzQhTM1EygkvrOy?= =?us-ascii?Q?OcAD+89HFLyvMlEila1SnMf+mi4VzILEDemskRP4jwxihCaoTfVWBdbTijMB?= =?us-ascii?Q?+BbY8PoVHttYlRqnF4VbFtWqCftkEWnbCwRDtupDRqN3vqo7rfJq7dUsX8V4?= =?us-ascii?Q?A7Eoa9dQESSMd33p3J350K693q9s2cpdoFuga5VRg2mQ1JJHEZMkMsxpYdTd?= =?us-ascii?Q?znDYt8c4KgWeVLDwvref1Jc1ZXANxdf9iU2RDr0lyUBHuRXH0XxPABvErI/S?= =?us-ascii?Q?Gfh/3qDurXH/6J6OVlJ1rE6QF5a82sZ1vPXIkEwh4aLzSNiUBGF3qt9GIaJb?= =?us-ascii?Q?P9EsaTJ8iXHAK/CifFeGUi4DstIVpvQbIPZ10YXTJAdTkDXa4/WnYdirZDRy?= =?us-ascii?Q?s7OpKTwi5B8p3K5G+7DMUD4uQPaxB2N7pstnwzLrBmoeHc57yspZxbEvmu2h?= =?us-ascii?Q?9DWsUifnpGdWAFDo53PAbyZ0oOx2EcItnGPxckCNa1u8mF+ImutV2cJzIZkf?= =?us-ascii?Q?RyTmgN2njQ4DOd8l2P8huCIdxfQsoy0ifNMJOBqdWRP+xord2N568ccUx7s+?= =?us-ascii?Q?xasOAM3uOxI0xC5WVXLicIL4xTKtCpivgQnRKopnFp41ytTqBQ3ypj0EFJVi?= =?us-ascii?Q?yPT/QcmEj/sbC+fnj00wXG0vytKFK93BQM1N/PVDtR6yKXPdIigpR71qRkP1?= =?us-ascii?Q?06AWpb00e4nL2oaTflK94SWiUXhaq8lfvlZrQiZGndn16AnQYmyEi55ylBv+?= =?us-ascii?Q?niifOAlrhwyS5dmdAV6+PZ/wZEXuDJAVdckHPsE/op/eKJFaQ74nkB5ng3Lk?= =?us-ascii?Q?mSkKXuVinfafvFhWn7u7sEXXq8cehUScXW9rfIlMwUsvEInq2opI0uZ8qu4F?= =?us-ascii?Q?5Q=3D=3D?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: d9512851-c283-4b64-a786-08da969f0f25 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2022 22:18:32.5531 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 9vi7q5mpDObA1xUfYTqoVJuf5OV0npv5x4GbDvDY2qemtmPVIAvY2qwnhTdrcO2CENwuparx6RTcgVlDa4PO1A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA2PR10MB4745 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-14_09,2022-09-14_04,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 spamscore=0 adultscore=0 malwarescore=0 suspectscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2208220000 definitions=main-2209140108 X-Proofpoint-ORIG-GUID: XxWoKxguxz7mDk4owwO2p38FnMNla3FA X-Proofpoint-GUID: XxWoKxguxz7mDk4owwO2p38FnMNla3FA Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Allocate a new hugetlb_vma_lock structure and hang off vm_private_data for synchronization use by vmas that could be involved in pmd sharing. This data structure contains a rw semaphore that is the primary tool used for synchronization. This new structure is ref counted, so that it can exist when NOT attached to a vma. This is only helpful in resolving lock ordering issues where code may need to obtain the vma_lock while there are no guarantees the vma may go away. By obtaining a ref on the structure, it can be guaranteed that at least the rw semaphore will not go away. Only add infrastructure for the new lock here. Actual use will be added in subsequent patches. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- include/linux/hugetlb.h | 43 ++++++++- kernel/fork.c | 6 +- mm/hugetlb.c | 202 ++++++++++++++++++++++++++++++++++++---- mm/rmap.c | 8 +- 4 files changed, 235 insertions(+), 24 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 0ce916d1afca..6a1bd172f943 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -114,6 +114,12 @@ struct file_region { #endif }; =20 +struct hugetlb_vma_lock { + struct kref refs; + struct rw_semaphore rw_sema; + struct vm_area_struct *vma; +}; + extern struct resv_map *resv_map_alloc(void); void resv_map_release(struct kref *ref); =20 @@ -126,7 +132,7 @@ struct hugepage_subpool *hugepage_new_subpool(struct hs= tate *h, long max_hpages, long min_hpages); void hugepage_put_subpool(struct hugepage_subpool *spool); =20 -void reset_vma_resv_huge_pages(struct vm_area_struct *vma); +void hugetlb_dup_vma_private(struct vm_area_struct *vma); void clear_vma_resv_huge_pages(struct vm_area_struct *vma); int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff= _t *); int hugetlb_overcommit_handler(struct ctl_table *, int, void *, size_t *, @@ -214,6 +220,14 @@ struct page *follow_huge_pud(struct mm_struct *mm, uns= igned long address, struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int flags); =20 +void hugetlb_vma_lock_read(struct vm_area_struct *vma); +void hugetlb_vma_unlock_read(struct vm_area_struct *vma); +void hugetlb_vma_lock_write(struct vm_area_struct *vma); +void hugetlb_vma_unlock_write(struct vm_area_struct *vma); +int hugetlb_vma_trylock_write(struct vm_area_struct *vma); +void hugetlb_vma_assert_locked(struct vm_area_struct *vma); +void hugetlb_vma_lock_release(struct kref *kref); + int pmd_huge(pmd_t pmd); int pud_huge(pud_t pud); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, @@ -225,7 +239,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vm= a); =20 #else /* !CONFIG_HUGETLB_PAGE */ =20 -static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma) +static inline void hugetlb_dup_vma_private(struct vm_area_struct *vma) { } =20 @@ -336,6 +350,31 @@ static inline int prepare_hugepage_range(struct file *= file, return -EINVAL; } =20 +static inline void hugetlb_vma_lock_read(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_unlock_read(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_lock_write(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_unlock_write(struct vm_area_struct *vma) +{ +} + +static inline int hugetlb_vma_trylock_write(struct vm_area_struct *vma) +{ + return 1; +} + +static inline void hugetlb_vma_assert_locked(struct vm_area_struct *vma) +{ +} + static inline int pmd_huge(pmd_t pmd) { return 0; diff --git a/kernel/fork.c b/kernel/fork.c index b3399184706c..e85e923537a2 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -677,12 +677,10 @@ static __latent_entropy int dup_mmap(struct mm_struct= *mm, } =20 /* - * Clear hugetlb-related page reserves for children. This only - * affects MAP_PRIVATE mappings. Faults generated by the child - * are not guaranteed to succeed, even if read-only + * Copy/update hugetlb private vma information. */ if (is_vm_hugetlb_page(tmp)) - reset_vma_resv_huge_pages(tmp); + hugetlb_dup_vma_private(tmp); =20 /* Link the vma into the MT */ mas.index =3D tmp->vm_start; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8117bc299c46..616be891b798 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -90,6 +90,8 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_ali= gned_in_smp; =20 /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); +static void hugetlb_vma_lock_free(struct vm_area_struct *vma); +static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); =20 static inline bool subpool_is_free(struct hugepage_subpool *spool) { @@ -858,7 +860,7 @@ __weak unsigned long vma_mmu_pagesize(struct vm_area_st= ruct *vma) * faults in a MAP_PRIVATE mapping. Only the process that called mmap() * is guaranteed to have their future faults succeed. * - * With the exception of reset_vma_resv_huge_pages() which is called at fo= rk(), + * With the exception of hugetlb_dup_vma_private() which is called at fork= (), * the reserve counters are updated with the hugetlb_lock held. It is safe * to reset the VMA at fork() time as it is not in use yet and there is no * chance of the global counters getting corrupted as a result of the valu= es. @@ -1005,12 +1007,20 @@ static int is_vma_resv_set(struct vm_area_struct *v= ma, unsigned long flag) return (get_vma_private_data(vma) & flag) !=3D 0; } =20 -/* Reset counters to 0 and clear all HPAGE_RESV_* flags */ -void reset_vma_resv_huge_pages(struct vm_area_struct *vma) +void hugetlb_dup_vma_private(struct vm_area_struct *vma) { VM_BUG_ON_VMA(!is_vm_hugetlb_page(vma), vma); + /* + * Clear vm_private_data + * - For MAP_PRIVATE mappings, this is the reserve map which does + * not apply to children. Faults generated by the children are + * not guaranteed to succeed, even if read-only. + * - For shared mappings this is a per-vma semaphore that may be + * allocated in a subsequent call to hugetlb_vm_op_open. + */ + vma->vm_private_data =3D (void *)0; if (!(vma->vm_flags & VM_MAYSHARE)) - vma->vm_private_data =3D (void *)0; + return; } =20 /* @@ -1041,7 +1051,7 @@ void clear_vma_resv_huge_pages(struct vm_area_struct = *vma) kref_put(&reservations->refs, resv_map_release); } =20 - reset_vma_resv_huge_pages(vma); + hugetlb_dup_vma_private(vma); } =20 /* Returns true if the VMA has associated reserve pages */ @@ -4622,16 +4632,21 @@ static void hugetlb_vm_op_open(struct vm_area_struc= t *vma) resv_map_dup_hugetlb_cgroup_uncharge_info(resv); kref_get(&resv->refs); } + + hugetlb_vma_lock_alloc(vma); } =20 static void hugetlb_vm_op_close(struct vm_area_struct *vma) { struct hstate *h =3D hstate_vma(vma); - struct resv_map *resv =3D vma_resv_map(vma); + struct resv_map *resv; struct hugepage_subpool *spool =3D subpool_vma(vma); unsigned long reserve, start, end; long gbl_reserve; =20 + hugetlb_vma_lock_free(vma); + + resv =3D vma_resv_map(vma); if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER)) return; =20 @@ -6438,6 +6453,11 @@ bool hugetlb_reserve_pages(struct inode *inode, return false; } =20 + /* + * vma specific semaphore used for pmd sharing synchronization + */ + hugetlb_vma_lock_alloc(vma); + /* * Only apply hugepage reservation if asked. At fault time, an * attempt will be made for VM_NORESERVE to allocate a page @@ -6461,12 +6481,11 @@ bool hugetlb_reserve_pages(struct inode *inode, resv_map =3D inode_resv_map(inode); =20 chg =3D region_chg(resv_map, from, to, ®ions_needed); - } else { /* Private mapping. */ resv_map =3D resv_map_alloc(); if (!resv_map) - return false; + goto out_err; =20 chg =3D to - from; =20 @@ -6561,6 +6580,7 @@ bool hugetlb_reserve_pages(struct inode *inode, hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), chg * pages_per_huge_page(h), h_cg); out_err: + hugetlb_vma_lock_free(vma); if (!vma || vma->vm_flags & VM_MAYSHARE) /* Only call region_abort if the region_chg succeeded but the * region_add failed or didn't run. @@ -6640,14 +6660,34 @@ static unsigned long page_table_shareable(struct vm= _area_struct *svma, } =20 static bool __vma_aligned_range_pmd_shareable(struct vm_area_struct *vma, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, + bool check_vma_lock) { +#ifdef CONFIG_USERFAULTFD + if (uffd_disable_huge_pmd_share(vma)) + return false; +#endif /* * check on proper vm_flags and page table alignment */ - if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, start, end)) - return true; - return false; + if (!(vma->vm_flags & VM_MAYSHARE)) + return false; + if (check_vma_lock && !vma->vm_private_data) + return false; + if (!range_in_vma(vma, start, end)) + return false; + return true; +} + +static bool vma_pmd_shareable(struct vm_area_struct *vma) +{ + unsigned long start =3D ALIGN(vma->vm_start, PUD_SIZE), + end =3D ALIGN_DOWN(vma->vm_end, PUD_SIZE); + + if (start >=3D end) + return false; + + return __vma_aligned_range_pmd_shareable(vma, start, end, false); } =20 static bool vma_addr_pmd_shareable(struct vm_area_struct *vma, @@ -6656,15 +6696,11 @@ static bool vma_addr_pmd_shareable(struct vm_area_s= truct *vma, unsigned long start =3D addr & PUD_MASK; unsigned long end =3D start + PUD_SIZE; =20 - return __vma_aligned_range_pmd_shareable(vma, start, end); + return __vma_aligned_range_pmd_shareable(vma, start, end, true); } =20 bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) { -#ifdef CONFIG_USERFAULTFD - if (uffd_disable_huge_pmd_share(vma)) - return false; -#endif return vma_addr_pmd_shareable(vma, addr); } =20 @@ -6695,6 +6731,130 @@ void adjust_range_if_pmd_sharing_possible(struct vm= _area_struct *vma, *end =3D ALIGN(*end, PUD_SIZE); } =20 +static bool __vma_shareable_flags_pmd(struct vm_area_struct *vma) +{ + return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) && + vma->vm_private_data; +} + +void hugetlb_vma_lock_read(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) { + struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; + + down_read(&vma_lock->rw_sema); + } +} + +void hugetlb_vma_unlock_read(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) { + struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; + + up_read(&vma_lock->rw_sema); + } +} + +void hugetlb_vma_lock_write(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) { + struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; + + down_write(&vma_lock->rw_sema); + } +} + +void hugetlb_vma_unlock_write(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) { + struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; + + up_write(&vma_lock->rw_sema); + } +} + +int hugetlb_vma_trylock_write(struct vm_area_struct *vma) +{ + struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; + + if (!__vma_shareable_flags_pmd(vma)) + return 1; + + return down_write_trylock(&vma_lock->rw_sema); +} + +void hugetlb_vma_assert_locked(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) { + struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; + + lockdep_assert_held(&vma_lock->rw_sema); + } +} + +void hugetlb_vma_lock_release(struct kref *kref) +{ + struct hugetlb_vma_lock *vma_lock =3D container_of(kref, + struct hugetlb_vma_lock, refs); + + kfree(vma_lock); +} + +static void hugetlb_vma_lock_free(struct vm_area_struct *vma) +{ + /* + * Only present in sharable vmas. See comment in + * __unmap_hugepage_range_final about how VM_SHARED could + * be set without VM_MAYSHARE. As a result, we need to + * check if either is set in the free path. + */ + if (!vma || !(vma->vm_flags & (VM_MAYSHARE | VM_SHARED))) + return; + + if (vma->vm_private_data) { + struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; + + /* + * vma_lock structure may or not be released, but it + * certainly will no longer be attached to vma so clear + * pointer. + */ + vma_lock->vma =3D NULL; + kref_put(&vma_lock->refs, hugetlb_vma_lock_release); + vma->vm_private_data =3D NULL; + } +} + +static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +{ + struct hugetlb_vma_lock *vma_lock; + + /* Only establish in (flags) sharable vmas */ + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) + return; + + /* Should never get here with non-NULL vm_private_data */ + if (vma->vm_private_data) + return; + + /* Check size/alignment for pmd sharing possible */ + if (!vma_pmd_shareable(vma)) + return; + + vma_lock =3D kmalloc(sizeof(*vma_lock), GFP_KERNEL); + if (!vma_lock) + /* + * If we can not allocate structure, then vma can not + * participate in pmd sharing. + */ + return; + + kref_init(&vma_lock->refs); + init_rwsem(&vma_lock->rw_sema); + vma_lock->vma =3D vma; + vma->vm_private_data =3D vma_lock; +} + /* * Search for a shareable pmd page for hugetlb. In any case calls pmd_allo= c() * and returns the corresponding pte. While this is not necessary for the @@ -6781,6 +6941,14 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm= _area_struct *vma, } =20 #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ +static void hugetlb_vma_lock_free(struct vm_area_struct *vma) +{ +} + +static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +{ +} + pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud) { diff --git a/mm/rmap.c b/mm/rmap.c index d17d68a9b15b..744faaef0489 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -24,7 +24,7 @@ * mm->mmap_lock * mapping->invalidate_lock (in filemap_fault) * page->flags PG_locked (lock_page) - * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) + * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share, see hugetlbfs be= low) * mapping->i_mmap_rwsem * anon_vma->rwsem * mm->page_table_lock or pte_lock @@ -44,6 +44,12 @@ * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_= anon) * ->tasklist_lock * pte map lock + * + * hugetlbfs PageHuge() take locks in this order: + * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) + * vma_lock (hugetlb specific lock for pmd_sharing) + * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing) + * page->flags PG_locked (lock_page) */ =20 #include --=20 2.37.2 From nobody Fri Apr 3 06:41:00 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA0C5ECAAD3 for ; Wed, 14 Sep 2022 22:20:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229731AbiINWUL (ORCPT ); Wed, 14 Sep 2022 18:20:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229592AbiINWTh (ORCPT ); Wed, 14 Sep 2022 18:19:37 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44E7F21BF for ; Wed, 14 Sep 2022 15:19:31 -0700 (PDT) Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28EMIhCf027338; Wed, 14 Sep 2022 22:18:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=YgYkt0QuP9ElyM3GYKSupA5PKhlljwrykt+N//yr6Sk=; b=O0aprwbDZJxPyxCjRnInoT10YiNSrEC8lpjxXpaFpDS8ijPvdi7R/Oe6vMXJ77SSWnjn 3KrYET85Gm6G1OPoaTzCflbChyL4fjhy7hyln/1xkxJiVhWJ6j2mTkh1qQ30zJqp73f5 ZHARUHHVGR5V1UWo+Y+PbzbXebvaTTy5WThG4hQJX7XxXBWYRFKuFgUL7OdcQza62urK k36wbLrIvJS5v2tK6+fqO8x24ZFfR0Jk8f0gOZPM+/9RoSbgYuJhBDYQhK+ri2ZCpDPq ciw00yFISGLRmSrmV5zpHthMzHX2kz4YD8QvTuGaoLJ0G80jxFOH/tNOBRMB+cacjnIX 1Q== Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.appoci.oracle.com [130.35.103.27]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3jjxyduppy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:37 +0000 Received: from pps.filterd (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 28ELuQLB009613; Wed, 14 Sep 2022 22:18:37 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3jjyedadwh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:37 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YdHiT0oeWnUkBaGuBpns4akjaLB6DpfplDu7t0to26RGIdrnmx7gyswyzktCLiNq4fDdOlYT5dNU/ZSKrABW8EWFB8HFd7KOALNw9pR2B3hsJxtUPGGQYCERlsVWriY1CGgeQSQb2uH1ch9sTxrV5b2bO0UdEi6cqS9UK8uCNtkRoDPTjcVVLOic+saoZSC7Sqj3YNUFcsNlEBgjLdtQdV++L9AUdDmhMVI326x1ndgfM1TDvZR4+xFCsiwF7UbkqUyoYv28FxA9WnBEeVXlRj36d/ly2LJ/I8jDYTewOh/cbVRYSSdkZEetH/Ek79i6fenMcmQS6eskjK9PIwNr7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YgYkt0QuP9ElyM3GYKSupA5PKhlljwrykt+N//yr6Sk=; b=AaojvL71ykzR8wbHHp2vCWRLH4YT6Q8nV+fReKomNP0HzM6khh7EQjvAWoltQ2Z3gLg8W7o/OyiMH/++sos+xaAhLJAXgL4R1cEkXuegG2C9Gvyj0rbFUWK5MW6l4DhqK709AF8GY1hEnsiAmu9UBT0WWTQkh/GIhmSJE6Z8Z/p+BlfdX0yVnIo4ifozlNlONI7ct0m8IDbPDoMnqVJ2bKFCGeO6zjAs8UUXuOJJ83GxDnUPD35BQr6N9zTYU3pOK10LwIHskkysXNPj588AgHtnty9xry7h43tbS+cNS0NprIqc6B6CqOa/zW0800CMwsxDQ6R5b5vx2Q+9P9Ilfg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YgYkt0QuP9ElyM3GYKSupA5PKhlljwrykt+N//yr6Sk=; b=O93Zdu38aIhHKC96gJs4lDNvZhBeFmiU/8q3gdAoQDMhNvV2qTAQJZa91L7zLJpup+s5TPomsZbZVpNogauj7WQ0VRcT3K8+P29+LkfP8t4QK1rX3CCFnKL1lr6xxMgZv4t2vqFjgJp9nv/MhE/gxAkxkSk32N0jlR4sSc0P9yw= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by CH2PR10MB4390.namprd10.prod.outlook.com (2603:10b6:610:af::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.22; Wed, 14 Sep 2022 22:18:35 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a%6]) with mapi id 15.20.5612.022; Wed, 14 Sep 2022 22:18:34 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Sven Schnelle , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH v2 7/9] hugetlb: create hugetlb_unmap_file_folio to unmap single file folio Date: Wed, 14 Sep 2022 15:18:08 -0700 Message-Id: <20220914221810.95771-8-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220914221810.95771-1-mike.kravetz@oracle.com> References: <20220914221810.95771-1-mike.kravetz@oracle.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: MW2PR16CA0028.namprd16.prod.outlook.com (2603:10b6:907::41) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BY5PR10MB4196:EE_|CH2PR10MB4390:EE_ X-MS-Office365-Filtering-Correlation-Id: 4d30b1de-46a5-4735-05b2-08da969f107b X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Ek2fhnQWeKBdZE3NGSa3ZTktMmLHX2qwOSDFJB7YZzdSdVd4T/r6Y+hwmQk5IsSJsYIVevMdj205Jdh12XpCWQsYj7mdmsHBAMcj9feGRecaCrt8I3pbNrKm3kq8Jm8bb9B4dUTTq+mGs0L0A7RKLXf1oSp1yLtS580n9WsyPhEaJI1zlT3bkfzTGNRs4k1tjndzXgzjjNdNxVHLOMSvf5NImJkuDcxdUe0Dc+anrKQ1NL9o8YhS3f9K+NzZM3kbkz6opVLIlwDgoe4i+/pJkdnncHU3Fp+W7AUFe6ycPv7zWrSy7YeuuArnttll0A336eLv+b65I6+7ed4fvZnfgf2M6aMI92NbRxVmEjWs5GzhMikq3Zks+OlNdWk8WA3xwezynNuqcouVnMBHTBnf2Vs32V8jRUAR7o3diNQIHLmEFO6gYMjPTGrtSr6urq0fhnQ6oW4jlt9sQl9ivW0ypmU6UFUUDF3KNoBMc7mgipw6z+bVRWa4FIMK2ALmV26QPdlHWUwvXVvHSkciiHIx67H8jPiP/JIN1iqpALV35D0FuCystyzBiXzt/0/ru90Cvv9ggpowP/yhRn+UZGxNQ94u1Kw/07GWs+dn1MSW9qPSP+mhJ9y9c0gcDGYzQ2uer/+52ouT3wHhcHzMtYs8Cl1mY3oyiIsMk3YvksJFeDpDUrlRfWCiVfMI9i4ma+rzMO7CiqP780io4FJu7o4XVw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(366004)(376002)(346002)(396003)(39860400002)(136003)(451199015)(6512007)(7416002)(6666004)(86362001)(66556008)(36756003)(8676002)(107886003)(83380400001)(6486002)(66946007)(54906003)(44832011)(66476007)(8936002)(5660300002)(316002)(41300700001)(6506007)(478600001)(186003)(2906002)(38100700002)(26005)(4326008)(1076003)(2616005);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?v93JHpVRFbb0PlIOy840rJlLYDtxWdSY/OBYkoI1HoN3Y+Hwo5pdBPyXKaE/?= =?us-ascii?Q?5P+mT+CfeVnQuaL7Utv3n6rbnglMv8AxmtjGtdullvU9+8umxktSxu2wcLLI?= =?us-ascii?Q?UuqcdAJ9pViZ8KNicnDOK5aViDzNrsluwN+pm0C/pFvaL1fGKlrswpCAN+we?= =?us-ascii?Q?ievXqYbO3pu+E0eg73caNgqczfaGlTikkvb6ZIz/Z+lUu4Q5Dya/rYBRxVfL?= =?us-ascii?Q?e06RAc55UnfOBLFDKTHg/bf5+iDjJXJvA3vc0w6C+xPMj+k8XGC8SOpLjitm?= =?us-ascii?Q?j1YozRbCFdAnReXcfNtBLADXxtz/zmNkJwJppvzwoDJSugTVXCzuuPLVqtbj?= =?us-ascii?Q?r79MR3IJTmrYf9HY2PWFNREP4NnCJ8m5Xs9AnNo0GtuGiuAKb2NSYuiMXSWJ?= =?us-ascii?Q?4yCYZT3k4nvpizf/WxMctHVrlr6+CfDLWg/UTk/jOmz7TXZJNfB4ZIuCPVGB?= =?us-ascii?Q?iGecuSaZ4+ex+LF4TjLF4lft2EGIfnevbLUiIlr2eGDzTMFsKO4fQkNuAahM?= =?us-ascii?Q?Lu3At8BXRPkhRhRjWPh1TdpcuX20GtmNsw3DIMccabGJ6sO6mr3Ujb7gE5Rx?= =?us-ascii?Q?rQO2LtwEDJhPJKVMqpU4cs8hdYoH2XvP3Rf5jY2coHHNMeewL8W6osPQcZFz?= =?us-ascii?Q?J7oKP7rhL/vLGp5fFwkCNXjBqtayWAn5FEcfU8FrWO9dM36cTYkRoO361KgL?= =?us-ascii?Q?tNPLgefT+OCXEO9OjusIkiPMkUXNPWRqSiSYtoucXRig9l9Yis23hxrjKExI?= =?us-ascii?Q?CMTNWvpl3iCglGDEMyKQzKV2LOm30UNkZqsLZHHtVFhK+oGlLy+L48vwmkTt?= =?us-ascii?Q?HLRw0QdyAlvYeT366zbI9GcmmqoMV3Xqh9BSHCo/kigxFnSsMIWMtzz+j4+q?= =?us-ascii?Q?llk8NbT4nJhbLk1i7LgqfiMruxIrMzC/kT4WPtj4yIF0mWVJqnCWoBOiQhns?= =?us-ascii?Q?Q6a+SrKXvkRz8Oj/WeZEsyqk6GV4auoET3sz0b+/wPXKRwiKlBXjhjC4IGZD?= =?us-ascii?Q?L66OCqbeoRcQablSZcUs++loT7mynvXwLumhEMrfdvHa/FO+24g8yrOml2Ne?= =?us-ascii?Q?sF+ZNj+A74siPUpoHBGVz16OSdzVD1QOz3t3oaJ9RaAEvyfhNRiraCvEQQKP?= =?us-ascii?Q?nXq94JDvDRhMoHwtjXJKJtl8KD+vxr1k6npqFm5raXerOzh9s+alwfTn8/+G?= =?us-ascii?Q?3nr8rd03AlPomRc2SqiDziO1h+Ht+37PVh9izm2qyn5sgk6HQP8ucEG7iXUA?= =?us-ascii?Q?SFhq/QvmAL0TA7SePShctwbkvb+UlLNj/Ib5FmnweA9o88EgHiWGEU/C4DDq?= =?us-ascii?Q?u2V85Z2x7ZmFF4QLxJJ15nCgofnyeVZ57xwftcfMSwIw51MsCUHzQjngi7ra?= =?us-ascii?Q?ZPV1uiFPz8Pp2yf5/bTQYpuALEb8x3cGHGFSxtFNiTNJjlxDAs8fg/6HCnyX?= =?us-ascii?Q?UvFSZcjeuKIhkZCpL82D8df8CkaYrYetsTs6lCQLF+elWX09FtFnxsEfteeP?= =?us-ascii?Q?94rVhgp68XS9YhSBRQ/11/vwpAaV1rTDWd6xb7L3cEKLIkRl90UTaqcCSEDk?= =?us-ascii?Q?LrOgvS+UN17iEWkEY7+LocPZBGeXZyIGt7Nui/eHhoJxlMFpg93wWYgUQvaz?= =?us-ascii?Q?JA=3D=3D?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4d30b1de-46a5-4735-05b2-08da969f107b X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2022 22:18:34.7591 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: t5brwB27MUbXHWRq+WQd1JHiQ56b8HFyk8REbfJuQMmEH3OW0DVfoLI8hsBVjczE3AgUlwldaEVTRn3uMvUIug== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR10MB4390 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-14_09,2022-09-14_04,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 adultscore=0 phishscore=0 malwarescore=0 spamscore=0 mlxlogscore=999 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2208220000 definitions=main-2209140108 X-Proofpoint-ORIG-GUID: CCej2BkDRArpAEAYlJMrnFMEMtH-umz8 X-Proofpoint-GUID: CCej2BkDRArpAEAYlJMrnFMEMtH-umz8 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Create the new routine hugetlb_unmap_file_folio that will unmap a single file folio. This is refactored code from hugetlb_vmdelete_list. It is modified to do locking within the routine itself and check whether the page is mapped within a specific vma before unmapping. This refactoring will be put to use and expanded upon in a subsequent patch adding vma specific locking. Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin --- fs/hugetlbfs/inode.c | 123 +++++++++++++++++++++++++++++++++---------- 1 file changed, 94 insertions(+), 29 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 7112a9a9f54d..3bb1772fce2f 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -371,6 +371,94 @@ static void hugetlb_delete_from_page_cache(struct page= *page) delete_from_page_cache(page); } =20 +/* + * Called with i_mmap_rwsem held for inode based vma maps. This makes + * sure vma (and vm_mm) will not go away. We also hold the hugetlb fault + * mutex for the page in the mapping. So, we can not race with page being + * faulted into the vma. + */ +static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, + unsigned long addr, struct page *page) +{ + pte_t *ptep, pte; + + ptep =3D huge_pte_offset(vma->vm_mm, addr, + huge_page_size(hstate_vma(vma))); + + if (!ptep) + return false; + + pte =3D huge_ptep_get(ptep); + if (huge_pte_none(pte) || !pte_present(pte)) + return false; + + if (pte_page(pte) =3D=3D page) + return true; + + return false; +} + +/* + * Can vma_offset_start/vma_offset_end overflow on 32-bit arches? + * No, because the interval tree returns us only those vmas + * which overlap the truncated area starting at pgoff, + * and no vma on a 32-bit arch can span beyond the 4GB. + */ +static unsigned long vma_offset_start(struct vm_area_struct *vma, pgoff_t = start) +{ + if (vma->vm_pgoff < start) + return (start - vma->vm_pgoff) << PAGE_SHIFT; + else + return 0; +} + +static unsigned long vma_offset_end(struct vm_area_struct *vma, pgoff_t en= d) +{ + unsigned long t_end; + + if (!end) + return vma->vm_end; + + t_end =3D ((end - vma->vm_pgoff) << PAGE_SHIFT) + vma->vm_start; + if (t_end > vma->vm_end) + t_end =3D vma->vm_end; + return t_end; +} + +/* + * Called with hugetlb fault mutex held. Therefore, no more mappings to + * this folio can be created while executing the routine. + */ +static void hugetlb_unmap_file_folio(struct hstate *h, + struct address_space *mapping, + struct folio *folio, pgoff_t index) +{ + struct rb_root_cached *root =3D &mapping->i_mmap; + struct page *page =3D &folio->page; + struct vm_area_struct *vma; + unsigned long v_start; + unsigned long v_end; + pgoff_t start, end; + + start =3D index * pages_per_huge_page(h); + end =3D (index + 1) * pages_per_huge_page(h); + + i_mmap_lock_write(mapping); + + vma_interval_tree_foreach(vma, root, start, end - 1) { + v_start =3D vma_offset_start(vma, start); + v_end =3D vma_offset_end(vma, end); + + if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) + continue; + + unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, + NULL, ZAP_FLAG_DROP_MARKER); + } + + i_mmap_unlock_write(mapping); +} + static void hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t = end, zap_flags_t zap_flags) @@ -383,30 +471,13 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pg= off_t start, pgoff_t end, * an inclusive "last". */ vma_interval_tree_foreach(vma, root, start, end ? end - 1 : ULONG_MAX) { - unsigned long v_offset; + unsigned long v_start; unsigned long v_end; =20 - /* - * Can the expression below overflow on 32-bit arches? - * No, because the interval tree returns us only those vmas - * which overlap the truncated area starting at pgoff, - * and no vma on a 32-bit arch can span beyond the 4GB. - */ - if (vma->vm_pgoff < start) - v_offset =3D (start - vma->vm_pgoff) << PAGE_SHIFT; - else - v_offset =3D 0; - - if (!end) - v_end =3D vma->vm_end; - else { - v_end =3D ((end - vma->vm_pgoff) << PAGE_SHIFT) - + vma->vm_start; - if (v_end > vma->vm_end) - v_end =3D vma->vm_end; - } + v_start =3D vma_offset_start(vma, start); + v_end =3D vma_offset_end(vma, end); =20 - unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, + unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, NULL, zap_flags); } } @@ -428,14 +499,8 @@ static bool remove_inode_single_folio(struct hstate *h= , struct inode *inode, * the fault mutex. The mutex will prevent faults * until we finish removing the folio. */ - if (unlikely(folio_mapped(folio))) { - i_mmap_lock_write(mapping); - hugetlb_vmdelete_list(&mapping->i_mmap, - index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h), - ZAP_FLAG_DROP_MARKER); - i_mmap_unlock_write(mapping); - } + if (unlikely(folio_mapped(folio))) + hugetlb_unmap_file_folio(h, mapping, folio, index); =20 folio_lock(folio); /* --=20 2.37.2 From nobody Fri Apr 3 06:41:00 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12DC5C6FA82 for ; Wed, 14 Sep 2022 22:19:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229750AbiINWTm (ORCPT ); Wed, 14 Sep 2022 18:19:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229719AbiINWTZ (ORCPT ); Wed, 14 Sep 2022 18:19:25 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F115D86C37 for ; Wed, 14 Sep 2022 15:19:20 -0700 (PDT) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28EMIml5014889; Wed, 14 Sep 2022 22:18:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=qcAxmGJ/kx4xobBtSUuE0kBJ49vUn3mtf24AnfwZG6g=; b=K8sBhoI/A2A36R0sZzVz2jIda/5acJfAQ/pCbG2Q9H0Uxft5w4bLAGHuR1uH3fAI9mCN UZE7jv/DGfV22AM01mrqiya/Rw1CJbkQ3LDKy+RK3DXP44fuN1an7wIjxFL6frIHNAi7 XUlrsHMF4tnj/TiNGbGw6AnnKHnoGIf67VULqaV0NWeIscENf4TbN9kxEHvLXeQDMzVh 7drImZl/9O3C88WfJDlsh9S3LdL0m9sPPDGRHeBmzOydCO4Sc17mX3uox53aZWWv7uiY c7ZGX6WeF1NFhFFbAi3kfmr5aHCC/uEeZKhwGNb8WNom4nJMjGKscNJQcwukOT/kW7M6 xQ== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3jjxyr3shy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:48 +0000 Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 28EJAjla035442; Wed, 14 Sep 2022 22:18:40 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2168.outbound.protection.outlook.com [104.47.58.168]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3jjym095mr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:39 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FTPkg1VbFtBtOvowUuoiLIJ4o52U5O3DffB60FGh01wJZJuI5kkb8Yc0vwV3prdTDQ35DdaImPIhEw1cOK4evu0T+xp4djVaZ0RbApSEtCmryy3fkJDJTIhDI8/tFuXDPAetJVunF4sdqW8kx79eAuLnGLfjFHfrrEQJygviTXJD7LHy7e9M1cFoDT2m4/J8eZqBJ9YDjYOIuQZyP2t0JWc5qTJZBaL5T3x4Okbw61t8bcimvLzU6P86VhjWe/zcYbzfki8jhMPKXp0JSl4fLnQWSrYqK1cK9M2uhVRz5RXB4LsXJCJ52EXTaS/nsopZnY6in8ZtZiuYwP/By7X/Gw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qcAxmGJ/kx4xobBtSUuE0kBJ49vUn3mtf24AnfwZG6g=; b=WH2ptc79/H+9gqWNo/Z8ItwtuZj81HFJ1BVf7jcZNrrnkipoqbVYgyr9WSsAfTdk4B/ZUUIwJRBGpXPBBwD+YNS29mDzzJBI7ThQDMhzA8zuaYSvO1xsLoaO7girrsJs3uv0SS9kk+pI8IW3TgA3sBe5YRqJKhjCcE4HRt6A1xsviC63j5aFXeVm8DdyvXA0Yvs23fU+DpL91ckpcPrcDbnbcycJDMJE3VuesdUX3V56yWJojM8mskxhRYb0rR/v+4OfHapoIR8wJ9qwUcl5R4BmOysXb3oysjY+ZYqY2Q5KZxwDyDvrj4alZwm8aJYIbH/3Vsa6NomlECfa4qRT2Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qcAxmGJ/kx4xobBtSUuE0kBJ49vUn3mtf24AnfwZG6g=; b=SuLL9MI3xmSKZo5pEeWCqqfLOkREjxmKMUt1OkS0K363DfbT63iw8NHoU70OT2ou61hX3sSvHAUFkbfmH3iEiHjTbUNRM/Vr9JDGiAcGhfUuByBanJqUjh9bBjICF5sCLHsm+ZnOUJosNhgc2AXZRvzdX1z7hisMLxLI+shxuGw= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by CH2PR10MB4390.namprd10.prod.outlook.com (2603:10b6:610:af::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.22; Wed, 14 Sep 2022 22:18:37 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a%6]) with mapi id 15.20.5612.022; Wed, 14 Sep 2022 22:18:37 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Sven Schnelle , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH v2 8/9] hugetlb: use new vma_lock for pmd sharing synchronization Date: Wed, 14 Sep 2022 15:18:09 -0700 Message-Id: <20220914221810.95771-9-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220914221810.95771-1-mike.kravetz@oracle.com> References: <20220914221810.95771-1-mike.kravetz@oracle.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: MW2PR2101CA0023.namprd21.prod.outlook.com (2603:10b6:302:1::36) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BY5PR10MB4196:EE_|CH2PR10MB4390:EE_ X-MS-Office365-Filtering-Correlation-Id: 064c2c8f-7d6f-45a4-ba93-08da969f11ae X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: I4NRcuXUXb2IHGyaHEL5cHbegzmua74UKcx3S1OuwRK6hPPQkkhBxpB8ltUcBSSxa/uOyR1Dv9567+ClEa46eSJZO69QNru6Xp8yy3HNiBqRDfFzrQI7/9QxepQfxsrnHwvTr+tdd5xOmnO57KoPEAEllG48Owpi/dTmCIwW1UnVdwIcNsVDuZpmyH5Rp34oIRlEjo+KzuwX01TlQIKB3QNhPyQdv8AjeU3fBYNbpxneQfA1IfFQPruelKpJvfGxeBpgxMgNvFlXTZNHamtJocyzdmIZTUv4GJKb+3YsXhKGlwgWKUmPts/Gf9k8TiqG+Pd/+Bcp9xGoVvGJx76CyfAvIZcByq+G6zwm8tecjcCH0st5BH/QidR2FJ6Xx3HTERPobR8wSvd3//6emzLPkIkEyCsIrBl/hHgDiGnJ1tYDdLiFsgXP5P1DCHTEEUrsZ8VYdwhhCPXPzNVpHKKsBFwW7635quu0R70gNovSBtgs6f9edcK1xSmM8ft1WmI6DKucKCXx8+CgaoCvQFGW5UgNJWnfmath1YMm2EAWsDq24jiyX0hKKZAAp3Hu34ij2rxwblGRqqcMZoPxpPCbWuAiXvbYD6KWcfQjLmjZ2YwGBDDApIIcxuN2TZCNfqEyiH5de9atS5FoD618FzaYwZLgbNabrvgpfkAzJ62eA3KYJ3G6R+Un+1EW6d9bMqyztz3DNj9aqvJcVWEQYWN2VA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(366004)(376002)(346002)(396003)(39860400002)(136003)(451199015)(6512007)(7416002)(6666004)(86362001)(66556008)(36756003)(8676002)(107886003)(83380400001)(6486002)(66946007)(54906003)(30864003)(44832011)(66476007)(8936002)(5660300002)(316002)(41300700001)(6506007)(478600001)(186003)(2906002)(38100700002)(26005)(4326008)(1076003)(2616005);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?Qlg2DX4u8ZWJOGcQr0JXQwTqXzdNssr3GDFpFuRoSlY49MDnLceUVRvRmERT?= =?us-ascii?Q?0+pmCHlorLhoAd32LJM62r185prNK3FzyyaYODb1drlmXDRPKarxRy65Yu/E?= =?us-ascii?Q?s4gdTQ47YBg7mRlpla8+dhmAJCZxxy/CncaRH/L40qfR5v3dcm0EcCZk8xw3?= =?us-ascii?Q?QQ3EdKOKjcLWBNzMQ41gLX6fiUXn71omHSL3g6cp28ibYd76587cRvQtKyjt?= =?us-ascii?Q?1qNIRg8ku6v6IooYWQZAEwJb96N+Xnwm1nnTMr2eTrbyCuiQdzFCgAXX1Tt3?= =?us-ascii?Q?/H4GCX7mXUJORgdAQuM7TI3nEE0/mALauJs2iRZu+mgyLk55S+p8XBBClajm?= =?us-ascii?Q?KbWy/tK0TmOgxeH4I0itxPI1qQi3f/83cKOg8Vp0cLaTVW/HCSJ/r212DDaR?= =?us-ascii?Q?5V+j6j23c2KZYLDqBc83WbU8YW1WC8E9irEtFcGAjW0YJX43XxRwWdYoUXCy?= =?us-ascii?Q?5UPLk01wsbSUqllxTS54F1RXjUsx7r+gpRujYkDdd+7ZQsXl+5LbM1YbNKZG?= =?us-ascii?Q?5LLOpWpxRJZTrE0TepbVAXG3RPRVV3gqeQBqGgrOpQtWI6j+BUFVTzkn/OM9?= =?us-ascii?Q?m+66IaQjRuESYRgs1dJt5JQbT0aYje5pemtGGBLwACef/1TcQK9We9Uvc9Mn?= =?us-ascii?Q?GW05qYdeCO8bPP5okuNxrPwW6dYowq2vAcj1yOV+WmDB9GkL5choLfrhm84v?= =?us-ascii?Q?F4JDtriw9T0DuJ+SljooukY8yum18jttU9Bzf95jerMlZ7ZlxnGYiBdE5YY+?= =?us-ascii?Q?WVRlmzwNAq+CYYriXlU7+AiOzFLR/oX0THOehVM7i/6JpfPPtUjMB8zReTP5?= =?us-ascii?Q?Q8FXAEYDYmSwH18aXeBoo93iI8eZ30zz75vpAYBN/jIxUIL1HHx2PsA9I8C9?= =?us-ascii?Q?7MQCkgDLUjSPv1SP+G1EktMutCBmSyN/vpBppQX8H4yIVpIxkYvaXuAvVvdg?= =?us-ascii?Q?Bq7r4bpu0U2PHtrlSyrh5yEAHOb5C5wTIu2VVtUmHab4ci8DJ8NT3PNfUJMM?= =?us-ascii?Q?fg4EGj7rcYOAPZwpZKvwUjRDmZ2I/gb5mHah4YC/0OQDXC/1PEDElp11/YKc?= =?us-ascii?Q?O4iCdlcJaIrAMiXbUNPsBRPnB1qms7qymSWeavtSn9fLWJsbV81d0YUqjFSj?= =?us-ascii?Q?tBy60sS0Auq9gwWDN1kqmMz7hmu31B5Y4jtFCO+IGmq6f2yBdPPaqoYMquGK?= =?us-ascii?Q?SvcMwGQ+dfyxuT85Mho7WS3Wd4pevwHSFmhWNsqlM2/hLr2WlFLsAc6MiaER?= =?us-ascii?Q?Se+i+EQGs6F8jAsvPEisY6Qk8MsybBJ8U4PN7Y/Hb2JN+ZZXoMXrQstT/IJJ?= =?us-ascii?Q?7kmrcEN86qjixLDirpMg2SxIRDC/ADf50TtgYJtQZJAjuonqIQMtSJQ/bWQD?= =?us-ascii?Q?TuCN6msESqRzqIs0vrpQF+Pc6qHgNP3hKWtSEZM+Ats98AaCrfZZololL+O2?= =?us-ascii?Q?hYADTXTFTab/yvxmdQcWqMDgulQQc1vrzdrnFYjNWlqwavUyqBGQC4d1B6Fn?= =?us-ascii?Q?j87no2EnJDzjV06QH1bRVHg5eC8K5zHbjazPIsdFgoOxn9vJlA11aSJ1p+og?= =?us-ascii?Q?9iYXUHrXbJQzP4eUjTs32+VAptFbnr25qf/MXGiQhEi5EVS+3WmWZc7sCwYC?= =?us-ascii?Q?FA=3D=3D?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 064c2c8f-7d6f-45a4-ba93-08da969f11ae X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2022 22:18:36.8070 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: hh6G3RK8F5rSPr8UqkM5iYqyURrjVg9cLF5F2Lh6F1+WFt/5Fkxfg3Z4PgiiEGQqyns4wzC01qw7d16TP3kKpA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR10MB4390 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-14_09,2022-09-14_04,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 spamscore=0 adultscore=0 malwarescore=0 suspectscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2208220000 definitions=main-2209140108 X-Proofpoint-GUID: 1mm892UOSCJjKI1Q_BlwJwWFsjY6DIb7 X-Proofpoint-ORIG-GUID: 1mm892UOSCJjKI1Q_BlwJwWFsjY6DIb7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The new hugetlb vma lock is used to address this race: Faulting thread Unsharing thread ... ... ptep =3D huge_pte_offset() or ptep =3D huge_pte_alloc() ... i_mmap_lock_write lock page table ptep invalid <------------------------ huge_pmd_unshare() Could be in a previously unlock_page_table sharing process or worse i_mmap_unlock_write ... The vma_lock is used as follows: - During fault processing. The lock is acquired in read mode before doing a page table lock and allocation (huge_pte_alloc). The lock is held until code is finished with the page table entry (ptep). - The lock must be held in write mode whenever huge_pmd_unshare is called. Lock ordering issues come into play when unmapping a page from all vmas mapping the page. The i_mmap_rwsem must be held to search for the vmas, and the vma lock must be held before calling unmap which will call huge_pmd_unshare. This is done today in: - try_to_migrate_one and try_to_unmap_ for page migration and memory error handling. In these routines we 'try' to obtain the vma lock and fail to unmap if unsuccessful. Calling routines already deal with the failure of unmapping. - hugetlb_vmdelete_list for truncation and hole punch. This routine also tries to acquire the vma lock. If it fails, it skips the unmapping. However, we can not have file truncation or hole punch fail because of contention. After hugetlb_vmdelete_list, truncation and hole punch call remove_inode_hugepages. remove_inode_hugepages checks for mapped pages and call hugetlb_unmap_file_page to unmap them. hugetlb_unmap_file_page is designed to drop locks and reacquire in the correct order to guarantee unmap success. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 66 +++++++++++++++++++++++++++- mm/hugetlb.c | 102 +++++++++++++++++++++++++++++++++++++++---- mm/memory.c | 2 + mm/rmap.c | 100 +++++++++++++++++++++++++++--------------- mm/userfaultfd.c | 9 +++- 5 files changed, 233 insertions(+), 46 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 3bb1772fce2f..009ae539b9b2 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -434,6 +434,7 @@ static void hugetlb_unmap_file_folio(struct hstate *h, struct folio *folio, pgoff_t index) { struct rb_root_cached *root =3D &mapping->i_mmap; + struct hugetlb_vma_lock *vma_lock; struct page *page =3D &folio->page; struct vm_area_struct *vma; unsigned long v_start; @@ -444,7 +445,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, end =3D (index + 1) * pages_per_huge_page(h); =20 i_mmap_lock_write(mapping); - +retry: + vma_lock =3D NULL; vma_interval_tree_foreach(vma, root, start, end - 1) { v_start =3D vma_offset_start(vma, start); v_end =3D vma_offset_end(vma, end); @@ -452,11 +454,63 @@ static void hugetlb_unmap_file_folio(struct hstate *h, if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) continue; =20 + if (!hugetlb_vma_trylock_write(vma)) { + vma_lock =3D vma->vm_private_data; + /* + * If we can not get vma lock, we need to drop + * immap_sema and take locks in order. First, + * take a ref on the vma_lock structure so that + * we can be guaranteed it will not go away when + * dropping immap_sema. + */ + kref_get(&vma_lock->refs); + break; + } + unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, NULL, ZAP_FLAG_DROP_MARKER); + hugetlb_vma_unlock_write(vma); } =20 i_mmap_unlock_write(mapping); + + if (vma_lock) { + /* + * Wait on vma_lock. We know it is still valid as we have + * a reference. We must 'open code' vma locking as we do + * not know if vma_lock is still attached to vma. + */ + down_write(&vma_lock->rw_sema); + i_mmap_lock_write(mapping); + + vma =3D vma_lock->vma; + if (!vma) { + /* + * If lock is no longer attached to vma, then just + * unlock, drop our reference and retry looking for + * other vmas. + */ + up_write(&vma_lock->rw_sema); + kref_put(&vma_lock->refs, hugetlb_vma_lock_release); + goto retry; + } + + /* + * vma_lock is still attached to vma. Check to see if vma + * still maps page and if so, unmap. + */ + v_start =3D vma_offset_start(vma, start); + v_end =3D vma_offset_end(vma, end); + if (hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) + unmap_hugepage_range(vma, vma->vm_start + v_start, + v_end, NULL, + ZAP_FLAG_DROP_MARKER); + + kref_put(&vma_lock->refs, hugetlb_vma_lock_release); + hugetlb_vma_unlock_write(vma); + + goto retry; + } } =20 static void @@ -474,11 +528,21 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pg= off_t start, pgoff_t end, unsigned long v_start; unsigned long v_end; =20 + if (!hugetlb_vma_trylock_write(vma)) + continue; + v_start =3D vma_offset_start(vma, start); v_end =3D vma_offset_end(vma, end); =20 unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, NULL, zap_flags); + + /* + * Note that vma lock only exists for shared/non-private + * vmas. Therefore, lock is not held when calling + * unmap_hugepage_range for private vmas. + */ + hugetlb_vma_unlock_write(vma); } } =20 diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 616be891b798..e8cbc0f7cdaa 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4795,6 +4795,14 @@ int copy_hugetlb_page_range(struct mm_struct *dst, s= truct mm_struct *src, mmu_notifier_invalidate_range_start(&range); mmap_assert_write_locked(src); raw_write_seqcount_begin(&src->write_protect_seq); + } else { + /* + * For shared mappings the vma lock must be held before + * calling huge_pte_offset in the src vma. Otherwise, the + * returned ptep could go away if part of a shared pmd and + * another thread calls huge_pmd_unshare. + */ + hugetlb_vma_lock_read(src_vma); } =20 last_addr_mask =3D hugetlb_mask_last_page(h); @@ -4941,6 +4949,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, if (cow) { raw_write_seqcount_end(&src->write_protect_seq); mmu_notifier_invalidate_range_end(&range); + } else { + hugetlb_vma_unlock_read(src_vma); } =20 return ret; @@ -4999,6 +5009,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *v= ma, mmu_notifier_invalidate_range_start(&range); last_addr_mask =3D hugetlb_mask_last_page(h); /* Prevent race with file truncation */ + hugetlb_vma_lock_write(vma); i_mmap_lock_write(mapping); for (; old_addr < old_end; old_addr +=3D sz, new_addr +=3D sz) { src_pte =3D huge_pte_offset(mm, old_addr, sz); @@ -5030,6 +5041,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *v= ma, flush_tlb_range(vma, old_end - len, old_end); mmu_notifier_invalidate_range_end(&range); i_mmap_unlock_write(mapping); + hugetlb_vma_unlock_write(vma); =20 return len + old_addr - old_end; } @@ -5349,8 +5361,29 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, s= truct vm_area_struct *vma, * may get SIGKILLed if it later faults. */ if (outside_reserve) { + struct address_space *mapping =3D vma->vm_file->f_mapping; + pgoff_t idx; + u32 hash; + put_page(old_page); + /* + * Drop hugetlb_fault_mutex and vma_lock before + * unmapping. unmapping needs to hold vma_lock + * in write mode. Dropping vma_lock in read mode + * here is OK as COW mappings do not interact with + * PMD sharing. + * + * Reacquire both after unmap operation. + */ + idx =3D vma_hugecache_offset(h, vma, haddr); + hash =3D hugetlb_fault_mutex_hash(mapping, idx); + hugetlb_vma_unlock_read(vma); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + unmap_ref_private(mm, vma, old_page, haddr); + + mutex_lock(&hugetlb_fault_mutex_table[hash]); + hugetlb_vma_lock_read(vma); spin_lock(ptl); ptep =3D huge_pte_offset(mm, haddr, huge_page_size(h)); if (likely(ptep && @@ -5499,14 +5532,16 @@ static inline vm_fault_t hugetlb_handle_userfault(s= truct vm_area_struct *vma, }; =20 /* - * hugetlb_fault_mutex and i_mmap_rwsem must be + * vma_lock and hugetlb_fault_mutex must be * dropped before handling userfault. Reacquire * after handling fault to make calling code simpler. */ + hugetlb_vma_unlock_read(vma); hash =3D hugetlb_fault_mutex_hash(mapping, idx); mutex_unlock(&hugetlb_fault_mutex_table[hash]); ret =3D handle_userfault(&vmf, reason); mutex_lock(&hugetlb_fault_mutex_table[hash]); + hugetlb_vma_lock_read(vma); =20 return ret; } @@ -5740,6 +5775,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, =20 ptep =3D huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { + /* + * Since we hold no locks, ptep could be stale. That is + * OK as we are only making decisions based on content and + * not actually modifying content here. + */ entry =3D huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { migration_entry_wait_huge(vma, ptep); @@ -5747,23 +5787,35 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, stru= ct vm_area_struct *vma, } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); - } else { - ptep =3D huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); - if (!ptep) - return VM_FAULT_OOM; } =20 - mapping =3D vma->vm_file->f_mapping; - idx =3D vma_hugecache_offset(h, vma, haddr); - /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ + mapping =3D vma->vm_file->f_mapping; + idx =3D vma_hugecache_offset(h, vma, haddr); hash =3D hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); =20 + /* + * Acquire vma lock before calling huge_pte_alloc and hold + * until finished with ptep. This prevents huge_pmd_unshare from + * being called elsewhere and making the ptep no longer valid. + * + * ptep could have already be assigned via huge_pte_offset. That + * is OK, as huge_pte_alloc will return the same value unless + * something has changed. + */ + hugetlb_vma_lock_read(vma); + ptep =3D huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); + if (!ptep) { + hugetlb_vma_unlock_read(vma); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + return VM_FAULT_OOM; + } + entry =3D huge_ptep_get(ptep); /* PTE markers should be handled the same way as none pte */ if (huge_pte_none_mostly(entry)) { @@ -5824,6 +5876,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, unlock_page(pagecache_page); put_page(pagecache_page); } + hugetlb_vma_unlock_read(vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); return handle_userfault(&vmf, VM_UFFD_WP); } @@ -5867,6 +5920,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, put_page(pagecache_page); } out_mutex: + hugetlb_vma_unlock_read(vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); /* * Generally it's safe to hold refcount during waiting page lock. But @@ -6329,8 +6383,9 @@ unsigned long hugetlb_change_protection(struct vm_are= a_struct *vma, flush_cache_range(vma, range.start, range.end); =20 mmu_notifier_invalidate_range_start(&range); - last_addr_mask =3D hugetlb_mask_last_page(h); + hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); + last_addr_mask =3D hugetlb_mask_last_page(h); for (; address < end; address +=3D psize) { spinlock_t *ptl; ptep =3D huge_pte_offset(mm, address, psize); @@ -6429,6 +6484,7 @@ unsigned long hugetlb_change_protection(struct vm_are= a_struct *vma, * See Documentation/mm/mmu_notifier.rst */ i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); mmu_notifier_invalidate_range_end(&range); =20 return pages << h->order; @@ -6930,6 +6986,7 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_= area_struct *vma, pud_t *pud =3D pud_offset(p4d, addr); =20 i_mmap_assert_write_locked(vma->vm_file->f_mapping); + hugetlb_vma_assert_locked(vma); BUG_ON(page_count(virt_to_page(ptep)) =3D=3D 0); if (page_count(virt_to_page(ptep)) =3D=3D 1) return 0; @@ -6941,6 +6998,31 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm= _area_struct *vma, } =20 #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ +void hugetlb_vma_lock_read(struct vm_area_struct *vma) +{ +} + +void hugetlb_vma_unlock_read(struct vm_area_struct *vma) +{ +} + +void hugetlb_vma_lock_write(struct vm_area_struct *vma) +{ +} + +void hugetlb_vma_unlock_write(struct vm_area_struct *vma) +{ +} + +int hugetlb_vma_trylock_write(struct vm_area_struct *vma) +{ + return 1; +} + +void hugetlb_vma_assert_locked(struct vm_area_struct *vma) +{ +} + static void hugetlb_vma_lock_free(struct vm_area_struct *vma) { } @@ -7318,6 +7400,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *= vma) mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, start, end); mmu_notifier_invalidate_range_start(&range); + hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); for (address =3D start; address < end; address +=3D PUD_SIZE) { ptep =3D huge_pte_offset(mm, address, sz); @@ -7329,6 +7412,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *= vma) } flush_hugetlb_tlb_range(vma, start, end); i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); /* * No need to call mmu_notifier_invalidate_range(), see * Documentation/mm/mmu_notifier.rst. diff --git a/mm/memory.c b/mm/memory.c index c4c3c2fd4f45..118e5f023597 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1685,10 +1685,12 @@ static void unmap_single_vma(struct mmu_gather *tlb, if (vma->vm_file) { zap_flags_t zap_flags =3D details ? details->zap_flags : 0; + hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); __unmap_hugepage_range_final(tlb, vma, start, end, NULL, zap_flags); i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); } } else unmap_page_range(tlb, vma, start, end, details); diff --git a/mm/rmap.c b/mm/rmap.c index 744faaef0489..2ec925e5fa6a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1554,24 +1554,39 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, * To call huge_pmd_unshare, i_mmap_rwsem must be * held in write mode. Caller needs to explicitly * do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write mode. + * Lock order dictates acquiring vma_lock BEFORE + * i_mmap_rwsem. We can only try lock here and fail + * if unsuccessful. */ - VM_BUG_ON(!anon && !(flags & TTU_RMAP_LOCKED)); - if (!anon && huge_pmd_unshare(mm, vma, address, pvmw.pte)) { - flush_tlb_range(vma, range.start, range.end); - mmu_notifier_invalidate_range(mm, range.start, - range.end); - - /* - * The ref count of the PMD page was dropped - * which is part of the way map counting - * is done for shared PMDs. Return 'true' - * here. When there is no other sharing, - * huge_pmd_unshare returns false and we will - * unmap the actual page and drop map count - * to zero. - */ - page_vma_mapped_walk_done(&pvmw); - break; + if (!anon) { + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (!hugetlb_vma_trylock_write(vma)) { + page_vma_mapped_walk_done(&pvmw); + ret =3D false; + break; + } + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { + hugetlb_vma_unlock_write(vma); + flush_tlb_range(vma, + range.start, range.end); + mmu_notifier_invalidate_range(mm, + range.start, range.end); + /* + * The ref count of the PMD page was + * dropped which is part of the way map + * counting is done for shared PMDs. + * Return 'true' here. When there is + * no other sharing, huge_pmd_unshare + * returns false and we will unmap the + * actual page and drop map count + * to zero. + */ + page_vma_mapped_walk_done(&pvmw); + break; + } + hugetlb_vma_unlock_write(vma); } pteval =3D huge_ptep_clear_flush(vma, address, pvmw.pte); } else { @@ -1929,26 +1944,41 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, * To call huge_pmd_unshare, i_mmap_rwsem must be * held in write mode. Caller needs to explicitly * do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write mode. + * Lock order dictates acquiring vma_lock BEFORE + * i_mmap_rwsem. We can only try lock here and + * fail if unsuccessful. */ - VM_BUG_ON(!anon && !(flags & TTU_RMAP_LOCKED)); - if (!anon && huge_pmd_unshare(mm, vma, address, pvmw.pte)) { - flush_tlb_range(vma, range.start, range.end); - mmu_notifier_invalidate_range(mm, range.start, - range.end); - - /* - * The ref count of the PMD page was dropped - * which is part of the way map counting - * is done for shared PMDs. Return 'true' - * here. When there is no other sharing, - * huge_pmd_unshare returns false and we will - * unmap the actual page and drop map count - * to zero. - */ - page_vma_mapped_walk_done(&pvmw); - break; + if (!anon) { + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (!hugetlb_vma_trylock_write(vma)) { + page_vma_mapped_walk_done(&pvmw); + ret =3D false; + break; + } + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { + hugetlb_vma_unlock_write(vma); + flush_tlb_range(vma, + range.start, range.end); + mmu_notifier_invalidate_range(mm, + range.start, range.end); + + /* + * The ref count of the PMD page was + * dropped which is part of the way map + * counting is done for shared PMDs. + * Return 'true' here. When there is + * no other sharing, huge_pmd_unshare + * returns false and we will unmap the + * actual page and drop map count + * to zero. + */ + page_vma_mapped_walk_done(&pvmw); + break; + } + hugetlb_vma_unlock_write(vma); } - /* Nuke the hugetlb page table entry */ pteval =3D huge_ptep_clear_flush(vma, address, pvmw.pte); } else { diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 0fdbd2c05587..e24e8a47ce8a 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -379,16 +379,21 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb= (struct mm_struct *dst_mm, BUG_ON(dst_addr >=3D dst_start + len); =20 /* - * Serialize via hugetlb_fault_mutex. + * Serialize via vma_lock and hugetlb_fault_mutex. + * vma_lock ensures the dst_pte remains valid even + * in the case of shared pmds. fault mutex prevents + * races with other faulting threads. */ idx =3D linear_page_index(dst_vma, dst_addr); mapping =3D dst_vma->vm_file->f_mapping; hash =3D hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); + hugetlb_vma_lock_read(dst_vma); =20 err =3D -ENOMEM; dst_pte =3D huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { + hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } @@ -396,6 +401,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, if (mode !=3D MCOPY_ATOMIC_CONTINUE && !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err =3D -EEXIST; + hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } @@ -404,6 +410,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, dst_addr, src_addr, mode, &page, wp_copy); =20 + hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); =20 cond_resched(); --=20 2.37.2 From nobody Fri Apr 3 06:41:00 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D521ECAAD3 for ; Wed, 14 Sep 2022 22:21:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229491AbiINWVZ (ORCPT ); Wed, 14 Sep 2022 18:21:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229644AbiINWUu (ORCPT ); Wed, 14 Sep 2022 18:20:50 -0400 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFCCC80F6E for ; Wed, 14 Sep 2022 15:20:09 -0700 (PDT) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28EMAARs026129; Wed, 14 Sep 2022 22:18:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2022-7-12; bh=1Ip1504nE2rI5Mk0abFz7kLwYsrBv+Ysx7YZ81uNmX8=; b=uJZZLrnmYukpGEomRcMim6RLb3aHVVCFqbCAuhEETrj7jNDEMzzaS4l6gxCWfLE6HGNk pGRhFKxS8ngiJIgZ5ccer5MGyr03hrCeDX9pLCgHsvFMc+PQBlGP8sBynxJ55sVOuOKN ktGUGL7KrnrU8V+CUfMRWIMKM2edPzPewzWnSgrb7A4NZaKjDsDF+Yyp9uLsKuyb9lSa oD/yGHungRpHJtI4EXD4jdvnJH4BvuyAPwhZactRCpp3kW/mGrC/s/BwLRxCIDT67pjQ O0aVbql/8x0oCpiKRpcn07JkwcrzUr8vd+RiYu2lv1UxwYj4k16LgCuStb6FX98pOqze 2g== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3jjxypbqrb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:41 +0000 Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 28EJAjlb035442; Wed, 14 Sep 2022 22:18:40 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2168.outbound.protection.outlook.com [104.47.58.168]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3jjym095mr-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2022 22:18:40 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SRoooO7fE/ST+x289RSUQ9p4OZwYdZ/lGZg5258R/MhzVjG2Nc0uiGDBVJYMzQfgAjws68YAW5acrMzt3fNG91yR4KpXLxsjVjYTMASaWpegm+ApdAm8T4zQQfo9trLxoltc2ooEUIYE3Rj3qqPBmUvyIfs3U2lrez5onIJVmZJK/DJez6MO6YNZOFDwsmBBU8IqX37CDwgUAEi2Shr4SJzo7fC1cSLe3hHcey95LF9n+M8h1U9nrYYopcnM6WxZU6iCTLp+r1NnwrndsarPnTROCfd2HxNE0ot+SrtlUG5KitNfH3cco18PPaZjjAV0MDdjDAFvcyb0Q1k6TAxO2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1Ip1504nE2rI5Mk0abFz7kLwYsrBv+Ysx7YZ81uNmX8=; b=LlbwGiuFT4N8Y/w2joC8RV/j0WoGKynoJagxRFhb0J/SQHCNQZnqD+gkrLeh4nBRUfcwHqjD0EkqWm5yE+t0m0BB5MeyQ+hnsqW+hRuuF8N2MG4nXUoamora8YAqmDodqlST2MHhZizaTTKTIXb4ECF5snBPnSf6jCjRYGw+m9X25NLBOnfV6dfa5tIt0cMf2xu6v31dazzm7RI/YOQ237MQmEnUfzWnQ14sqq8ExSy8T6FbtRqugWA1byzVd0mDjoQsGtFS0Bykc9+4BaKvfO5WSfwB0e8pEBYkqlwESOtXGC5iKpnt+sD/uvokDPxEfPZKB9D5mX9MZj97kfHBlw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1Ip1504nE2rI5Mk0abFz7kLwYsrBv+Ysx7YZ81uNmX8=; b=b3fkDfIN38SrB0Mb/YQVaeUGsoneSBmthJcuJ4Of3LBLOY5eDQxKnBSS354ZcrvvXdi5unwD8GdcK2PvJKz9aBe0edvln0446DJrzFntCkiQdCAKrAVJ4Uqmg1CJGlQLZYcrekMOcr8oIHuSMY8IP8rUYU2s2xYFEmufTM0zuMI= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by CH2PR10MB4390.namprd10.prod.outlook.com (2603:10b6:610:af::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.22; Wed, 14 Sep 2022 22:18:39 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::e9d2:a804:e53a:779a%6]) with mapi id 15.20.5612.022; Wed, 14 Sep 2022 22:18:39 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Miaohe Lin , David Hildenbrand , Sven Schnelle , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [PATCH v2 9/9] hugetlb: clean up code checking for fault/truncation races Date: Wed, 14 Sep 2022 15:18:10 -0700 Message-Id: <20220914221810.95771-10-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220914221810.95771-1-mike.kravetz@oracle.com> References: <20220914221810.95771-1-mike.kravetz@oracle.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: MW4P222CA0008.NAMP222.PROD.OUTLOOK.COM (2603:10b6:303:114::13) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BY5PR10MB4196:EE_|CH2PR10MB4390:EE_ X-MS-Office365-Filtering-Correlation-Id: 1e7743fd-5bd0-4c36-1b01-08da969f130f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: PTFLaIG2WlgsxGmKzQac3tPw/zJzljptv29fCCCGc03Btw76ntnzuSBBjDvZgRkQyiriUyEfrWKBOhb/2UFGHSTyJONKCxYZVKJ03kJvPAwpOhk1eEJrgk/yHKUk9hwwNkwOY2h48r4g0XUUmucpLXZeOLqZl+pCC0ZWw4IZdcF2ZwgY8SjUXOgES4bT/G0jLjzgGtEM8DpyduXpLpFuC08BmZM61nhtLwCQvjW8BM4Bz6jk2vQPtnaKloIKUobZx/8hxoOBCByfxUvu4v659W7ZRcq42ZjghOdStYM4RxPuN6M13YHh0LcWHqGYZvtY6hyt1v9F3tqu8r12hF7fLtWfgmhlzOBWicbmkQzYEoVu5MWyH0EP5aCYBK3UFfLGk+XQPrKm3XKtp1zVmsn4XvfDWzKbeSZu0iikhm3s70DUNaNrZ66bZfzFFCjzmgdz6cY/vKFMaNm8WZQbsK8Sjkjgsl8DHUub8N0+1U6HPHmJ8l1Ns3AvRj+GgChTb5IIEyG94CtCy1JEumLziz0ws6ZlhgAmD+DP7J6U+J58TtkaJkhDLiZX12svKF9t4ddhGmsQhpDcb4cdzZk9Dn9I4eEHMEv7inwWjlfrHyhAj47wNB/ApSyfwjnw+hHQjRieUKiMuyXWf936Mr57/Y2WLsRupXzpZY9a7B+/UEFK/7V62neE+XrZ1+p1r3Le/jlO6CIS20X2UVVXdOuV2dzWZg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(366004)(376002)(346002)(396003)(39860400002)(136003)(451199015)(6512007)(7416002)(6666004)(86362001)(66556008)(36756003)(8676002)(107886003)(83380400001)(6486002)(66946007)(54906003)(44832011)(66476007)(8936002)(5660300002)(316002)(41300700001)(6506007)(478600001)(186003)(2906002)(38100700002)(26005)(4326008)(1076003)(2616005);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?8r9+KVXHc3zgim8RH2NLp1PfvqiGZHIBjFW03EbQUv65Hgr/4BsSv1vp/03O?= =?us-ascii?Q?ice7b0M+tDD1SXJ7MPQUrHjoJ1I41hr9sFMXoSD4R428lXm31VN5uo9A3WcP?= =?us-ascii?Q?Oo/0W3tlH9KQ0oZTpMxF1qCeCkUQKytYmVa9c2+uwXHhYz1weQ5W+fX4fhDN?= =?us-ascii?Q?o1IhOxoTR97YzpXHFaM5rSSUAMbdz5PMHZ/Yj2+ivSCwUhb5wGiGVI+5jNFK?= =?us-ascii?Q?2Nj0jLP/ogcg9NBdj6S4TtNoRGr2pRhOKncH7V+kxDhKgO7IkXKc9M76ORZP?= =?us-ascii?Q?bNvSqzULQ5Qt3lME+DNe+UBAF1vhHK3mBj2kOcyjaIipx9n1feN4HwKseXZ2?= =?us-ascii?Q?Yfa7EUQ0O6F7O8KRsKE5cNVbmBq+KmmG3vvd4EoNK5PLqMnLOkJzQXTmz8Q4?= =?us-ascii?Q?j8+Kh67DD3bz4+DqYJ4yHHS+YNFd2eM4uRdIsDFMxcr3T36deFQdaUvmdn/7?= =?us-ascii?Q?lC42NEL0pjERfHc1l5ezzaUdHR+HAAIEsDBLgoOApPqffkFkrOK2NIiqQi4J?= =?us-ascii?Q?lj/BcNcRJ5NjLEWcR5uhoHdbHBzA6GAF+2WuemVUkf3TyN3bdq90My5gQvMc?= =?us-ascii?Q?0lsN/QPZj5OQDVTSANCFIzo6dE5MAwXWp25FntlSMt8vMYfRPjrdF94PtK6q?= =?us-ascii?Q?XYTQSCANNHzpvdLgJcj20FjRwDzKx8z6fACzBAtJWj/hD3t/Uni1q4Jlqw1J?= =?us-ascii?Q?Fut2gPMLlr1xutTAp9ljrJvsSrYxVixC3+xkMJf8ZrIl+OgWsJeHmd/uu1IH?= =?us-ascii?Q?WLX0qX9TlKJQmKXNXYW8I1mDLs61Toyrg0irYxyrI26dhVeHuG1vJ3SMD0M3?= =?us-ascii?Q?aeUgNCOrpGo8kfwV06mB7syPU5dEZrVoB5itM0Nx5gETecOczU1deguewamb?= =?us-ascii?Q?UCvkBkDhFixU/3QbVnCMBqbyXgbVnoyADcb7ujKlJ9yZANQ5Fiy1FV585irO?= =?us-ascii?Q?WvVEWg3L0J/a8vuFfKBqyKWJ94fhFOpeIMWPC3NK8uJesFwrxB9cGIcaZEDG?= =?us-ascii?Q?vEDUz7phepkM8A5dGM23FwKMUp/1ViHzSZlw3UaHLB9BGszK7wMNa1JCfD5u?= =?us-ascii?Q?drHbSoJWqpr4k87nkNmQbFnCKHWxu7i/f2LWkjD4LWck5/uZE/ZA8uYOETVh?= =?us-ascii?Q?EDLUygF5zyUWgT4RqCmJN6FEdac+goo7GE3pGDsY9FvExRP7LLmAref3wGbf?= =?us-ascii?Q?DWf1736dysjoRP5UruLtqSKLR7IpLLVZxaaBVrrexsp/DqqhjlTXXgEiwFB8?= =?us-ascii?Q?ZijXChcdTu8fhCP2XmF6BpnnazUxkWOKTtFeh8BDEyuuPVpwjLNSqO8roJEH?= =?us-ascii?Q?Bv8OCRMaBtWgMXj0+4j/82fl2XF4vNZgxpLZ0bnHqsDqfDq0E4riRvrcTtIn?= =?us-ascii?Q?vm2n7mrkmiDgdOhDsKCxibSlWf0tJN4b3/06HuPxUDqc2i3Cf0QMs5jqQcww?= =?us-ascii?Q?zhloP+OeQx/dJI+EMQD75hBrEe8QUax3qwKeFmPZQ7V8SaGeXBknyBrCUuA0?= =?us-ascii?Q?720FYkU/Zu3dja9gZnlrplhhVvFWAgmusy/biCFk0Zx7U07JG5OaosPSK1uf?= =?us-ascii?Q?1I2CwoddnekCtOLQYNS7Fu5czJKT1cwWrMIbCbKr8fSy5orUQkRM5BbGD4Cd?= =?us-ascii?Q?Ew=3D=3D?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 1e7743fd-5bd0-4c36-1b01-08da969f130f X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2022 22:18:39.1007 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: cLu97R1moiteSWs3Bor2WiRp5gSM8bAmnqM2xvCQ3mjdzz2zQeVTBvkXUdCDOdTIYRUbmY6eM4YiOvxovkA7xA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR10MB4390 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-14_09,2022-09-14_04,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 spamscore=0 adultscore=0 malwarescore=0 suspectscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2208220000 definitions=main-2209140108 X-Proofpoint-ORIG-GUID: uuq0JspoIzDwcGOp_ylP0s1lJboeLOqg X-Proofpoint-GUID: uuq0JspoIzDwcGOp_ylP0s1lJboeLOqg Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" With the new hugetlb vma lock in place, it can also be used to handle page fault races with file truncation. The lock is taken at the beginning of the code fault path in read mode. During truncation, it is taken in write mode for each vma which has the file mapped. The file's size (i_size) is modified before taking the vma lock to unmap. How are races handled? The page fault code checks i_size early in processing after taking the vma lock. If the fault is beyond i_size, the fault is aborted. If the fault is not beyond i_size the fault will continue and a new page will be added to the file. It could be that truncation code modifies i_size after the check in fault code. That is OK, as truncation code will soon remove the page. The truncation code will wait until the fault is finished, as it must obtain the vma lock in write mode. This patch cleans up/removes late checks in the fault paths that try to back out pages racing with truncation. As noted above, we just let the truncation code remove the pages. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 31 ++++++++++++------------------- mm/hugetlb.c | 27 ++++++--------------------- 2 files changed, 18 insertions(+), 40 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 009ae539b9b2..ed57a029eab0 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -568,26 +568,19 @@ static bool remove_inode_single_folio(struct hstate *= h, struct inode *inode, =20 folio_lock(folio); /* - * After locking page, make sure mapping is the same. - * We could have raced with page fault populate and - * backout code. + * We must remove the folio from page cache before removing + * the region/ reserve map (hugetlb_unreserve_pages). In + * rare out of memory conditions, removal of the region/reserve + * map could fail. Correspondingly, the subpool and global + * reserve usage count can need to be adjusted. */ - if (folio_mapping(folio) =3D=3D mapping) { - /* - * We must remove the folio from page cache before removing - * the region/ reserve map (hugetlb_unreserve_pages). In - * rare out of memory conditions, removal of the region/reserve - * map could fail. Correspondingly, the subpool and global - * reserve usage count can need to be adjusted. - */ - VM_BUG_ON(HPageRestoreReserve(&folio->page)); - hugetlb_delete_from_page_cache(&folio->page); - ret =3D true; - if (!truncate_op) { - if (unlikely(hugetlb_unreserve_pages(inode, index, - index + 1, 1))) - hugetlb_fix_reserve_counts(inode); - } + VM_BUG_ON(HPageRestoreReserve(&folio->page)); + hugetlb_delete_from_page_cache(&folio->page); + ret =3D true; + if (!truncate_op) { + if (unlikely(hugetlb_unreserve_pages(inode, index, + index + 1, 1))) + hugetlb_fix_reserve_counts(inode); } =20 folio_unlock(folio); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e8cbc0f7cdaa..2207300791e5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5561,6 +5561,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *m= m, spinlock_t *ptl; unsigned long haddr =3D address & huge_page_mask(h); bool new_page, new_pagecache_page =3D false; + bool reserve_alloc =3D false; =20 /* * Currently, we are forced to kill the process in the event the @@ -5616,6 +5617,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *m= m, clear_huge_page(page, address, pages_per_huge_page(h)); __SetPageUptodate(page); new_page =3D true; + if (HPageRestoreReserve(page)) + reserve_alloc =3D true; =20 if (vma->vm_flags & VM_MAYSHARE) { int err =3D hugetlb_add_to_page_cache(page, mapping, idx); @@ -5679,10 +5682,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *= mm, } =20 ptl =3D huge_pte_lock(h, mm, ptep); - size =3D i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >=3D size) - goto backout; - ret =3D 0; /* If pte changed from under us, retry */ if (!pte_same(huge_ptep_get(ptep), old_pte)) @@ -5726,10 +5725,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct = *mm, backout: spin_unlock(ptl); backout_unlocked: - unlock_page(page); - /* restore reserve for newly allocated pages not in page cache */ if (new_page && !new_pagecache_page) restore_reserve_on_error(h, vma, haddr, page); + + unlock_page(page); put_page(page); goto out; } @@ -6061,26 +6060,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_= mm, =20 ptl =3D huge_pte_lock(h, dst_mm, dst_pte); =20 - /* - * Recheck the i_size after holding PT lock to make sure not - * to leave any page mapped (as page_mapped()) beyond the end - * of the i_size (remove_inode_hugepages() is strict about - * enforcing that). If we bail out here, we'll also leave a - * page in the radix tree in the vm_shared case beyond the end - * of the i_size, but remove_inode_hugepages() will take care - * of it as soon as we drop the hugetlb_fault_mutex_table. - */ - size =3D i_size_read(mapping->host) >> huge_page_shift(h); - ret =3D -EFAULT; - if (idx >=3D size) - goto out_release_unlock; - - ret =3D -EEXIST; /* * We allow to overwrite a pte marker: consider when both MISSING|WP * registered, we firstly wr-protect a none pte which has no page cache * page backing it, then access the page. */ + ret =3D -EEXIST; if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) goto out_release_unlock; =20 --=20 2.37.2