From nobody Fri Dec 19 19:15:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35AC3204C36 for ; Wed, 8 Jan 2025 23:33:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379191; cv=none; b=R1MxeILscLg/T9vIW1BI8Y6zMvjvdOQg13SZj/63UhmXlCRHl/f5+uJ0H0Z6EfFDX3TqmLe2Trkc6SkpZBlnHZfJuu8xkd4gNa6i0gq9v9o16zkrFO8ClPtm+JWsq+nWQvaqVy4mUD1CPqzXfbhzvni80ONhGLj9LmI95kJPhUw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379191; c=relaxed/simple; bh=XTzAqLJrSXCTRCf8A/BA86mwOeFhzb57OfrE2FQmSYQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=d2PIYLkgWen8fqMvZXQ8H+J/H3Z1ECxJw0TYMgAzmGjNNfFbFM40qewAyr0nv43wLHtEG2GC+2wRvbUz/cdsnVbjJHneJ/Al01Sf8K08THodwWElNXOfNvYshCb5BRN65iOeybUSH4SJFUnf5M3L1/dJSx4QesD0q5tpWWBJvcM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=K86bdoEB; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="K86bdoEB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379189; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x+jqVU9ZBjbAss1pYtlcKhjJs4UO6QLqbD/SlLsG+XQ=; b=K86bdoEBJj0HidciY4VdVuADdqLNVu4iziM5sGEaE7kOFT6DCFzY9/SnmRjhxhmmOiLrI+ UrAx2AgredOHejiok/TYEEdnSvhWAum6+OQ9L3FOnNpsFnFQ2to8RU+Ul5bpdffOe7cIax FeDPZFq0c2Idgi3g9HCAHDAHSfVuQN4= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-653-xPwzlNi_O7akFagKf4tyXA-1; Wed, 08 Jan 2025 18:33:05 -0500 X-MC-Unique: xPwzlNi_O7akFagKf4tyXA-1 X-Mimecast-MFC-AGG-ID: xPwzlNi_O7akFagKf4tyXA Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1CFC519560B8; Wed, 8 Jan 2025 23:33:01 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 85E1419560B7; Wed, 8 Jan 2025 23:32:52 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 01/11] introduce khugepaged_collapse_single_pmd to collapse a single pmd Date: Wed, 8 Jan 2025 16:31:17 -0700 Message-ID: <20250108233128.14484-2-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" The khugepaged daemon and madvise_collapse have two different implementations that do almost the thing. Create khugepaged_collapse_single_pmd to increase code reuse and create a entry point for future khugepaged changes. Signed-off-by: Nico Pache --- mm/khugepaged.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 653dbb1ff05c..4d932839ff1d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2348,6 +2348,52 @@ static int hpage_collapse_scan_file(struct mm_struct= *mm, unsigned long addr, } #endif =20 +/* + * Try to collapse a single PMD starting at a PMD aligned addr, and return + * the results. + */ +static int khugepaged_collapse_single_pmd(unsigned long addr, struct mm_st= ruct *mm, + struct vm_area_struct *vma, bool *mmap_locked, + struct collapse_control *cc) +{ + int result =3D SCAN_FAIL; + unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; + + if (!*mmap_locked) { + mmap_read_lock(mm); + *mmap_locked =3D true; + } + + if (thp_vma_allowable_order(vma, vma->vm_flags, + tva_flags, PMD_ORDER)) { + if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) { + struct file *file =3D get_file(vma->vm_file); + pgoff_t pgoff =3D linear_page_index(vma, addr); + + mmap_read_unlock(mm); + *mmap_locked =3D false; + result =3D hpage_collapse_scan_file(mm, addr, file, pgoff, + cc); + fput(file); + if (result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_read_lock(mm); + if (hpage_collapse_test_exit_or_disable(mm)) + goto end; + result =3D collapse_pte_mapped_thp(mm, addr, + !cc->is_khugepaged); + mmap_read_unlock(mm); + } + } else { + result =3D hpage_collapse_scan_pmd(mm, vma, addr, + mmap_locked, cc); + } + if (result =3D=3D SCAN_SUCCEED || result =3D=3D SCAN_PMD_MAPPED) + ++khugepaged_pages_collapsed; + } +end: + return result; +} + static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *resul= t, struct collapse_control *cc) __releases(&khugepaged_mm_lock) --=20 2.47.1 From nobody Fri Dec 19 19:15:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C74692054EC for ; Wed, 8 Jan 2025 23:33:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379203; cv=none; b=YfRUZr4V1c73/sIIcov0srtzUfvWP4tNEu7e3SCnhIPgaL5b4uIoJ+paqqIU2cuDetQSv6WK/xnq1dLYFuWvKVK0d6fSIrvL3toWRrA5wahRac/neWYzj/3z6/95ZiLnYS9S/Qgq1RmXSDItVsqESmJ5PaMMXznQLkUs1Vet0sM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379203; c=relaxed/simple; bh=2r93zcfaJyD9LFnvvqL1RpW0W0tytoOpvgCbUoFZd7E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dew5Jk7fuOohXBba+Jx812JVTAMe03DACY41m5NjXz1Nxfzv0q3nFexcjHhCGGDoHZTnaeFoPJJh6CCyCvJ4PUcl5JIm7kW11Unp2Yrl+/V5r3coe3yDsuJQYDUg+hHZwQBA8jvv+4f6OEyurBFm6+fnICEjZDOHrXiH9Gh0BSI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZjhxzD/b; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZjhxzD/b" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379200; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IAvDdT6ZFWIOmFCYflWY6gP5N4T+u2KYNhwxXC129qU=; b=ZjhxzD/bWCzJDxELYWd/QyTgknQhH7tNQEJTf3as2PypBbIEXStg9Eb4y+OwhgbCVOVPdX QS48w1ibcFNBWzjG2HDzQUQ0Mpti6ttvQZ8WcHHiWHrB9peMzGdUQxc2dxHQsSz20iwcjA /4P1hO44SOHhlJQeoQi557vxKg+h8/A= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-573-3YXL2rIHONerU6TIQLXORA-1; Wed, 08 Jan 2025 18:33:16 -0500 X-MC-Unique: 3YXL2rIHONerU6TIQLXORA-1 X-Mimecast-MFC-AGG-ID: 3YXL2rIHONerU6TIQLXORA Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D22C41956083; Wed, 8 Jan 2025 23:33:11 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 796C119560AE; Wed, 8 Jan 2025 23:33:01 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 02/11] khugepaged: refactor madvise_collapse and khugepaged_scan_mm_slot Date: Wed, 8 Jan 2025 16:31:18 -0700 Message-ID: <20250108233128.14484-3-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" Now that we have a khugepaged_collapse_single_pmd, lets use that code in madvise_collapse and khugepaged_scan_mm_slot to create a single entry point. Signed-off-by: Nico Pache --- mm/khugepaged.c | 50 ++++--------------------------------------------- 1 file changed, 4 insertions(+), 46 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4d932839ff1d..ba85a8fcee88 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2468,33 +2468,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned= int pages, int *result, VM_BUG_ON(khugepaged_scan.address < hstart || khugepaged_scan.address + HPAGE_PMD_SIZE > hend); - if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) { - struct file *file =3D get_file(vma->vm_file); - pgoff_t pgoff =3D linear_page_index(vma, - khugepaged_scan.address); =20 - mmap_read_unlock(mm); - mmap_locked =3D false; - *result =3D hpage_collapse_scan_file(mm, - khugepaged_scan.address, file, pgoff, cc); - fput(file); - if (*result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { - mmap_read_lock(mm); - if (hpage_collapse_test_exit_or_disable(mm)) - goto breakouterloop; - *result =3D collapse_pte_mapped_thp(mm, - khugepaged_scan.address, false); - if (*result =3D=3D SCAN_PMD_MAPPED) - *result =3D SCAN_SUCCEED; - mmap_read_unlock(mm); - } - } else { - *result =3D hpage_collapse_scan_pmd(mm, vma, - khugepaged_scan.address, &mmap_locked, cc); - } - - if (*result =3D=3D SCAN_SUCCEED) - ++khugepaged_pages_collapsed; + *result =3D khugepaged_collapse_single_pmd(khugepaged_scan.address, + mm, vma, &mmap_locked, cc); =20 /* move to next address */ khugepaged_scan.address +=3D HPAGE_PMD_SIZE; @@ -2814,36 +2790,18 @@ int madvise_collapse(struct vm_area_struct *vma, st= ruct vm_area_struct **prev, mmap_assert_locked(mm); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); - if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) { - struct file *file =3D get_file(vma->vm_file); - pgoff_t pgoff =3D linear_page_index(vma, addr); =20 - mmap_read_unlock(mm); - mmap_locked =3D false; - result =3D hpage_collapse_scan_file(mm, addr, file, pgoff, - cc); - fput(file); - } else { - result =3D hpage_collapse_scan_pmd(mm, vma, addr, - &mmap_locked, cc); - } + result =3D khugepaged_collapse_single_pmd(addr, mm, vma, &mmap_locked, c= c); + if (!mmap_locked) *prev =3D NULL; /* Tell caller we dropped mmap_lock */ =20 -handle_result: switch (result) { case SCAN_SUCCEED: case SCAN_PMD_MAPPED: ++thps; break; case SCAN_PTE_MAPPED_HUGEPAGE: - BUG_ON(mmap_locked); - BUG_ON(*prev); - mmap_read_lock(mm); - result =3D collapse_pte_mapped_thp(mm, addr, true); - mmap_read_unlock(mm); - goto handle_result; - /* Whitelisted set of results where continuing OK */ case SCAN_PMD_NULL: case SCAN_PTE_NON_PRESENT: case SCAN_PTE_UFFD_WP: --=20 2.47.1 From nobody Fri Dec 19 19:15:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E04A204F80 for ; Wed, 8 Jan 2025 23:33:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379213; cv=none; b=GpxtJj8Rmik60HRmw2eWTg3VDyXTw3ZUnigz3/+Q8oF/W/YZV6Lzg3Mc6rUN/UpsDYa3LLmsacXm5eEMSkbarTkZKACj0Eid+KCDZ5Bj58pGCQUZZpR6TgoX3QvNgCsNQm4vt4fRXbSgThs6g3zac/KCrYyHzHNluK1yhV2qb7U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379213; c=relaxed/simple; bh=fNkGBIqJ9qIORN1Lf1JT8E7r97Bj8+Kl47v9jfAFSts=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dtg0UtSn412+8ukzRCCV0n9liUDyKG7NpnyyH5qvOVeshAaid+l/xiF/K4saCYCcPlIio1dHwnLqEcJxBOH+F6WP061m9pIsEiaytmr5VCTdpR6iGy8N5Xq6mEWpQKRfQiXtUbHLJDvEUGsUHtJNu8XRcPuPajVtADaxQeNo9RY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Aglp1BMU; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Aglp1BMU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379211; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T3A+lzu2HXugNjY9ipPnv1YBHYwcAbMPAnaFgxhpBsc=; b=Aglp1BMUCzMS1BZxbEfFZHVB1x3S+se5JaKLbL2ig1Uhrwn4Mrv+3JskqARx1kO+jn5QpZ mGBNSlVRk1+pMzM6jc1vbxM8WCH0h7Sul495elD2QbpbXeQ+sND63/lzl50jKAW5lQ4AsF fm1yKxUlnRRF12pTH36K+lQQOXg9TGw= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-639-lr8fwzThMiy_-N9z-nI7IQ-1; Wed, 08 Jan 2025 18:33:25 -0500 X-MC-Unique: lr8fwzThMiy_-N9z-nI7IQ-1 X-Mimecast-MFC-AGG-ID: lr8fwzThMiy_-N9z-nI7IQ Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 60E741944B2E; Wed, 8 Jan 2025 23:33:21 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2789F19560B7; Wed, 8 Jan 2025 23:33:11 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 03/11] khugepaged: Don't allocate khugepaged mm_slot early Date: Wed, 8 Jan 2025 16:31:19 -0700 Message-ID: <20250108233128.14484-4-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" We should only "enter"/allocate the khugepaged mm_slot if we succeed at allocating the PMD sized folio. Move the khugepaged_enter_vma call until after we know the vma_alloc_folio was successful. Signed-off-by: Nico Pache --- mm/huge_memory.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e53d83b3e5cf..635c65e7ef63 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1323,7 +1323,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault= *vmf) ret =3D vmf_anon_prepare(vmf); if (ret) return ret; - khugepaged_enter_vma(vma, vma->vm_flags); =20 if (!(vmf->flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(vma->vm_mm) && @@ -1365,7 +1364,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault= *vmf) } return ret; } - + khugepaged_enter_vma(vma, vma->vm_flags); return __do_huge_pmd_anonymous_page(vmf); } =20 --=20 2.47.1 From nobody Fri Dec 19 19:15:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1AB3204F6E for ; Wed, 8 Jan 2025 23:33:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379222; cv=none; b=B4KMv0+zRY3TYG27tlKS396FGkpDAzYnJiqCSX5oUgcFzTHUZ512Um9maFsLMG87vptknlcPEjNUnRDDz8tnOHFOx6Ctq+O4dvFxecNAhGnkLZtNDpWcJHMuhg+Az7Q3Z53cCaxvPu4nYzm31yYv9l1KggvdjZsleH5eW/0M8Ds= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379222; c=relaxed/simple; bh=ak4M+LeI3D+PE9DkO5Kq7QlaoJx/lWuwMgOlpTvx4sU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OffP+5KOmudXt1oqTr9zOt1OBIQfqq8YjqDCJjPFBCRSZyFqAK+zsg4QyeFmSi2hJoBwIzYZk1uHpBJbPjKJbWodBOGpDKS23z/+O6qm6yAbfukxsmD/jiW4k0KtJy42Js0s2kvnwo8U9ttMCOCPrGLS8lfaDxQzOHD8Owxmz1Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=KxGFuvD3; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KxGFuvD3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379219; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AcvndsoaXEpPgsEl+w8HgcKpOFH1iO8o+01JlrF6xsM=; b=KxGFuvD3+RMNBXslx384yxY2G90Q4CTjMjI3v0AvGA8T7ouGI4tbr05R7rFfwtzM4P1SH8 apSqjxWIX7+2B+Fe/DxhhooFfe1UdOIFOD14411XfbAQS+b9qLVnsbSFEYuJzmEo7Zr6V3 Z31nKulxsl5yAPtVaP4/MU/FRtkylxs= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-346-Ayib-qL_N4OIqVTG75R-HQ-1; Wed, 08 Jan 2025 18:33:35 -0500 X-MC-Unique: Ayib-qL_N4OIqVTG75R-HQ-1 X-Mimecast-MFC-AGG-ID: Ayib-qL_N4OIqVTG75R-HQ Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7B34919560BB; Wed, 8 Jan 2025 23:33:30 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BCF7019560AE; Wed, 8 Jan 2025 23:33:21 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 04/11] khugepaged: rename hpage_collapse_* to khugepaged_* Date: Wed, 8 Jan 2025 16:31:20 -0700 Message-ID: <20250108233128.14484-5-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" functions in khugepaged.c use a mix of hpage_collapse and khugepaged as the function prefix. rename all of them to khugepaged to keep things consistent and slightly shorten the function names. Signed-off-by: Nico Pache --- mm/khugepaged.c | 52 ++++++++++++++++++++++++------------------------- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ba85a8fcee88..90de49d11a98 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -402,14 +402,14 @@ void __init khugepaged_destroy(void) kmem_cache_destroy(mm_slot_cache); } =20 -static inline int hpage_collapse_test_exit(struct mm_struct *mm) +static inline int khugepaged_test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) =3D=3D 0; } =20 -static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) +static inline int khugepaged_test_exit_or_disable(struct mm_struct *mm) { - return hpage_collapse_test_exit(mm) || + return khugepaged_test_exit(mm) || test_bit(MMF_DISABLE_THP, &mm->flags); } =20 @@ -444,7 +444,7 @@ void __khugepaged_enter(struct mm_struct *mm) int wakeup; =20 /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm); + VM_BUG_ON_MM(khugepaged_test_exit(mm), mm); if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) return; =20 @@ -503,7 +503,7 @@ void __khugepaged_exit(struct mm_struct *mm) } else if (mm_slot) { /* * This is required to serialize against - * hpage_collapse_test_exit() (which is guaranteed to run + * khugepaged_test_exit() (which is guaranteed to run * under mmap sem read mode). Stop here (after we return all * pagetables will be destroyed) until khugepaged has finished * working on the pagetables under the mmap_lock. @@ -606,7 +606,7 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, folio =3D page_folio(page); VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio); =20 - /* See hpage_collapse_scan_pmd(). */ + /* See khugepaged_scan_pmd(). */ if (folio_likely_mapped_shared(folio)) { ++shared; if (cc->is_khugepaged && @@ -851,7 +851,7 @@ struct collapse_control khugepaged_collapse_control =3D= { .is_khugepaged =3D true, }; =20 -static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc) +static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) { int i; =20 @@ -886,7 +886,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(v= oid) } =20 #ifdef CONFIG_NUMA -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int khugepaged_find_target_node(struct collapse_control *cc) { int nid, target_node =3D 0, max_value =3D 0; =20 @@ -905,7 +905,7 @@ static int hpage_collapse_find_target_node(struct colla= pse_control *cc) return target_node; } #else -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int khugepaged_find_target_node(struct collapse_control *cc) { return 0; } @@ -925,7 +925,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, struct vm_area_struct *vma; unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; =20 - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) return SCAN_ANY_PROCESS; =20 *vmap =3D vma =3D find_vma(mm, address); @@ -988,7 +988,7 @@ static int check_pmd_still_valid(struct mm_struct *mm, =20 /* * Bring missing pages in from swap, to complete THP collapse. - * Only done if hpage_collapse_scan_pmd believes it is worthwhile. + * Only done if khugepaged_scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held. * Returns result: if not SCAN_SUCCEED, mmap_lock has been released. @@ -1074,7 +1074,7 @@ static int alloc_charge_folio(struct folio **foliop, = struct mm_struct *mm, { gfp_t gfp =3D (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); - int node =3D hpage_collapse_find_target_node(cc); + int node =3D khugepaged_find_target_node(cc); struct folio *folio; =20 folio =3D __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); @@ -1260,7 +1260,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, return result; } =20 -static int hpage_collapse_scan_pmd(struct mm_struct *mm, +static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, struct collapse_control *cc) @@ -1376,7 +1376,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *= mm, * hit record. */ node =3D folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (khugepaged_scan_abort(node, cc)) { result =3D SCAN_SCAN_ABORT; goto out_unmap; } @@ -1445,7 +1445,7 @@ static void collect_mm_slot(struct khugepaged_mm_slot= *mm_slot) =20 lockdep_assert_held(&khugepaged_mm_lock); =20 - if (hpage_collapse_test_exit(mm)) { + if (khugepaged_test_exit(mm)) { /* free mm_slot */ hash_del(&slot->hash); list_del(&slot->mm_node); @@ -1740,7 +1740,7 @@ static void retract_page_tables(struct address_space = *mapping, pgoff_t pgoff) if (find_pmd_or_thp_or_none(mm, addr, &pmd) !=3D SCAN_SUCCEED) continue; =20 - if (hpage_collapse_test_exit(mm)) + if (khugepaged_test_exit(mm)) continue; /* * When a vma is registered with uffd-wp, we cannot recycle @@ -2249,7 +2249,7 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, return result; } =20 -static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long ad= dr, +static int khugepaged_scan_file(struct mm_struct *mm, unsigned long addr, struct file *file, pgoff_t start, struct collapse_control *cc) { @@ -2294,7 +2294,7 @@ static int hpage_collapse_scan_file(struct mm_struct = *mm, unsigned long addr, } =20 node =3D folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (khugepaged_scan_abort(node, cc)) { result =3D SCAN_SCAN_ABORT; break; } @@ -2340,7 +2340,7 @@ static int hpage_collapse_scan_file(struct mm_struct = *mm, unsigned long addr, return result; } #else -static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long ad= dr, +static int khugepaged_scan_file(struct mm_struct *mm, unsigned long addr, struct file *file, pgoff_t start, struct collapse_control *cc) { @@ -2372,19 +2372,19 @@ static int khugepaged_collapse_single_pmd(unsigned = long addr, struct mm_struct * =20 mmap_read_unlock(mm); *mmap_locked =3D false; - result =3D hpage_collapse_scan_file(mm, addr, file, pgoff, + result =3D khugepaged_scan_file(mm, addr, file, pgoff, cc); fput(file); if (result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { mmap_read_lock(mm); - if (hpage_collapse_test_exit_or_disable(mm)) + if (khugepaged_test_exit_or_disable(mm)) goto end; result =3D collapse_pte_mapped_thp(mm, addr, !cc->is_khugepaged); mmap_read_unlock(mm); } } else { - result =3D hpage_collapse_scan_pmd(mm, vma, addr, + result =3D khugepaged_scan_pmd(mm, vma, addr, mmap_locked, cc); } if (result =3D=3D SCAN_SUCCEED || result =3D=3D SCAN_PMD_MAPPED) @@ -2432,7 +2432,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, goto breakouterloop_mmap_lock; =20 progress++; - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) goto breakouterloop; =20 vma_iter_init(&vmi, mm, khugepaged_scan.address); @@ -2440,7 +2440,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, unsigned long hstart, hend; =20 cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) { + if (unlikely(khugepaged_test_exit_or_disable(mm))) { progress++; break; } @@ -2462,7 +2462,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, bool mmap_locked =3D true; =20 cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) goto breakouterloop; =20 VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2498,7 +2498,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (hpage_collapse_test_exit(mm) || !vma) { + if (khugepaged_test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find --=20 2.47.1 From nobody Fri Dec 19 19:15:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16BD42054EB for ; Wed, 8 Jan 2025 23:33:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379227; cv=none; b=CdkFgiFhnQNo+ta/ysv/ekEs3KFZ8kgvFKIXXQe4oEe5MMnTxCcROd+j8FNd6jZEm/69xuCPU7rmVq2BwgyMWY2+GzurW2af+FGQFTa5kEj8HMwulkxGvvmeKoINmipOGeLwPwx8gsWxjHbEydXeyPjIAb3wvJL/ZgFJHemK8Jw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379227; c=relaxed/simple; bh=68YIkYe5bX1sX2aFWBvuWXqH92YNNpFnTwpWVzwR4LY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VXU5PyQEi9RkKFLZ95YBCXO6lez3R8jpwPeTWwdjLETqHtm0+rTXya3ouZD+gAadi8HNOK9Ls9RMI+IsbPIu6HVQwrLqCk46P3QDiwJDNPt+t6WXnYFdn52v7tn/JtJqtKrOE4BwKDgG9+L2kIvJDZ/PejUCCxz9bh2wQC0+hu0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PYqjZGPa; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PYqjZGPa" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379225; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LYWZaPy1kd0tcw2Lw/njF3yTpmJS5BgOSyQhXLKd6xQ=; b=PYqjZGPaFjyaomqdYKZLQr/KmCz4eqVT2KEKtoW1qXj+RclAXvcYZdQ5SHR1nT6AvKnIpB OSPsG2nOUVvZfRUykhFDuUw+kBz6Ms4l96PJF9V0emEHqWbxeolvk12sD/BOyzBFKa4LD4 tXdWca9MajUH3QtmKVO2JQilAYCX6G4= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-425-jTdYKgYgNhSn6U_dgUqwTQ-1; Wed, 08 Jan 2025 18:33:42 -0500 X-MC-Unique: jTdYKgYgNhSn6U_dgUqwTQ-1 X-Mimecast-MFC-AGG-ID: jTdYKgYgNhSn6U_dgUqwTQ Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 916F619560AF; Wed, 8 Jan 2025 23:33:38 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D82DC19560AE; Wed, 8 Jan 2025 23:33:30 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 05/11] khugepaged: generalize hugepage_vma_revalidate for mTHP support Date: Wed, 8 Jan 2025 16:31:21 -0700 Message-ID: <20250108233128.14484-6-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" For khugepaged to support different mTHP orders, we must generalize this function for arbitrary orders. No functional change in this patch. Signed-off-by: Nico Pache --- mm/khugepaged.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 90de49d11a98..e2e6ca9265ab 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -920,7 +920,7 @@ static int khugepaged_find_target_node(struct collapse_= control *cc) static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long add= ress, bool expect_anon, struct vm_area_struct **vmap, - struct collapse_control *cc) + struct collapse_control *cc, int order) { struct vm_area_struct *vma; unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; @@ -932,9 +932,9 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, if (!vma) return SCAN_VMA_NULL; =20 - if (!thp_vma_suitable_order(vma, address, PMD_ORDER)) + if (!thp_vma_suitable_order(vma, address, order)) return SCAN_ADDRESS_RANGE; - if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, order)) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then @@ -1126,7 +1126,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, goto out_nolock; =20 mmap_read_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD= _ORDER); if (result !=3D SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1160,7 +1160,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD= _ORDER); if (result !=3D SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -2779,7 +2779,7 @@ int madvise_collapse(struct vm_area_struct *vma, stru= ct vm_area_struct **prev, mmap_read_lock(mm); mmap_locked =3D true; result =3D hugepage_vma_revalidate(mm, addr, false, &vma, - cc); + cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) { last_fail =3D result; goto out_nolock; --=20 2.47.1 From nobody Fri Dec 19 19:15:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11A4722611 for ; Wed, 8 Jan 2025 23:33:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379240; cv=none; b=nUElkGqotkCqFn0GWbVzqQO66PQxcDuQjy8wHD/DwpkZmBxOlnT7gWyfFS/qaKCjR/rGSsHTXVY1xTrGKIZgRdJBiF7Lcwgf9fbALR/2iza2W9SgijvzJX6cGP5tvGvyCUlpoyganpuIUI1Bs3f2GAxhW7US9oN2DzJkU+DBGIY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379240; c=relaxed/simple; bh=9T2kGD2o47meZodEx99VBkdunkkn3wW2xwxofEe3FwY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jmIsWk/NLIuOoSiy3pI/q+K+YdRGloJfh1BUiCnNre8Z2gKyfg6Nm9f9miw294n+wXnMOL1jQarAImMsyusCchfqz7zJwdzaFB+3BkHVoMdl56LvPDJe+6eMc5plnhGiKjDpxwOmT0LWiitql8BIegrz2IWKnGi50DMhWBVMXZE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=eqAsdLQg; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eqAsdLQg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S133XG+svDEeb2oKFkZulclk4jILVy1/1EYUYGlZ8FE=; b=eqAsdLQgDIBTJZaw+KifWXAqBhaSMQSTJSWZM0iCNqnJiB6AMXN2BYRtRKs3tVB0XSxCJX Dmkc2IhX9uqQwD3ITHmoB6fA+fSUaeWbSaEPIUnEcnv994z1RY4WjHgqrKOiirkkLgoTJj IVQpqFZbhEZuKrvv9mE/2/iuzi49Y/0= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-418-1PB79giaOF6GmKQgm3tLVg-1; Wed, 08 Jan 2025 18:33:52 -0500 X-MC-Unique: 1PB79giaOF6GmKQgm3tLVg-1 X-Mimecast-MFC-AGG-ID: 1PB79giaOF6GmKQgm3tLVg Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 376AE19560A1; Wed, 8 Jan 2025 23:33:48 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E4AE419560AE; Wed, 8 Jan 2025 23:33:38 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 06/11] khugepaged: generalize alloc_charge_folio for mTHP support Date: Wed, 8 Jan 2025 16:31:22 -0700 Message-ID: <20250108233128.14484-7-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" alloc_charge_folio allocates the new folio for the khugepaged collapse. Generalize the order of the folio allocations to support future mTHP collapsing. No functional changes in this patch. Signed-off-by: Nico Pache --- mm/khugepaged.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e2e6ca9265ab..6daf3a943a1a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1070,14 +1070,14 @@ static int __collapse_huge_page_swapin(struct mm_st= ruct *mm, } =20 static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm, - struct collapse_control *cc) + struct collapse_control *cc, int order) { gfp_t gfp =3D (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); int node =3D khugepaged_find_target_node(cc); struct folio *folio; =20 - folio =3D __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); + folio =3D __folio_alloc(gfp, order, node, &cc->alloc_nmask); if (!folio) { *foliop =3D NULL; count_vm_event(THP_COLLAPSE_ALLOC_FAILED); @@ -1121,7 +1121,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, */ mmap_read_unlock(mm); =20 - result =3D alloc_charge_folio(&folio, mm, cc); + result =3D alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out_nolock; =20 @@ -1834,7 +1834,7 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); =20 - result =3D alloc_charge_folio(&new_folio, mm, cc); + result =3D alloc_charge_folio(&new_folio, mm, cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out; =20 --=20 2.47.1 From nobody Fri Dec 19 19:15:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDA3C2046AA for ; Wed, 8 Jan 2025 23:34:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379248; cv=none; b=ig7kix7/hVnmTLWUijIHh3YTMDVaelkN44Wtaw0k0f1k948wNivuL+Alikw5+ToOZ6VssLu5+2hgiFQxSVwRzEL4U4oYMQQ4XHtY93HJnwJELEkfNhLH6uCvC5u3vmKUVG8Z52djlu+pviFoyUQarjfYTqfzpqjLtbeJZK/KJLE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379248; c=relaxed/simple; bh=T+A7EDCCtFsA6mR2AJvLwkkIuPfFsPQ9QIiOqlu+nXc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F2OzdSqRrUfkBL2vI1fDxTdP3NhJs0obDAoJDfB+apBoIzTudfjSv4GhW2CgVLftE2bsxXKI4kdp91IzyKOS1Bdj7PiKfEhDv5SjI9rC5K30vrPyF14kTIGwdl/JL+fo/RyDHeRC1FRnEzYx6PtXG2j3419zhZ09g3Mg35NeU3Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bCZUQ6oK; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bCZUQ6oK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379245; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NlWBk/L/IkPkjBErBKE0II3MThdDUOorzPsxzF/l62g=; b=bCZUQ6oKniZdoiOuvPOBVVvqc/oVG4IZzSI1q87to5VtSBidywMswWtxJt6NmzTqdwmmgp q8SFtMa3XBU08YZCLekxBUgus4iOQWkBOYcB9TgxLnOFvswQQ/2QmokzoB25FmSQcaxNWi egru8owtlU4ts0RM8YSHZQtyaO1MNTE= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-529-f2Ks21HyPQGtWMlhN9XdfA-1; Wed, 08 Jan 2025 18:34:00 -0500 X-MC-Unique: f2Ks21HyPQGtWMlhN9XdfA-1 X-Mimecast-MFC-AGG-ID: f2Ks21HyPQGtWMlhN9XdfA Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BCE641956083; Wed, 8 Jan 2025 23:33:56 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 92A9019560AE; Wed, 8 Jan 2025 23:33:48 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 07/11] khugepaged: generalize __collapse_huge_page_* for mTHP support Date: Wed, 8 Jan 2025 16:31:23 -0700 Message-ID: <20250108233128.14484-8-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" generalize the order of the __collapse_huge_page_* functions to support future mTHP collapse. No functional changes in this patch. Signed-off-by: Nico Pache --- mm/khugepaged.c | 36 +++++++++++++++++++----------------- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6daf3a943a1a..9eb161b04ee4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -565,7 +565,8 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, unsigned long address, pte_t *pte, struct collapse_control *cc, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct page *page =3D NULL; struct folio *folio =3D NULL; @@ -573,7 +574,7 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, int none_or_zero =3D 0, shared =3D 0, result =3D SCAN_FAIL, referenced = =3D 0; bool writable =3D false; =20 - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; + for (_pte =3D pte; _pte < pte + (1 << order); _pte++, address +=3D PAGE_SIZE) { pte_t pteval =3D ptep_get(_pte); if (pte_none(pteval) || (pte_present(pteval) && @@ -711,14 +712,15 @@ static void __collapse_huge_page_copy_succeeded(pte_t= *pte, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct folio *src, *tmp; pte_t *_pte; pte_t pteval; =20 - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; - _pte++, address +=3D PAGE_SIZE) { + for (_pte =3D pte; _pte < pte + (1 << order); + _pte++, address +=3D PAGE_SIZE) { pteval =3D ptep_get(_pte); if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1); @@ -764,7 +766,8 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { spinlock_t *pmd_ptl; =20 @@ -781,7 +784,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, * Release both raw and compound pages isolated * in __collapse_huge_page_isolate. */ - release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist); + release_pte_pages(pte, pte + (1 << order), compound_pagelist); } =20 /* @@ -802,7 +805,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, u8 order) { unsigned int i; int result =3D SCAN_SUCCEED; @@ -810,7 +813,7 @@ static int __collapse_huge_page_copy(pte_t *pte, struct= folio *folio, /* * Copying pages' contents is subject to memory poison at any iteration. */ - for (i =3D 0; i < HPAGE_PMD_NR; i++) { + for (i =3D 0; i < (1 << order); i++) { pte_t pteval =3D ptep_get(pte + i); struct page *page =3D folio_page(folio, i); unsigned long src_addr =3D address + i * PAGE_SIZE; @@ -829,10 +832,10 @@ static int __collapse_huge_page_copy(pte_t *pte, stru= ct folio *folio, =20 if (likely(result =3D=3D SCAN_SUCCEED)) __collapse_huge_page_copy_succeeded(pte, vma, address, ptl, - compound_pagelist); + compound_pagelist, order); else __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vma, - compound_pagelist); + compound_pagelist, order); =20 return result; } @@ -996,11 +999,11 @@ static int check_pmd_still_valid(struct mm_struct *mm, static int __collapse_huge_page_swapin(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, - int referenced) + int referenced, u8 order) { int swapped_in =3D 0; vm_fault_t ret =3D 0; - unsigned long address, end =3D haddr + (HPAGE_PMD_NR * PAGE_SIZE); + unsigned long address, end =3D haddr + ((1 << order) * PAGE_SIZE); int result; pte_t *pte =3D NULL; spinlock_t *ptl; @@ -1110,7 +1113,6 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, int result =3D SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; - VM_BUG_ON(address & ~HPAGE_PMD_MASK); =20 /* @@ -1145,7 +1147,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * that case. Continuing to collapse causes inconsistency. */ result =3D __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced); + referenced, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out_nolock; } @@ -1192,7 +1194,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); if (pte) { result =3D __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); spin_unlock(pte_ptl); } else { result =3D SCAN_PMD_NULL; @@ -1222,7 +1224,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, =20 result =3D __collapse_huge_page_copy(pte, folio, pmd, _pmd, vma, address, pte_ptl, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); pte_unmap(pte); if (unlikely(result !=3D SCAN_SUCCEED)) goto out_up_write; --=20 2.47.1 From nobody Fri Dec 19 19:15:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9CAC3204F70 for ; Wed, 8 Jan 2025 23:34:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379257; cv=none; b=nVH+RfZAEKVExZ2FlZaAbBcdL2P9UFnxNqpLorZ4CNd7/Td3wbqWtBbA8gnKR0k1CLaipzIiR+8YFTx3iE3w3MarXI6IN+5VDi/WX0jrBkK+2QypQr6/DCuisQtKljPUoHQ7G5+ISZizwb6ldcqT1FU8Dw7IZx0P+Ri3UMTmYdQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379257; c=relaxed/simple; bh=onJbqRGIwbsxztFIp6xG0EJikaf01qGsC91fw7O8XHM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=N3Mv5X0aUghUDShnQIjxCodhnYatE+YrHwFV4d6fqS9lVanL3LUnUgHRq9Qpe83Hh5/kD2N5W3UBgRHyApjzez70Z1EDGlOiTa44bB7pbZTDLXx2v5wgDORooNC0JVYdsr0pR29ffyCYZknpLvXlIL7l47AljuPXaV2qmVAVVZg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EB80bCDE; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EB80bCDE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379254; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0aDz9njSBgcFlph30zgTLUgh8iNh2Ug8pSZWMj0+9ks=; b=EB80bCDEdRDRPW2YV2zPn9/X7J2MzrOi2xpBqxKhqX3yojypAjedgkGmjowq6WqfmxP41M hB4KrCWiXeFoFhH+0awnvQ5805nU4wfVCMHBYBnTilljn8CVIL33FeXftJJHdLfWqSIT5j vtdqFqVKfWPZMI3yZdxElyxTC/U6ITQ= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-682-IzrUvAYLNmSq9Wj_N5ghYw-1; Wed, 08 Jan 2025 18:34:10 -0500 X-MC-Unique: IzrUvAYLNmSq9Wj_N5ghYw-1 X-Mimecast-MFC-AGG-ID: IzrUvAYLNmSq9Wj_N5ghYw Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8C0DF19560B1; Wed, 8 Jan 2025 23:34:05 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2718F19560AE; Wed, 8 Jan 2025 23:33:57 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 08/11] khugepaged: introduce khugepaged_scan_bitmap for mTHP support Date: Wed, 8 Jan 2025 16:31:24 -0700 Message-ID: <20250108233128.14484-9-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" khugepaged scans PMD ranges for potential collapse to a hugepage. To add mTHP support we use this scan to instead record chunks of fully utilized sections of the PMD. create a bitmap to represent a PMD in order MTHP_MIN_ORDER chunks. by default we will set this to order 3. The reasoning is that for 4K 512 PMD size this results in a 64 bit bitmap which has some optimizations. For other arches like ARM64 64K, we can set a larger order if needed. khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap that represents chunks of fully utilized regions. We can then determine what mTHP size fits best and in the following patch, we set this bitmap while scanning the PMD. max_ptes_none is used as a scale to determine how "full" an order must be before being considered for collapse. Signed-off-by: Nico Pache --- include/linux/khugepaged.h | 4 +- mm/khugepaged.c | 129 +++++++++++++++++++++++++++++++++++-- 2 files changed, 126 insertions(+), 7 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 1f46046080f5..31cff8aeec4a 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -1,7 +1,9 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_KHUGEPAGED_H #define _LINUX_KHUGEPAGED_H - +#define MIN_MTHP_ORDER 3 +#define MIN_MTHP_NR (1<mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { HPAGE_PMD_ORDER - MIN_MTHP_ORDER, 0 }; + + while (top >=3D 0) { + state =3D cc->mthp_bitmap_stack[top--]; + order =3D state.order; + offset =3D state.offset; + num_chunks =3D 1 << order; + // Skip mTHP orders that are not enabled + if (!(enabled_orders >> (order + MIN_MTHP_ORDER)) & 1) + goto next; + + // copy the relavant section to a new bitmap + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset, + MTHP_BITMAP_SIZE); + + bits_set =3D bitmap_weight(cc->mthp_bitmap_temp, num_chunks); + + // Check if the region is "almost full" based on the threshold + max_percent =3D ((HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) * 100) + / (HPAGE_PMD_NR - 1); + threshold_bits =3D (max_percent * num_chunks) / 100; + + if (bits_set >=3D threshold_bits) { + ret =3D collapse_huge_page(mm, address, referenced, unmapped, cc, + mmap_locked, order + MIN_MTHP_ORDER, offset * MIN_MTHP_NR); + if (ret =3D=3D SCAN_SUCCEED) + collapsed +=3D (1 << (order + MIN_MTHP_ORDER)); + continue; + } + +next: + if (order > 0) { + next_order =3D order - 1; + mid_offset =3D offset + (num_chunks / 2); + cc->mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { next_order, mid_offset }; + cc->mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { next_order, offset }; + } + } + return collapsed; +} + static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, @@ -1430,7 +1528,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, pte_unmap_unlock(pte, ptl); if (result =3D=3D SCAN_SUCCEED) { result =3D collapse_huge_page(mm, address, referenced, - unmapped, cc); + unmapped, cc, mmap_locked, HPAGE_PMD_ORDER, 0); /* collapse_huge_page will return with the mmap_lock released */ *mmap_locked =3D false; } @@ -2767,6 +2865,21 @@ int madvise_collapse(struct vm_area_struct *vma, str= uct vm_area_struct **prev, return -ENOMEM; cc->is_khugepaged =3D false; =20 + cc->mthp_bitmap =3D kmalloc_array( + BITS_TO_LONGS(MTHP_BITMAP_SIZE), sizeof(unsigned long), GFP_KERNEL); + if (!cc->mthp_bitmap) + return -ENOMEM; + + cc->mthp_bitmap_temp =3D kmalloc_array( + BITS_TO_LONGS(MTHP_BITMAP_SIZE), sizeof(unsigned long), GFP_KERNEL); + if (!cc->mthp_bitmap_temp) + return -ENOMEM; + + cc->mthp_bitmap_stack =3D kmalloc_array( + MTHP_BITMAP_SIZE, sizeof(struct scan_bit_state), GFP_KERNEL); + if (!cc->mthp_bitmap_stack) + return -ENOMEM; + mmgrab(mm); lru_add_drain_all(); =20 @@ -2831,8 +2944,12 @@ int madvise_collapse(struct vm_area_struct *vma, str= uct vm_area_struct **prev, out_nolock: mmap_assert_locked(mm); mmdrop(mm); + kfree(cc->mthp_bitmap); + kfree(cc->mthp_bitmap_temp); + kfree(cc->mthp_bitmap_stack); kfree(cc); =20 + return thps =3D=3D ((hend - hstart) >> HPAGE_PMD_SHIFT) ? 0 : madvise_collapse_errno(last_fail); } --=20 2.47.1 From nobody Fri Dec 19 19:15:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3251A204F61 for ; Wed, 8 Jan 2025 23:34:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379265; cv=none; b=REcE7BMeCd/rEZZXEUKlWqA7yIFCBvCIcnpLE7Knx9vsvBZ0y9h8prQ5RBvQTcfa5Fa3Jmiu83gGfAIkD56+JA0tThI7EAGVUhZNCdaGTZ4qj5M16NkiKfj+1fa+xnk3SyjEWsh+qLm0ESrTGt/FotbBVSjSDbeIA2FdMUAmJDE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379265; c=relaxed/simple; bh=Tb38I3kg/rOpPNtL2/mocyhh7Rd3V+wsnt60jIaR2Xs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=l8N+9u/AA8ej3vW/2OQyGtVJRFE/mgTDKz3WNB+F7i3WgbmfxhhSz9XQBH07+ZhGqb4nf19mGtlcn2Ugyd5fSgUrzCota2BQ4u+G5Wz0hzUu8fwFQfoev+3Whsx/JdMtL03tHcTSNFI1RiuLEtFU60ZGQOFHx0m/NK12e85gnM0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ND4aWY/w; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ND4aWY/w" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379262; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=90X6GfV2bcC688lGkgxcg3viPnFdaoWJIxc9fBgBRMY=; b=ND4aWY/wEkmNKWDmpgg1Y/KAQheuR+jZgVjwfnz6bTcnWr1x7GMm7U3T+s54jJ84V4XKfv 3WiAqm/mBtHsp/PszO38fZ7dKPU63FSLnpyhIK1i0AwJfeqt8iTGM5oZkUi03iFuZStsde +rT/XooRUFYjaa1iTT77Uyfei8Keetk= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-480-6NSDnLaZPwSRWloZEf12ug-1; Wed, 08 Jan 2025 18:34:18 -0500 X-MC-Unique: 6NSDnLaZPwSRWloZEf12ug-1 X-Mimecast-MFC-AGG-ID: 6NSDnLaZPwSRWloZEf12ug Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E318A19560B1; Wed, 8 Jan 2025 23:34:13 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id DD72519560AE; Wed, 8 Jan 2025 23:34:05 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 09/11] khugepaged: add mTHP support Date: Wed, 8 Jan 2025 16:31:25 -0700 Message-ID: <20250108233128.14484-10-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" Introduce the ability for khugepaged to collapse to different mTHP sizes. While scanning a PMD range for potential hugepage collapse, track pages in MIN_MTHP_ORDER chunks. Each bit represents a fully utilized region of order MIN_MTHP_ORDER ptes. With this bitmap we can determine which mTHP sizes would be the most efficient to collapse to if the PMD collapse is not suitible. Signed-off-by: Nico Pache --- mm/khugepaged.c | 111 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 77 insertions(+), 34 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index de1dc6ea3c71..4d3c560f20b4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1139,13 +1139,14 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; - pte_t *pte; + pte_t *pte, mthp_pte; pgtable_t pgtable; struct folio *folio; spinlock_t *pmd_ptl, *pte_ptl; int result =3D SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; + unsigned long _address =3D address + offset * PAGE_SIZE; VM_BUG_ON(address & ~HPAGE_PMD_MASK); =20 /* if collapsing mTHPs we may have already released the read_lock, and @@ -1162,12 +1163,13 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, mmap_read_unlock(mm); *mmap_locked =3D false; =20 - result =3D alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); + result =3D alloc_charge_folio(&folio, mm, cc, order); if (result !=3D SCAN_SUCCEED) goto out_nolock; =20 mmap_read_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD= _ORDER); + *mmap_locked =3D true; + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, order); if (result !=3D SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1185,13 +1187,14 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * released when it fails. So we jump out_nolock directly in * that case. Continuing to collapse causes inconsistency. */ - result =3D __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced, HPAGE_PMD_ORDER); + result =3D __collapse_huge_page_swapin(mm, vma, _address, pmd, + referenced, order); if (result !=3D SCAN_SUCCEED) goto out_nolock; } =20 mmap_read_unlock(mm); + *mmap_locked =3D false; /* * Prevent all access to pagetables with the exception of * gup_fast later handled by the ptep_clear_flush and the VM @@ -1201,7 +1204,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD= _ORDER); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, order); if (result !=3D SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -1212,11 +1215,12 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, vma_start_write(vma); anon_vma_lock_write(vma->anon_vma); =20 - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address, - address + HPAGE_PMD_SIZE); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, _address, + _address + (PAGE_SIZE << order)); mmu_notifier_invalidate_range_start(&range); =20 pmd_ptl =3D pmd_lock(mm, pmd); /* probably unnecessary */ + /* * This removes any huge TLB entry from the CPU so we won't allow * huge and small TLB entries for the same virtual address to @@ -1230,10 +1234,10 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, mmu_notifier_invalidate_range_end(&range); tlb_remove_table_sync_one(); =20 - pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); + pte =3D pte_offset_map_lock(mm, &_pmd, _address, &pte_ptl); if (pte) { - result =3D __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist, HPAGE_PMD_ORDER); + result =3D __collapse_huge_page_isolate(vma, _address, pte, cc, + &compound_pagelist, order); spin_unlock(pte_ptl); } else { result =3D SCAN_PMD_NULL; @@ -1262,8 +1266,8 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, anon_vma_unlock_write(vma->anon_vma); =20 result =3D __collapse_huge_page_copy(pte, folio, pmd, _pmd, - vma, address, pte_ptl, - &compound_pagelist, HPAGE_PMD_ORDER); + vma, _address, pte_ptl, + &compound_pagelist, order); pte_unmap(pte); if (unlikely(result !=3D SCAN_SUCCEED)) goto out_up_write; @@ -1274,20 +1278,37 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * write. */ __folio_mark_uptodate(folio); - pgtable =3D pmd_pgtable(_pmd); - - _pmd =3D mk_huge_pmd(&folio->page, vma->vm_page_prot); - _pmd =3D maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); - - spin_lock(pmd_ptl); - BUG_ON(!pmd_none(*pmd)); - folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); - folio_add_lru_vma(folio, vma); - pgtable_trans_huge_deposit(mm, pmd, pgtable); - set_pmd_at(mm, address, pmd, _pmd); - update_mmu_cache_pmd(vma, address, pmd); - deferred_split_folio(folio, false); - spin_unlock(pmd_ptl); + if (order =3D=3D HPAGE_PMD_ORDER) { + pgtable =3D pmd_pgtable(_pmd); + _pmd =3D mk_huge_pmd(&folio->page, vma->vm_page_prot); + _pmd =3D maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); + + spin_lock(pmd_ptl); + BUG_ON(!pmd_none(*pmd)); + folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + set_pmd_at(mm, address, pmd, _pmd); + update_mmu_cache_pmd(vma, address, pmd); + deferred_split_folio(folio, false); + spin_unlock(pmd_ptl); + } else { //mTHP + mthp_pte =3D mk_pte(&folio->page, vma->vm_page_prot); + mthp_pte =3D maybe_mkwrite(pte_mkdirty(mthp_pte), vma); + + spin_lock(pmd_ptl); + folio_ref_add(folio, (1 << order) - 1); + folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + spin_lock(pte_ptl); + set_ptes(vma->vm_mm, _address, pte, mthp_pte, (1 << order)); + update_mmu_cache_range(NULL, vma, _address, pte, (1 << order)); + spin_unlock(pte_ptl); + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pmd_pgtable(_pmd)); + deferred_split_folio(folio, false); + spin_unlock(pmd_ptl); + } =20 folio =3D NULL; =20 @@ -1367,21 +1388,26 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, { pmd_t *pmd; pte_t *pte, *_pte; + int i; int result =3D SCAN_FAIL, referenced =3D 0; int none_or_zero =3D 0, shared =3D 0; struct page *page =3D NULL; struct folio *folio =3D NULL; unsigned long _address; + unsigned long enabled_orders; spinlock_t *ptl; int node =3D NUMA_NO_NODE, unmapped =3D 0; bool writable =3D false; - + bool all_valid =3D true; + unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; VM_BUG_ON(address & ~HPAGE_PMD_MASK); =20 result =3D find_pmd_or_thp_or_none(mm, address, &pmd); if (result !=3D SCAN_SUCCEED) goto out; =20 + bitmap_zero(cc->mthp_bitmap, 1 << (HPAGE_PMD_ORDER - MIN_MTHP_ORDER)); + bitmap_zero(cc->mthp_bitmap_temp, 1 << (HPAGE_PMD_ORDER - MIN_MTHP_ORDER)= ); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); pte =3D pte_offset_map_lock(mm, pmd, address, &ptl); @@ -1390,8 +1416,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out; } =20 - for (_address =3D address, _pte =3D pte; _pte < pte + HPAGE_PMD_NR; - _pte++, _address +=3D PAGE_SIZE) { + for (i =3D 0; i < HPAGE_PMD_NR; i++) { + if (i % MIN_MTHP_NR =3D=3D 0) + all_valid =3D true; + + _pte =3D pte + i; + _address =3D address + i * PAGE_SIZE; pte_t pteval =3D ptep_get(_pte); if (is_swap_pte(pteval)) { ++unmapped; @@ -1414,6 +1444,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, } } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { + all_valid =3D false; ++none_or_zero; if (!userfaultfd_armed(vma) && (!cc->is_khugepaged || @@ -1514,7 +1545,15 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, folio_test_referenced(folio) || mmu_notifier_test_young(vma->vm_mm, address))) referenced++; + + /* + * we are reading in MIN_MTHP_NR page chunks. if there are no empty + * pages keep track of it in the bitmap for mTHP collapsing. + */ + if (all_valid && (i + 1) % MIN_MTHP_NR =3D=3D 0) + bitmap_set(cc->mthp_bitmap, i / MIN_MTHP_NR, 1); } + if (!writable) { result =3D SCAN_PAGE_RO; } else if (cc->is_khugepaged && @@ -1527,10 +1566,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (result =3D=3D SCAN_SUCCEED) { - result =3D collapse_huge_page(mm, address, referenced, - unmapped, cc, mmap_locked, HPAGE_PMD_ORDER, 0); - /* collapse_huge_page will return with the mmap_lock released */ - *mmap_locked =3D false; + enabled_orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, + tva_flags, THP_ORDERS_ALL_ANON); + result =3D khugepaged_scan_bitmap(mm, address, referenced, unmapped, cc, + mmap_locked, enabled_orders); + if (result > 0) + result =3D SCAN_SUCCEED; } out: trace_mm_khugepaged_scan_pmd(mm, &folio->page, writable, referenced, @@ -2477,11 +2518,13 @@ static int khugepaged_collapse_single_pmd(unsigned = long addr, struct mm_struct * fput(file); if (result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { mmap_read_lock(mm); + *mmap_locked =3D true; if (khugepaged_test_exit_or_disable(mm)) goto end; result =3D collapse_pte_mapped_thp(mm, addr, !cc->is_khugepaged); mmap_read_unlock(mm); + *mmap_locked =3D false; } } else { result =3D khugepaged_scan_pmd(mm, vma, addr, --=20 2.47.1 From nobody Fri Dec 19 19:15:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A490204C2B for ; Wed, 8 Jan 2025 23:34:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379275; cv=none; b=mN/6LaPiLp+hWCL5dXU9XXy0NE11ScXPSiANLTmTmAwtcZZOgDRE/tDRSQZ5/EQqpfXyy62EEG5oIQsOcpLme8I2qau1rZ15d4tXLWvX2CZqro++n6QD0ANnzkQzmQ0gtGwimrVh6XduEkOM4L0XHqwDzwNorfPWpRF0FZA6fS8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379275; c=relaxed/simple; bh=aGA3Cg3acrjmWfvKMaOzQWtX5r/SLsw3Ic0wyCbOOhY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tAVaEslmPVIKk4gZimyv4PFi8Kmu/zAQDhoTuZdMoQga6wkk5SrC6kKcgHXedtLLr+2IOIbRbGXEUDHExEjMXn6VHuH2O6jr0IpOZhdkbpivaKc24qYaZVPRbgoewyNMsjO9UWH2xTC4UXvWOylSLq+RAgLNFd0zQ70c5EkV08w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Lg3kC8q1; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Lg3kC8q1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379273; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j5EOVM0v0gflap4pKtAxxmUaALnCNv8cRd9oJSElYf8=; b=Lg3kC8q1v2MBNsj/lUmZi+FTmlFCFExfNt0uKLAyymjo322enanOAC5Fk0gIIikVIqHP7m h2xtTakft42J3pnlkmQeVhzpZXDUrqM5QdCX9Zq4PgsG+wSWW3Goa5/8VnMi0GLn5qcEQB Nsr0rDHy+ZaoDEP4fChEewZmQxLmaF0= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-584--pEwM4wDPgm_w6pskDoyhA-1; Wed, 08 Jan 2025 18:34:27 -0500 X-MC-Unique: -pEwM4wDPgm_w6pskDoyhA-1 X-Mimecast-MFC-AGG-ID: -pEwM4wDPgm_w6pskDoyhA Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E58CE1979053; Wed, 8 Jan 2025 23:34:22 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4B9B419560AE; Wed, 8 Jan 2025 23:34:14 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 10/11] khugepaged: remove max_ptes_none restriction on the pmd scan Date: Wed, 8 Jan 2025 16:31:26 -0700 Message-ID: <20250108233128.14484-11-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" now that we have mTHP support, which uses max_ptes_none to determine how "full" a mTHP size needs to collapse. lets remove the restriction during the scan phase so we dont bailout early and miss potential mTHP candidates. Signed-off-by: Nico Pache --- mm/khugepaged.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4d3c560f20b4..61a349eb3cf4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1446,15 +1446,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { all_valid =3D false; ++none_or_zero; - if (!userfaultfd_armed(vma) && - (!cc->is_khugepaged || - none_or_zero <=3D khugepaged_max_ptes_none)) { - continue; - } else { + if (userfaultfd_armed(vma)) { result =3D SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); goto out_unmap; } + continue; } if (pte_uffd_wp(pteval)) { /* --=20 2.47.1 From nobody Fri Dec 19 19:15:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BACF5204C22 for ; Wed, 8 Jan 2025 23:34:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379283; cv=none; b=q+jXDJhc8zcenqpSQLZuuuG4jLYGSlCUZUeGB3/fSe5poSeAQfU8cSLtJHc8NXrvtIaGO+eQJVLpHRwNyoPXrHHiMmRVTFpxtyytPVhmOuCW0/yqBDOP/rib89nT6rMUpr0/a3x25Pu8xLCXkGWvvajiphmar4XJokDnAfn/boc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736379283; c=relaxed/simple; bh=m3+2yXppMzq0hNQ2Q0aMTWhl6U+LQebbSkKEXmVMB3I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Cg720nJMWUptBZKo3MiedaLczMVOYGcOxqdIMopNd4LaYygWPhuq3R+htMXxnHfROqKDceIio9bm0r77LbCoKYIciTCTRH0XABd+gzINAn1ckcSBWDD6Va0sir1BcGAvCgBXJ1R/bEDcIN+0Ysbdlo8H/zLrmR5X4DfiZ/alYes= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=eZ7b5Vax; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eZ7b5Vax" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379280; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Dlo40r5jZ4R+FC4eYe0RwrYQN6FD9g5gDaMstpJqM5E=; b=eZ7b5Vax988ALNZSi1PfoHI7YPIjKsDFAOiCZlWh+AEnsHBfJdp4F9yAWgX+57mk3Ayg4O AeuYrtkUWRuGyysVOG9qGmF/Yxykj9Vf6/pF5qZgoMQ95BJGqRnAPi/9OWEHT/NG2vsiyz Ib5lQIuPTl/fgTVehsCwplssoE/dWWE= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-370-l4fI3HlEPWGMA1brAjZ3eA-1; Wed, 08 Jan 2025 18:34:35 -0500 X-MC-Unique: l4fI3HlEPWGMA1brAjZ3eA-1 X-Mimecast-MFC-AGG-ID: l4fI3HlEPWGMA1brAjZ3eA Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C8D1219560B0; Wed, 8 Jan 2025 23:34:30 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4D5FE19560AE; Wed, 8 Jan 2025 23:34:23 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 11/11] khugepaged: skip collapsing mTHP to smaller orders Date: Wed, 8 Jan 2025 16:31:27 -0700 Message-ID: <20250108233128.14484-12-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" khugepaged may try to collapse a mTHP to a smaller mTHP, resulting in some pages being unmapped. Skip these cases until we have a way to check if its ok to collapse to a smaller mTHP size (like in the case of a partially mapped folio). This patch is inspired by Dev Jain's work on khugepaged mTHP support [1]. [1] https://lore.kernel.org/lkml/20241216165105.56185-11-dev.jain@arm.com/ Signed-off-by: Nico Pache --- mm/khugepaged.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 61a349eb3cf4..046843a0d632 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -643,6 +643,11 @@ static int __collapse_huge_page_isolate(struct vm_area= _struct *vma, folio =3D page_folio(page); VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio); =20 + if (order !=3D HPAGE_PMD_ORDER && folio_order(folio) >=3D order) { + result =3D SCAN_PTE_MAPPED_HUGEPAGE; + goto out; + } + /* See khugepaged_scan_pmd(). */ if (folio_likely_mapped_shared(folio)) { ++shared; --=20 2.47.1