From nobody Sat May 4 14:58:49 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCCA5C433FE for ; Fri, 21 Oct 2022 10:12:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229872AbiJUKMU (ORCPT ); Fri, 21 Oct 2022 06:12:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230251AbiJUKMM (ORCPT ); Fri, 21 Oct 2022 06:12:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 991D07AB02 for ; Fri, 21 Oct 2022 03:12:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666347128; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HpvAVKz3qbonI1lfYmBeIQnX7aY+jIE8nipT3HyDP+I=; b=hGn4Rrsh3VNBWXJms6MSicok7N8j0Gus4wUhPP81jDPsGZJrzKb0sfpAjnir6udNQ0yKSf yw6KzCqiEBXSGr0jcRUwws6TFr5lwKqx+3Mfdtnyqb/39eDRoO3zh/A6fDiTYBNbHb32Wo Yyh0EQbrZWndNbBbt2Iru1++myf+W3w= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-614-HesOCGjDNr6QxqCIcGuzdw-1; Fri, 21 Oct 2022 06:12:05 -0400 X-MC-Unique: HesOCGjDNr6QxqCIcGuzdw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5C98387A386; Fri, 21 Oct 2022 10:12:04 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id DB8CA40D2998; Fri, 21 Oct 2022 10:11:50 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v2 1/9] selftests/vm: add test to measure MADV_UNMERGEABLE performance Date: Fri, 21 Oct 2022 12:11:33 +0200 Message-Id: <20221021101141.84170-2-david@redhat.com> In-Reply-To: <20221021101141.84170-1-david@redhat.com> References: <20221021101141.84170-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Let's add a test to measure performance of KSM breaking not triggered via COW, but triggered by disabling KSM on an area filled with KSM pages via MADV_UNMERGEABLE. Acked-by: Peter Xu Signed-off-by: David Hildenbrand --- tools/testing/selftests/vm/ksm_tests.c | 76 +++++++++++++++++++++++++- 1 file changed, 74 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/ksm_tests.c b/tools/testing/selftes= ts/vm/ksm_tests.c index 0d85be2350fa..f9eb4d67e0dd 100644 --- a/tools/testing/selftests/vm/ksm_tests.c +++ b/tools/testing/selftests/vm/ksm_tests.c @@ -40,6 +40,7 @@ enum ksm_test_name { CHECK_KSM_NUMA_MERGE, KSM_MERGE_TIME, KSM_MERGE_TIME_HUGE_PAGES, + KSM_UNMERGE_TIME, KSM_COW_TIME }; =20 @@ -108,7 +109,10 @@ static void print_help(void) " -P evaluate merging time and speed.\n" " For this test, the size of duplicated memory area (in MiB)\n" " must be provided using -s option\n" - " -H evaluate merging time and speed of area allocated mostly with hu= ge pages\n" + " -H evaluate merging time and speed of area allocated mostly with= huge pages\n" + " For this test, the size of duplicated memory area (in MiB)\n" + " must be provided using -s option\n" + " -D evaluate unmerging time and speed when disabling KSM.\n" " For this test, the size of duplicated memory area (in MiB)\n" " must be provided using -s option\n" " -C evaluate the time required to break COW of merged pages.\n\n"= ); @@ -188,6 +192,16 @@ static int ksm_merge_pages(void *addr, size_t size, st= ruct timespec start_time, return 0; } =20 +static int ksm_unmerge_pages(void *addr, size_t size, + struct timespec start_time, int timeout) +{ + if (madvise(addr, size, MADV_UNMERGEABLE)) { + perror("madvise"); + return 1; + } + return 0; +} + static bool assert_ksm_pages_count(long dupl_page_count) { unsigned long max_page_sharing, pages_sharing, pages_shared; @@ -560,6 +574,53 @@ static int ksm_merge_time(int mapping, int prot, int t= imeout, size_t map_size) return KSFT_FAIL; } =20 +static int ksm_unmerge_time(int mapping, int prot, int timeout, size_t map= _size) +{ + void *map_ptr; + struct timespec start_time, end_time; + unsigned long scan_time_ns; + + map_size *=3D MB; + + map_ptr =3D allocate_memory(NULL, prot, mapping, '*', map_size); + if (!map_ptr) + return KSFT_FAIL; + if (clock_gettime(CLOCK_MONOTONIC_RAW, &start_time)) { + perror("clock_gettime"); + goto err_out; + } + if (ksm_merge_pages(map_ptr, map_size, start_time, timeout)) + goto err_out; + + if (clock_gettime(CLOCK_MONOTONIC_RAW, &start_time)) { + perror("clock_gettime"); + goto err_out; + } + if (ksm_unmerge_pages(map_ptr, map_size, start_time, timeout)) + goto err_out; + if (clock_gettime(CLOCK_MONOTONIC_RAW, &end_time)) { + perror("clock_gettime"); + goto err_out; + } + + scan_time_ns =3D (end_time.tv_sec - start_time.tv_sec) * NSEC_PER_SEC + + (end_time.tv_nsec - start_time.tv_nsec); + + printf("Total size: %lu MiB\n", map_size / MB); + printf("Total time: %ld.%09ld s\n", scan_time_ns / NSEC_PER_SEC, + scan_time_ns % NSEC_PER_SEC); + printf("Average speed: %.3f MiB/s\n", (map_size / MB) / + ((double)scan_time_ns / NSEC_PER_SEC)); + + munmap(map_ptr, map_size); + return KSFT_PASS; + +err_out: + printf("Not OK\n"); + munmap(map_ptr, map_size); + return KSFT_FAIL; +} + static int ksm_cow_time(int mapping, int prot, int timeout, size_t page_si= ze) { void *map_ptr; @@ -644,7 +705,7 @@ int main(int argc, char *argv[]) bool merge_across_nodes =3D KSM_MERGE_ACROSS_NODES_DEFAULT; long size_MB =3D 0; =20 - while ((opt =3D getopt(argc, argv, "ha:p:l:z:m:s:MUZNPCH")) !=3D -1) { + while ((opt =3D getopt(argc, argv, "ha:p:l:z:m:s:MUZNPCHD")) !=3D -1) { switch (opt) { case 'a': prot =3D str_to_prot(optarg); @@ -701,6 +762,9 @@ int main(int argc, char *argv[]) case 'H': test_name =3D KSM_MERGE_TIME_HUGE_PAGES; break; + case 'D': + test_name =3D KSM_UNMERGE_TIME; + break; case 'C': test_name =3D KSM_COW_TIME; break; @@ -762,6 +826,14 @@ int main(int argc, char *argv[]) ret =3D ksm_merge_hugepages_time(MAP_PRIVATE | MAP_ANONYMOUS, prot, ksm_scan_limit_sec, size_MB); break; + case KSM_UNMERGE_TIME: + if (size_MB =3D=3D 0) { + printf("Option '-s' is required.\n"); + return KSFT_FAIL; + } + ret =3D ksm_unmerge_time(MAP_PRIVATE | MAP_ANONYMOUS, prot, + ksm_scan_limit_sec, size_MB); + break; case KSM_COW_TIME: ret =3D ksm_cow_time(MAP_PRIVATE | MAP_ANONYMOUS, prot, ksm_scan_limit_s= ec, page_size); --=20 2.37.3 From nobody Sat May 4 14:58:49 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E3C3C4332F for ; Fri, 21 Oct 2022 10:12:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230332AbiJUKMk (ORCPT ); Fri, 21 Oct 2022 06:12:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230251AbiJUKMi (ORCPT ); Fri, 21 Oct 2022 06:12:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BD037AB12 for ; Fri, 21 Oct 2022 03:12:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666347155; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nRQc9H/82kxOKI2FpppEsEOSQhzrh5bTImBzbqeGf2g=; b=JQj3pDlH+Skurj50/0/yVO8Y0o8VP+o/S4WEn4RGNxh0qOmdZRPg+5ESOELoCRxlj4GCxr dF/2eohpENEx3s06MRZbMSg3qpNdnvnWIxdw8ZTWGyogfVvchWEKPQb7c8v03IJDFP0utM aYci/S8AIdAr2kszGgrZTOXG2+WIvZ0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-563-uUm3cRV3OvSB9KP2Undt3w-1; Fri, 21 Oct 2022 06:12:34 -0400 X-MC-Unique: uUm3cRV3OvSB9KP2Undt3w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 66259857D0A; Fri, 21 Oct 2022 10:12:17 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9C8A040C95B0; Fri, 21 Oct 2022 10:12:04 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v2 2/9] mm/ksm: simplify break_ksm() to not rely on VM_FAULT_WRITE Date: Fri, 21 Oct 2022 12:11:34 +0200 Message-Id: <20221021101141.84170-3-david@redhat.com> In-Reply-To: <20221021101141.84170-1-david@redhat.com> References: <20221021101141.84170-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Now that GUP no longer requires VM_FAULT_WRITE, break_ksm() is the sole remaining user of VM_FAULT_WRITE. As we also want to stop triggering a fake write fault and instead use FAULT_FLAG_UNSHARE -- similar to GUP-triggered unsharing when taking a R/O pin on a shared anonymous page (including KSM pages), let's stop relying on VM_FAULT_WRITE. Let's rework break_ksm() to not rely on the return value of handle_mm_fault() anymore to figure out whether COW-breaking was successful. Simply perform another follow_page() lookup to verify the result. While this makes break_ksm() slightly less efficient, we can simplify handle_mm_fault() a little and easily switch to FAULT_FLAG_UNSHARE without introducing similar KSM-specific behavior for FAULT_FLAG_UNSHARE. In my setup (AMD Ryzen 9 3900X), running the KSM selftest to test unmerge performance on 2 GiB (taskset 0x8 ./ksm_tests -D -s 2048), this results in a performance degradation of ~4% -- 5% (old: ~5250 MiB/s, new: ~5010 MiB/s). I don't think that we particularly care about that performance drop when unmerging. If it ever turns out to be an actual performance issue, we can think about a better alternative for FAULT_FLAG_UNSHARE -- let's just keep it simple for now. Acked-by: Peter Xu Signed-off-by: David Hildenbrand --- mm/ksm.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index c19fcca9bc03..b884a22f3c3c 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -440,26 +440,27 @@ static int break_ksm(struct vm_area_struct *vma, unsi= gned long addr) vm_fault_t ret =3D 0; =20 do { + bool ksm_page =3D false; + cond_resched(); page =3D follow_page(vma, addr, FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE); if (IS_ERR_OR_NULL(page)) break; if (PageKsm(page)) - ret =3D handle_mm_fault(vma, addr, - FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE, - NULL); - else - ret =3D VM_FAULT_WRITE; + ksm_page =3D true; put_page(page); - } while (!(ret & (VM_FAULT_WRITE | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | V= M_FAULT_OOM))); + + if (!ksm_page) + return 0; + ret =3D handle_mm_fault(vma, addr, + FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE, + NULL); + } while (!(ret & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | VM_FAULT_OOM))); /* - * We must loop because handle_mm_fault() may back out if there's - * any difficulty e.g. if pte accessed bit gets updated concurrently. - * - * VM_FAULT_WRITE is what we have been hoping for: it indicates that - * COW has been broken, even if the vma does not permit VM_WRITE; - * but note that a concurrent fault might break PageKsm for us. + * We must loop until we no longer find a KSM page because + * handle_mm_fault() may back out if there's any difficulty e.g. if + * pte accessed bit gets updated concurrently. * * VM_FAULT_SIGBUS could occur if we race with truncation of the * backing file, which also invalidates anonymous pages: that's --=20 2.37.3 From nobody Sat May 4 14:58:49 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87293C4332F for ; Fri, 21 Oct 2022 10:12:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230350AbiJUKMp (ORCPT ); Fri, 21 Oct 2022 06:12:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55246 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229489AbiJUKMi (ORCPT ); Fri, 21 Oct 2022 06:12:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F212E7AB06 for ; Fri, 21 Oct 2022 03:12:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666347155; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uL1RRyPgPkJis952ZjK0SFPNdUV1Kk+hsw2E/TYMifs=; b=ia0n/9LmLL+E7X5Fxzf++n8B2xy0+m31zQQDbijcvPkjmhpAyM8b4IBLEHRWR9J0bJ4iGu +omV9NWudGSrEm1e06xzmalRAtjJRki5m/8z+/aIQeDqa2P3xW7+TBp5uHo/ToYjifITXV faYl8YAXct/3h5Nw4YlNdEQe0KgoiQM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-638--bIND0qnMVShTdrbuDHPqA-1; Fri, 21 Oct 2022 06:12:31 -0400 X-MC-Unique: -bIND0qnMVShTdrbuDHPqA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1D72186EB31; Fri, 21 Oct 2022 10:12:26 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6896D40D299B; Fri, 21 Oct 2022 10:12:07 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v2 3/9] mm: remove VM_FAULT_WRITE Date: Fri, 21 Oct 2022 12:11:35 +0200 Message-Id: <20221021101141.84170-4-david@redhat.com> In-Reply-To: <20221021101141.84170-1-david@redhat.com> References: <20221021101141.84170-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" All users -- GUP and KSM -- are gone, let's just remove it. Acked-by: Peter Xu Signed-off-by: David Hildenbrand --- include/linux/mm_types.h | 3 --- mm/huge_memory.c | 2 +- mm/memory.c | 9 ++++----- 3 files changed, 5 insertions(+), 9 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 500e536796ca..6bc3baced3e3 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -847,7 +847,6 @@ typedef __bitwise unsigned int vm_fault_t; * @VM_FAULT_OOM: Out Of Memory * @VM_FAULT_SIGBUS: Bad access * @VM_FAULT_MAJOR: Page read from storage - * @VM_FAULT_WRITE: Special case for get_user_pages * @VM_FAULT_HWPOISON: Hit poisoned small page * @VM_FAULT_HWPOISON_LARGE: Hit poisoned large page. Index encoded * in upper bits @@ -868,7 +867,6 @@ enum vm_fault_reason { VM_FAULT_OOM =3D (__force vm_fault_t)0x000001, VM_FAULT_SIGBUS =3D (__force vm_fault_t)0x000002, VM_FAULT_MAJOR =3D (__force vm_fault_t)0x000004, - VM_FAULT_WRITE =3D (__force vm_fault_t)0x000008, VM_FAULT_HWPOISON =3D (__force vm_fault_t)0x000010, VM_FAULT_HWPOISON_LARGE =3D (__force vm_fault_t)0x000020, VM_FAULT_SIGSEGV =3D (__force vm_fault_t)0x000040, @@ -894,7 +892,6 @@ enum vm_fault_reason { { VM_FAULT_OOM, "OOM" }, \ { VM_FAULT_SIGBUS, "SIGBUS" }, \ { VM_FAULT_MAJOR, "MAJOR" }, \ - { VM_FAULT_WRITE, "WRITE" }, \ { VM_FAULT_HWPOISON, "HWPOISON" }, \ { VM_FAULT_HWPOISON_LARGE, "HWPOISON_LARGE" }, \ { VM_FAULT_SIGSEGV, "SIGSEGV" }, \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1cc4a5f4791e..be13fe55b798 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1379,7 +1379,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); spin_unlock(vmf->ptl); - return VM_FAULT_WRITE; + return 0; } =20 unlock_fallback: diff --git a/mm/memory.c b/mm/memory.c index f88c351aecd4..8e72f703ed99 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3242,7 +3242,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) } =20 delayacct_wpcopy_end(); - return (page_copied && !unshare) ? VM_FAULT_WRITE : 0; + return 0; oom_free_new: put_page(new_page); oom: @@ -3306,14 +3306,14 @@ static vm_fault_t wp_pfn_shared(struct vm_fault *vm= f) return finish_mkwrite_fault(vmf); } wp_page_reuse(vmf); - return VM_FAULT_WRITE; + return 0; } =20 static vm_fault_t wp_page_shared(struct vm_fault *vmf) __releases(vmf->ptl) { struct vm_area_struct *vma =3D vmf->vma; - vm_fault_t ret =3D VM_FAULT_WRITE; + vm_fault_t ret =3D 0; =20 get_page(vmf->page); =20 @@ -3464,7 +3464,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) return 0; } wp_page_reuse(vmf); - return VM_FAULT_WRITE; + return 0; } else if (unshare) { /* No anonymous page -> nothing to do. */ pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -3983,7 +3983,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (vmf->flags & FAULT_FLAG_WRITE) { pte =3D maybe_mkwrite(pte_mkdirty(pte), vma); vmf->flags &=3D ~FAULT_FLAG_WRITE; - ret |=3D VM_FAULT_WRITE; } rmap_flags |=3D RMAP_EXCLUSIVE; } --=20 2.37.3 From nobody Sat May 4 14:58:49 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BA33C4332F for ; Fri, 21 Oct 2022 10:12:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230361AbiJUKMt (ORCPT ); Fri, 21 Oct 2022 06:12:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230325AbiJUKMj (ORCPT ); Fri, 21 Oct 2022 06:12:39 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1E077AB08 for ; Fri, 21 Oct 2022 03:12:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666347156; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uXtz/Zj1ptThtlYCmULXjCE80g7BkkS7MWtxrW61b/8=; b=Fjgu4sv1/rOMA7veGfxs0A1diuaZTGj+y/eRW66zdzGCluqqAChFzb262UrsylH5rdL0hF bxH28kO9tWcP9ZpF8F+YyZ+MOoJVZXyhP1tLz00fRKAC0X8avsZ/iUMxpnOJhI/RliTnDV UxpGbkjDKrI+HEGdfxBcfwdr5JF0HMU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-277-JeVJkWYEM-2dEQrYwBCSRw-1; Fri, 21 Oct 2022 06:12:33 -0400 X-MC-Unique: JeVJkWYEM-2dEQrYwBCSRw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A6D5B3C0F248; Fri, 21 Oct 2022 10:12:32 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id 62B1440B4976; Fri, 21 Oct 2022 10:12:15 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v2 4/9] selftests/vm: add KSM unmerge tests Date: Fri, 21 Oct 2022 12:11:36 +0200 Message-Id: <20221021101141.84170-5-david@redhat.com> In-Reply-To: <20221021101141.84170-1-david@redhat.com> References: <20221021101141.84170-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Let's add three unmerge tests (MADV_UNMERGEABLE unmerging all pages in the range). test_unmerge(): basic unmerge tests test_unmerge_discarded(): have some pte_none() entries in the range test_unmerge_uffd_wp(): protect the merged pages using uffd-wp ksm_tests.c currently contains a mixture of benchmarks and tests, whereby each test is carried out by executing the ksm_tests binary with specific parameters. Let's add new ksm_functional_tests.c that performs multiple, smaller functional tests all at once. Signed-off-by: David Hildenbrand --- tools/testing/selftests/vm/Makefile | 2 + .../selftests/vm/ksm_functional_tests.c | 279 ++++++++++++++++++ tools/testing/selftests/vm/run_vmtests.sh | 2 + tools/testing/selftests/vm/vm_util.c | 10 + tools/testing/selftests/vm/vm_util.h | 1 + 5 files changed, 294 insertions(+) create mode 100644 tools/testing/selftests/vm/ksm_functional_tests.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/= vm/Makefile index 163c2fde3cb3..2d640a48255c 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -52,6 +52,7 @@ TEST_GEN_FILES +=3D userfaultfd TEST_GEN_PROGS +=3D soft-dirty TEST_GEN_PROGS +=3D split_huge_page_test TEST_GEN_FILES +=3D ksm_tests +TEST_GEN_PROGS +=3D ksm_functional_tests =20 ifeq ($(MACHINE),x86_64) CAN_BUILD_I386 :=3D $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32= bit_program.c -m32) @@ -96,6 +97,7 @@ TEST_FILES +=3D va_128TBswitch.sh include ../lib.mk =20 $(OUTPUT)/khugepaged: vm_util.c +$(OUTPUT)/ksm_functional_tests: vm_util.c $(OUTPUT)/madv_populate: vm_util.c $(OUTPUT)/soft-dirty: vm_util.c $(OUTPUT)/split_huge_page_test: vm_util.c diff --git a/tools/testing/selftests/vm/ksm_functional_tests.c b/tools/test= ing/selftests/vm/ksm_functional_tests.c new file mode 100644 index 000000000000..96644be68962 --- /dev/null +++ b/tools/testing/selftests/vm/ksm_functional_tests.c @@ -0,0 +1,279 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * KSM functional tests + * + * Copyright 2022, Red Hat, Inc. + * + * Author(s): David Hildenbrand + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" +#include "vm_util.h" + +#define KiB 1024u +#define MiB (1024 * KiB) + +static int ksm_fd; +static int ksm_full_scans_fd; +static int pagemap_fd; +static size_t pagesize; + +static bool range_maps_duplicates(char *addr, unsigned long size) +{ + unsigned long offs_a, offs_b, pfn_a, pfn_b; + + /* + * There is no easy way to check if there are KSM pages mapped into + * this range. We only check that the range does not map the same PFN + * twice by comaring each pair of mapped pages. + */ + for (offs_a =3D 0; offs_a < size; offs_a +=3D pagesize) { + pfn_a =3D pagemap_get_pfn(pagemap_fd, addr + offs_a); + /* Page not present or PFN not exposed by the kernel. */ + if (pfn_a =3D=3D -1ull || !pfn_a) + continue; + + for (offs_b =3D offs_a + pagesize; offs_b < size; + offs_b +=3D pagesize) { + pfn_b =3D pagemap_get_pfn(pagemap_fd, addr + offs_b); + if (pfn_b =3D=3D -1ull || !pfn_b) + continue; + if (pfn_a =3D=3D pfn_b) + return true; + } + } + return false; +} + +static long ksm_get_full_scans(void) +{ + char buf[10]; + ssize_t ret; + + ret =3D pread(ksm_full_scans_fd, buf, sizeof(buf) - 1, 0); + if (ret <=3D 0) + return -errno; + buf[ret] =3D 0; + + return strtol(buf, NULL, 10); +} + +static int ksm_merge(void) +{ + long start_scans, end_scans; + + /* Wait for two full scans such that any possible merging happened. */ + start_scans =3D ksm_get_full_scans(); + if (start_scans < 0) + return start_scans; + if (write(ksm_fd, "1", 1) !=3D 1) + return -errno; + do { + end_scans =3D ksm_get_full_scans(); + if (end_scans < 0) + return end_scans; + } while (end_scans < start_scans + 2); + + return 0; +} + +static char *mmap_and_merge_range(char val, unsigned long size) +{ + char *map; + + map =3D mmap(NULL, size, PROT_READ|PROT_WRITE, + MAP_PRIVATE|MAP_ANON, -1, 0); + if (map =3D=3D MAP_FAILED) { + ksft_test_result_fail("mmap() failed\n"); + return MAP_FAILED; + } + + /* Don't use THP. Ignore if THP are not around on a kernel. */ + if (madvise(map, size, MADV_NOHUGEPAGE) && errno !=3D EINVAL) { + ksft_test_result_fail("MADV_NOHUGEPAGE failed\n"); + goto unmap; + } + + /* Make sure each page contains the same values to merge them. */ + memset(map, val, size); + if (madvise(map, size, MADV_MERGEABLE)) { + ksft_test_result_fail("MADV_MERGEABLE failed\n"); + goto unmap; + } + + /* Run KSM to trigger merging and wait. */ + if (ksm_merge()) { + ksft_test_result_fail("Running KSM failed\n"); + goto unmap; + } + return map; +unmap: + munmap(map, size); + return MAP_FAILED; +} + +static void test_unmerge(void) +{ + const unsigned int size =3D 2 * MiB; + char *map; + + ksft_print_msg("[RUN] %s\n", __func__); + + map =3D mmap_and_merge_range(0xcf, size); + if (map =3D=3D MAP_FAILED) + return; + + if (madvise(map, size, MADV_UNMERGEABLE)) { + ksft_test_result_fail("MADV_UNMERGEABLE failed\n"); + goto unmap; + } + + ksft_test_result(!range_maps_duplicates(map, size), + "Pages were unmerged\n"); +unmap: + munmap(map, size); +} + +static void test_unmerge_discarded(void) +{ + const unsigned int size =3D 2 * MiB; + char *map; + + ksft_print_msg("[RUN] %s\n", __func__); + + map =3D mmap_and_merge_range(0xcf, size); + if (map =3D=3D MAP_FAILED) + return; + + /* Discard half of all mapped pages so we have pte_none() entries. */ + if (madvise(map, size / 2, MADV_DONTNEED)) { + ksft_test_result_fail("MADV_DONTNEED failed\n"); + goto unmap; + } + + if (madvise(map, size, MADV_UNMERGEABLE)) { + ksft_test_result_fail("MADV_UNMERGEABLE failed\n"); + goto unmap; + } + + ksft_test_result(!range_maps_duplicates(map, size), + "Pages were unmerged\n"); +unmap: + munmap(map, size); +} + +#ifdef __NR_userfaultfd +static void test_unmerge_uffd_wp(void) +{ + struct uffdio_writeprotect uffd_writeprotect; + struct uffdio_register uffdio_register; + const unsigned int size =3D 2 * MiB; + struct uffdio_api uffdio_api; + char *map; + int uffd; + + ksft_print_msg("[RUN] %s\n", __func__); + + map =3D mmap_and_merge_range(0xcf, size); + if (map =3D=3D MAP_FAILED) + return; + + /* See if UFFD is around. */ + uffd =3D syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) { + ksft_test_result_skip("__NR_userfaultfd failed\n"); + goto unmap; + } + + /* See if UFFD-WP is around. */ + uffdio_api.api =3D UFFD_API; + uffdio_api.features =3D UFFD_FEATURE_PAGEFAULT_FLAG_WP; + if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) { + ksft_test_result_fail("UFFDIO_API failed\n"); + goto close_uffd; + } + if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) { + ksft_test_result_skip("UFFD_FEATURE_PAGEFAULT_FLAG_WP not available\n"); + goto close_uffd; + } + + /* Register UFFD-WP, no need for an actual handler. */ + uffdio_register.range.start =3D (unsigned long) map; + uffdio_register.range.len =3D size; + uffdio_register.mode =3D UFFDIO_REGISTER_MODE_WP; + if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) { + ksft_test_result_fail("UFFDIO_REGISTER_MODE_WP failed\n"); + goto close_uffd; + } + + /* Write-protect the range using UFFD-WP. */ + uffd_writeprotect.range.start =3D (unsigned long) map; + uffd_writeprotect.range.len =3D size; + uffd_writeprotect.mode =3D UFFDIO_WRITEPROTECT_MODE_WP; + if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) { + ksft_test_result_fail("UFFDIO_WRITEPROTECT failed\n"); + goto close_uffd; + } + + if (madvise(map, size, MADV_UNMERGEABLE)) { + ksft_test_result_fail("MADV_UNMERGEABLE failed\n"); + goto close_uffd; + } + + ksft_test_result(!range_maps_duplicates(map, size), + "Pages were unmerged\n"); +close_uffd: + close(uffd); +unmap: + munmap(map, size); +} +#endif + +int main(int argc, char **argv) +{ + unsigned int tests =3D 2; + int err; + +#ifdef __NR_userfaultfd + tests++; +#endif + + ksft_print_header(); + ksft_set_plan(tests); + + pagesize =3D getpagesize(); + + ksm_fd =3D open("/sys/kernel/mm/ksm/run", O_RDWR); + if (ksm_fd < 0) + ksft_exit_skip("open(\"/sys/kernel/mm/ksm/run\") failed\n"); + ksm_full_scans_fd =3D open("/sys/kernel/mm/ksm/full_scans", O_RDONLY); + if (ksm_full_scans_fd < 0) + ksft_exit_skip("open(\"/sys/kernel/mm/ksm/full_scans\") failed\n"); + pagemap_fd =3D open("/proc/self/pagemap", O_RDONLY); + if (pagemap_fd < 0) + ksft_exit_skip("open(\"/proc/self/pagemap\") failed\n"); + + test_unmerge(); + test_unmerge_discarded(); +#ifdef __NR_userfaultfd + test_unmerge_uffd_wp(); +#endif + + err =3D ksft_get_fail_cnt(); + if (err) + ksft_exit_fail_msg("%d out of %d tests failed\n", + err, ksft_test_num()); + return ksft_exit_pass(); +} diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/self= tests/vm/run_vmtests.sh index e780e76c26b8..b8950891259b 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -184,6 +184,8 @@ run_test ./ksm_tests -N -m 1 # KSM test with 2 NUMA nodes and merge_across_nodes =3D 0 run_test ./ksm_tests -N -m 0 =20 +run_test ./ksm_functional_tests + # protection_keys tests if [ -x ./protection_keys_32 ] then diff --git a/tools/testing/selftests/vm/vm_util.c b/tools/testing/selftests= /vm/vm_util.c index f11f8adda521..dbd8889324e6 100644 --- a/tools/testing/selftests/vm/vm_util.c +++ b/tools/testing/selftests/vm/vm_util.c @@ -28,6 +28,16 @@ bool pagemap_is_softdirty(int fd, char *start) return entry & 0x0080000000000000ull; } =20 +unsigned long pagemap_get_pfn(int fd, char *start) +{ + uint64_t entry =3D pagemap_get_entry(fd, start); + + /* If present (63th bit), PFN is at bit 0 -- 54. */ + if (entry & 0x8000000000000000ull) + return entry & 0x007fffffffffffffull; + return -1ull; +} + void clear_softdirty(void) { int ret; diff --git a/tools/testing/selftests/vm/vm_util.h b/tools/testing/selftests= /vm/vm_util.h index 5c35de454e08..acecb5b6f8ca 100644 --- a/tools/testing/selftests/vm/vm_util.h +++ b/tools/testing/selftests/vm/vm_util.h @@ -4,6 +4,7 @@ =20 uint64_t pagemap_get_entry(int fd, char *start); bool pagemap_is_softdirty(int fd, char *start); +unsigned long pagemap_get_pfn(int fd, char *start); void clear_softdirty(void); bool check_for_pattern(FILE *fp, const char *pattern, char *buf, size_t le= n); uint64_t read_pmd_pagesize(void); --=20 2.37.3 From nobody Sat May 4 14:58:49 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F5E8C4332F for ; Fri, 21 Oct 2022 10:13:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230367AbiJUKNH (ORCPT ); Fri, 21 Oct 2022 06:13:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230334AbiJUKMm (ORCPT ); Fri, 21 Oct 2022 06:12:42 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D72FF7AB12 for ; Fri, 21 Oct 2022 03:12:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666347158; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jHrPlMhzTs5FfUPfL2J2Q5lLuPRkNLMRcyfMgFj4nT4=; b=N0Ce5rcVUBQr3gTezV1zh0XBJS1P/gMC0KTXioI/3eSQJG+DnXbbumJGMxAX+2gvVaiLS8 sp0PiYrQmi6MsFHf/21NsbIXhGffhXvssfr7dApHsqE5h1HNZX/HUXDe6pEsTy9IxQRECh YtAjQdjRHmmbMArpoyVQxxbnbR9LYPs= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-466-dn75feFuN_6Iop431YOpQQ-1; Fri, 21 Oct 2022 06:12:34 -0400 X-MC-Unique: dn75feFuN_6Iop431YOpQQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 36AF23815D31; Fri, 21 Oct 2022 10:12:34 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id D682940E42FB; Fri, 21 Oct 2022 10:12:22 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v2 5/9] mm/ksm: fix KSM COW breaking with userfaultfd-wp via FAULT_FLAG_UNSHARE Date: Fri, 21 Oct 2022 12:11:37 +0200 Message-Id: <20221021101141.84170-6-david@redhat.com> In-Reply-To: <20221021101141.84170-1-david@redhat.com> References: <20221021101141.84170-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Let's stop breaking COW via a fake write fault and let's use FAULT_FLAG_UNSHARE instead. This avoids any wrong side effects of the fake write fault, such as mapping the PTE writable and marking the pte dirty/softdirty. Consequently, we will no longer trigger a fake write fault and break COW without any such side-effects. Also, this fixes KSM interaction with userfaultfd-wp: when we have a KSM page that's write-protected by userfaultfd, break_ksm()->handle_mm_fault() will fail with VM_FAULT_SIGBUS and will simply return in break_ksm() with 0 instead of actually breaking COW. For now, the KSM unmerge tests can trigger that: $ sudo ./ksm_functional_tests TAP version 13 1..3 # [RUN] test_unmerge ok 1 Pages were unmerged # [RUN] test_unmerge_discarded ok 2 Pages were unmerged # [RUN] test_unmerge_uffd_wp not ok 3 Pages were unmerged Bail out! 1 out of 3 tests failed # Planned tests !=3D run tests (2 !=3D 3) # Totals: pass:2 fail:1 xfail:0 xpass:0 skip:0 error:0 The warning in dmesg also indicates this wrong handling: [ 230.096368] FAULT_FLAG_ALLOW_RETRY missing 881 [ 230.100822] CPU: 1 PID: 1643 Comm: ksm-uffd-wp [...] [ 230.110124] Hardware name: [...] [ 230.117775] Call Trace: [ 230.120227] [ 230.122334] dump_stack_lvl+0x44/0x5c [ 230.126010] handle_userfault.cold+0x14/0x19 [ 230.130281] ? tlb_finish_mmu+0x65/0x170 [ 230.134207] ? uffd_wp_range+0x65/0xa0 [ 230.137959] ? _raw_spin_unlock+0x15/0x30 [ 230.141972] ? do_wp_page+0x50/0x590 [ 230.145551] __handle_mm_fault+0x9f5/0xf50 [ 230.149652] ? mmput+0x1f/0x40 [ 230.152712] handle_mm_fault+0xb9/0x2a0 [ 230.156550] break_ksm+0x141/0x180 [ 230.159964] unmerge_ksm_pages+0x60/0x90 [ 230.163890] ksm_madvise+0x3c/0xb0 [ 230.167295] do_madvise.part.0+0x10c/0xeb0 [ 230.171396] ? do_syscall_64+0x67/0x80 [ 230.175157] __x64_sys_madvise+0x5a/0x70 [ 230.179082] do_syscall_64+0x58/0x80 [ 230.182661] ? do_syscall_64+0x67/0x80 [ 230.186413] entry_SYSCALL_64_after_hwframe+0x63/0xcd This is primarily a fix for KSM+userfaultfd-wp, however, the fake write fault was always questionable. As this fix is not easy to backport and it's not very critical, let's not cc stable. Fixes: 529b930b87d9 ("userfaultfd: wp: hook userfault handler to write prot= ection fault") Acked-by: Peter Xu Signed-off-by: David Hildenbrand --- mm/ksm.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index b884a22f3c3c..c6f58aa6e731 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -420,17 +420,15 @@ static inline bool ksm_test_exit(struct mm_struct *mm) } =20 /* - * We use break_ksm to break COW on a ksm page: it's a stripped down + * We use break_ksm to break COW on a ksm page by triggering unsharing, + * such that the ksm page will get replaced by an exclusive anonymous page. * - * if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) =3D=3D 1) - * put_page(page); - * - * but taking great care only to touch a ksm page, in a VM_MERGEABLE vma, + * We take great care only to touch a ksm page, in a VM_MERGEABLE vma, * in case the application has unmapped and remapped mm,addr meanwhile. * Could a ksm page appear anywhere else? Actually yes, in a VM_PFNMAP * mmap of /dev/mem, where we would not want to touch it. * - * FAULT_FLAG/FOLL_REMOTE are because we do this outside the context + * FAULT_FLAG_REMOTE/FOLL_REMOTE are because we do this outside the context * of the process that owns 'vma'. We also do not want to enforce * protection keys here anyway. */ @@ -454,7 +452,7 @@ static int break_ksm(struct vm_area_struct *vma, unsign= ed long addr) if (!ksm_page) return 0; ret =3D handle_mm_fault(vma, addr, - FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE, + FAULT_FLAG_UNSHARE | FAULT_FLAG_REMOTE, NULL); } while (!(ret & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | VM_FAULT_OOM))); /* --=20 2.37.3 From nobody Sat May 4 14:58:49 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB8AAC433FE for ; Fri, 21 Oct 2022 10:12:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230228AbiJUKMz (ORCPT ); Fri, 21 Oct 2022 06:12:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230337AbiJUKMk (ORCPT ); Fri, 21 Oct 2022 06:12:40 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2905A7AB17 for ; Fri, 21 Oct 2022 03:12:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666347157; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QvbHA1NlNVduFtzvLq26bYOPMz6FpLJa39idJS7z9qQ=; b=D0fWQBBvQLAzhX2zjMOZNPgyUEE8F7g2aGliW+hHkYnMg5vL9pzFrRzAGm2lghQCUaoo8o pl9lwHe9CS/PJvjBOS8OD3hoPeSucVyz3lPMsRgNM8lVMdzU0aSLHVl7WWbf96gEd6Jtl5 zu5Iip2VSrucIfolCPzLpdIggfG8+zo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-642-ViDLH6jQP72Zr-84yIPLcQ-1; Fri, 21 Oct 2022 06:12:33 -0400 X-MC-Unique: ViDLH6jQP72Zr-84yIPLcQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2B6183C0F242; Fri, 21 Oct 2022 10:12:33 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id 31D7B40E80E7; Fri, 21 Oct 2022 10:12:29 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v2 6/9] mm/pagewalk: don't trigger test_walk() in walk_page_vma() Date: Fri, 21 Oct 2022 12:11:38 +0200 Message-Id: <20221021101141.84170-7-david@redhat.com> In-Reply-To: <20221021101141.84170-1-david@redhat.com> References: <20221021101141.84170-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" As Peter points out, the caller passes a single VMA and can just do that check itself. And in fact, no existing users rely on test_walk() getting called. So let's just remove it and make the implementation slightly more efficient. Signed-off-by: David Hildenbrand --- include/linux/pagewalk.h | 2 ++ mm/pagewalk.c | 7 ------- 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index f3fafb731ffd..37dc0208862d 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -27,6 +27,8 @@ struct mm_walk; * "do page table walk over the current vma", returning * a negative value means "abort current page table walk * right now" and returning 1 means "skip the current vma" + * Note that this callback is not called when the caller + * passes in a single VMA as for walk_page_vma(). * @pre_vma: if set, called before starting walk on a non-null = vma. * @post_vma: if set, called after a walk on a non-null vma, pro= vided * that @pre_vma and the vma walk succeeded. diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 2ff3a5bebceb..0a5d71aaf9c7 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -526,18 +526,11 @@ int walk_page_vma(struct vm_area_struct *vma, const s= truct mm_walk_ops *ops, .vma =3D vma, .private =3D private, }; - int err; =20 if (!walk.mm) return -EINVAL; =20 mmap_assert_locked(walk.mm); - - err =3D walk_page_test(vma->vm_start, vma->vm_end, &walk); - if (err > 0) - return 0; - if (err < 0) - return err; return __walk_page_range(vma->vm_start, vma->vm_end, &walk); } =20 --=20 2.37.3 From nobody Sat May 4 14:58:49 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 454EDC43217 for ; Fri, 21 Oct 2022 10:13:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230380AbiJUKM7 (ORCPT ); Fri, 21 Oct 2022 06:12:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230339AbiJUKMm (ORCPT ); Fri, 21 Oct 2022 06:12:42 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 191637AB08 for ; Fri, 21 Oct 2022 03:12:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666347159; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3jjsbfVX32kRXRzGJzqZMgspzFJroQ0YMSc5QEJ3H8s=; b=gcO/3AFsz3ar9p76VY8o1UnQXQvkCtMBXQ7xzXeP15/xultDfC4XSFDM4nLtf1RMQblHyL 2o0cLOtiOaOoeUMG22b0C3Q++xpq+VyJrCk+j3Mn1jf+GoiHkRtBrawmanDYnngpCjHn9q 16KMOHFr+gZsmlZ3adl2umOitQWPnq8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-621-079V-auANpW0rqJrmq2chA-1; Fri, 21 Oct 2022 06:12:36 -0400 X-MC-Unique: 079V-auANpW0rqJrmq2chA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 97B9B3C0F23B; Fri, 21 Oct 2022 10:12:35 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6A76A40E80E9; Fri, 21 Oct 2022 10:12:33 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v2 7/9] mm/pagewalk: add walk_page_range_vma() Date: Fri, 21 Oct 2022 12:11:39 +0200 Message-Id: <20221021101141.84170-8-david@redhat.com> In-Reply-To: <20221021101141.84170-1-david@redhat.com> References: <20221021101141.84170-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Let's add walk_page_range_vma(), which is similar to walk_page_vma(), however, is only interested in a subset of the VMA range. To be used in KSM code to stop using follow_page() next. Signed-off-by: David Hildenbrand --- include/linux/pagewalk.h | 3 +++ mm/pagewalk.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 37dc0208862d..959f52e5867d 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -101,6 +101,9 @@ int walk_page_range_novma(struct mm_struct *mm, unsigne= d long start, unsigned long end, const struct mm_walk_ops *ops, pgd_t *pgd, void *private); +int walk_page_range_vma(struct vm_area_struct *vma, unsigned long start, + unsigned long end, const struct mm_walk_ops *ops, + void *private); int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *op= s, void *private); int walk_page_mapping(struct address_space *mapping, pgoff_t first_index, diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 0a5d71aaf9c7..7f1c9b274906 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -517,6 +517,26 @@ int walk_page_range_novma(struct mm_struct *mm, unsign= ed long start, return walk_pgd_range(start, end, &walk); } =20 +int walk_page_range_vma(struct vm_area_struct *vma, unsigned long start, + unsigned long end, const struct mm_walk_ops *ops, + void *private) +{ + struct mm_walk walk =3D { + .ops =3D ops, + .mm =3D vma->vm_mm, + .vma =3D vma, + .private =3D private, + }; + + if (start >=3D end || !walk.mm) + return -EINVAL; + if (start < vma->vm_start || end > vma->vm_end) + return -EINVAL; + + mmap_assert_locked(walk.mm); + return __walk_page_range(start, end, &walk); +} + int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *op= s, void *private) { --=20 2.37.3 From nobody Sat May 4 14:58:49 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95A34C433FE for ; Fri, 21 Oct 2022 10:13:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230363AbiJUKNV (ORCPT ); Fri, 21 Oct 2022 06:13:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230410AbiJUKNI (ORCPT ); Fri, 21 Oct 2022 06:13:08 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECD0025B27E for ; Fri, 21 Oct 2022 03:12:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666347165; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RBzNhruieU6ZfUTXGoryk7ArPTHcpoWfg1dNbBcF6Js=; b=AZtaZF+4iGNg24ymVruN1Tf/FWJtig4Rvz4nFUQgm28nObf6Z1VSqS06XgjShtIOaGDsfc qW7/4CZYC2dvO8HJ4u/Q9jWHQJ9IkHuDJE49CUKkdAFnykSk4Z9VYE6EaS1Z1iXEky5VIQ q5ceA2E7N5bUrvOBzdPlIAPZCXSNPWo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-632-nBZ72PQWNyWQLn-TMGl-EA-1; Fri, 21 Oct 2022 06:12:38 -0400 X-MC-Unique: nBZ72PQWNyWQLn-TMGl-EA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 230601C06EDC; Fri, 21 Oct 2022 10:12:38 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id D93DC40E80E3; Fri, 21 Oct 2022 10:12:35 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v2 8/9] mm/ksm: convert break_ksm() to use walk_page_range_vma() Date: Fri, 21 Oct 2022 12:11:40 +0200 Message-Id: <20221021101141.84170-9-david@redhat.com> In-Reply-To: <20221021101141.84170-1-david@redhat.com> References: <20221021101141.84170-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" FOLL_MIGRATION exists only for the purpose of break_ksm(), and actually, there is not even the need to wait for the migration to finish, we only want to know if we're dealing with a KSM page. Using follow_page() just to identify a KSM page overcomplicates GUP code. Let's use walk_page_range_vma() instead, because we don't actually care about the page itself, we only need to know a single property -- no need to even grab a reference. So, get rid of follow_page() usage such that we can get rid of FOLL_MIGRATION now and eventually be able to get rid of follow_page() in the future. In my setup (AMD Ryzen 9 3900X), running the KSM selftest to test unmerge performance on 2 GiB (taskset 0x8 ./ksm_tests -D -s 2048), this results in a performance degradation of ~2% (old: ~5010 MiB/s, new: ~4900 MiB/s). I don't think we particularly care for now. Interestingly, the benchmark reduction is due to the single callback. Adding a second callback (e.g., pud_entry()) reduces the benchmark by another 100-200 MiB/s. Signed-off-by: David Hildenbrand --- mm/ksm.c | 49 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 39 insertions(+), 10 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index c6f58aa6e731..5cdb852ff132 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -39,6 +39,7 @@ #include #include #include +#include =20 #include #include "internal.h" @@ -419,6 +420,39 @@ static inline bool ksm_test_exit(struct mm_struct *mm) return atomic_read(&mm->mm_users) =3D=3D 0; } =20 +static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned lo= ng next, + struct mm_walk *walk) +{ + struct page *page =3D NULL; + spinlock_t *ptl; + pte_t *pte; + int ret; + + if (pmd_leaf(*pmd) || !pmd_present(*pmd)) + return 0; + + pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (pte_present(*pte)) { + page =3D vm_normal_page(walk->vma, addr, *pte); + } else if (!pte_none(*pte)) { + swp_entry_t entry =3D pte_to_swp_entry(*pte); + + /* + * As KSM pages remain KSM pages until freed, no need to wait + * here for migration to end. + */ + if (is_migration_entry(entry)) + page =3D pfn_swap_entry_to_page(entry); + } + ret =3D page && PageKsm(page); + pte_unmap_unlock(pte, ptl); + return ret; +} + +static const struct mm_walk_ops break_ksm_ops =3D { + .pmd_entry =3D break_ksm_pmd_entry, +}; + /* * We use break_ksm to break COW on a ksm page by triggering unsharing, * such that the ksm page will get replaced by an exclusive anonymous page. @@ -434,21 +468,16 @@ static inline bool ksm_test_exit(struct mm_struct *mm) */ static int break_ksm(struct vm_area_struct *vma, unsigned long addr) { - struct page *page; vm_fault_t ret =3D 0; =20 do { - bool ksm_page =3D false; + int ksm_page; =20 cond_resched(); - page =3D follow_page(vma, addr, - FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE); - if (IS_ERR_OR_NULL(page)) - break; - if (PageKsm(page)) - ksm_page =3D true; - put_page(page); - + ksm_page =3D walk_page_range_vma(vma, addr, addr + 1, + &break_ksm_ops, NULL); + if (WARN_ON_ONCE(ksm_page < 0)) + return ksm_page; if (!ksm_page) return 0; ret =3D handle_mm_fault(vma, addr, --=20 2.37.3 From nobody Sat May 4 14:58:49 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 918C8C4332F for ; Fri, 21 Oct 2022 10:13:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230455AbiJUKNS (ORCPT ); Fri, 21 Oct 2022 06:13:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230398AbiJUKNG (ORCPT ); Fri, 21 Oct 2022 06:13:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F9F924CCB7 for ; Fri, 21 Oct 2022 03:12:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666347164; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F2jxpEehZg91tvyIg0dyRa8PzGRQ+ddZdBpCZCuDxx0=; b=VYTq5ysQ9mtvp3sVWLrSZeBzOHa8Pv6ez/2Ci7EBnSTWyLo58ymtaZOF8BeIwuWtStx84y RhW0npNkREO6l1RhcW9zv9jl2d0OXfxJQ2QtAjAXebmbHVx4+wmcyf1BBFQ2zb/6HSPsGh DXEK5fDxsOBudwFLUKGB/DTbiJeUPHI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-91-tMySy2NHMr6eu_lQNLJ3Qw-1; Fri, 21 Oct 2022 06:12:40 -0400 X-MC-Unique: tMySy2NHMr6eu_lQNLJ3Qw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 319E83C0F240; Fri, 21 Oct 2022 10:12:40 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5F9BF40E80E4; Fri, 21 Oct 2022 10:12:38 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v2 9/9] mm/gup: remove FOLL_MIGRATION Date: Fri, 21 Oct 2022 12:11:41 +0200 Message-Id: <20221021101141.84170-10-david@redhat.com> In-Reply-To: <20221021101141.84170-1-david@redhat.com> References: <20221021101141.84170-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Fortunately, the last user (KSM) is gone, so let's just remove this rather special code from generic GUP handling -- especially because KSM never required the PMD handling as KSM only deals with individual base pages. Signed-off-by: David Hildenbrand --- include/linux/mm.h | 1 - mm/gup.c | 55 +++++----------------------------------------- 2 files changed, 5 insertions(+), 51 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8bbcccbc5565..a63415ac9dc2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2950,7 +2950,6 @@ struct page *follow_page(struct vm_area_struct *vma, = unsigned long address, * and return without waiting upon it */ #define FOLL_NOFAULT 0x80 /* do not fault in pages */ #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */ -#define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */ #define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */ #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */ #define FOLL_ANON 0x8000 /* don't do file mappings */ diff --git a/mm/gup.c b/mm/gup.c index fe195d47de74..bcb46e9d496e 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -549,30 +549,13 @@ static struct page *follow_page_pte(struct vm_area_st= ruct *vma, return no_page_table(vma, flags); } =20 -retry: if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); =20 ptep =3D pte_offset_map_lock(mm, pmd, address, &ptl); pte =3D *ptep; - if (!pte_present(pte)) { - swp_entry_t entry; - /* - * KSM's break_ksm() relies upon recognizing a ksm page - * even while it is being migrated, so for that case we - * need migration_entry_wait(). - */ - if (likely(!(flags & FOLL_MIGRATION))) - goto no_page; - if (pte_none(pte)) - goto no_page; - entry =3D pte_to_swp_entry(pte); - if (!is_migration_entry(entry)) - goto no_page; - pte_unmap_unlock(ptep, ptl); - migration_entry_wait(mm, pmd, address); - goto retry; - } + if (!pte_present(pte)) + goto no_page; if (pte_protnone(pte) && !gup_can_follow_protnone(flags)) goto no_page; =20 @@ -694,28 +677,8 @@ static struct page *follow_pmd_mask(struct vm_area_str= uct *vma, return page; return no_page_table(vma, flags); } -retry: - if (!pmd_present(pmdval)) { - /* - * Should never reach here, if thp migration is not supported; - * Otherwise, it must be a thp migration entry. - */ - VM_BUG_ON(!thp_migration_supported() || - !is_pmd_migration_entry(pmdval)); - - if (likely(!(flags & FOLL_MIGRATION))) - return no_page_table(vma, flags); - - pmd_migration_entry_wait(mm, pmd); - pmdval =3D READ_ONCE(*pmd); - /* - * MADV_DONTNEED may convert the pmd to null because - * mmap_lock is held in read mode - */ - if (pmd_none(pmdval)) - return no_page_table(vma, flags); - goto retry; - } + if (!pmd_present(pmdval)) + return no_page_table(vma, flags); if (pmd_devmap(pmdval)) { ptl =3D pmd_lock(mm, pmd); page =3D follow_devmap_pmd(vma, address, pmd, flags, &ctx->pgmap); @@ -729,18 +692,10 @@ static struct page *follow_pmd_mask(struct vm_area_st= ruct *vma, if (pmd_protnone(pmdval) && !gup_can_follow_protnone(flags)) return no_page_table(vma, flags); =20 -retry_locked: ptl =3D pmd_lock(mm, pmd); - if (unlikely(pmd_none(*pmd))) { - spin_unlock(ptl); - return no_page_table(vma, flags); - } if (unlikely(!pmd_present(*pmd))) { spin_unlock(ptl); - if (likely(!(flags & FOLL_MIGRATION))) - return no_page_table(vma, flags); - pmd_migration_entry_wait(mm, pmd); - goto retry_locked; + return no_page_table(vma, flags); } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); --=20 2.37.3