From nobody Fri Apr 26 20:57:18 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF52CC4332F for ; Fri, 30 Sep 2022 14:20:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231875AbiI3OUD (ORCPT ); Fri, 30 Sep 2022 10:20:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38826 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231879AbiI3OT5 (ORCPT ); Fri, 30 Sep 2022 10:19:57 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39A844C61B for ; Fri, 30 Sep 2022 07:19:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664547592; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=14k6alGur8/dUo2gK36W7ANUFkyrFHKBXBjIPk/s/yc=; b=HlFps+2Gj4NYaZSDtOGAQXfLhzI3FFGu0jnsHKIhpaNmCVTy4q7nMPVpsVZmEDV85HcnCS wtZjML2z4Zh7QO7tdf1FznRA8o7bylDb/nYo9EhCbqpabNiDa5YGVbapDu9flYGunN04KW sD+SwL+r9VR29aAVUOhLewmXgbaoKb8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-271-bM6wMhxeMee1M1Ad6jU0sQ-1; Fri, 30 Sep 2022 10:19:49 -0400 X-MC-Unique: bM6wMhxeMee1M1Ad6jU0sQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 99B20185A7A9; Fri, 30 Sep 2022 14:19:48 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.187]) by smtp.corp.redhat.com (Postfix) with ESMTP id D0A9E1121314; Fri, 30 Sep 2022 14:19:45 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v1 1/7] selftests/vm: add test to measure MADV_UNMERGEABLE performance Date: Fri, 30 Sep 2022 16:19:25 +0200 Message-Id: <20220930141931.174362-2-david@redhat.com> In-Reply-To: <20220930141931.174362-1-david@redhat.com> References: <20220930141931.174362-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Let's add a test to measure performance of KSM breaking not triggered via COW, but triggered by disabling KSM on an area filled with KSM pages via MADV_UNMERGEABLE. Signed-off-by: David Hildenbrand Acked-by: Peter Xu --- tools/testing/selftests/vm/ksm_tests.c | 76 +++++++++++++++++++++++++- 1 file changed, 74 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/ksm_tests.c b/tools/testing/selftes= ts/vm/ksm_tests.c index f5e4e0bbd081..353eee96aeba 100644 --- a/tools/testing/selftests/vm/ksm_tests.c +++ b/tools/testing/selftests/vm/ksm_tests.c @@ -40,6 +40,7 @@ enum ksm_test_name { CHECK_KSM_NUMA_MERGE, KSM_MERGE_TIME, KSM_MERGE_TIME_HUGE_PAGES, + KSM_UNMERGE_TIME, KSM_COW_TIME }; =20 @@ -108,7 +109,10 @@ static void print_help(void) " -P evaluate merging time and speed.\n" " For this test, the size of duplicated memory area (in MiB)\n" " must be provided using -s option\n" - " -H evaluate merging time and speed of area allocated mostly with hu= ge pages\n" + " -H evaluate merging time and speed of area allocated mostly with= huge pages\n" + " For this test, the size of duplicated memory area (in MiB)\n" + " must be provided using -s option\n" + " -D evaluate unmerging time and speed when disabling KSM.\n" " For this test, the size of duplicated memory area (in MiB)\n" " must be provided using -s option\n" " -C evaluate the time required to break COW of merged pages.\n\n"= ); @@ -188,6 +192,16 @@ static int ksm_merge_pages(void *addr, size_t size, st= ruct timespec start_time, return 0; } =20 +static int ksm_unmerge_pages(void *addr, size_t size, + struct timespec start_time, int timeout) +{ + if (madvise(addr, size, MADV_UNMERGEABLE)) { + perror("madvise"); + return 1; + } + return 0; +} + static bool assert_ksm_pages_count(long dupl_page_count) { unsigned long max_page_sharing, pages_sharing, pages_shared; @@ -560,6 +574,53 @@ static int ksm_merge_time(int mapping, int prot, int t= imeout, size_t map_size) return KSFT_FAIL; } =20 +static int ksm_unmerge_time(int mapping, int prot, int timeout, size_t map= _size) +{ + void *map_ptr; + struct timespec start_time, end_time; + unsigned long scan_time_ns; + + map_size *=3D MB; + + map_ptr =3D allocate_memory(NULL, prot, mapping, '*', map_size); + if (!map_ptr) + return KSFT_FAIL; + if (clock_gettime(CLOCK_MONOTONIC_RAW, &start_time)) { + perror("clock_gettime"); + goto err_out; + } + if (ksm_merge_pages(map_ptr, map_size, start_time, timeout)) + goto err_out; + + if (clock_gettime(CLOCK_MONOTONIC_RAW, &start_time)) { + perror("clock_gettime"); + goto err_out; + } + if (ksm_unmerge_pages(map_ptr, map_size, start_time, timeout)) + goto err_out; + if (clock_gettime(CLOCK_MONOTONIC_RAW, &end_time)) { + perror("clock_gettime"); + goto err_out; + } + + scan_time_ns =3D (end_time.tv_sec - start_time.tv_sec) * NSEC_PER_SEC + + (end_time.tv_nsec - start_time.tv_nsec); + + printf("Total size: %lu MiB\n", map_size / MB); + printf("Total time: %ld.%09ld s\n", scan_time_ns / NSEC_PER_SEC, + scan_time_ns % NSEC_PER_SEC); + printf("Average speed: %.3f MiB/s\n", (map_size / MB) / + ((double)scan_time_ns / NSEC_PER_SEC)); + + munmap(map_ptr, map_size); + return KSFT_PASS; + +err_out: + printf("Not OK\n"); + munmap(map_ptr, map_size); + return KSFT_FAIL; +} + static int ksm_cow_time(int mapping, int prot, int timeout, size_t page_si= ze) { void *map_ptr; @@ -644,7 +705,7 @@ int main(int argc, char *argv[]) bool merge_across_nodes =3D KSM_MERGE_ACROSS_NODES_DEFAULT; long size_MB =3D 0; =20 - while ((opt =3D getopt(argc, argv, "ha:p:l:z:m:s:MUZNPCH")) !=3D -1) { + while ((opt =3D getopt(argc, argv, "ha:p:l:z:m:s:MUZNPCHD")) !=3D -1) { switch (opt) { case 'a': prot =3D str_to_prot(optarg); @@ -701,6 +762,9 @@ int main(int argc, char *argv[]) case 'H': test_name =3D KSM_MERGE_TIME_HUGE_PAGES; break; + case 'D': + test_name =3D KSM_UNMERGE_TIME; + break; case 'C': test_name =3D KSM_COW_TIME; break; @@ -762,6 +826,14 @@ int main(int argc, char *argv[]) ret =3D ksm_merge_hugepages_time(MAP_PRIVATE | MAP_ANONYMOUS, prot, ksm_scan_limit_sec, size_MB); break; + case KSM_UNMERGE_TIME: + if (size_MB =3D=3D 0) { + printf("Option '-s' is required.\n"); + return KSFT_FAIL; + } + ret =3D ksm_unmerge_time(MAP_PRIVATE | MAP_ANONYMOUS, prot, + ksm_scan_limit_sec, size_MB); + break; case KSM_COW_TIME: ret =3D ksm_cow_time(MAP_PRIVATE | MAP_ANONYMOUS, prot, ksm_scan_limit_s= ec, page_size); --=20 2.37.3 From nobody Fri Apr 26 20:57:18 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEA36C433F5 for ; Fri, 30 Sep 2022 14:20:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231916AbiI3OUN (ORCPT ); Fri, 30 Sep 2022 10:20:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231894AbiI3OUA (ORCPT ); Fri, 30 Sep 2022 10:20:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 272A6481F5 for ; Fri, 30 Sep 2022 07:19:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664547596; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h/FXbSZy3Kth2MbXXSfAiotPyH32a5Ri5Xru6xlLiiI=; b=YDc7aufxQ77ETpXIcbgafFZyy8JWN7shPnqUct7ECf6vunr50eNVmpO9qeOGLkrJc/+MBu FNxaQj+mJ8hck6hwzu6EbHNEVrU9xBk0T1xAzHFNpJyrerw6OcbMdpCYCh38NSr3iefO+h QFnWtRgT0Grsi5ngY8dQoVZHlJvZZjs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-81-xX1KmHDqPWyeq4x3g8JPWw-1; Fri, 30 Sep 2022 10:19:53 -0400 X-MC-Unique: xX1KmHDqPWyeq4x3g8JPWw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E3D0380C8C3; Fri, 30 Sep 2022 14:19:52 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.187]) by smtp.corp.redhat.com (Postfix) with ESMTP id 05DD71121314; Fri, 30 Sep 2022 14:19:48 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v1 2/7] mm/ksm: simplify break_ksm() to not rely on VM_FAULT_WRITE Date: Fri, 30 Sep 2022 16:19:26 +0200 Message-Id: <20220930141931.174362-3-david@redhat.com> In-Reply-To: <20220930141931.174362-1-david@redhat.com> References: <20220930141931.174362-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Now that GUP no longer requires VM_FAULT_WRITE, break_ksm() is the sole remaining user of VM_FAULT_WRITE. As we also want to stop triggering a fake write fault and instead use FAULT_FLAG_UNSHARE -- similar to GUP-triggered unsharing when taking a R/O pin on a shared anonymous page (including KSM pages), let's stop relying on VM_FAULT_WRITE. Let's rework break_ksm() to not rely on the return value of handle_mm_fault() anymore to figure out whether COW-breaking was successful. Simply perform another follow_page() lookup to verify the result. While this makes break_ksm() slightly less efficient, we can simplify handle_mm_fault() a little and easily switch to FAULT_FLAG_UNSHARE without introducing similar KSM-specific behavior for FAULT_FLAG_UNSHARE. In my setup (AMD Ryzen 9 3900X), running the KSM selftest to test unmerge performance on 2 GiB (taskset 0x8 ./ksm_tests -D -s 2048), this results in a performance degradation of ~4% -- 5% (old: ~5250 MiB/s, new: ~5010 MiB/s). I don't think that we particularly care about that performance drop when unmerging. If it ever turns out to be an actual performance issue, we can think about a better alternative for FAULT_FLAG_UNSHARE -- let's just keep it simple for now. Signed-off-by: David Hildenbrand Acked-by: Peter Xu --- mm/ksm.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index 0cd2f4b62334..e8d987fb379e 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -473,26 +473,27 @@ static int break_ksm(struct vm_area_struct *vma, unsi= gned long addr) vm_fault_t ret =3D 0; =20 do { + bool ksm_page =3D false; + cond_resched(); page =3D follow_page(vma, addr, FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE); if (IS_ERR_OR_NULL(page)) break; if (PageKsm(page)) - ret =3D handle_mm_fault(vma, addr, - FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE, - NULL); - else - ret =3D VM_FAULT_WRITE; + ksm_page =3D true; put_page(page); - } while (!(ret & (VM_FAULT_WRITE | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | V= M_FAULT_OOM))); + + if (!ksm_page) + return 0; + ret =3D handle_mm_fault(vma, addr, + FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE, + NULL); + } while (!(ret & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | VM_FAULT_OOM))); /* - * We must loop because handle_mm_fault() may back out if there's - * any difficulty e.g. if pte accessed bit gets updated concurrently. - * - * VM_FAULT_WRITE is what we have been hoping for: it indicates that - * COW has been broken, even if the vma does not permit VM_WRITE; - * but note that a concurrent fault might break PageKsm for us. + * We must loop until we no longer find a KSM page because + * handle_mm_fault() may back out if there's any difficulty e.g. if + * pte accessed bit gets updated concurrently. * * VM_FAULT_SIGBUS could occur if we race with truncation of the * backing file, which also invalidates anonymous pages: that's --=20 2.37.3 From nobody Fri Apr 26 20:57:18 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57446C433F5 for ; Fri, 30 Sep 2022 14:20:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231976AbiI3OUx (ORCPT ); Fri, 30 Sep 2022 10:20:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231946AbiI3OUe (ORCPT ); Fri, 30 Sep 2022 10:20:34 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0A82165116 for ; Fri, 30 Sep 2022 07:20:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664547628; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BgHuUZqK9el1Q359iaqosMQiNhGcGN8eqaVP266/i4g=; b=QnzHWI7LtM2MVnM0OskLvIFYHT/lwUIVJQA72dYQviSsOJ6R3T63iI0XTnQ1DqfaixlE+5 v1qakg6zznRuYFKk1wG6e1GyJFwml8c32n+rmit8Og6a+xAEDYlWaO9Rmf56gKNH7tLBKh g+Se5eRK0fgQHOsZComO+0lp8wHscdI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-567-ew032SwtPdytGZ38382Xcg-1; Fri, 30 Sep 2022 10:20:23 -0400 X-MC-Unique: ew032SwtPdytGZ38382Xcg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CB375858F13; Fri, 30 Sep 2022 14:20:22 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.187]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3271C1121315; Fri, 30 Sep 2022 14:19:53 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v1 3/7] mm: remove VM_FAULT_WRITE Date: Fri, 30 Sep 2022 16:19:27 +0200 Message-Id: <20220930141931.174362-4-david@redhat.com> In-Reply-To: <20220930141931.174362-1-david@redhat.com> References: <20220930141931.174362-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" All users -- GUP and KSM -- are gone, let's just remove it. Signed-off-by: David Hildenbrand Acked-by: Peter Xu --- include/linux/mm_types.h | 3 --- mm/huge_memory.c | 2 +- mm/memory.c | 9 ++++----- 3 files changed, 5 insertions(+), 9 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8f30f262431c..6a1375dcb4ac 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -807,7 +807,6 @@ typedef __bitwise unsigned int vm_fault_t; * @VM_FAULT_OOM: Out Of Memory * @VM_FAULT_SIGBUS: Bad access * @VM_FAULT_MAJOR: Page read from storage - * @VM_FAULT_WRITE: Special case for get_user_pages * @VM_FAULT_HWPOISON: Hit poisoned small page * @VM_FAULT_HWPOISON_LARGE: Hit poisoned large page. Index encoded * in upper bits @@ -828,7 +827,6 @@ enum vm_fault_reason { VM_FAULT_OOM =3D (__force vm_fault_t)0x000001, VM_FAULT_SIGBUS =3D (__force vm_fault_t)0x000002, VM_FAULT_MAJOR =3D (__force vm_fault_t)0x000004, - VM_FAULT_WRITE =3D (__force vm_fault_t)0x000008, VM_FAULT_HWPOISON =3D (__force vm_fault_t)0x000010, VM_FAULT_HWPOISON_LARGE =3D (__force vm_fault_t)0x000020, VM_FAULT_SIGSEGV =3D (__force vm_fault_t)0x000040, @@ -854,7 +852,6 @@ enum vm_fault_reason { { VM_FAULT_OOM, "OOM" }, \ { VM_FAULT_SIGBUS, "SIGBUS" }, \ { VM_FAULT_MAJOR, "MAJOR" }, \ - { VM_FAULT_WRITE, "WRITE" }, \ { VM_FAULT_HWPOISON, "HWPOISON" }, \ { VM_FAULT_HWPOISON_LARGE, "HWPOISON_LARGE" }, \ { VM_FAULT_SIGSEGV, "SIGSEGV" }, \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 84bf1d5f6b7e..b351c1d4f858 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1376,7 +1376,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); spin_unlock(vmf->ptl); - return VM_FAULT_WRITE; + return 0; } =20 unlock_fallback: diff --git a/mm/memory.c b/mm/memory.c index e49faa0a1f9a..6e2f47d05f2b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3240,7 +3240,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) } =20 delayacct_wpcopy_end(); - return (page_copied && !unshare) ? VM_FAULT_WRITE : 0; + return 0; oom_free_new: put_page(new_page); oom: @@ -3304,14 +3304,14 @@ static vm_fault_t wp_pfn_shared(struct vm_fault *vm= f) return finish_mkwrite_fault(vmf); } wp_page_reuse(vmf); - return VM_FAULT_WRITE; + return 0; } =20 static vm_fault_t wp_page_shared(struct vm_fault *vmf) __releases(vmf->ptl) { struct vm_area_struct *vma =3D vmf->vma; - vm_fault_t ret =3D VM_FAULT_WRITE; + vm_fault_t ret =3D 0; =20 get_page(vmf->page); =20 @@ -3462,7 +3462,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) return 0; } wp_page_reuse(vmf); - return VM_FAULT_WRITE; + return 0; } else if (unshare) { /* No anonymous page -> nothing to do. */ pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -3960,7 +3960,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (vmf->flags & FAULT_FLAG_WRITE) { pte =3D maybe_mkwrite(pte_mkdirty(pte), vma); vmf->flags &=3D ~FAULT_FLAG_WRITE; - ret |=3D VM_FAULT_WRITE; } rmap_flags |=3D RMAP_EXCLUSIVE; } --=20 2.37.3 From nobody Fri Apr 26 20:57:18 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CBB3C433FE for ; Fri, 30 Sep 2022 14:21:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231994AbiI3OVH (ORCPT ); Fri, 30 Sep 2022 10:21:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232000AbiI3OUj (ORCPT ); Fri, 30 Sep 2022 10:20:39 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AB461716F5 for ; Fri, 30 Sep 2022 07:20:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664547634; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AYManVlpCvt6iYiv3mntijvloAa1rW/lzIzdVimSiKE=; b=JTzJoDQqs4PlJXMuoarUXx8vYGSWUunWePDdnNEMrqWSFRXgjVt2rd44sEPHMy8tkIBhxu 75ebUq20rQhXq4hHPcwEitPNQxzQjAWHvqewZSXL9ztjUGpl7CC7JBcrKErCMYwjrRlFFe uv5RgWc8iH2l8gr4roOX1Vbsb7b+SYA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-436-QCmRDKFMPtK1hafPaPCFGA-1; Fri, 30 Sep 2022 10:20:31 -0400 X-MC-Unique: QCmRDKFMPtK1hafPaPCFGA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 31D02101A52A; Fri, 30 Sep 2022 14:20:30 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.187]) by smtp.corp.redhat.com (Postfix) with ESMTP id 691AE112132E; Fri, 30 Sep 2022 14:20:23 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v1 4/7] mm/ksm: fix KSM COW breaking with userfaultfd-wp via FAULT_FLAG_UNSHARE Date: Fri, 30 Sep 2022 16:19:28 +0200 Message-Id: <20220930141931.174362-5-david@redhat.com> In-Reply-To: <20220930141931.174362-1-david@redhat.com> References: <20220930141931.174362-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Let's stop breaking COW via a fake write fault and let's use FAULT_FLAG_UNSHARE instead. This avoids any wrong side effects of the fake write fault, such as mapping the PTE writable and marking the pte dirty/softdirty. Also, this fixes KSM interaction with userfaultfd-wp: when we have a KSM page that's write-protected by userfaultfd, break_ksm()->handle_mm_fault() will fail with VM_FAULT_SIGBUS and will simpy return in break_ksm() with 0. The warning in dmesg indicates this wrong handling: [ 230.096368] FAULT_FLAG_ALLOW_RETRY missing 881 [ 230.100822] CPU: 1 PID: 1643 Comm: ksm-uffd-wp [...] [ 230.110124] Hardware name: [...] [ 230.117775] Call Trace: [ 230.120227] [ 230.122334] dump_stack_lvl+0x44/0x5c [ 230.126010] handle_userfault.cold+0x14/0x19 [ 230.130281] ? tlb_finish_mmu+0x65/0x170 [ 230.134207] ? uffd_wp_range+0x65/0xa0 [ 230.137959] ? _raw_spin_unlock+0x15/0x30 [ 230.141972] ? do_wp_page+0x50/0x590 [ 230.145551] __handle_mm_fault+0x9f5/0xf50 [ 230.149652] ? mmput+0x1f/0x40 [ 230.152712] handle_mm_fault+0xb9/0x2a0 [ 230.156550] break_ksm+0x141/0x180 [ 230.159964] unmerge_ksm_pages+0x60/0x90 [ 230.163890] ksm_madvise+0x3c/0xb0 [ 230.167295] do_madvise.part.0+0x10c/0xeb0 [ 230.171396] ? do_syscall_64+0x67/0x80 [ 230.175157] __x64_sys_madvise+0x5a/0x70 [ 230.179082] do_syscall_64+0x58/0x80 [ 230.182661] ? do_syscall_64+0x67/0x80 [ 230.186413] entry_SYSCALL_64_after_hwframe+0x63/0xcd Acked-by: Peter Xu -------------------------------------------------------------------------- #include #include #include #include #include #include #include #include #include #include #define MMAP_SIZE (2 * 1024 * 1024u) static char *map; int uffd; static int setup_uffd(void) { struct uffdio_api uffdio_api; struct uffdio_register uffdio_register; struct uffdio_writeprotect uffd_writeprotect; struct uffdio_range uffd_range; uffd =3D syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); if (uffd < 0) { fprintf(stderr, "syscall() failed: %d\n", errno); return -errno; } uffdio_api.api =3D UFFD_API; uffdio_api.features =3D UFFD_FEATURE_PAGEFAULT_FLAG_WP; if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) { fprintf(stderr, "UFFDIO_API failed: %d\n", errno); return -errno; } if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) { fprintf(stderr, "UFFD_FEATURE_WRITEPROTECT missing\n"); return -ENOSYS; } /* Register UFFD-WP */ uffdio_register.range.start =3D (unsigned long) map; uffdio_register.range.len =3D MMAP_SIZE; uffdio_register.mode =3D UFFDIO_REGISTER_MODE_WP; if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) { fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno); return -errno; } /* Writeprotect the range. */ uffd_writeprotect.range.start =3D (unsigned long) map; uffd_writeprotect.range.len =3D MMAP_SIZE; uffd_writeprotect.mode =3D UFFDIO_WRITEPROTECT_MODE_WP; if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) { fprintf(stderr, "UFFDIO_WRITEPROTECT failed: %d\n", errno); return -errno; } return 0; } int main(int argc, char **argv) { int ksm_fd, ret; ksm_fd =3D open("/sys/kernel/mm/ksm/run", O_RDWR); if (ksm_fd < 0) { fprintf(stderr, "KSM not available\n"); return -errno; } map =3D mmap(NULL, MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); if (map =3D=3D MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; } ret =3D madvise(map, MMAP_SIZE, MADV_NOHUGEPAGE); if (ret) { fprintf(stderr, "MADV_NOHUGEPAGE failed\n"); return -errno; } /* Fill with same value and trigger merging. */ memset(map, 0xff, MMAP_SIZE); ret =3D madvise(map, MMAP_SIZE, MADV_MERGEABLE); if (ret) { fprintf(stderr, "MADV_MERGEABLE failed\n"); return -errno; } /* * Run KSM to trigger merging and wait a bit until merging should be * done. */ if (write(ksm_fd, "1", 1) !=3D 1) { fprintf(stderr, "Running KSM failed\n"); } sleep(10); /* Write-protect the range with UFFD. */ if (setup_uffd()) return 1; /* Trigger unsharing. */ ret =3D madvise(map, MMAP_SIZE, MADV_UNMERGEABLE); if (ret) { fprintf(stderr, "MADV_UNMERGEABLE failed\n"); return -errno; } return 0; } -------------------------------------------------------------------------- Consequently, we will no longer trigger a fake write fault and break COW without any such side-effects. This is primarily a fix for KSM+userfaultfd-wp, however, the fake write fault was always questionable. As this fix is not easy to backport and it's not very critical, let's not cc stable. Fixes: 529b930b87d9 ("userfaultfd: wp: hook userfault handler to write prot= ection fault") Signed-off-by: David Hildenbrand --- mm/ksm.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index e8d987fb379e..4d7bcf7da7c3 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -453,17 +453,15 @@ static inline bool ksm_test_exit(struct mm_struct *mm) } =20 /* - * We use break_ksm to break COW on a ksm page: it's a stripped down + * We use break_ksm to break COW on a ksm page by triggering unsharing, + * such that the ksm page will get replaced by an exclusive anonymous page. * - * if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) =3D=3D 1) - * put_page(page); - * - * but taking great care only to touch a ksm page, in a VM_MERGEABLE vma, + * We take great care only to touch a ksm page, in a VM_MERGEABLE vma, * in case the application has unmapped and remapped mm,addr meanwhile. * Could a ksm page appear anywhere else? Actually yes, in a VM_PFNMAP * mmap of /dev/mem, where we would not want to touch it. * - * FAULT_FLAG/FOLL_REMOTE are because we do this outside the context + * FAULT_FLAG_REMOTE/FOLL_REMOTE are because we do this outside the context * of the process that owns 'vma'. We also do not want to enforce * protection keys here anyway. */ @@ -487,7 +485,7 @@ static int break_ksm(struct vm_area_struct *vma, unsign= ed long addr) if (!ksm_page) return 0; ret =3D handle_mm_fault(vma, addr, - FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE, + FAULT_FLAG_UNSHARE | FAULT_FLAG_REMOTE, NULL); } while (!(ret & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | VM_FAULT_OOM))); /* --=20 2.37.3 From nobody Fri Apr 26 20:57:18 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19E4BC433F5 for ; Fri, 30 Sep 2022 14:21:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232006AbiI3OVL (ORCPT ); Fri, 30 Sep 2022 10:21:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231965AbiI3OUl (ORCPT ); Fri, 30 Sep 2022 10:20:41 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC0386290E for ; Fri, 30 Sep 2022 07:20:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664547639; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5Os3ebntrNckY9ikYeoLSgVh4tT4cYPUxPafC1slc+E=; b=bHwJ42ChAhs5qI98f7YNwqXNQgWMXmmasKNBf3P5NIzoxrBDXNpw8qwAbcMvEmAUJIMK6Y KZMbZ6fkNak/EKcmhRwMKFr5pH6Pkup1AxP2CkNDbsvC40MFMHXO0vkwwMPVvrp3MO+VyI 8jCJLtYhYb54hrk4KuSKSL5fX71CvNw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-168-gjvSGRHeOl-RFUWtXhoFUQ-1; Fri, 30 Sep 2022 10:20:33 -0400 X-MC-Unique: gjvSGRHeOl-RFUWtXhoFUQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C1544101E150; Fri, 30 Sep 2022 14:20:32 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.187]) by smtp.corp.redhat.com (Postfix) with ESMTP id 92C461121315; Fri, 30 Sep 2022 14:20:30 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v1 5/7] mm/pagewalk: add walk_page_range_vma() Date: Fri, 30 Sep 2022 16:19:29 +0200 Message-Id: <20220930141931.174362-6-david@redhat.com> In-Reply-To: <20220930141931.174362-1-david@redhat.com> References: <20220930141931.174362-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Let's add walk_page_range_vma(), which is similar to walk_page_vma(), however, is only interested in a subset of the VMA range. To be used in KSM code to stop using follow_page() next. Signed-off-by: David Hildenbrand --- include/linux/pagewalk.h | 3 +++ mm/pagewalk.c | 27 +++++++++++++++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index f3fafb731ffd..2f8f6cc980b4 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -99,6 +99,9 @@ int walk_page_range_novma(struct mm_struct *mm, unsigned = long start, unsigned long end, const struct mm_walk_ops *ops, pgd_t *pgd, void *private); +int walk_page_range_vma(struct vm_area_struct *vma, unsigned long start, + unsigned long end, const struct mm_walk_ops *ops, + void *private); int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *op= s, void *private); int walk_page_mapping(struct address_space *mapping, pgoff_t first_index, diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 131b2b335b2c..757c075da231 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -514,6 +514,33 @@ int walk_page_range_novma(struct mm_struct *mm, unsign= ed long start, return __walk_page_range(start, end, &walk); } =20 +int walk_page_range_vma(struct vm_area_struct *vma, unsigned long start, + unsigned long end, const struct mm_walk_ops *ops, + void *private) +{ + struct mm_walk walk =3D { + .ops =3D ops, + .mm =3D vma->vm_mm, + .vma =3D vma, + .private =3D private, + }; + int err; + + if (start >=3D end || !walk.mm) + return -EINVAL; + if (start < vma->vm_start || end > vma->vm_end) + return -EINVAL; + + mmap_assert_locked(walk.mm); + + err =3D walk_page_test(start, end, &walk); + if (err > 0) + return 0; + if (err < 0) + return err; + return __walk_page_range(start, end, &walk); +} + int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *op= s, void *private) { --=20 2.37.3 From nobody Fri Apr 26 20:57:18 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C906FC433FE for ; Fri, 30 Sep 2022 14:21:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231958AbiI3OVf (ORCPT ); Fri, 30 Sep 2022 10:21:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231948AbiI3OUw (ORCPT ); Fri, 30 Sep 2022 10:20:52 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13520FA5D4 for ; Fri, 30 Sep 2022 07:20:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664547650; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QdWmhT+XCWVUBSGE3ZoFfg/O04jXqhSLTEOmUwlj3I8=; b=JyFXUR7ZTukTDqSBmzhTf0BteO513nY8Oai/mn1btshLc+JXFKBx7EP25jSzwwo/Lj0Jcz 63bXsVMQrUK1wtLQ2kuWEOUGUW4MsgGW1bi+FrTavXbHvjJ+DPmUqZW42+dg0p7QLqN5ZG DrH4FFpeNkK4P5+dDWrT87p3ws0nMME= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-499-ZiHXnUcHNF6VF1dflaLRxQ-1; Fri, 30 Sep 2022 10:20:47 -0400 X-MC-Unique: ZiHXnUcHNF6VF1dflaLRxQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8F6342A5955A; Fri, 30 Sep 2022 14:20:46 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.187]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0CB731121314; Fri, 30 Sep 2022 14:20:32 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v1 6/7] mm/ksm: convert break_ksm() to use walk_page_range_vma() Date: Fri, 30 Sep 2022 16:19:30 +0200 Message-Id: <20220930141931.174362-7-david@redhat.com> In-Reply-To: <20220930141931.174362-1-david@redhat.com> References: <20220930141931.174362-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" FOLL_MIGRATION exists only for the purpose of break_ksm(), and actually, there is not even the need to wait for the migration to finish, we only want to know if we're dealing with a KSM page. Using follow_page() just to identify a KSM page overcomplicates GUP code. Let's use walk_page_range_vma() instead, because we don't actually care about the page itself, we only need to know a single property -- no need to even grab a reference on the page. In my setup (AMD Ryzen 9 3900X), running the KSM selftest to test unmerge performance on 2 GiB (taskset 0x8 ./ksm_tests -D -s 2048), this results in a performance degradation of ~4% (old: ~5010 MiB/s, new: ~4800 MiB/s). I don't think we particularly care for now. Signed-off-by: David Hildenbrand --- mm/ksm.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 62 insertions(+), 8 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index 4d7bcf7da7c3..814c1a37c323 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -39,6 +39,7 @@ #include #include #include +#include =20 #include #include "internal.h" @@ -452,6 +453,60 @@ static inline bool ksm_test_exit(struct mm_struct *mm) return atomic_read(&mm->mm_users) =3D=3D 0; } =20 +int break_ksm_pud_entry(pud_t *pud, unsigned long addr, unsigned long next, + struct mm_walk *walk) +{ + /* We only care about page tables to walk to a single base page. */ + if (pud_leaf(*pud) || !pud_present(*pud)) + return 1; + return 0; +} + +int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next, + struct mm_walk *walk) +{ + bool *ksm_page =3D walk->private; + struct page *page =3D NULL; + pte_t *pte, ptent; + spinlock_t *ptl; + + /* We only care about page tables to walk to a single base page. */ + if (pmd_leaf(*pmd) || !pmd_present(*pmd)) + return 1; + + /* + * We only lookup a single page (a) no need to iterate; and (b) + * always return 1 to exit immediately and not iterate in the caller. + */ + pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + ptent =3D *pte; + + if (pte_none(ptent)) + return 1; + if (!pte_present(ptent)) { + swp_entry_t entry =3D pte_to_swp_entry(ptent); + + /* + * We only care about migration of KSM pages. As KSM pages + * remain KSM pages until freed, no need to wait here for + * migration to end to identify such. + */ + if (is_migration_entry(entry)) + page =3D pfn_swap_entry_to_page(entry); + } else { + page =3D vm_normal_page(walk->vma, addr, ptent); + } + if (page && PageKsm(page)) + *ksm_page =3D true; + pte_unmap_unlock(pte, ptl); + return 1; +} + +static const struct mm_walk_ops break_ksm_ops =3D { + .pud_entry =3D break_ksm_pud_entry, + .pmd_entry =3D break_ksm_pmd_entry, +}; + /* * We use break_ksm to break COW on a ksm page by triggering unsharing, * such that the ksm page will get replaced by an exclusive anonymous page. @@ -467,20 +522,19 @@ static inline bool ksm_test_exit(struct mm_struct *mm) */ static int break_ksm(struct vm_area_struct *vma, unsigned long addr) { - struct page *page; vm_fault_t ret =3D 0; =20 + if (WARN_ON_ONCE(!IS_ALIGNED(addr, PAGE_SIZE))) + return -EINVAL; + do { bool ksm_page =3D false; =20 cond_resched(); - page =3D follow_page(vma, addr, - FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE); - if (IS_ERR_OR_NULL(page)) - break; - if (PageKsm(page)) - ksm_page =3D true; - put_page(page); + ret =3D walk_page_range_vma(vma, addr, addr + PAGE_SIZE, + &break_ksm_ops, &ksm_page); + if (WARN_ON_ONCE(ret < 0)) + return ret; =20 if (!ksm_page) return 0; --=20 2.37.3 From nobody Fri Apr 26 20:57:18 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0892C433F5 for ; Fri, 30 Sep 2022 14:21:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232053AbiI3OVk (ORCPT ); Fri, 30 Sep 2022 10:21:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231986AbiI3OVF (ORCPT ); Fri, 30 Sep 2022 10:21:05 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D05491A1EBC for ; Fri, 30 Sep 2022 07:21:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664547661; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NRwNrzzGpz8bO60Pn63vXNqmewRV7ohymI+Kzy3OQVY=; b=VT+Lz1lpMVZKuRSEJPXHYXG5yi11DJMm8Ln/YS/I35+ypDIx1jS06xEVpLkQ0S0JBKfODN PdLuAVuhfTWygGZs6EMo/N3BBkwMFRYXT11gCt/N3SvK+9zZQc2SoGTz+U+JhtqPKrhXVr yABTImqPcFm88EmwqbXMnNJ2uZAio7E= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-192-ced8YZfqPOWKMAfTZSCfpQ-1; Fri, 30 Sep 2022 10:20:56 -0400 X-MC-Unique: ced8YZfqPOWKMAfTZSCfpQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 03AAE855305; Fri, 30 Sep 2022 14:20:56 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.187]) by smtp.corp.redhat.com (Postfix) with ESMTP id DE74A112132C; Fri, 30 Sep 2022 14:20:46 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Peter Xu , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard Subject: [PATCH v1 7/7] mm/gup: remove FOLL_MIGRATION Date: Fri, 30 Sep 2022 16:19:31 +0200 Message-Id: <20220930141931.174362-8-david@redhat.com> In-Reply-To: <20220930141931.174362-1-david@redhat.com> References: <20220930141931.174362-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Fortunately, the last user (KSM) is gone, so let's just remove this rather special code from generic GUP handling -- especially because KSM never required the PMD handling as KSM only deals with individual base pages. Signed-off-by: David Hildenbrand --- include/linux/mm.h | 1 - mm/gup.c | 55 +++++----------------------------------------- 2 files changed, 5 insertions(+), 51 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e56dd8f7eae1..4c176e308ead 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2942,7 +2942,6 @@ struct page *follow_page(struct vm_area_struct *vma, = unsigned long address, * and return without waiting upon it */ #define FOLL_NOFAULT 0x80 /* do not fault in pages */ #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */ -#define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */ #define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */ #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */ #define FOLL_ANON 0x8000 /* don't do file mappings */ diff --git a/mm/gup.c b/mm/gup.c index ce00a4c40da8..37195c549f68 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -537,30 +537,13 @@ static struct page *follow_page_pte(struct vm_area_st= ruct *vma, if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) =3D=3D (FOLL_PIN | FOLL_GET))) return ERR_PTR(-EINVAL); -retry: if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); =20 ptep =3D pte_offset_map_lock(mm, pmd, address, &ptl); pte =3D *ptep; - if (!pte_present(pte)) { - swp_entry_t entry; - /* - * KSM's break_ksm() relies upon recognizing a ksm page - * even while it is being migrated, so for that case we - * need migration_entry_wait(). - */ - if (likely(!(flags & FOLL_MIGRATION))) - goto no_page; - if (pte_none(pte)) - goto no_page; - entry =3D pte_to_swp_entry(pte); - if (!is_migration_entry(entry)) - goto no_page; - pte_unmap_unlock(ptep, ptl); - migration_entry_wait(mm, pmd, address); - goto retry; - } + if (!pte_present(pte)) + goto no_page; if (pte_protnone(pte) && !gup_can_follow_protnone(flags)) goto no_page; =20 @@ -682,28 +665,8 @@ static struct page *follow_pmd_mask(struct vm_area_str= uct *vma, return page; return no_page_table(vma, flags); } -retry: - if (!pmd_present(pmdval)) { - /* - * Should never reach here, if thp migration is not supported; - * Otherwise, it must be a thp migration entry. - */ - VM_BUG_ON(!thp_migration_supported() || - !is_pmd_migration_entry(pmdval)); - - if (likely(!(flags & FOLL_MIGRATION))) - return no_page_table(vma, flags); - - pmd_migration_entry_wait(mm, pmd); - pmdval =3D READ_ONCE(*pmd); - /* - * MADV_DONTNEED may convert the pmd to null because - * mmap_lock is held in read mode - */ - if (pmd_none(pmdval)) - return no_page_table(vma, flags); - goto retry; - } + if (!pmd_present(pmdval)) + return no_page_table(vma, flags); if (pmd_devmap(pmdval)) { ptl =3D pmd_lock(mm, pmd); page =3D follow_devmap_pmd(vma, address, pmd, flags, &ctx->pgmap); @@ -717,18 +680,10 @@ static struct page *follow_pmd_mask(struct vm_area_st= ruct *vma, if (pmd_protnone(pmdval) && !gup_can_follow_protnone(flags)) return no_page_table(vma, flags); =20 -retry_locked: ptl =3D pmd_lock(mm, pmd); - if (unlikely(pmd_none(*pmd))) { - spin_unlock(ptl); - return no_page_table(vma, flags); - } if (unlikely(!pmd_present(*pmd))) { spin_unlock(ptl); - if (likely(!(flags & FOLL_MIGRATION))) - return no_page_table(vma, flags); - pmd_migration_entry_wait(mm, pmd); - goto retry_locked; + return no_page_table(vma, flags); } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); --=20 2.37.3