From nobody Fri Dec 19 07:18:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9546C28D13 for ; Thu, 25 Aug 2022 10:11:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240986AbiHYKLn (ORCPT ); Thu, 25 Aug 2022 06:11:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240669AbiHYKLh (ORCPT ); Thu, 25 Aug 2022 06:11:37 -0400 Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC88A82FB4 for ; Thu, 25 Aug 2022 03:11:35 -0700 (PDT) Received: by mail-pf1-x430.google.com with SMTP id x19so16469140pfq.1 for ; Thu, 25 Aug 2022 03:11:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=NKL5sVtokizfWAI3mo2jb9YTpdgBm9GDnhWm26Gpw5s=; b=Gy1ISArxPeO5t0wyjubpA3M0UwCqZnmexobJgdIoq7tqU9HMbig6Y+7mGWKf54ECC8 DqQndeO6hmcTQOcP0wGzgvV+4sXY89dYzb7/lyOeA8+OpBHUu8O3rCfoFzhAot3HDtVY Ce7vtNvX7ItLbUkB2MFUwWbkWA/FzUlnDgneBXcMJqO+AhBTgppDm+wxJv3E3ETTr4QA lFF/lELPdk9gCZYniMOeQsec8xH89bdgarwyYZHV7mZzwiPpvZzq/IjP1XQeYBvqQs7V 2fftWuXXkPuQNSe0e7Y7HoDIW6QdtAB1sOCVQ3StK0qJInlmGTpxoS+XYQRxpKtzCGj8 gCaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=NKL5sVtokizfWAI3mo2jb9YTpdgBm9GDnhWm26Gpw5s=; b=yN8t3yxIStOZBFfQMevs6iSdoc777IrXiiAQ8xvS6ZnzyQzhx8uVDf2v7exWdH3uxu KiT8B+gk9ceOV9J35i+Ca2nzIXKwENL7a5OmoX9VMcuI6E8hViFcd1duzpj1s/FzR7eb 6TVW2vZlqLgnM4uMPdJI9fHNk9sD2EZ8tu1kYoURmcasRvYT91wnS3imXM+xqM259HTv Az7Hrxk0B6hF16YPArWR0IUXRibok/akXZBTjJmVBuYh9Yfys3aTcUNBnuwGq0CI3r5N XzGnQetBHbNE/+dm9foq58riuqEITSJ3V/19hKEQ1rgkZ2Oht0QXo9xYHMFxHMgxiMzL 5pZQ== X-Gm-Message-State: ACgBeo3wl9iWDcxEY2Opiwwz7Y1n2UKiqeR9W/dG1c+3hMao2hLsXCaX EdQE5Wt0T9nXr3B0Oq6PZrhEPw== X-Google-Smtp-Source: AA6agR5IcwxykGGpHLVe3CX1H3gs3cvYf5zfNNUsTN/8NtEiGwMN/bJ7KCESq8870YRROxxIqH28lw== X-Received: by 2002:a05:6a00:e8a:b0:535:cc5c:3d87 with SMTP id bo10-20020a056a000e8a00b00535cc5c3d87mr3497198pfb.24.1661422295360; Thu, 25 Aug 2022 03:11:35 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id b10-20020a170903228a00b001714c36a6e7sm8477581plh.284.2022.08.25.03.11.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 03:11:34 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@redhat.com, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, jgg@nvidia.com, tglx@linutronix.de, willy@infradead.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, muchun.song@linux.dev, Qi Zheng Subject: [RFC PATCH 1/7] mm: use ptep_clear() in non-present cases Date: Thu, 25 Aug 2022 18:10:31 +0800 Message-Id: <20220825101037.96517-2-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20220825101037.96517-1-zhengqi.arch@bytedance.com> References: <20220825101037.96517-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" After commit 08d5b29eac7d ("mm: ptep_clear() page table helper"), the ptep_clear() can be used to track the clearing of PTE entries, but it skips some places since the page table check does not care about non-present PTE entries. Subsequent patches need to use ptep_clear() to track all clearing PTE entries, so this patch makes ptep_clear() used for all cases including clearing non-present PTE entries. Signed-off-by: Qi Zheng --- include/linux/pgtable.h | 2 +- mm/memory.c | 2 +- mm/mprotect.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 3cdc16cfd867..9745684b0cdb 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -428,7 +428,7 @@ static inline void pte_clear_not_present_full(struct mm= _struct *mm, pte_t *ptep, int full) { - pte_clear(mm, address, ptep); + ptep_clear(mm, address, ptep); } #endif =20 diff --git a/mm/memory.c b/mm/memory.c index 1c6027adc542..207e0ee657e9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3655,7 +3655,7 @@ static vm_fault_t pte_marker_clear(struct vm_fault *v= mf) * none pte. Otherwise it means the pte could have changed, so retry. */ if (is_pte_marker(*vmf->pte)) - pte_clear(vmf->vma->vm_mm, vmf->address, vmf->pte); + ptep_clear(vmf->vma->vm_mm, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); return 0; } diff --git a/mm/mprotect.c b/mm/mprotect.c index ba5592655ee3..1a01bd22a4ed 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -201,7 +201,7 @@ static unsigned long change_pte_range(struct mmu_gather= *tlb, * fault will trigger without uffd trapping. */ if (uffd_wp_resolve) { - pte_clear(vma->vm_mm, addr, pte); + ptep_clear(vma->vm_mm, addr, pte); pages++; } continue; --=20 2.20.1 From nobody Fri Dec 19 07:18:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32D93C04AA5 for ; Thu, 25 Aug 2022 10:11:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238686AbiHYKLv (ORCPT ); Thu, 25 Aug 2022 06:11:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240998AbiHYKLo (ORCPT ); Thu, 25 Aug 2022 06:11:44 -0400 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0E358B980 for ; Thu, 25 Aug 2022 03:11:42 -0700 (PDT) Received: by mail-pj1-x102c.google.com with SMTP id p9-20020a17090a2d8900b001fb86ec43aaso4030468pjd.0 for ; Thu, 25 Aug 2022 03:11:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=SFdsSjWAQONAyj0B4BGWWGxwj5pOAu9AOlwRQ0TgC5A=; b=TctdZfIB9N0zQ09kkMlDWP+Kr7s/oP2QOFrEA8cG2q9SH3cDPtW2Xm2qtLCu0ZF1Ie 1cUD++3IFC9j8HSTyFV/pRd4U1vDay6piC5UoSjKLxTr6kc80Lf5dVy8/65A5EzAXWiF YMBmiu4+p+0A/8rJdD5MLJdpOay+sCeIwrpr6v7Su66828TBlx27Q2fM06Ep3JN4IfU9 0hmRqRattiBx+fp4aSQX6w7soFQ4ZKtwWmtlgH2Thaf0C9I0j2gDoTqFdYSpCFOPl9oE 6RdNvpO9I+k0yShSBe6PWhEeJBrKGzdMUel3hRhbBF7mg/X5dUX8vUVwYjWo8giRUEAl xdWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=SFdsSjWAQONAyj0B4BGWWGxwj5pOAu9AOlwRQ0TgC5A=; b=qINdmvQuh4T226QqectFJE/WdJwhceJSu7P/rFA/vHN+kF/msL0HpSDhZZOyZ7EB01 w3DL0YdZBfPMAQ0GXbuV+HH8YIM0WIsW2IljObXxxr51F4+4lKPZmc8fP1d/iKzKQxNu EY9KCF4t4QSDOrUXO9HuUagtLIqWD2Cxu85hOux/qf5ObUQ9N9nKLllh+oobv+szpBix hOMi5BIA4hyhstUlvzkuuA/M/kGW4a/L8Y8vkoEWSr3yvppvMmJE8JX7rrM7MnN3uTR1 uRHL5mrH8dTf6+p8YeeZF7Xs7SU8vKcIzaz6lGL1A00QjyfVEe7WA0RrC9A0OmRgSRkU Evzg== X-Gm-Message-State: ACgBeo2/dSCR9fBmqpq+iPVZ5THMjJk3P0H2NnhBpzHfMXCXCy6ZbbR5 LtqE5MnyzFvi1pPXqg6SFMElUw== X-Google-Smtp-Source: AA6agR4nupyddDVBBxbVHoklijRwE4BfvVlD41bJo7HEk5T2gjgiaqdFrpkQmsyV98KPMM8cpv+8MQ== X-Received: by 2002:a17:902:f641:b0:172:9642:1bf1 with SMTP id m1-20020a170902f64100b0017296421bf1mr3175044plg.36.1661422301791; Thu, 25 Aug 2022 03:11:41 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id b10-20020a170903228a00b001714c36a6e7sm8477581plh.284.2022.08.25.03.11.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 03:11:40 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@redhat.com, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, jgg@nvidia.com, tglx@linutronix.de, willy@infradead.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, muchun.song@linux.dev, Qi Zheng Subject: [RFC PATCH 2/7] mm: introduce CONFIG_FREE_USER_PTE Date: Thu, 25 Aug 2022 18:10:32 +0800 Message-Id: <20220825101037.96517-3-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20220825101037.96517-1-zhengqi.arch@bytedance.com> References: <20220825101037.96517-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This configuration variable will be used to build the code needed to free user PTE page table pages. The PTE page table setting and clearing functions(such as set_pte_at()) are in the architecture's files, and these functions will be hooked to implement FREE_USER_PTE, so the architecture support is needed. Signed-off-by: Qi Zheng --- mm/Kconfig | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index 169e64192e48..d2a5a24cee2d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1130,6 +1130,17 @@ config PTE_MARKER_UFFD_WP purposes. It is required to enable userfaultfd write protection on file-backed memory types like shmem and hugetlbfs. =20 +config ARCH_SUPPORTS_FREE_USER_PTE + def_bool n + +config FREE_USER_PTE + bool "Free user PTE page table pages" + default y + depends on ARCH_SUPPORTS_FREE_USER_PTE && MMU && SMP + help + Try to free user PTE page table page when its all entries are none or + mapped shared zero page. + source "mm/damon/Kconfig" =20 endmenu --=20 2.20.1 From nobody Fri Dec 19 07:18:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B112C04AA5 for ; Thu, 25 Aug 2022 10:12:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241092AbiHYKMI (ORCPT ); Thu, 25 Aug 2022 06:12:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240998AbiHYKMB (ORCPT ); Thu, 25 Aug 2022 06:12:01 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47208A7A82 for ; Thu, 25 Aug 2022 03:11:49 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id p13so6736883pld.6 for ; Thu, 25 Aug 2022 03:11:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=5x32MEDCgaNmHcojCpTje6cvDrY80Yxg7sYtN2i0CQc=; b=Te/5IP11Lj2J2v23eu1UQFcoUwv4dykJGHh99WiFfsij/0Rki7IrrrZUo2uCyGFclm KhwIUXnn/hCKxxGZSDmvctNWutsnfG2jiFI6EDBMTtnQjsqUp8zty5btb0T2U1eMYaAw ZVGUNnbi1YLmhRT0hSrmnwApJbI2JEFFLm4Yz/v7Ft2oQvMGdrYASN7qLEC84irUpsWv 2BDeF6FZ7a0sXvUOV84T7gAfIJ4GiTQhwC0IvEgp5Rl+atGvYh3T4qDETv2PIwGprzon T94RtbPTcLwAJjd733YMYjkH9R1j/Sboj7Ln2UWI1Ee7eUSezRk7pfGK6VaMatgi1JCf DWLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=5x32MEDCgaNmHcojCpTje6cvDrY80Yxg7sYtN2i0CQc=; b=75yhL6T7QkdvwsIZOtm6CnhzgwSiVHDfCa+h3pdCdkOjZJN1X/cx6fvF2ZPkVDY3Zn upNVmsfmOYKhBfZ1Gl+RfRc6jIcojCTxbGLrwwdg0SFBK04f9+5nQs2qngqCTOEh+f9A 9JSiATRgWQGFhZq0r8/bZTbumADmgwa2JtUG/GV7+ky7UFGvzK3kUfHAnXCxjLdfHR3R cap4Icp6zzorMav9kAZSrxT51Zf6Dca7Q0MSDm7HJ0247HL60k0OGAQgrTN+Ro/gvnjY k+HubSiJs0Be7N4R2S/+9MtWOz1n/5tG8GJ66am/iQpwp/HjLC18fyzciPEMuZrmLyUM 8ebA== X-Gm-Message-State: ACgBeo2+s02uUCmZvqBmxhry64f5K5cJ4rtm6ZBN3E2okGkU240JfeRB dTJIEEchBeiJ/vOjURkDRg6Ynw== X-Google-Smtp-Source: AA6agR7hp40eC3wRY8MGPaoNXwGLwbdIDjNQybY0ITEEfTLXjmMgba4xFQhcMKQZ1dlMtlSOnHbvOg== X-Received: by 2002:a17:902:d2c5:b0:172:8d5f:bf01 with SMTP id n5-20020a170902d2c500b001728d5fbf01mr3089264plc.119.1661422308812; Thu, 25 Aug 2022 03:11:48 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id b10-20020a170903228a00b001714c36a6e7sm8477581plh.284.2022.08.25.03.11.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 03:11:47 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@redhat.com, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, jgg@nvidia.com, tglx@linutronix.de, willy@infradead.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, muchun.song@linux.dev, Qi Zheng Subject: [RFC PATCH 3/7] mm: add pte_to_page() helper Date: Thu, 25 Aug 2022 18:10:33 +0800 Message-Id: <20220825101037.96517-4-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20220825101037.96517-1-zhengqi.arch@bytedance.com> References: <20220825101037.96517-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add pte_to_page() helper similar to pmd_to_page(), which will be used to get the struct page of the PTE page table. Signed-off-by: Qi Zheng --- include/linux/pgtable.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 9745684b0cdb..c4a6bda6e965 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -86,6 +86,14 @@ static inline unsigned long pud_index(unsigned long addr= ess) #define pgd_index(a) (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) #endif =20 +#ifdef CONFIG_FREE_USER_PTE +static inline struct page *pte_to_page(pte_t *pte) +{ + unsigned long mask =3D ~(PTRS_PER_PTE * sizeof(pte_t) - 1); + return virt_to_page((void *)((unsigned long) pte & mask)); +} +#endif + #ifndef pte_offset_kernel static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) { --=20 2.20.1 From nobody Fri Dec 19 07:18:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED5EDC04AA5 for ; Thu, 25 Aug 2022 10:12:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241149AbiHYKMW (ORCPT ); Thu, 25 Aug 2022 06:12:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58038 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241025AbiHYKMF (ORCPT ); Thu, 25 Aug 2022 06:12:05 -0400 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6ABFEA9261 for ; Thu, 25 Aug 2022 03:11:55 -0700 (PDT) Received: by mail-pf1-x435.google.com with SMTP id y15so15265555pfr.9 for ; Thu, 25 Aug 2022 03:11:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=aSHAnJbXt+emb9j4QTyZL0s3Z7FutteIJpqm4S4DPUM=; b=RbsrREZnLtUzH8gMFFYqbdaGcqZdakHJIv4jEVHZfBUUk3tWVyBDnamcGPs1WsJ92r XJP90JRC4AVQlJZ50k+SQY6YhJNMEXbO+pw56gstS/zmz9D27iUjvirTz/+n1TCIeIqv Otm0knF+sTO56iugjSfAESn7+Zs+O3BSqFcgoaMGi/Y1s3ZE/gxsk61nXNShGQaAgkw6 T16FtOglAQKreSOxKFyKlyM69RkbmwiV69jFMM1ymowkokK5JY1s3UHXlEaOgDyC/wHz zi0Nrbo6R4Vx1XW5NoHASgr1T+r9Me0DG0TNIzP1G1c9yTqtjgzt3FeUuAV2M+8yAV6Y n8MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=aSHAnJbXt+emb9j4QTyZL0s3Z7FutteIJpqm4S4DPUM=; b=FQy54m83d0/iv5/a3lBG+KOJD9axsqcy6wn5684eIr/oeyCYCU+AYNdAYLrj9AuM6w frnmEB//YS7E4RkwvfYnttGGrb4o+9CP6zN/S5J5V3lZ22+JDu03D6BZrj77lIzl8Rc4 H+dg2VqwBlVI5z/4e4clGon63BxiiOfYhG9qpvD+XMOM+A4dwbEQcWDSI4sPDkhWyrFE ZyPxY5dPTxVyYD7gyv/zNFPTI6U+mO9rgqLbm6hKULiev8Frn/gm08McH1SkrTuQUnOX hAcG2znkxICHBdRzQfw3Mzavu2opajeRfUwlYM+8YykdlubHUUA31DC3t6TaHskw01lM G4+Q== X-Gm-Message-State: ACgBeo0QQG2PklvjnuER+u9Ap4gSN9sapYABO2z7haslCcgJDXVVl2HZ IsByeZMvu8qmfBnIoPdbjBTMaA== X-Google-Smtp-Source: AA6agR5AzI0RX0KK63LNqUrxwhd2qQ24X4dqPnkbn1vY7sUvc+FjOJ1dzhvq9jKl7nzUl8UApJFrjg== X-Received: by 2002:a05:6a00:1996:b0:52e:b0f7:8c83 with SMTP id d22-20020a056a00199600b0052eb0f78c83mr3235245pfl.59.1661422314575; Thu, 25 Aug 2022 03:11:54 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id b10-20020a170903228a00b001714c36a6e7sm8477581plh.284.2022.08.25.03.11.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 03:11:54 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@redhat.com, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, jgg@nvidia.com, tglx@linutronix.de, willy@infradead.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, muchun.song@linux.dev, Qi Zheng Subject: [RFC PATCH 4/7] mm: introduce pte_refcount for user PTE page table page Date: Thu, 25 Aug 2022 18:10:34 +0800 Message-Id: <20220825101037.96517-5-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20220825101037.96517-1-zhengqi.arch@bytedance.com> References: <20220825101037.96517-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The following is the largest user PTE page table memory that can be allocated by a single user process in a 32-bit and a 64-bit system (assuming 4K page size). +---------------------------+--------+---------+ | | 32-bit | 64-bit | +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ | user PTE page table pages | 3 MiB | 512 GiB | +---------------------------+--------+---------+ | user PMD page table pages | 3 KiB | 1 GiB | +---------------------------+--------+---------+ (for 32-bit, take 3G user address space as an example; for 64-bit, take 48-bit address width as an example.) Today, 64-bit servers generally have only a few terabytes of physical memory, and mapping these memory does not require as many PTE page tables as above, but in some of the following scenarios, it is still possible to cause huge page table memory usage. 1. In order to pursue high performance, applications mostly use some high-performance user-mode memory allocators, such as jemalloc or tcmalloc. These memory allocators use madvise(MADV_DONTNEED or MADV_FREE) to release physical memory, but neither MADV_DONTNEED nor MADV_FREE will release page table memory, which may cause huge page table memory as follows: VIRT: 55t RES: 590g VmPTE: 110g In this case, most of the page table entries are empty. For such a PTE page where all entries are empty, we call it empty PTE page. 2. The shared zero page scenario mentioned by David Hildenbrand: Especially the shared zeropage is nasty, because there are sane use cases that can trigger it. Assume you have a VM (e.g., QEMU) that inflated the balloon to return free memory to the hypervisor. Simply migrating that VM will populate the shared zeropage to all inflated pages, because migration code ends up reading all VM memory. Similarly, the guest can just read that memory as well, for example, when the guest issues kdump itself. In this case, most of the page table entries are mapped to the shared zero page. For such a PTE page where all page table entries are mapped to zero pages, we call it zero PTE page. The page table entries for both types of PTE pages do not record "meaningful" information, so we can try to free these PTE pages at some point (such as when memory pressure is high) to reclaim more memory. To quickly identify these two types of pages, we have introduced a pte_refcount for each PTE page. We put the mapped and zero PTE entry counter into the pte_refcount of the PTE page. The bitmask has the following meaning: - bits 0-9 are mapped PTE entry count - bits 10-19 are zero PTE entry count Because the mapping and unmapping of PTE entries are under pte_lock, there is no concurrent thread to modify pte_refcount, so pte_refcount can be a non-atomic variable with little performance overhead. Signed-off-by: Qi Zheng --- include/linux/mm.h | 2 ++ include/linux/mm_types.h | 1 + include/linux/pte_ref.h | 23 +++++++++++++ mm/Makefile | 2 +- mm/pte_ref.c | 72 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 99 insertions(+), 1 deletion(-) create mode 100644 include/linux/pte_ref.h create mode 100644 mm/pte_ref.c diff --git a/include/linux/mm.h b/include/linux/mm.h index 7898e29bcfb5..23e2f1e75b4b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -28,6 +28,7 @@ #include #include #include +#include =20 struct mempolicy; struct anon_vma; @@ -2336,6 +2337,7 @@ static inline bool pgtable_pte_page_ctor(struct page = *page) return false; __SetPageTable(page); inc_lruvec_page_state(page, NR_PAGETABLE); + pte_ref_init(page); return true; } =20 diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c29ab4c0cd5c..da2738f87737 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -153,6 +153,7 @@ struct page { union { struct mm_struct *pt_mm; /* x86 pgds only */ atomic_t pt_frag_refcount; /* powerpc */ + unsigned long pte_refcount; /* only for PTE page */ }; #if ALLOC_SPLIT_PTLOCKS spinlock_t *ptl; diff --git a/include/linux/pte_ref.h b/include/linux/pte_ref.h new file mode 100644 index 000000000000..db14e03e1dff --- /dev/null +++ b/include/linux/pte_ref.h @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2022, ByteDance. All rights reserved. + * + * Author: Qi Zheng + */ + +#ifndef _LINUX_PTE_REF_H +#define _LINUX_PTE_REF_H + +#ifdef CONFIG_FREE_USER_PTE + +void pte_ref_init(pgtable_t pte); + +#else /* !CONFIG_FREE_USER_PTE */ + +static inline void pte_ref_init(pgtable_t pte) +{ +} + +#endif /* CONFIG_FREE_USER_PTE */ + +#endif /* _LINUX_PTE_REF_H */ diff --git a/mm/Makefile b/mm/Makefile index 6f9ffa968a1a..f8fa5078a13d 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -54,7 +54,7 @@ obj-y :=3D filemap.o mempool.o oom_kill.o fadvise.o \ mm_init.o percpu.o slab_common.o \ compaction.o vmacache.o \ interval_tree.o list_lru.o workingset.o \ - debug.o gup.o mmap_lock.o $(mmu-y) + debug.o gup.o mmap_lock.o $(mmu-y) pte_ref.o =20 # Give 'page_alloc' its own module-parameter namespace page-alloc-y :=3D page_alloc.o diff --git a/mm/pte_ref.c b/mm/pte_ref.c new file mode 100644 index 000000000000..12b27646e88c --- /dev/null +++ b/mm/pte_ref.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2022, ByteDance. All rights reserved. + * + * Author: Qi Zheng + */ +#include +#include + +#ifdef CONFIG_FREE_USER_PTE + +/* + * For a PTE page where all entries are empty, we call it empty PTE page. = For a + * PTE page where all page table entries are mapped to zero pages, we call= it + * zero PTE page. + * + * The page table entries for both types of PTE pages do not record "meani= ngful" + * information, so we can try to free these PTE pages at some point (such = as + * when memory pressure is high) to reclaim more memory. + * + * We put the mapped and zero PTE entry counter into the pte_refcount of t= he + * PTE page. The bitmask has the following meaning: + * + * - bits 0-9 are mapped PTE entry count + * - bits 10-19 are zero PTE entry count + * + * Because the mapping and unmapping of PTE entries are under pte_lock, th= ere is + * no concurrent thread to modify pte_refcount, so pte_refcount can be a + * non-atomic variable with little performance overhead. + */ +#define PTE_MAPPED_BITS 10 +#define PTE_ZERO_BITS 10 + +#define PTE_MAPPED_SHIFT 0 +#define PTE_ZERO_SHIFT (PTE_MAPPED_SHIFT + PTE_MAPPED_BITS) + +#define __PTE_REF_MASK(x) ((1UL << (x))-1) + +#define PTE_MAPPED_MASK (__PTE_REF_MASK(PTE_MAPPED_BITS) << PTE_MAPPED_SHI= FT) +#define PTE_ZERO_MASK (__PTE_REF_MASK(PTE_ZERO_BITS) << PTE_ZERO_SHIFT) + +#define PTE_MAPPED_OFFSET (1UL << PTE_MAPPED_SHIFT) +#define PTE_ZERO_OFFSET (1UL << PTE_ZERO_SHIFT) + +static inline unsigned long pte_refcount(pgtable_t pte) +{ + return pte->pte_refcount; +} + +#define pte_mapped_count(pte) \ + ((pte_refcount(pte) & PTE_MAPPED_MASK) >> PTE_MAPPED_SHIFT) +#define pte_zero_count(pte) \ + ((pte_refcount(pte) & PTE_ZERO_MASK) >> PTE_ZERO_SHIFT) + +static __always_inline void pte_refcount_add(struct mm_struct *mm, + pgtable_t pte, int val) +{ + pte->pte_refcount +=3D val; +} + +static __always_inline void pte_refcount_sub(struct mm_struct *mm, + pgtable_t pte, int val) +{ + pte->pte_refcount -=3D val; +} + +void pte_ref_init(pgtable_t pte) +{ + pte->pte_refcount =3D 0; +} + +#endif /* CONFIG_FREE_USER_PTE */ --=20 2.20.1 From nobody Fri Dec 19 07:18:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53559C04AA5 for ; Thu, 25 Aug 2022 10:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241112AbiHYKMg (ORCPT ); Thu, 25 Aug 2022 06:12:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241054AbiHYKMK (ORCPT ); Thu, 25 Aug 2022 06:12:10 -0400 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DC93ABF3C for ; Thu, 25 Aug 2022 03:12:01 -0700 (PDT) Received: by mail-pf1-x434.google.com with SMTP id t129so538502pfb.6 for ; Thu, 25 Aug 2022 03:12:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=88xUUyVXHxDMCZmOYvZcYjojlwDW8mxu4+gwmvT3Uwg=; b=gCJc7t/irCxh/NsMtY1y28pn7+kS/+rCgGiEHuh81J6ucr5EX2vhMnTKm6bgUjyY0A FSH1ktN+iUPsTs7yW0vMLvS8VlTwmoaCjW/xDnuFk754nlVaaV5Sknqz7pgkUiF2NyNw kFNyD315ZQdYYefBhtxUO3hwVbrJh72K9xlvUHcJJytF8Hm+UucnXUn/bJV6GXT3KUbO qC74CEILcwNgX2/ksJAFI1VKjObcAa5bb61duB1Unbfwa4/HOAZINmkbhLrg/n2UTAM9 Y1LJVRSnkqYB4JH0aUKiKOhE31G/bmczwx7n5ekrUGCJhmRUyWj6K+A5mw3aTIe+aRlY 6TNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=88xUUyVXHxDMCZmOYvZcYjojlwDW8mxu4+gwmvT3Uwg=; b=hAO3w0b0e+cXrluDqkYn56W5ZRZ8jIn6fsYa0MHUVPNaaxJhft1MuwA6Xae7uGkVbs rq9EYwstegJ1YUOMCr9EhGRoWZrfySHqI0uRFaqMiMj8Dlon6P9xmB6ROO6Vd/vQUcOC eyZ+IcIN20C03peF/xCHwbrzSA/lToJfp5OmBnhOite9jFqsVWw1FEYMtrmZ+U4Ao+Ta QiFnuw+WA52xTGGas6EA9hAR5WiuFG5oAiodwx39ojdVyA+vIyWNxNJ9DcfTiwjbG2UX 5iskUnDO2Uyrm8OF2n3loaGo6X8yyZ0x5ruKLfGA9KBccSaxkUC5ZOOzVZDS/ZkOhvQJ /5mA== X-Gm-Message-State: ACgBeo1At/iA1EZ5MwKk0LdWOIyFtqd59pAhT62V19mWjMFpRsYrZvFO cJgeazuHgb3wCbzNBHutArHK0Q== X-Google-Smtp-Source: AA6agR7wR0oMHsFEiShRabwQxkM0xbplFH+VDCYoO8SCvwJvs/eBFuYQvXDb00xbfLmujhgctEuB0w== X-Received: by 2002:a05:6a00:8c8:b0:52c:887d:fa25 with SMTP id s8-20020a056a0008c800b0052c887dfa25mr3524872pfu.86.1661422320411; Thu, 25 Aug 2022 03:12:00 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id b10-20020a170903228a00b001714c36a6e7sm8477581plh.284.2022.08.25.03.11.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 03:11:59 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@redhat.com, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, jgg@nvidia.com, tglx@linutronix.de, willy@infradead.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, muchun.song@linux.dev, Qi Zheng Subject: [RFC PATCH 5/7] pte_ref: add track_pte_{set, clear}() helper Date: Thu, 25 Aug 2022 18:10:35 +0800 Message-Id: <20220825101037.96517-6-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20220825101037.96517-1-zhengqi.arch@bytedance.com> References: <20220825101037.96517-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The track_pte_set() is used to track the setting of the PTE page table entry, and the track_pte_clear() is used to track the clearing of the PTE page table entry, we update the pte_refcount of the PTE page in these two functions. In this way, the usage of the PTE page table page can be tracked by its pte_refcount. Signed-off-by: Qi Zheng --- include/linux/pte_ref.h | 13 +++++++++++++ mm/pte_ref.c | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+) diff --git a/include/linux/pte_ref.h b/include/linux/pte_ref.h index db14e03e1dff..ab49c7fac120 100644 --- a/include/linux/pte_ref.h +++ b/include/linux/pte_ref.h @@ -12,12 +12,25 @@ =20 void pte_ref_init(pgtable_t pte); =20 +void track_pte_set(struct mm_struct *mm, unsigned long addr, pte_t *ptep, + pte_t pte); +void track_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, + pte_t pte); #else /* !CONFIG_FREE_USER_PTE */ =20 static inline void pte_ref_init(pgtable_t pte) { } =20 +static inline void track_pte_set(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} + +static inline void track_pte_clear(struct mm_struct *mm, unsigned long add= r, + pte_t *ptep, pte_t pte) +{ +} #endif /* CONFIG_FREE_USER_PTE */ =20 #endif /* _LINUX_PTE_REF_H */ diff --git a/mm/pte_ref.c b/mm/pte_ref.c index 12b27646e88c..818821d068af 100644 --- a/mm/pte_ref.c +++ b/mm/pte_ref.c @@ -69,4 +69,40 @@ void pte_ref_init(pgtable_t pte) pte->pte_refcount =3D 0; } =20 +void track_pte_set(struct mm_struct *mm, unsigned long addr, pte_t *ptep, + pte_t pte) +{ + pgtable_t page; + + if (&init_mm =3D=3D mm || pte_huge(pte)) + return; + + page =3D pte_to_page(ptep); + if (pte_none(*ptep) && !pte_none(pte)) { + pte_refcount_add(mm, page, PTE_MAPPED_OFFSET); + if (is_zero_pfn(pte_pfn(pte))) + pte_refcount_add(mm, page, PTE_ZERO_OFFSET); + } else if (is_zero_pfn(pte_pfn(*ptep)) && !is_zero_pfn(pte_pfn(pte))) { + pte_refcount_sub(mm, page, PTE_ZERO_OFFSET); + } +} +EXPORT_SYMBOL(track_pte_set); + +void track_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, + pte_t pte) +{ + pgtable_t page; + + if (&init_mm =3D=3D mm || pte_huge(pte)) + return; + + page =3D pte_to_page(ptep); + if (!pte_none(pte)) { + pte_refcount_sub(mm, page, PTE_MAPPED_OFFSET); + if (is_zero_pfn(pte_pfn(pte))) + pte_refcount_sub(mm, page, PTE_ZERO_OFFSET); + } +} +EXPORT_SYMBOL(track_pte_clear); + #endif /* CONFIG_FREE_USER_PTE */ --=20 2.20.1 From nobody Fri Dec 19 07:18:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65A1FC04AA5 for ; Thu, 25 Aug 2022 10:12:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241182AbiHYKMr (ORCPT ); Thu, 25 Aug 2022 06:12:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241029AbiHYKMT (ORCPT ); Thu, 25 Aug 2022 06:12:19 -0400 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9FBDCA7A82 for ; Thu, 25 Aug 2022 03:12:07 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id jm11so18083870plb.13 for ; Thu, 25 Aug 2022 03:12:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=N/hACB6DJOnbW9YdcxCL4bYdw5iH8tr2HnsmRUbA/2k=; b=QK9DEIvE1XdTmyK0lBvfluXy8RYM/LAWM6lwGwePhGiIuALjqlWe153rtG2hWbimgP tADUbPJWllZwAGBl6q1qMoG7GiVV4Zxg5mCWYr5gpaptx+0cYenTa8MYu8BxylD1BYIR MuGpFgTWwlSreVJQ1YFHSoT6iayMMrkysAJwtTyc5sV4ffpifFreiRkvBdACM872Q6tU dkOoQSesVJHmEDw19sbAyC6q21uVp3rx9VXqWv5ZjPzjubyss70MeyGWM9AFYgnuU659 NiJrSv8GF1blOGvs/AogG2NrOnNaFsFxF+R6PRJfQ+I+YGspLrVxAViyyr0Bm4Nd080+ d75g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=N/hACB6DJOnbW9YdcxCL4bYdw5iH8tr2HnsmRUbA/2k=; b=fn/bJzgZgUcoKyIuOCNZ/krMIdaXpng9nAH/L0WHLjKdNa8a13Q6jYEK6lWTXqIyBF GXmMNKfOrnBInBMcuqI+Nbxb0qQH0aRwfckQ56nhaKrLEm0kUzW3u/qjqLqFlLAbvjHn B83VqWwSCtBqcftHy0NkRE5k/OAVs7NUjvK/ExAraAH+aBSkGga564Bdy2CZKE5BWkvO Xv0ZTlFlX45lTWrb7SEi7De549O/KjD22uiblGC8Nqn0qhL0GTpqlAy5FXmqXS0rbfGz yPVxQ1FvjATZLkmNrATXZTF10aPTCQR0FtDw4ZZC+h1wCvTixHiNftqt+4gSo+/4Freq S8jA== X-Gm-Message-State: ACgBeo2VQlBP1V+wT81kDBzfRn88Tx3hdkBbNU80JyuHLlScqP4tL/Aj 8hZCDsvJ+KRKjVGnwfsBctk3Kg== X-Google-Smtp-Source: AA6agR5cOXU69J240S/HoYf6HtB5GQPmumANKU1IMKISK6IYS4mPO26cn0vvNtm0snnl5pnh+tc46Q== X-Received: by 2002:a17:90b:2496:b0:1ef:a94:7048 with SMTP id nt22-20020a17090b249600b001ef0a947048mr3877401pjb.244.1661422326951; Thu, 25 Aug 2022 03:12:06 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id b10-20020a170903228a00b001714c36a6e7sm8477581plh.284.2022.08.25.03.12.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 03:12:06 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@redhat.com, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, jgg@nvidia.com, tglx@linutronix.de, willy@infradead.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, muchun.song@linux.dev, Qi Zheng Subject: [RFC PATCH 6/7] x86/mm: add x86_64 support for pte_ref Date: Thu, 25 Aug 2022 18:10:36 +0800 Message-Id: <20220825101037.96517-7-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20220825101037.96517-1-zhengqi.arch@bytedance.com> References: <20220825101037.96517-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add pte_ref hooks into routines that modify user PTE page tables, and select ARCH_SUPPORTS_FREE_USER_PTE, so that the pte_ref code can be compiled and worked on this architecture. Signed-off-by: Qi Zheng --- arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable.h | 4 ++++ include/linux/pgtable.h | 1 + 3 files changed, 6 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 52a7f91527fe..50215b05723e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -34,6 +34,7 @@ config X86_64 select SWIOTLB select ARCH_HAS_ELFCORE_COMPAT select ZONE_DMA32 + select ARCH_SUPPORTS_FREE_USER_PTE =20 config FORCE_DYNAMIC_FTRACE def_bool y diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 44e2d6f1dbaa..cbfcfa497fb9 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -23,6 +23,7 @@ #include #include #include +#include =20 extern pgd_t early_top_pgt[PTRS_PER_PGD]; bool __init __early_make_pgtable(unsigned long address, pmdval_t pmd); @@ -1005,6 +1006,7 @@ static inline void set_pte_at(struct mm_struct *mm, u= nsigned long addr, pte_t *ptep, pte_t pte) { page_table_check_pte_set(mm, addr, ptep, pte); + track_pte_set(mm, addr, ptep, pte); set_pte(ptep, pte); } =20 @@ -1050,6 +1052,7 @@ static inline pte_t ptep_get_and_clear(struct mm_stru= ct *mm, unsigned long addr, { pte_t pte =3D native_ptep_get_and_clear(ptep); page_table_check_pte_clear(mm, addr, pte); + track_pte_clear(mm, addr, ptep, pte); return pte; } =20 @@ -1066,6 +1069,7 @@ static inline pte_t ptep_get_and_clear_full(struct mm= _struct *mm, */ pte =3D native_local_ptep_get_and_clear(ptep); page_table_check_pte_clear(mm, addr, pte); + track_pte_clear(mm, addr, ptep, pte); } else { pte =3D ptep_get_and_clear(mm, addr, ptep); } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index c4a6bda6e965..908636f48c95 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -276,6 +276,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct= *mm, pte_t pte =3D *ptep; pte_clear(mm, address, ptep); page_table_check_pte_clear(mm, address, pte); + track_pte_clear(mm, address, ptep, pte); return pte; } #endif --=20 2.20.1 From nobody Fri Dec 19 07:18:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2631BC04AA5 for ; Thu, 25 Aug 2022 10:12:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239719AbiHYKM6 (ORCPT ); Thu, 25 Aug 2022 06:12:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241075AbiHYKMW (ORCPT ); Thu, 25 Aug 2022 06:12:22 -0400 Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9FA7BAC259 for ; Thu, 25 Aug 2022 03:12:12 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id r22so17459474pgm.5 for ; Thu, 25 Aug 2022 03:12:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=4+N7HYlfplazgpIqzrok71tJ5+HL03OgoPpZtaJJ/JA=; b=Bu7SsRr4s/1augCft2+vJK75ZrctEK6Iyfqu3Pqt47wpkqwjMjUrUC1JkgguNrwL13 2hwv5VLcCfH+1TdJctlmV4tEW0D5p2uuFLXfW83lWXUXrRqn7wVkKO6VfJ+TG0znQv03 iVFjsT+BlKjd/AFLzEoJHLCxK2LgbnhjLHjQ2I0kzLOzbt/5nN04MawaWeD4I0Qm64w2 E9MSbNuPJ3pLKOYKGmuvPUahHRz+8UnEMIUSOM24Pd3OQIg9OBXQ7WM9H2O2mD3CUq/L a6iPYt/Xvmok3ln2e30SBo3/3tTpT4vmSws68CmrOOrJk1agvp1haB8koBrCgA4j8OJl 16qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=4+N7HYlfplazgpIqzrok71tJ5+HL03OgoPpZtaJJ/JA=; b=CObyG5vZjYIbGsWq5dQjmkYIvk9eGN307sThcjOObpIwkGNAmJPbCIs7EuYAaVRzme zbwip5WXmPCbMJtk9qZVJRr4NKOg6ISFt9VbnfyjI9R+kJ8Yh6m9l89HE0SeIPFuhCDs gBAK4j2C3uN6wzH+RuxRVEXgAXQihczS0uTJp0aP3J1bpljkrsUcf648ERmz653NoHiU GYEydptuMOyr9m6FozsCJYFOf83Ak8D3v6cOFWqQUkRViRbbJpWMRJ9j4ngtXb5pnf73 x3UwAOZSD86fEtsZW9jisqgbWqaafmbCNSt3xwwU2ovhHqa/8bioxEYFHvqUCyoHjX9O fUSA== X-Gm-Message-State: ACgBeo3JiPpNJPpDuQqC7yFtRK238dIL16KX29VUNRUt4aT53GrDN4/0 4j5wvWC9vcMiPrnfmLUbKprCkg== X-Google-Smtp-Source: AA6agR4E8A9z3wJppoQNhY57GIv03JlBORZW3VAf7AyAGpWyEGHzTY8vAaMpe8BdYq7iY2OzpCD70Q== X-Received: by 2002:a63:e851:0:b0:42a:3bc0:9ad1 with SMTP id a17-20020a63e851000000b0042a3bc09ad1mr2612576pgk.543.1661422332147; Thu, 25 Aug 2022 03:12:12 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id b10-20020a170903228a00b001714c36a6e7sm8477581plh.284.2022.08.25.03.12.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 03:12:11 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@redhat.com, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, jgg@nvidia.com, tglx@linutronix.de, willy@infradead.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, muchun.song@linux.dev, Qi Zheng Subject: [RFC PATCH 7/7] mm: add proc interface to free user PTE page table pages Date: Thu, 25 Aug 2022 18:10:37 +0800 Message-Id: <20220825101037.96517-8-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20220825101037.96517-1-zhengqi.arch@bytedance.com> References: <20220825101037.96517-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add /proc/sys/vm/free_ptes file to procfs, when pid is written to the file, we will traverse its process address space, find and free empty PTE pages or zero PTE pages. Signed-off-by: Qi Zheng --- include/linux/pte_ref.h | 5 ++ kernel/sysctl.c | 12 ++++ mm/pte_ref.c | 126 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 143 insertions(+) diff --git a/include/linux/pte_ref.h b/include/linux/pte_ref.h index ab49c7fac120..f7e244129291 100644 --- a/include/linux/pte_ref.h +++ b/include/linux/pte_ref.h @@ -16,6 +16,11 @@ void track_pte_set(struct mm_struct *mm, unsigned long a= ddr, pte_t *ptep, pte_t pte); void track_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte); + +int free_ptes_sysctl_handler(struct ctl_table *table, int write, + void *buffer, size_t *length, loff_t *ppos); +extern int sysctl_free_ptes_pid; + #else /* !CONFIG_FREE_USER_PTE */ =20 static inline void pte_ref_init(pgtable_t pte) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 35d034219513..14e1a9841cb8 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -64,6 +64,7 @@ #include #include #include +#include =20 #include "../lib/kstrtox.h" =20 @@ -2153,6 +2154,17 @@ static struct ctl_table vm_table[] =3D { .extra1 =3D SYSCTL_ONE, .extra2 =3D SYSCTL_FOUR, }, +#ifdef CONFIG_FREE_USER_PTE + { + .procname =3D "free_ptes", + .data =3D &sysctl_free_ptes_pid, + .maxlen =3D sizeof(int), + .mode =3D 0200, + .proc_handler =3D free_ptes_sysctl_handler, + .extra1 =3D SYSCTL_ZERO, + .extra2 =3D SYSCTL_INT_MAX, + }, +#endif #ifdef CONFIG_COMPACTION { .procname =3D "compact_memory", diff --git a/mm/pte_ref.c b/mm/pte_ref.c index 818821d068af..e7080a3100a6 100644 --- a/mm/pte_ref.c +++ b/mm/pte_ref.c @@ -6,6 +6,14 @@ */ #include #include +#include +#include +#include +#include +#include +#include + +#include "internal.h" =20 #ifdef CONFIG_FREE_USER_PTE =20 @@ -105,4 +113,122 @@ void track_pte_clear(struct mm_struct *mm, unsigned l= ong addr, pte_t *ptep, } EXPORT_SYMBOL(track_pte_clear); =20 +#ifdef CONFIG_DEBUG_VM +void pte_free_debug(pmd_t pmd) +{ + pte_t *ptep =3D (pte_t *)pmd_page_vaddr(pmd); + int i =3D 0; + + for (i =3D 0; i < PTRS_PER_PTE; i++, ptep++) { + pte_t pte =3D *ptep; + BUG_ON(!(pte_none(pte) || is_zero_pfn(pte_pfn(pte)))); + } +} +#else +static inline void pte_free_debug(pmd_t pmd) +{ +} +#endif + + +static int kfreeptd_pmd_entry(pmd_t *pmd, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + pmd_t pmdval; + pgtable_t page; + struct mm_struct *mm =3D walk->mm; + struct vm_area_struct vma =3D TLB_FLUSH_VMA(mm, 0); + spinlock_t *ptl; + bool free =3D false; + unsigned long haddr =3D addr & PMD_MASK; + + if (pmd_trans_unstable(pmd)) + goto out; + + mmap_read_unlock(mm); + mmap_write_lock(mm); + + if (mm_find_pmd(mm, addr) !=3D pmd) + goto unlock_out; + + ptl =3D pmd_lock(mm, pmd); + pmdval =3D *pmd; + if (pmd_none(pmdval) || pmd_leaf(pmdval)) { + spin_unlock(ptl); + goto unlock_out; + } + page =3D pmd_pgtable(pmdval); + if (!pte_mapped_count(page) || pte_zero_count(page) =3D=3D PTRS_PER_PTE) { + pmd_clear(pmd); + flush_tlb_range(&vma, haddr, haddr + PMD_SIZE); + free =3D true; + } + spin_unlock(ptl); + +unlock_out: + mmap_write_unlock(mm); + mmap_read_lock(mm); + + if (free) { + pte_free_debug(pmdval); + mm_dec_nr_ptes(mm); + pgtable_pte_page_dtor(page); + __free_page(page); + } + +out: + cond_resched(); + return 0; +} + +static const struct mm_walk_ops kfreeptd_walk_ops =3D { + .pmd_entry =3D kfreeptd_pmd_entry, +}; + +int sysctl_free_ptes_pid; +int free_ptes_sysctl_handler(struct ctl_table *table, int write, + void *buffer, size_t *length, loff_t *ppos) +{ + int ret; + + ret =3D proc_dointvec_minmax(table, write, buffer, length, ppos); + if (ret) + return ret; + if (write) { + struct task_struct *task; + struct mm_struct *mm; + + rcu_read_lock(); + task =3D find_task_by_vpid(sysctl_free_ptes_pid); + if (!task) { + rcu_read_unlock(); + return -ESRCH; + } + mm =3D get_task_mm(task); + rcu_read_unlock(); + + if (!mm) { + mmput(mm); + return -ESRCH; + } + + do { + ret =3D -EBUSY; + + if (mmap_read_trylock(mm)) { + ret =3D walk_page_range(mm, FIRST_USER_ADDRESS, + ULONG_MAX, + &kfreeptd_walk_ops, NULL); + + mmap_read_unlock(mm); + } + + cond_resched(); + } while (ret =3D=3D -EAGAIN); + + mmput(mm); + } + return ret; +} + #endif /* CONFIG_FREE_USER_PTE */ --=20 2.20.1