From nobody Sat Dec 13 22:52:59 2025 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0E361B9831 for ; Wed, 4 Dec 2024 11:10:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310625; cv=none; b=fBx+vX5cuypEKHOULn18/x0wdLoP+N4UDxfQ1QSExihrDgq9veM/sCMn5BTgEvXAvvf5GQUGY8D3QG8g+HPcKOUtl7sOVokmfhp9PJTHPN11tBsqSuQP6+oZ+57BVGNiPkH+a94qDKo51ehCASwAHVXCNl3Ph6C4ItWj5DWRoQo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310625; c=relaxed/simple; bh=FEBZWUR9v7E8KgYvnYm6uhEr4btIUVDaZhQxoD4rWgQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=p4QN4pHPlqjOBWuvGyFErNAe0ti4OX5gbQaZoEgwh0x2rDvz3NYWnyxd6xXzCNjKxmrR31oFL9jQuWycK48KgvDVw4Gl+jbLbaSjlzhxEAZ5kpg3vKhsudp2nznZBHTqwdBdHfJ4IgktB/Fi5KGpKVJZpATUgr1cnQjrLTJ05y4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=gDYDwL7d; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="gDYDwL7d" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-724f383c5bfso5157394b3a.1 for ; Wed, 04 Dec 2024 03:10:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310623; x=1733915423; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=J1cxqrRNLtVgEINqhNBjpEtuU6ycwPEQ1YtKh4XaP64=; b=gDYDwL7dDOyyaTtP2iqHdw2xPvs5anu7EvlvMTiROCn/xGqU/etcElPNRIzv/hyHNP +y+Sxw1nL/D8aZQ+oL+ZxDHhSEe7Y1RjfDPHqkNNf7lTFAJt2j/TYZ3PKj3cyjnTycFM fy4fB0Wuz5wFNs2t0in+R+q3m0UasISCGPFrfzAeLkHB0XRo58kxzgNHIG/fRsHjIcNq 6VT2pzALGDHnf/JTrngCNplaMPDNWOgZF1aJQPDwX3De+4+JntPA3Xu4umTeDXYQeLq4 6cR/yPezaBkF66XQi5BfqCvnBSDuguTOossd+MX6IabHGcME5qhHpZDbBEh2w3ymY3tg EcEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310623; x=1733915423; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=J1cxqrRNLtVgEINqhNBjpEtuU6ycwPEQ1YtKh4XaP64=; b=nxUlhi50J7lMi5t2bs5wsE46GQOmbIxOCvTNJfmcbdRqM+6+PwrbY0gV7CX/XpeZq0 VLDKfR70l4u56jeNCv+YT/01zUyf+FxFRaMbapFeFYM+gxkUQuDHytzwQaUHGIkbSjom qW17zPXL2iYdS/+ny9yoKct621UIbXRo4gwNOk95/GDXKqhuVPdMkAfuf6ndCBleYS5w 0SlpV8yd7MjqlE+usFFgljbrJ0JrPJH3HYY3KskQOzoJidWv7qwTBMHviM40RmT+8yRT Vp/p87cu5LsqaagYJ8G7w9jvXF4o5uHdDqFTIAdrlK8OV2+s4y1lCASy+ga6xQ51Mzni duBA== X-Forwarded-Encrypted: i=1; AJvYcCUMQwz1wVUvxhvJK8MRIlw9Icj7566AWOAPGXzBCaYcIeKY5uZdyyz7aiIePepZaNU3Ga+ULXYoLGGmfWs=@vger.kernel.org X-Gm-Message-State: AOJu0YzSV7phGb25vYOG2LGk4/3cxheQg+GacqQCve7qDge4RsKauzuI 4NVchA+bYudJIXZlWzSY3BIrvGLQ4jygeBvtXkpLjxXLfaw8cbSCdh700jD9ZoY= X-Gm-Gg: ASbGncs6KzW6LRAATsWsXjPIat1n7+SPoPTkPY3QFZXFMoDO9MqM+yNKrnhYl5CsVrl Kkh8idhU1tIrWeeldc4I5+TewoqSdIFUfrKzWvHXovwrSu7f9bX8H326M7wz8fy+jegry2ypoiV cQBDOBWzQeOeHb2Nx5lb9j33WWD+LsoyBoCogLl6ldrcWw52oAJ3K4SqdPaiGK1HmrUVfxMaI6H bB6YL8IQxo7yxlvRSLmXIhZVsleKyZHnkq199NYKCiYsH1Tpyj4XWkTr3EPls3N8VSGga7dZx8S wemO377i4U03Pwo= X-Google-Smtp-Source: AGHT+IF1f5oofbaWvETvYEiHLO/zdFbjQACTN4HhvvjpL8HNWx6S5PsEo00LYzRIYs4wKfqYTCWGBg== X-Received: by 2002:a17:902:e549:b0:215:6e28:827c with SMTP id d9443c01a7336-215d00f6d59mr48414015ad.56.1733310623257; Wed, 04 Dec 2024 03:10:23 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.10.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:10:22 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 01/11] mm: khugepaged: recheck pmd state in retract_page_tables() Date: Wed, 4 Dec 2024 19:09:41 +0800 Message-Id: <70a51804cd19d44ccaf031825d9fb6eaf92f2bad.1733305182.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In retract_page_tables(), the lock of new_folio is still held, we will be blocked in the page fault path, which prevents the pte entries from being set again. So even though the old empty PTE page may be concurrently freed and a new PTE page is filled into the pmd entry, it is still empty and can be removed. So just refactor the retract_page_tables() a little bit and recheck the pmd state after holding the pmd lock. Suggested-by: Jann Horn Signed-off-by: Qi Zheng --- Documentation/mm/process_addrs.rst | 4 +++ mm/khugepaged.c | 45 ++++++++++++++++++++---------- 2 files changed, 35 insertions(+), 14 deletions(-) diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_= addrs.rst index 1d416658d7f59..81417fa2ed20b 100644 --- a/Documentation/mm/process_addrs.rst +++ b/Documentation/mm/process_addrs.rst @@ -531,6 +531,10 @@ are extra requirements for accessing them: new page table has been installed in the same location and filled with entries. Writers normally need to take the PTE lock and revalidate that = the PMD entry still refers to the same PTE-level page table. + If the writer does not care whether it is the same PTE-level page table,= it + can take the PMD lock and revalidate that the contents of pmd entry stil= l meet + the requirements. In particular, this also happens in :c:func:`!retract_= page_tables` + when handling :c:macro:`!MADV_COLLAPSE`. =20 To access PTE-level page tables, a helper like :c:func:`!pte_offset_map_lo= ck` or :c:func:`!pte_offset_map` can be used depending on stability requirements. diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6f8d46d107b4b..99dc995aac110 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -947,17 +947,10 @@ static int hugepage_vma_revalidate(struct mm_struct *= mm, unsigned long address, return SCAN_SUCCEED; } =20 -static int find_pmd_or_thp_or_none(struct mm_struct *mm, - unsigned long address, - pmd_t **pmd) +static inline int check_pmd_state(pmd_t *pmd) { - pmd_t pmde; + pmd_t pmde =3D pmdp_get_lockless(pmd); =20 - *pmd =3D mm_find_pmd(mm, address); - if (!*pmd) - return SCAN_PMD_NULL; - - pmde =3D pmdp_get_lockless(*pmd); if (pmd_none(pmde)) return SCAN_PMD_NONE; if (!pmd_present(pmde)) @@ -971,6 +964,17 @@ static int find_pmd_or_thp_or_none(struct mm_struct *m= m, return SCAN_SUCCEED; } =20 +static int find_pmd_or_thp_or_none(struct mm_struct *mm, + unsigned long address, + pmd_t **pmd) +{ + *pmd =3D mm_find_pmd(mm, address); + if (!*pmd) + return SCAN_PMD_NULL; + + return check_pmd_state(*pmd); +} + static int check_pmd_still_valid(struct mm_struct *mm, unsigned long address, pmd_t *pmd) @@ -1720,7 +1724,7 @@ static void retract_page_tables(struct address_space = *mapping, pgoff_t pgoff) pmd_t *pmd, pgt_pmd; spinlock_t *pml; spinlock_t *ptl; - bool skipped_uffd =3D false; + bool success =3D false; =20 /* * Check vma->anon_vma to exclude MAP_PRIVATE mappings that @@ -1757,6 +1761,19 @@ static void retract_page_tables(struct address_space= *mapping, pgoff_t pgoff) mmu_notifier_invalidate_range_start(&range); =20 pml =3D pmd_lock(mm, pmd); + /* + * The lock of new_folio is still held, we will be blocked in + * the page fault path, which prevents the pte entries from + * being set again. So even though the old empty PTE page may be + * concurrently freed and a new PTE page is filled into the pmd + * entry, it is still empty and can be removed. + * + * So here we only need to recheck if the state of pmd entry + * still meets our requirements, rather than checking pmd_same() + * like elsewhere. + */ + if (check_pmd_state(pmd) !=3D SCAN_SUCCEED) + goto drop_pml; ptl =3D pte_lockptr(mm, pmd); if (ptl !=3D pml) spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); @@ -1770,20 +1787,20 @@ static void retract_page_tables(struct address_spac= e *mapping, pgoff_t pgoff) * repeating the anon_vma check protects from one category, * and repeating the userfaultfd_wp() check from another. */ - if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) { - skipped_uffd =3D true; - } else { + if (likely(!vma->anon_vma && !userfaultfd_wp(vma))) { pgt_pmd =3D pmdp_collapse_flush(vma, addr, pmd); pmdp_get_lockless_sync(); + success =3D true; } =20 if (ptl !=3D pml) spin_unlock(ptl); +drop_pml: spin_unlock(pml); =20 mmu_notifier_invalidate_range_end(&range); =20 - if (!skipped_uffd) { + if (success) { mm_dec_nr_ptes(mm); page_table_check_pte_clear_range(mm, addr, pgt_pmd); pte_free_defer(mm, pmd_pgtable(pgt_pmd)); --=20 2.20.1 From nobody Sat Dec 13 22:52:59 2025 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92ADB1B4130 for ; Wed, 4 Dec 2024 11:10:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310633; cv=none; b=NkR4vpH+eazqvshL+Fn5whed8fxhbrmq6ZxKh9eEmEHw9fIIc4fSHaWT8HTDXAUiw7wNiEUwLCAWo+vCMar+4SzylFUoDGAagSr5Pw0UITOxJdQ8N+7OjEs4FoiM9szUZlARMyt7NwvLBkzgXSz/6lbISlzmN1G58CX4yfJqHIY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310633; c=relaxed/simple; bh=hP7hUc+z7BZxPdYj1/2wMZfEErGlV5j6jXq2U6tDsOQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WuGPn0ofV4X4Ks0V7P5Xi7bYOrkNnNDPO/WeoP9KEZvue4fIc0QXliW64JM4jAen1OgSfmNhUp2Ii2nka7V5R4MOyDq/51gUBLKsglRwd+ajhJNvnFZ9YN3xmnfLgbJCwDbuoX4+RQvincmUQuWqyvoBUjcUqb1OK+8+mlw1G4U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=AZfrw58S; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="AZfrw58S" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-215cc7b0c56so14453425ad.3 for ; Wed, 04 Dec 2024 03:10:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310631; x=1733915431; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=z5NhkFR9u/ZCBqXODjRxrXNpbWksUAj27FTFaGEH9DI=; b=AZfrw58SvUb7o187X3TZRx0kYECmA3r3shludNJCjMNzWQDLcb80l2KM9TcmLUAY1e oWgBg8AKQ8ccHoc0CII7DD79WyOPWh11zupKpMTerTAwz7hXHTxjTuShjc+28jwQ2doJ MJBV2QOlDcdDduy0PiOiO4vDRYQ+F9zHdYeAgO5OKFQiyqu54AvXu+YslJPSRNSwGlKf 1mfgTMMUtK47t3VRdMCs7U9UhwhL6OHNNiVARCS2wp8n88yiOl6/d1WWZLRn2C73i9/c Esz94QI1OVzAN3DczQeM8GaSVf8DA8wK03jXz+xYyCbJ5UPJX5/QT9HGlm/cxR6ZrVRB urXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310631; x=1733915431; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=z5NhkFR9u/ZCBqXODjRxrXNpbWksUAj27FTFaGEH9DI=; b=eKg36Szw5EOqV9lN7YUpkXcHyx/v630ID1M/gEru0RaOAsPDEScasn/MCtbvT7FGfS zofm/t6AltyKJSfR3F1Aegj4dRsO8Vs9Znu6JQ6mZ139MS60xhfmSVmoc1Blqw1vTW+n DJIYrI2urN+yejlsV+cun6e9RdiuJ1bEPU2uxY7yWH9Eo9FfM5GBiqWf9ZwQbK/Rwg+x t9ppTD0vLjwytxnHOK1h76W1zQ2Gl8hNphDBXPEeLjRUYCYSUo4KZNC7eivCTKtA6GqG HgWXuQyaU2JJ3BpKbyG2q2VSi7c7x440MK0SQQUosH9I5A0KqxXA8+tjjYA25HvcBbdy O6CA== X-Forwarded-Encrypted: i=1; AJvYcCWigxkKkWvsi/PJm1ewBEB+cU7UOiisjJkJ9xgJMdu7v7CGe2O7iOuyO9yNMQ1y0TSPThydD138+vDWaWk=@vger.kernel.org X-Gm-Message-State: AOJu0YzXyaZLsDsnFdLo7WdvP6V8NZ2d5VbgrUg/ecgi82u0yC7LmtKk btxfE/e3t97fs8LWJ+ld2LCgTvixPJgDIC9FBmYWpydAYch8wtDSyFl9883JQMU= X-Gm-Gg: ASbGnctYIWRA9X6PkBZj7T5qg0A/nOcjeHhE7h3IbW6XILzGkGu69qzRK10g0BxyWuK OpmQIZl/2T9lrDUmgFgApDlOCWcD0y9atTg7rYWvxHIs1GgRBFHZSQXnvjZbDxWiX1cXfF/TBxw OR/DFClxb6Qinm2NbBAD0FKZcfHRQNjqtU+b8ggcrG5+tMFlZvtFe011mpiz/kvkRsPx1F3vVsl RIs5acz8kjKiJOro4HCKN+P8mGOxmcQK087sTQdyOqKYhDY4VkPaX19AODi3v1r/VWgc2mBx7a0 oFm2VVpzMTSlojs= X-Google-Smtp-Source: AGHT+IHr/3WEXhcUbT+3scjK3eDqtKNFZN/P/43E0ViNGyViHEQX/poSMnV5EjBiq3NGqRPWRhcOyA== X-Received: by 2002:a17:902:f64d:b0:215:9894:5679 with SMTP id d9443c01a7336-215bc42229emr92525605ad.0.1733310630948; Wed, 04 Dec 2024 03:10:30 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.10.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:10:30 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 02/11] mm: userfaultfd: recheck dst_pmd entry in move_pages_pte() Date: Wed, 4 Dec 2024 19:09:42 +0800 Message-Id: <8108c262757fc492626f3a2ffc44b775f2710e16.1733305182.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In move_pages_pte(), since dst_pte needs to be none, the subsequent pte_same() check cannot prevent the dst_pte page from being freed concurrently, so we also need to abtain dst_pmdval and recheck pmd_same(). Otherwise, once we support empty PTE page reclaimation for anonymous pages, it may result in moving the src_pte page into the dts_pte page that is about to be freed by RCU. Signed-off-by: Qi Zheng --- mm/userfaultfd.c | 51 +++++++++++++++++++++++++++++++----------------- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 60a0be33766ff..8e16dc290ddf1 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1020,6 +1020,14 @@ void double_pt_unlock(spinlock_t *ptl1, __release(ptl2); } =20 +static inline bool is_pte_pages_stable(pte_t *dst_pte, pte_t *src_pte, + pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval) +{ + return pte_same(ptep_get(src_pte), orig_src_pte) && + pte_same(ptep_get(dst_pte), orig_dst_pte) && + pmd_same(dst_pmdval, pmdp_get_lockless(dst_pmd)); +} =20 static int move_present_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma, @@ -1027,6 +1035,7 @@ static int move_present_pte(struct mm_struct *mm, unsigned long dst_addr, unsigned long src_addr, pte_t *dst_pte, pte_t *src_pte, pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval, spinlock_t *dst_ptl, spinlock_t *src_ptl, struct folio *src_folio) { @@ -1034,8 +1043,8 @@ static int move_present_pte(struct mm_struct *mm, =20 double_pt_lock(dst_ptl, src_ptl); =20 - if (!pte_same(ptep_get(src_pte), orig_src_pte) || - !pte_same(ptep_get(dst_pte), orig_dst_pte)) { + if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte, + dst_pmd, dst_pmdval)) { err =3D -EAGAIN; goto out; } @@ -1071,6 +1080,7 @@ static int move_swap_pte(struct mm_struct *mm, unsigned long dst_addr, unsigned long src_addr, pte_t *dst_pte, pte_t *src_pte, pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval, spinlock_t *dst_ptl, spinlock_t *src_ptl) { if (!pte_swp_exclusive(orig_src_pte)) @@ -1078,8 +1088,8 @@ static int move_swap_pte(struct mm_struct *mm, =20 double_pt_lock(dst_ptl, src_ptl); =20 - if (!pte_same(ptep_get(src_pte), orig_src_pte) || - !pte_same(ptep_get(dst_pte), orig_dst_pte)) { + if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte, + dst_pmd, dst_pmdval)) { double_pt_unlock(dst_ptl, src_ptl); return -EAGAIN; } @@ -1097,13 +1107,14 @@ static int move_zeropage_pte(struct mm_struct *mm, unsigned long dst_addr, unsigned long src_addr, pte_t *dst_pte, pte_t *src_pte, pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval, spinlock_t *dst_ptl, spinlock_t *src_ptl) { pte_t zero_pte; =20 double_pt_lock(dst_ptl, src_ptl); - if (!pte_same(ptep_get(src_pte), orig_src_pte) || - !pte_same(ptep_get(dst_pte), orig_dst_pte)) { + if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte, + dst_pmd, dst_pmdval)) { double_pt_unlock(dst_ptl, src_ptl); return -EAGAIN; } @@ -1136,6 +1147,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t= *dst_pmd, pmd_t *src_pmd, pte_t *src_pte =3D NULL; pte_t *dst_pte =3D NULL; pmd_t dummy_pmdval; + pmd_t dst_pmdval; struct folio *src_folio =3D NULL; struct anon_vma *src_anon_vma =3D NULL; struct mmu_notifier_range range; @@ -1148,11 +1160,11 @@ static int move_pages_pte(struct mm_struct *mm, pmd= _t *dst_pmd, pmd_t *src_pmd, retry: /* * Use the maywrite version to indicate that dst_pte will be modified, - * but since we will use pte_same() to detect the change of the pte - * entry, there is no need to get pmdval, so just pass a dummy variable - * to it. + * since dst_pte needs to be none, the subsequent pte_same() check + * cannot prevent the dst_pte page from being freed concurrently, so we + * also need to abtain dst_pmdval and recheck pmd_same() later. */ - dst_pte =3D pte_offset_map_rw_nolock(mm, dst_pmd, dst_addr, &dummy_pmdval, + dst_pte =3D pte_offset_map_rw_nolock(mm, dst_pmd, dst_addr, &dst_pmdval, &dst_ptl); =20 /* Retry if a huge pmd materialized from under us */ @@ -1161,7 +1173,11 @@ static int move_pages_pte(struct mm_struct *mm, pmd_= t *dst_pmd, pmd_t *src_pmd, goto out; } =20 - /* same as dst_pte */ + /* + * Unlike dst_pte, the subsequent pte_same() check can ensure the + * stability of the src_pte page, so there is no need to get pmdval, + * just pass a dummy variable to it. + */ src_pte =3D pte_offset_map_rw_nolock(mm, src_pmd, src_addr, &dummy_pmdval, &src_ptl); =20 @@ -1213,7 +1229,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t= *dst_pmd, pmd_t *src_pmd, err =3D move_zeropage_pte(mm, dst_vma, src_vma, dst_addr, src_addr, dst_pte, src_pte, orig_dst_pte, orig_src_pte, - dst_ptl, src_ptl); + dst_pmd, dst_pmdval, dst_ptl, src_ptl); goto out; } =20 @@ -1303,8 +1319,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t= *dst_pmd, pmd_t *src_pmd, =20 err =3D move_present_pte(mm, dst_vma, src_vma, dst_addr, src_addr, dst_pte, src_pte, - orig_dst_pte, orig_src_pte, - dst_ptl, src_ptl, src_folio); + orig_dst_pte, orig_src_pte, dst_pmd, + dst_pmdval, dst_ptl, src_ptl, src_folio); } else { entry =3D pte_to_swp_entry(orig_src_pte); if (non_swap_entry(entry)) { @@ -1319,10 +1335,9 @@ static int move_pages_pte(struct mm_struct *mm, pmd_= t *dst_pmd, pmd_t *src_pmd, goto out; } =20 - err =3D move_swap_pte(mm, dst_addr, src_addr, - dst_pte, src_pte, - orig_dst_pte, orig_src_pte, - dst_ptl, src_ptl); + err =3D move_swap_pte(mm, dst_addr, src_addr, dst_pte, src_pte, + orig_dst_pte, orig_src_pte, dst_pmd, + dst_pmdval, dst_ptl, src_ptl); } =20 out: --=20 2.20.1 From nobody Sat Dec 13 22:52:59 2025 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF1411B4130 for ; Wed, 4 Dec 2024 11:10:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310642; cv=none; b=F+qxKas5jOsafyUe0yiwcfhhunDoJTAKeo2ivUpuvYWfoiYuiGCa9LLCNPHsgY2PJrcdb/DRgc8TSxW+Xah+o9bUDd+XXa8yOBLdgeohjbHptqwDLH5MV2zkTJ3KKfmZmHmxzGW4csaEMiZIQDdiusi2SPtUqPPo1lFXdRC4KCI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310642; c=relaxed/simple; bh=7roOxUQ9JUnSMsdXAsiPgJXsMQAjgBLALyc1oPKltDs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ft66ukhaF6Rovmq487d4YEKd3fZloz17MSDsVBu1tiIgBMzdB3dcjvoO+Evor0Uvcw/7+mk85uPteI9lQR84vq/SkF7mLQrADCh2usTWHuBJ4VXp4ABRTugVTTrVVoODJn1KWoVNbM7ZmzAnnvynfBCxJ6qTvGQmla/feUjRpHs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=NDnfWb0H; arc=none smtp.client-ip=209.85.215.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="NDnfWb0H" Received: by mail-pg1-f169.google.com with SMTP id 41be03b00d2f7-7f46d5d1ad5so5079827a12.3 for ; Wed, 04 Dec 2024 03:10:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310640; x=1733915440; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uBIs+RyxQ8T92otn81RkrE5S0sfAaFlsvX0Ncx+wxck=; b=NDnfWb0H7kpbOSn5gT6sEI4/K5uyaQ9cVCItUOB1AoNhThwd5aLwC0E3rr9Bz6L/dw p9eIr9vJ4FCo+n7mL5FI1Os42xfwPbXUaxB5d/xjdrwawbR56e+uxQpeB2jb2Fu//WiA FJHFVgb1YGXyjsIK6pTlHdVs6QDuaCuEWkGL//vYYdlEBbCZmS8L2H9cPZc7QG7HxSeV 2t/OIVd5sCHDhtNeNJoxMMeBxNp/BPHFYq8yYmxL98NP+1LWGLjP6I/gvBHDMrVKxDHU DldERsIo4kwM026X3HpzvsAp7PPPcGOQF1sY2CnicK5p5Bq54Xtw8O02sUnPRqvKtriq DqCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310640; x=1733915440; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uBIs+RyxQ8T92otn81RkrE5S0sfAaFlsvX0Ncx+wxck=; b=S8Y0Y1/aR6c4VqicjEjBS9uS0VOatcQi660A88xCOA6uKVuKazvHBZnUhHXhQoQkXf 5LhV/V4DvWFkVjs/0ZEHpjMUFzvmd7NOYfR2lfTs7YPttMqYZOXBRzL3NH35l6nTUbMu H557qRyD3q8VnhP6WytXHzqQ2IJ6rJrP83ZIcCduWIr9dmjD7HUFl7TkxNgvBBGXhrzL Hd0zxJp6F/U4x891ogz3lcZp/Fy2KYXWmdr1VyR2mJRS1mJ/7An08l0rsX2PA4i7TK3a itYiVRzt4TVas4nXqOSEn453+2jLptyW9spEx7Z8yIHIZ2ruAUN8ZYWX0Nu0skI7sVBo q7dw== X-Forwarded-Encrypted: i=1; AJvYcCWMFK9ugMjhVa3dJLY/ojoeMvHJM/YCvDZsix2hSxbkdSyC4fKKKoEW8CBKs3eUQnEr3lSYmxllS5zPFIg=@vger.kernel.org X-Gm-Message-State: AOJu0YzLTa6JK/KeXcohTQ8z94jg4/dct7X+GcqrsV+yKVvxOz6ndutK wJaOR4H0Q0ReP3ecUEYt6uXn+ZH6vwEXAKtP2fieOYATA7GXcTl7je6gi2CyAYk= X-Gm-Gg: ASbGnculwCdTbt0OdiCQ87kZLVUmoEdm7j7C5405mzE/sDinTf1n07M4xq8fw8PkDdd IZYi1Wj/9JL3Cpsc2o6luV3V5ji8F+RM1P6oXT6fb6cHNq9i9HvccyA6bm2oqotRBrglrPg+JpB 9yn7vhYp7d5qbQgjCTQIhKTRXvz0zVxQCgkTqDCB0tm/csp5ozWKY2VTTd9LT5+GFg3k/mouIVW A6DvrdoNVIGFxsdXtuwgTn6C1AeElquhO68P4Dra6iPmceNExsO++yO13N/secErB7TDB/1P2dy 2sT5KurkaR/ob1o= X-Google-Smtp-Source: AGHT+IER5lnoQQdWEC7gbQxZosi7N0mw2aeYdLAS9LZW7bMxGiW32vvaQ3PYOO0pQsVEtwB2CIp8WQ== X-Received: by 2002:a05:6a20:7f8e:b0:1e0:ca95:3cb2 with SMTP id adf61e73a8af0-1e1653a7c1fmr8526385637.8.1733310640114; Wed, 04 Dec 2024 03:10:40 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.10.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:10:38 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 03/11] mm: introduce zap_nonpresent_ptes() Date: Wed, 4 Dec 2024 19:09:43 +0800 Message-Id: <009ca882036d9c7a9f815489cfeafe0bdb79d62d.1733305182.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Similar to zap_present_ptes(), let's introduce zap_nonpresent_ptes() to handle non-present ptes, which can improve code readability. No functional change. Signed-off-by: Qi Zheng Reviewed-by: Jann Horn Acked-by: David Hildenbrand --- mm/memory.c | 136 ++++++++++++++++++++++++++++------------------------ 1 file changed, 73 insertions(+), 63 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index d5a1b0a6bf1fa..5624c22bb03cf 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1587,6 +1587,76 @@ static inline int zap_present_ptes(struct mmu_gather= *tlb, return 1; } =20 +static inline int zap_nonpresent_ptes(struct mmu_gather *tlb, + struct vm_area_struct *vma, pte_t *pte, pte_t ptent, + unsigned int max_nr, unsigned long addr, + struct zap_details *details, int *rss) +{ + swp_entry_t entry; + int nr =3D 1; + + entry =3D pte_to_swp_entry(ptent); + if (is_device_private_entry(entry) || + is_device_exclusive_entry(entry)) { + struct page *page =3D pfn_swap_entry_to_page(entry); + struct folio *folio =3D page_folio(page); + + if (unlikely(!should_zap_folio(details, folio))) + return 1; + /* + * Both device private/exclusive mappings should only + * work with anonymous page so far, so we don't need to + * consider uffd-wp bit when zap. For more information, + * see zap_install_uffd_wp_if_needed(). + */ + WARN_ON_ONCE(!vma_is_anonymous(vma)); + rss[mm_counter(folio)]--; + if (is_device_private_entry(entry)) + folio_remove_rmap_pte(folio, page, vma); + folio_put(folio); + } else if (!non_swap_entry(entry)) { + /* Genuine swap entries, hence a private anon pages */ + if (!should_zap_cows(details)) + return 1; + + nr =3D swap_pte_batch(pte, max_nr, ptent); + rss[MM_SWAPENTS] -=3D nr; + free_swap_and_cache_nr(entry, nr); + } else if (is_migration_entry(entry)) { + struct folio *folio =3D pfn_swap_entry_folio(entry); + + if (!should_zap_folio(details, folio)) + return 1; + rss[mm_counter(folio)]--; + } else if (pte_marker_entry_uffd_wp(entry)) { + /* + * For anon: always drop the marker; for file: only + * drop the marker if explicitly requested. + */ + if (!vma_is_anonymous(vma) && !zap_drop_markers(details)) + return 1; + } else if (is_guard_swp_entry(entry)) { + /* + * Ordinary zapping should not remove guard PTE + * markers. Only do so if we should remove PTE markers + * in general. + */ + if (!zap_drop_markers(details)) + return 1; + } else if (is_hwpoison_entry(entry) || is_poisoned_swp_entry(entry)) { + if (!should_zap_cows(details)) + return 1; + } else { + /* We should have covered all the swap entry types */ + pr_alert("unrecognized swap entry 0x%lx\n", entry.val); + WARN_ON_ONCE(1); + } + clear_not_present_full_ptes(vma->vm_mm, addr, pte, nr, tlb->fullmm); + zap_install_uffd_wp_if_needed(vma, addr, pte, nr, details, ptent); + + return nr; +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1598,7 +1668,6 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, spinlock_t *ptl; pte_t *start_pte; pte_t *pte; - swp_entry_t entry; int nr; =20 tlb_change_page_size(tlb, PAGE_SIZE); @@ -1611,8 +1680,6 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, arch_enter_lazy_mmu_mode(); do { pte_t ptent =3D ptep_get(pte); - struct folio *folio; - struct page *page; int max_nr; =20 nr =3D 1; @@ -1622,8 +1689,8 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, if (need_resched()) break; =20 + max_nr =3D (end - addr) / PAGE_SIZE; if (pte_present(ptent)) { - max_nr =3D (end - addr) / PAGE_SIZE; nr =3D zap_present_ptes(tlb, vma, pte, ptent, max_nr, addr, details, rss, &force_flush, &force_break); @@ -1631,67 +1698,10 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, addr +=3D nr * PAGE_SIZE; break; } - continue; - } - - entry =3D pte_to_swp_entry(ptent); - if (is_device_private_entry(entry) || - is_device_exclusive_entry(entry)) { - page =3D pfn_swap_entry_to_page(entry); - folio =3D page_folio(page); - if (unlikely(!should_zap_folio(details, folio))) - continue; - /* - * Both device private/exclusive mappings should only - * work with anonymous page so far, so we don't need to - * consider uffd-wp bit when zap. For more information, - * see zap_install_uffd_wp_if_needed(). - */ - WARN_ON_ONCE(!vma_is_anonymous(vma)); - rss[mm_counter(folio)]--; - if (is_device_private_entry(entry)) - folio_remove_rmap_pte(folio, page, vma); - folio_put(folio); - } else if (!non_swap_entry(entry)) { - max_nr =3D (end - addr) / PAGE_SIZE; - nr =3D swap_pte_batch(pte, max_nr, ptent); - /* Genuine swap entries, hence a private anon pages */ - if (!should_zap_cows(details)) - continue; - rss[MM_SWAPENTS] -=3D nr; - free_swap_and_cache_nr(entry, nr); - } else if (is_migration_entry(entry)) { - folio =3D pfn_swap_entry_folio(entry); - if (!should_zap_folio(details, folio)) - continue; - rss[mm_counter(folio)]--; - } else if (pte_marker_entry_uffd_wp(entry)) { - /* - * For anon: always drop the marker; for file: only - * drop the marker if explicitly requested. - */ - if (!vma_is_anonymous(vma) && - !zap_drop_markers(details)) - continue; - } else if (is_guard_swp_entry(entry)) { - /* - * Ordinary zapping should not remove guard PTE - * markers. Only do so if we should remove PTE markers - * in general. - */ - if (!zap_drop_markers(details)) - continue; - } else if (is_hwpoison_entry(entry) || - is_poisoned_swp_entry(entry)) { - if (!should_zap_cows(details)) - continue; } else { - /* We should have covered all the swap entry types */ - pr_alert("unrecognized swap entry 0x%lx\n", entry.val); - WARN_ON_ONCE(1); + nr =3D zap_nonpresent_ptes(tlb, vma, pte, ptent, max_nr, + addr, details, rss); } - clear_not_present_full_ptes(mm, addr, pte, nr, tlb->fullmm); - zap_install_uffd_wp_if_needed(vma, addr, pte, nr, details, ptent); } while (pte +=3D nr, addr +=3D PAGE_SIZE * nr, addr !=3D end); =20 add_mm_rss_vec(mm, rss); --=20 2.20.1 From nobody Sat Dec 13 22:52:59 2025 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 792A21B21B3 for ; Wed, 4 Dec 2024 11:10:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310650; cv=none; b=hYvcQdS3Mm5XJ+iSV9CxlDFhSZvkjSOUdyjEp22k2epX5+9IlDO1EItnd9zFKC1Oarf/kDultZ83LE2xxX1S2I0TsUSmRShn0h/9USXYcDNNRpCmU1A8b8IlPUBNzdQ4sLbtkJSr8aXGxmv2vwd/wjAFqVE9gDT08m1rlGxbO5w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310650; c=relaxed/simple; bh=6cazHoRYcol5ohvX1jpEboV34t7h7jYizQ5+Hl/hvx0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ry/nz0cpTipEVEcKsDSH0gAfwxslQccc+aRs7XeiQPx7UOea+/oanZ+1K/FUhz7VTE8WJaheBPKf0H7CkUhrH504027yujS2gX9odb74OvvC+NCX5KlsHzUWG+60T9hPkXrRwAP7ws0vk4xT3bC8q8c1GLGbdQgaEl36nkDOL9M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=lvIqLbyk; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="lvIqLbyk" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-20cf3e36a76so61931145ad.0 for ; Wed, 04 Dec 2024 03:10:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310649; x=1733915449; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=J2X5SMV19Fsn2OjJZllhusee1Lj+v0wZR5QDlA7SZpw=; b=lvIqLbykAAfCTrZHQyTbEyVsI5x53/pPILhRn4Q0O1h5f2UMS2fV3G5S+iCBUJvtIH KieYxYmP2cdUAdV/xFLd2eM6dpSa4I7lszXhTrL2qi0ZQy95YbFHMilTMUmWjarBrQFI zthPo5JtBR+Z/sT5MVHdoj83pxhK9ti88R5Vwmc2yz01L0MF2FAvgrwGugeHtl/if9HT px/ZSs6kMwgT2atCMEpt7K7lcq2nvdYPW5HT/rT/McGBvFTifDNYvxFHHwZBdHKeJgrd +lFUxKdYb2QbtJVjYtjA5iqQORAlVE1ugVhsdcBG60JmxSbzKx42d63arUMLFyvkSBso MlUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310649; x=1733915449; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=J2X5SMV19Fsn2OjJZllhusee1Lj+v0wZR5QDlA7SZpw=; b=umP32LOcqLeaMkeenocMi/CjX2rzr1ZrCohjDryS4nLrzaTVFZJ9ud2EtN8tkx8PF4 NjlFn9mVpZ0/8cQDOW4Owz/jDmQKmlKqnpJGYbPnSCnIikC12IlJ7i38nhiHJEo68B3J c018a7ffyBHxmgXIdr+LioykxdhmQLu2Ko0FLFbOrY4ywfPYm4P15p2rnB5N0nRj+he4 yWe+UgpkiXOwJA/+z0GLi3oKNm0vJQp4kDe22msieYeeV+3SUX29bLth7cNhQxj227A6 qr0gcGfehO6oP5OqAk9KuwnEO8lsAVoumNSTtcv2lfX53pVWJzzArW9Omsr/8hZxxLbs EF4w== X-Forwarded-Encrypted: i=1; AJvYcCVi6YagOemQ08GGkvlf0tR2/UkMPTl+DLBM3FWgZAL1jL5gyRkZEe2t3r92mJBfHCriZDRu5q1O1YCZKgA=@vger.kernel.org X-Gm-Message-State: AOJu0Yxiuycjqy+FLEfIEQhmsauspYQlteg81Z7orMd7i98atnc5/rJx Ey0uSX7rJBJ8Ku6SakyejBYQ0L0vvy7RK7lrd3GE7WZoP7SRviJTYBYaqy87i8Y= X-Gm-Gg: ASbGncteIS9KoipSTC5kzlWX0HpkCxjsLdj1rLWjOrNqTNvfw+ra9ynamkA5DoXd3Sx kFnOZXfHvRMJVP0AhsgB2QqW5fTFLAmHQH1ZxGe33RkmHC6DSNLCYYTicUzj+xo9roPMJBrkwg2 ml/wl9urbOzP0JLaxNsAP1Qw1j1x46Fp22RTMPtXlOQc2Hy8fHDgO8lDE2N7uK0B/t4R5cT8J1q X7GVyjcNbNWV9n2BcKwPkaIX67qzfNQNoSSAnTCn8h2Kza/wknv/y+3XqJkBliITNOVkOrmicVC 6a52ywWETagmFk0= X-Google-Smtp-Source: AGHT+IH7KMftfdbT7IJ/SYECv1Z0b6kIRJQ0pshXMsGiPcHLBFGh7gTwpZlD1z3c4LTYAe+5oKH33A== X-Received: by 2002:a17:902:d2c7:b0:215:b18d:e1 with SMTP id d9443c01a7336-215bcfc2ab1mr102416785ad.24.1733310648745; Wed, 04 Dec 2024 03:10:48 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.10.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:10:47 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 04/11] mm: introduce do_zap_pte_range() Date: Wed, 4 Dec 2024 19:09:44 +0800 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This commit introduces do_zap_pte_range() to actually zap the PTEs, which will help improve code readability and facilitate secondary checking of the processed PTEs in the future. No functional change. Signed-off-by: Qi Zheng Reviewed-by: Jann Horn Acked-by: David Hildenbrand --- mm/memory.c | 45 ++++++++++++++++++++++++++------------------- 1 file changed, 26 insertions(+), 19 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 5624c22bb03cf..abe07e6bdd1bb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1657,6 +1657,27 @@ static inline int zap_nonpresent_ptes(struct mmu_gat= her *tlb, return nr; } =20 +static inline int do_zap_pte_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, pte_t *pte, + unsigned long addr, unsigned long end, + struct zap_details *details, int *rss, + bool *force_flush, bool *force_break) +{ + pte_t ptent =3D ptep_get(pte); + int max_nr =3D (end - addr) / PAGE_SIZE; + + if (pte_none(ptent)) + return 1; + + if (pte_present(ptent)) + return zap_present_ptes(tlb, vma, pte, ptent, max_nr, + addr, details, rss, force_flush, + force_break); + + return zap_nonpresent_ptes(tlb, vma, pte, ptent, max_nr, addr, + details, rss); +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1679,28 +1700,14 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); do { - pte_t ptent =3D ptep_get(pte); - int max_nr; - - nr =3D 1; - if (pte_none(ptent)) - continue; - if (need_resched()) break; =20 - max_nr =3D (end - addr) / PAGE_SIZE; - if (pte_present(ptent)) { - nr =3D zap_present_ptes(tlb, vma, pte, ptent, max_nr, - addr, details, rss, &force_flush, - &force_break); - if (unlikely(force_break)) { - addr +=3D nr * PAGE_SIZE; - break; - } - } else { - nr =3D zap_nonpresent_ptes(tlb, vma, pte, ptent, max_nr, - addr, details, rss); + nr =3D do_zap_pte_range(tlb, vma, pte, addr, end, details, rss, + &force_flush, &force_break); + if (unlikely(force_break)) { + addr +=3D nr * PAGE_SIZE; + break; } } while (pte +=3D nr, addr +=3D PAGE_SIZE * nr, addr !=3D end); =20 --=20 2.20.1 From nobody Sat Dec 13 22:52:59 2025 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 551F81B652B for ; Wed, 4 Dec 2024 11:10:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310658; cv=none; b=UVLy2x5AFJEBwqES1rDsgxmGYj/gBjk6an6/XYjqWXAQRgH4YOBsSLKXgyxo1Cv15dZ58Lp50ZrKM77GHO98XqxGzI705WGZHdw3q7znpQ1GLWnsiniLq9FREFjrma20I2W/5mkCf3YBlgijggn5bu9TybrH7yWhOQSihTk/q/g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310658; c=relaxed/simple; bh=N971sMAHwsLpjF7GuEYy3BPBTJ8uwywgaL1ODtjcCkk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=De003/9mtLgkgdx0BsMAUZmTuVduGbw699Ham1D3JZD+2ThshwmDWbIW+ccFaHdPBxVLecd1/LJjtZaZcueKcdl+XAiboQLvqPc7O7pfSW2eT4I4mpi3HMTfr9CMZ8J/23oGXOiKW/nnn/B6PgYmNY53WnSYTXxNgRZQIpFU5Ns= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=bT/DB+mw; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="bT/DB+mw" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-215c4000c20so16423915ad.3 for ; Wed, 04 Dec 2024 03:10:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310656; x=1733915456; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QkuROlRprAHSTBsp8RZw3I2mnycKFd5DDkO/yWw4n1g=; b=bT/DB+mw6XoyNhu8XH88IskK6h2UiQl5YFwBj7PIIuYpxlHABSYxH29UBnAltxdO54 h5pTLOkdpRLaseVpgKGRnAiXbLuqyQnYz0zBIhQ3tZnvMy4938ipojnqQdPC0HMLbtmq KtkHnk7B1mTEbunmhDsEXC1kKoNFmbV4urGB19Xd9Ysyk6hPI45IIzHY+Qn05lqDu/J/ l3g4AOnd8ZR04ujrzuwb0AdwJ6t5E4FNJo5pZF1QNo+4SRznaKHzB/PbMq8OMh+zzY94 I2YmSKTPcTrCZWVDDEDrFzG20e8H+MkVM9rrYDSnWk8hBFdg57dxN/f652pFTcq1h3iM 6dgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310656; x=1733915456; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QkuROlRprAHSTBsp8RZw3I2mnycKFd5DDkO/yWw4n1g=; b=Syvuay5ZBRhXPXn5n0MwzLiXFqYK0m6IpD5UNibySKOh3JN+z5oFqHDBvbOfU9IIEL 3ZxSx1OCKa0A4qqi/NTRpKjdR00NnV7WWhSfVuCn3Z8kOq3u0hhPO6qgcNfFVUZ55lKt jxYJKeSEvRhaLFbJD23ppms5vMQYH8D/pm/41MyG2ZqONOGUSDVQp2rc4RIAOY787dqJ a+vWqGtLGldvWB7n1DtlbMRtIE30ZiAxFU7ZD4IiZbPUGQd152r+FUJKasotiWm63zTp BbZnziuGuVHB94XHnoBVs9EddA8JS1ziwdNNBbBJVLeBHHxTa8JASFfxYf+pUEDmrMii q2GQ== X-Forwarded-Encrypted: i=1; AJvYcCUN5gJKTWN+XmH+U+DOP1A81NypV1Cot9bNbeweYPQ21i1AeShPRTFLtCmYfui/oh+YZf1RnNR/whNAxtY=@vger.kernel.org X-Gm-Message-State: AOJu0YwSAMOaY0hUw1mIJUsJFUuS+AG+o1/EDvge2NwFiJiUylmTafbr HXv6CiTnJx0CkbnKOcTcTwlKZYbrSVZMBhqJPTL8KeyvTjhYEmjSSwxb99JLbf4= X-Gm-Gg: ASbGncsTu4uXWucqzgMVtAry7RAJDzwdlnXdK5Jniai55JnoXSiPcwlNUSgCvDbOxyX mchm7d7aOvfQ5QBNW99c74YDAAbDq+fp0gc/td8MFhx0T/w8OyyreTNMtvMhPgM7yLh+k/wbjxR kpAoQEUYw6p6Pd2+lMTsUpB/cRojANDrKQ/Ao0g1mqro1nEZpitR1rtZQ5aW6NUAUZaRgcFNXst adp6jyZarioZOGelkMiPTHJ7lvPkaXBaezoNNMZAlWQD3TJoTIQkEMoBikZf1NjdRN8riVOEQOr fhNPYm5iCRBy9I4= X-Google-Smtp-Source: AGHT+IEu/5U9u+MTuBZhWK2lULLFn2ChhXdgNwcwoaBBrqVb6T6ZJFfs9ziJlEdo4kgFVcC+MHm1Dw== X-Received: by 2002:a17:902:dac7:b0:215:72aa:693f with SMTP id d9443c01a7336-215bd1b476emr75201455ad.9.1733310656657; Wed, 04 Dec 2024 03:10:56 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.10.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:10:55 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 05/11] mm: skip over all consecutive none ptes in do_zap_pte_range() Date: Wed, 4 Dec 2024 19:09:45 +0800 Message-Id: <8ecffbf990afd1c8ccc195a2ec321d55f0923908.1733305182.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Skip over all consecutive none ptes in do_zap_pte_range(), which helps optimize away need_resched() + force_break + incremental pte/addr increments etc. Suggested-by: David Hildenbrand Signed-off-by: Qi Zheng --- mm/memory.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index abe07e6bdd1bb..7f8869a22b57c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1665,17 +1665,30 @@ static inline int do_zap_pte_range(struct mmu_gathe= r *tlb, { pte_t ptent =3D ptep_get(pte); int max_nr =3D (end - addr) / PAGE_SIZE; + int nr =3D 0; =20 - if (pte_none(ptent)) - return 1; + /* Skip all consecutive none ptes */ + if (pte_none(ptent)) { + for (nr =3D 1; nr < max_nr; nr++) { + ptent =3D ptep_get(pte + nr); + if (!pte_none(ptent)) + break; + } + max_nr -=3D nr; + if (!max_nr) + return nr; + pte +=3D nr; + addr +=3D nr * PAGE_SIZE; + } =20 if (pte_present(ptent)) - return zap_present_ptes(tlb, vma, pte, ptent, max_nr, - addr, details, rss, force_flush, - force_break); + nr +=3D zap_present_ptes(tlb, vma, pte, ptent, max_nr, addr, + details, rss, force_flush, force_break); + else + nr +=3D zap_nonpresent_ptes(tlb, vma, pte, ptent, max_nr, addr, + details, rss); =20 - return zap_nonpresent_ptes(tlb, vma, pte, ptent, max_nr, addr, - details, rss); + return nr; } =20 static unsigned long zap_pte_range(struct mmu_gather *tlb, --=20 2.20.1 From nobody Sat Dec 13 22:52:59 2025 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 311B81B21B9 for ; Wed, 4 Dec 2024 11:11:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310666; cv=none; b=tiEMxWJaqwm1d3HmMOVCBTWo3E3XUAzAxN2CcQjWx9dxyotjUelKaXMTaMHQ1ANZvnSR5sYM+tDvD0qhhY6rxOxx6WSTaCWksmF7Ro7znF7fuxCaqTPhH48BUW4uX3mK1seOKvQFsW8NuHjRo1lO5S8PkLCjnFN7DjTscv2KQv8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310666; c=relaxed/simple; bh=++EmkIKLFQMSIfBYBqEt6AEoPkuK3CGOfEOambsCph0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LinA4A5EhCZcTGWV6dhqRFKO5gn/+v766jxC6qsXKKk2CyM+B9qEo6z9SGY72WddFPYpPY45lMdD0Evr56WJC3iUeII1iSB4mCcUlOgT+m2PU60BGAAVY9nU0Ndk4kpVAVG06r4D/k93cVaARlnTBqER8hwWMr2fLis4iZOag/I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=fd8dYFhF; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="fd8dYFhF" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-215d92702feso5004155ad.1 for ; Wed, 04 Dec 2024 03:11:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310664; x=1733915464; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YBcCGE4XlG5U77pGfU9yv3mqGKsFB2BwbFt8OgUUtNY=; b=fd8dYFhFsOmqXAdGANS92uMrZuoKPDJXJvPu36DOteo9/vL4Ypald5b2sUc3qvjZ/i B+9cSmZ1uUudfGxfFFIrdiTb3F+7n2pqt6vszYqgSIocmC6lu88eSUyBUy8Glm35HTaa pMYgSqEzU+p1Pz3cooFKiRl8gulOO0os7/QQeHx/GJh/WmLi1W/FLBWE9/7Wsp5Q1D7/ X8gixtrzREI7zUk2yXN2fZ3so3EcYFXn/7Q3PlusbpbeMW7wdUC6Y78IQv7s+XPnaPJi M3l/tuM4xBq9xp+MGDrt/crXEZxYS+RZkpkoGYBJaCTb0VpRjsncleFLeeQvbXqH8b3f EeiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310664; x=1733915464; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YBcCGE4XlG5U77pGfU9yv3mqGKsFB2BwbFt8OgUUtNY=; b=K6t/M8UwkLVO6AuuHb8PpnLzO/r8P8PMN/2w5U4Rf8OP9jEjG5V2FlWrGIooIKocqw FR7z4mqXweEkrScj7D4EpFAHQX7PyIf+bbZ/jBnmhn8Yz86Ha1BS/Fbc4UTDhycGP60q +2qn9tUZs+Z90T9j5bO3UOLmxGoGCWhZNWf66zUwlPhR5pr2FbO+q18TTJKMVG4Rk4O8 w12LbCGzcwXceXfAhG5d3lzS5qnbtxKtSwVRvKArNccSmZ4lDoYLstN1LVv0oWl1uL9a h0wBuZQJitDMntzhXxdCsNWkZTbcQ3gjXvsGiRIyBiK41luZW/810yhH4r8XIWVjsRTo OAZg== X-Forwarded-Encrypted: i=1; AJvYcCUUwSXnMzjSMgF3K5qQshNT1YydSOUZHJ3SfxyQz/7SntYUKPaxD1dOsMEyaOZhn+UigB4WYVelCbDuR6k=@vger.kernel.org X-Gm-Message-State: AOJu0Yyur2GyQIbsBjatHCynS6XetNL0S05WpqckQtMQHlmrFit6JeOx jekO0Ax552y+6+tymPFY7KvtSw+3CZi0gUk4vjOti/3kDHxmvltUKd2CNP69x74= X-Gm-Gg: ASbGncvmUUde70s/xrktowhx2fme4aJEruohjutYGC9wZXQasH0nJ8bQT6c8U2vWbve r0vVFTH8mdk/mO83IP3p95QehoMWgQf7BnzD8WqByOAhT7vSeULP7uRkCHT49MFMAMnjR7jPCer cBHW3EHvyfjodF+Q/J6kY2u8Xmrx+XwRSZJnhaz+J5DOJxhRZiLLAg2ak0zYrDYhAnR8lGenIpF R2SjmhKcjWuIi+wxRYtNtzo2nL7ek4zokAYKs9AuH/V86g5zBrGsAqeyU8GGPC9cOE53Re+FqrY hnjUq+ZZDjoQCZo= X-Google-Smtp-Source: AGHT+IEq78EZW9DPKE8ZtBXaTwCJuq93zz+P96KIjyqh8unYy/RsaRnFkAa1Daiqvh42R9qprpb8TQ== X-Received: by 2002:a17:902:f687:b0:215:5a53:edee with SMTP id d9443c01a7336-215bcfc52f9mr64931185ad.9.1733310664497; Wed, 04 Dec 2024 03:11:04 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.10.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:11:03 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 06/11] mm: zap_install_uffd_wp_if_needed: return whether uffd-wp pte has been re-installed Date: Wed, 4 Dec 2024 19:09:46 +0800 Message-Id: <9d4516554724eda87d6576468042a1741c475413.1733305182.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In some cases, we'll replace the none pte with an uffd-wp swap special pte marker when necessary. Let's expose this information to the caller through the return value, so that subsequent commits can use this information to detect whether the PTE page is empty. Signed-off-by: Qi Zheng --- include/linux/mm_inline.h | 11 +++++++---- mm/memory.c | 16 ++++++++++++---- 2 files changed, 19 insertions(+), 8 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 1b6a917fffa4b..34e5097182a02 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -564,9 +564,9 @@ static inline pte_marker copy_pte_marker( * Must be called with pgtable lock held so that no thread will see the no= ne * pte, and if they see it, they'll fault and serialize at the pgtable loc= k. * - * This function is a no-op if PTE_MARKER_UFFD_WP is not enabled. + * Returns true if an uffd-wp pte was installed, false otherwise. */ -static inline void +static inline bool pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long ad= dr, pte_t *pte, pte_t pteval) { @@ -583,7 +583,7 @@ pte_install_uffd_wp_if_needed(struct vm_area_struct *vm= a, unsigned long addr, * with a swap pte. There's no way of leaking the bit. */ if (vma_is_anonymous(vma) || !userfaultfd_wp(vma)) - return; + return false; =20 /* A uffd-wp wr-protected normal pte */ if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval))) @@ -596,10 +596,13 @@ pte_install_uffd_wp_if_needed(struct vm_area_struct *= vma, unsigned long addr, if (unlikely(pte_swp_uffd_wp_any(pteval))) arm_uffd_pte =3D true; =20 - if (unlikely(arm_uffd_pte)) + if (unlikely(arm_uffd_pte)) { set_pte_at(vma->vm_mm, addr, pte, make_pte_marker(PTE_MARKER_UFFD_WP)); + return true; + } #endif + return false; } =20 static inline bool vma_has_recency(struct vm_area_struct *vma) diff --git a/mm/memory.c b/mm/memory.c index 7f8869a22b57c..1f149bc2c0586 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1466,27 +1466,35 @@ static inline bool zap_drop_markers(struct zap_deta= ils *details) /* * This function makes sure that we'll replace the none pte with an uffd-wp * swap special pte marker when necessary. Must be with the pgtable lock h= eld. + * + * Returns true if uffd-wp ptes was installed, false otherwise. */ -static inline void +static inline bool zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr, pte_t *pte, int nr, struct zap_details *details, pte_t pteval) { + bool was_installed =3D false; + +#ifdef CONFIG_PTE_MARKER_UFFD_WP /* Zap on anonymous always means dropping everything */ if (vma_is_anonymous(vma)) - return; + return false; =20 if (zap_drop_markers(details)) - return; + return false; =20 for (;;) { /* the PFN in the PTE is irrelevant. */ - pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); + if (pte_install_uffd_wp_if_needed(vma, addr, pte, pteval)) + was_installed =3D true; if (--nr =3D=3D 0) break; pte++; addr +=3D PAGE_SIZE; } +#endif + return was_installed; } =20 static __always_inline void zap_present_folio_ptes(struct mmu_gather *tlb, --=20 2.20.1 From nobody Sat Dec 13 22:52:59 2025 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 457161B21B9 for ; Wed, 4 Dec 2024 11:11:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310674; cv=none; b=Jm630/+5ycGTP0PM52GiItvSaekk+OvQebLaq2/9Sg0xl3/uEEtteXzOcRMJFG/ogYGQRbQCQhkqXyGA/6lcMYd76YVHGiBgj++INZr1mtEFIxbXBMDFZly2/Q2KGZl0V4+63IpR3ZfiUCTSbHoEVPGFWoeFTQvz6RAIY3Ea+8g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310674; c=relaxed/simple; bh=wiCDcOADgZY8H+pYsTtIJYWHM/qJJDm87JLvD+UEQDc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Sgg3xSoxmDMNIgunFdYChbtU9LWKsW7SdkNjUBEKY4eqgnw5+4pi02Mc7In6a+KByXj6RClcIuOVOY/elaWlKnk4hDXg+xb9YFE3ZN8PnPdtzMws1vJJUYTdBKPGNItre1bKWt2bXNzO52RMCw3plXEyECl5G+lb1VDdYx0Cqv0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=aWWfKcp+; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="aWWfKcp+" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-21578cfad81so30814165ad.3 for ; Wed, 04 Dec 2024 03:11:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310672; x=1733915472; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/4hB6V6uA9TX2vZoO0lHgFVoZw/iaOacAaPzNOaZz/w=; b=aWWfKcp+E/sHzwnP+BqmVBwfvjS20XBFpFdpdepOcxiyLIwDkqhM+FAyQ0xBldwrKH F+GgYP4VTjS499Lklfg0TWkENjJ6kj7tdGang1V42yfdvVkc9FlRgY6ZjNHhVScirhue GB6s46iAflZMQFI4+sCsy/2QBC9Opjw7LyMbM71eweS/Q75kgUVX7r0y+WnAh8J3KHiy JP6B7BHWSG4S4jYsWHh50FX/GAjZC0NQ3ub+M7I7pL8FjDijRgO1+8YlEsaVdupKuFf/ xjSP3Kd6uNf9arJzl19J6jy6yxy35XdXHC0WUQxNNe8LRbWTRUcu2Bn0LOYpncs26NvL nUOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310672; x=1733915472; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/4hB6V6uA9TX2vZoO0lHgFVoZw/iaOacAaPzNOaZz/w=; b=OSTkRlUipZAwD89XIF+K2MIOndAhV3ImA9Oxh3rXFcpjR4lETd4u5urumB679uhAWo +zlHD80gnWtU/p6DLvsGRoYaz8/Q0siRvysIdYPoVVaCBT3ADAV/QAnzj9OgRLH5dl35 a81/ZvXJa4ss9UV4RNgXBxld1SHrOb8tFRvpJ3jXnyn2qwomNOQW1m6GMtU6FKJws4Ir OCW4yt5q7oGeCCUV0ZPl3cvhRwIk0QF531Ov4dpjb/eE2c4/ydjqYXCWMPXkhVE8EGLn PZAfqFf0DftIVjI34gCQJFi2zwlR/f94YmdQ3QdYCE5qQuyu1cesNVlEUFeEYuaZkHai JVYw== X-Forwarded-Encrypted: i=1; AJvYcCW9mm+6AFMgFz6mq54K/SlvxDtly28nNCWMTiegkMOIzJc8cZQw3RGGuTTWSMcE/Jm01XPExm9YVURoVPM=@vger.kernel.org X-Gm-Message-State: AOJu0Yxv+5jGdOtGn+jddRcF7iBO+m5l6T+meAdir/DQ+cLAdtcXnQI5 L8Dtc+O3+9a6boa31y/Xu5rlaEyQg6sKbILiyPfFlrhBDfwYbTNgeLCWxfgySlk= X-Gm-Gg: ASbGnctkqr2ebirJup+5jE7t7kMJtVQ9pKtNOMKqxjEIykX8AblIT4fyIHXxnqWPP2b iZnV+EREE/DGpTvcxJB4UEac+X96i3PNX+wGSo102DVqzF4qGCjDGEItlc7buxcb65W4ZuMcx4U BUErXv2KZutJ4j/cS7Tfe+5JVQpBuwBu3yOmyBLf4Gd9oH9NDbCbY11o43BmC40bREiWp/ODFMm EMVB4wHGLQI2Ieks97/ttjG7TjeFIumjR3WR3emDdtc99rJJDr+JotzvndeKmB+ABR2ED8USkAC wJEA40R3+dDBc68= X-Google-Smtp-Source: AGHT+IHVmRnYQumB1wXhU3np0xfxzAbGYvF0A99UdzOBbPzf6Gm/CK0T1MxPbJQR0FQZPPyKoGad2Q== X-Received: by 2002:a17:902:ce0c:b0:211:6b21:73d9 with SMTP id d9443c01a7336-215bd24146cmr87777885ad.37.1733310672514; Wed, 04 Dec 2024 03:11:12 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.11.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:11:11 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 07/11] mm: do_zap_pte_range: return any_skipped information to the caller Date: Wed, 4 Dec 2024 19:09:47 +0800 Message-Id: <59f33ec9f74e9f058ed319b0bfadd76b0f7adf9b.1733305182.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Let the caller of do_zap_pte_range() know whether we skip zap ptes or reinstall uffd-wp ptes through any_skipped parameter, so that subsequent commits can use this information in zap_pte_range() to detect whether the PTE page can be reclaimed. Signed-off-by: Qi Zheng --- mm/memory.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 1f149bc2c0586..fdefa551d1250 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1501,7 +1501,7 @@ static __always_inline void zap_present_folio_ptes(st= ruct mmu_gather *tlb, struct vm_area_struct *vma, struct folio *folio, struct page *page, pte_t *pte, pte_t ptent, unsigned int nr, unsigned long addr, struct zap_details *details, int *rss, - bool *force_flush, bool *force_break) + bool *force_flush, bool *force_break, bool *any_skipped) { struct mm_struct *mm =3D tlb->mm; bool delay_rmap =3D false; @@ -1527,8 +1527,8 @@ static __always_inline void zap_present_folio_ptes(st= ruct mmu_gather *tlb, arch_check_zapped_pte(vma, ptent); tlb_remove_tlb_entries(tlb, pte, nr, addr); if (unlikely(userfaultfd_pte_wp(vma, ptent))) - zap_install_uffd_wp_if_needed(vma, addr, pte, nr, details, - ptent); + *any_skipped =3D zap_install_uffd_wp_if_needed(vma, addr, pte, + nr, details, ptent); =20 if (!delay_rmap) { folio_remove_rmap_ptes(folio, page, nr, vma); @@ -1552,7 +1552,7 @@ static inline int zap_present_ptes(struct mmu_gather = *tlb, struct vm_area_struct *vma, pte_t *pte, pte_t ptent, unsigned int max_nr, unsigned long addr, struct zap_details *details, int *rss, bool *force_flush, - bool *force_break) + bool *force_break, bool *any_skipped) { const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; struct mm_struct *mm =3D tlb->mm; @@ -1567,15 +1567,17 @@ static inline int zap_present_ptes(struct mmu_gathe= r *tlb, arch_check_zapped_pte(vma, ptent); tlb_remove_tlb_entry(tlb, pte, addr); if (userfaultfd_pte_wp(vma, ptent)) - zap_install_uffd_wp_if_needed(vma, addr, pte, 1, - details, ptent); + *any_skipped =3D zap_install_uffd_wp_if_needed(vma, addr, + pte, 1, details, ptent); ksm_might_unmap_zero_page(mm, ptent); return 1; } =20 folio =3D page_folio(page); - if (unlikely(!should_zap_folio(details, folio))) + if (unlikely(!should_zap_folio(details, folio))) { + *any_skipped =3D true; return 1; + } =20 /* * Make sure that the common "small folio" case is as fast as possible @@ -1587,22 +1589,23 @@ static inline int zap_present_ptes(struct mmu_gathe= r *tlb, =20 zap_present_folio_ptes(tlb, vma, folio, page, pte, ptent, nr, addr, details, rss, force_flush, - force_break); + force_break, any_skipped); return nr; } zap_present_folio_ptes(tlb, vma, folio, page, pte, ptent, 1, addr, - details, rss, force_flush, force_break); + details, rss, force_flush, force_break, any_skipped); return 1; } =20 static inline int zap_nonpresent_ptes(struct mmu_gather *tlb, struct vm_area_struct *vma, pte_t *pte, pte_t ptent, unsigned int max_nr, unsigned long addr, - struct zap_details *details, int *rss) + struct zap_details *details, int *rss, bool *any_skipped) { swp_entry_t entry; int nr =3D 1; =20 + *any_skipped =3D true; entry =3D pte_to_swp_entry(ptent); if (is_device_private_entry(entry) || is_device_exclusive_entry(entry)) { @@ -1660,7 +1663,7 @@ static inline int zap_nonpresent_ptes(struct mmu_gath= er *tlb, WARN_ON_ONCE(1); } clear_not_present_full_ptes(vma->vm_mm, addr, pte, nr, tlb->fullmm); - zap_install_uffd_wp_if_needed(vma, addr, pte, nr, details, ptent); + *any_skipped =3D zap_install_uffd_wp_if_needed(vma, addr, pte, nr, detail= s, ptent); =20 return nr; } @@ -1669,7 +1672,8 @@ static inline int do_zap_pte_range(struct mmu_gather = *tlb, struct vm_area_struct *vma, pte_t *pte, unsigned long addr, unsigned long end, struct zap_details *details, int *rss, - bool *force_flush, bool *force_break) + bool *force_flush, bool *force_break, + bool *any_skipped) { pte_t ptent =3D ptep_get(pte); int max_nr =3D (end - addr) / PAGE_SIZE; @@ -1691,10 +1695,11 @@ static inline int do_zap_pte_range(struct mmu_gathe= r *tlb, =20 if (pte_present(ptent)) nr +=3D zap_present_ptes(tlb, vma, pte, ptent, max_nr, addr, - details, rss, force_flush, force_break); + details, rss, force_flush, force_break, + any_skipped); else nr +=3D zap_nonpresent_ptes(tlb, vma, pte, ptent, max_nr, addr, - details, rss); + details, rss, any_skipped); =20 return nr; } @@ -1705,6 +1710,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, struct zap_details *details) { bool force_flush =3D false, force_break =3D false; + bool any_skipped =3D false; struct mm_struct *mm =3D tlb->mm; int rss[NR_MM_COUNTERS]; spinlock_t *ptl; @@ -1725,7 +1731,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, break; =20 nr =3D do_zap_pte_range(tlb, vma, pte, addr, end, details, rss, - &force_flush, &force_break); + &force_flush, &force_break, &any_skipped); if (unlikely(force_break)) { addr +=3D nr * PAGE_SIZE; break; --=20 2.20.1 From nobody Sat Dec 13 22:52:59 2025 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF0E81B6D1B for ; Wed, 4 Dec 2024 11:11:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310682; cv=none; b=jq6YuM9Zg6t4Qv9z9BOJL7cbHOof2ZkAGQ/qls1ADKbrJT8JBa8loRC+vOQIWksEmccPt5UpXDWBZrWMH5r+ZEqLxFnjUN+yyyXfaKuOByDtkVqTsycqGGt1lFQJVXVIVftr99lJ7mS1RSxX+dLSdAoz5vJU9oNosiEJ8dMRCTo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310682; c=relaxed/simple; bh=1lQMPd0BbclJorppW+dGp8eaHmksS0ywZIvax3ZYyUs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=U95HSkni0ARgdXISR9NgZfEWDek/qRlKzSa5HIy6zfH4E/OBbNgi+4TI0WXN/uJvR/sYkPf2pRAPvvwKc5Cy3kFDw/tpTZnjIzbh/CHOiJMIMQEKbvAsXWT6fbvD5yeCe1VsYmv3A+N9K4ysBelYqbbbvrWL9//r/Sd/ga3ggQg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=bh6for3m; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="bh6for3m" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-215bb7bb9f9so16672485ad.2 for ; Wed, 04 Dec 2024 03:11:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310680; x=1733915480; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7pv7uitRReKN+1XEtzxbevTSt67jmAqoi+0p+H++uGg=; b=bh6for3mkxnhUw0u2Eho+xz9yz092xYy5H/46Nd/eTGt6DVza2U4T3jTWEkVLuoXlM mktqD8U1nxCQE7XZF+0kzJPuC4eE1CvFCDfczFQLvabc8i1exBJjbNOsP3eQ96TgE+Tg RJH5oqXQHY8jIXH1NZ1eeYHE0cEKYw3X0enDSEAAZ7dJLSv6hNueDI85jJq1ghwBd0Dl wNDwQO4QWuVTlFm7YEzNgpL5ZUcdV3OsdKfE5akeqJmsGF9+oy5lKkv3FkXbzByNAXNg 8a20wOQJZJeVx4oNUiBRue/99eVI+itYu3/YtzzSXvmfSrF/C3AQq+MqQ5BJvH3hQ2ym g7tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310680; x=1733915480; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7pv7uitRReKN+1XEtzxbevTSt67jmAqoi+0p+H++uGg=; b=JWAaLuquOhwPIQKHUqG8hQBLvxw69EByZeatPlcZfgMPVwcRn0LH8swkabgcA+5iYO t2vLiP0z7wCIkbMi2tqWpIKd1mJLcDGM9xY8wTJJM4v7cP38/iY7BTHFGGxNyLi6G7vO mLuTu5TbvL47Tt+12YNXUTmqQyW9/EjiN/skS4qLvbsJhshNmaNpFGqHDb9akRIniJjK Gaiij4r1/a27QoAUi84MNwIvUS3IOZhbaCQlHby0rn0pzEWD4B56DuxZjRjLUKuOuJqq c8BlhTDTBWlsXVFXjXZu8oFYU6NTiRbKrRsC/pWHsz4OlfBPdNkaFfHBRISjGkg1aFSb Sj4Q== X-Forwarded-Encrypted: i=1; AJvYcCVOefLGh35ft96oZN4vB5vkaza9vT2Bholy5LIIBmX3VhNuDG/1JkJII9Dhfxy/N8bZxP/knAjyOFYcqbk=@vger.kernel.org X-Gm-Message-State: AOJu0YwgW8497FoT4i/Muyp1DYGtr5sLqWDlU1j0Gyas4BParCOIim0e 8YhQ0ejnDeWuaDPxS9mT7LznEasGMp0OeVxesDXFZgqjJzwlUM+vTh/CJoi3xaYRfrMsUlSUpqV b X-Gm-Gg: ASbGncvRhcGnSGfNsBbBw91ZSgLlsvkKpay44Yf6M9ZqYOskcXmRcjXH7HelClP6XvM 4iWXsq6SQaCRzAPvSSUgfqQGpuPL48QMsS8ocC/5Lj0HYiVJ6pLVaZIV/VqaPR4GKgTX8q+Qk9D PdDH7ZY7TOc9nftlmZCHIGbFgmPXyhDX7n2jh7sbhy0u/65IZZVm9cZEmK8nt52C0LPM2+zP8bd 4hOsL90Z9/gqWShsv2P8/LNzjsnyHLSPFVo2nqjg8Bw3XZHXON1ho8p0yUPme8A7WH7sN+xkUG2 Dw5r3AzJ42sTL/A= X-Google-Smtp-Source: AGHT+IHbc9xNaqbbB+HahdZkvlcNVEtw+zco1U61JfjC0HrH8ZPfSYoGTbGPdGo4pnmKoLHl9UqICQ== X-Received: by 2002:a17:902:f712:b0:211:ff13:8652 with SMTP id d9443c01a7336-215d0050dfcmr58113365ad.28.1733310680194; Wed, 04 Dec 2024 03:11:20 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.11.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:11:19 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 08/11] mm: make zap_pte_range() handle full within-PMD range Date: Wed, 4 Dec 2024 19:09:48 +0800 Message-Id: <76c95ee641da7808cd66d642ab95841df4048295.1733305182.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation for reclaiming empty PTE pages, this commit first makes zap_pte_range() to handle the full within-PMD range, so that we can more easily detect and free PTE pages in this function in subsequent commits. Signed-off-by: Qi Zheng Reviewed-by: Jann Horn --- mm/memory.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index fdefa551d1250..36a59bea289d1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1718,6 +1718,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, pte_t *pte; int nr; =20 +retry: tlb_change_page_size(tlb, PAGE_SIZE); init_rss_vec(rss); start_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); @@ -1757,6 +1758,13 @@ static unsigned long zap_pte_range(struct mmu_gather= *tlb, if (force_flush) tlb_flush_mmu(tlb); =20 + if (addr !=3D end) { + cond_resched(); + force_flush =3D false; + force_break =3D false; + goto retry; + } + return addr; } =20 --=20 2.20.1 From nobody Sat Dec 13 22:52:59 2025 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B15F31B6D1B for ; Wed, 4 Dec 2024 11:11:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310690; cv=none; b=N9n0mJ1FLHnGvH+YgcS0xGVUG9RXvHtcGeTNcK46HInL9fpjs+J5egHRqO4KVgDhPXohYingEnH2sLyGRPu5PRp/ZXuBYpfDFB2SgRduFMghFpbA3m+MkPSQgjyPmWsh/PsKd84NngjpeKkOCCZ8AJEjW+FBx9Rg51bmmIoIgbw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310690; c=relaxed/simple; bh=zuWwq/UD6z+SpGWIzqBoOdqBvQAMZV2JNFy05FDkYmY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uNYunZ6DfTppF6Zw4jE7si2AmvOGRPTL06f67/J9G5CcqQ+ltuIpRxf5D14RjiIf15SJg9R1GkmqoFgufTmFWmbGJlcIcemke9/1XIfg3+CzIlugM+1RkxaGj6d1rTc0v0mfyYOU4qktuZ9SqgkDOv6UMNPUxY7wXvwdsGqfW3E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=N1MnfBuW; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="N1MnfBuW" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-215b0582aaeso20471075ad.3 for ; Wed, 04 Dec 2024 03:11:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310688; x=1733915488; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qoapVpNXl28ZG5UuJ7jnDD6Hy269eKhZbzIlX0dbMls=; b=N1MnfBuWGfgnSktBsZGV7xZ+35dpH5eUDcuc8yge8YGjG02h+TEgan6GdCIrEbyUjE 9ZsO0d5CJK7N2hczeRVJaqaxnVYIZElvVuU/Vc25VLfBaAxpBJ+lFKCdhur7O6fzFdiG G299oDEhOM56bfYnDlzH6XESy5hAWC+JxD4exqX+VFeSscuPaYZHDY70ovOOG/HGVUUc MScqyx001Bex7Lo8yE6BysUcrv0X+bFV59bFEh5o9Exr/3QvH674NK4UO3vRIJgJNE3x r8clIlelEg7umK0yxG0qNFxO7mnuV32s+uStucTAfYwQZ5kDRPw7PqIkCE0Q3exJ5mW7 AsOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310688; x=1733915488; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qoapVpNXl28ZG5UuJ7jnDD6Hy269eKhZbzIlX0dbMls=; b=YrFX+I9Cw3dPe8jKHgIB+uuOawq6El7d74YCbnyFDeIAgAQ9T30JiLe4xHfK0SXK5C q8opcPmbtAfefEcXDumvkwLbkFWqfeSF5LQjy4wgHGTvSN1hvjMGfqQx49M0dCiiip6f gxubrmf3AQIkkLuZEJNPbdMWrjDNKf0NXDGxLspQ1BsXrAP22fLJFFrSX55d5FkFoPCG xsuJNHwyveb50zOscSn5/1omX+kMbcXiC9id7A+19d8lBPrj+8hd1502Vt6nDd8kV5RG 3r70I1MW9XavwN9zBSO54HrbH4cNFvThAXiaaPix0f5q8ZjZy80/rKWKSYwHRV94cIWh 7yPw== X-Forwarded-Encrypted: i=1; AJvYcCU0yum6I2thZ37ck5iDIpGdw5Im7SB40KaaExDeyrwEB2FpDdLFtJUwaeaMY6n4p5gwUxpZKh3fm3cfruI=@vger.kernel.org X-Gm-Message-State: AOJu0YyjjnshTsJnzu8A21c4M8p4fJF+h1ojhk7+GvdkEzluIkB5wefa EFR24sxMWDldqwTbZSWffGfUamJ+hY/8s5N3zcErWXr8c+hafnaMuZHhdJXQhIc= X-Gm-Gg: ASbGncsDRusOc4UXJxjR+Gq+fgZ2D9myvp3Rz5/wyPQ6Iq9jFvpmkL6t1Z5mhluGBHx BIbTLEDbNscen+lisiYDylPHYyBkPqFfyD5zv0wk605kN4XTEMPP9dCseKjrckCBuKv9o4LPIBf smAdz7v2wsXYRrJTuAqSm85Il/TisVkdqglI7x489ueCAPffjVoo5hlAiU1mRvz/utsL1OFs1gI II1RWhnmDOTQPwcgq2s3mLpDlnuoZx/IspCBO9XF/Oi2Mfk47+uqIuAkWiZOCDQdta0t5FVhthu sNBmbJgs9kgsmbw= X-Google-Smtp-Source: AGHT+IEfEQGVP8Gt1483n1zkgrWVjfWQwHKtLl/KnEOEh9tvsaCxkNv7R/GEU8g81pmpLtGbqUrK9w== X-Received: by 2002:a17:903:2442:b0:215:a98c:aabb with SMTP id d9443c01a7336-215bd0e66a5mr90103075ad.24.1733310688006; Wed, 04 Dec 2024 03:11:28 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.11.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:11:27 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 09/11] mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) Date: Wed, 4 Dec 2024 19:09:49 +0800 Message-Id: <92aba2b319a734913f18ba41e7d86a265f0b84e2.1733305182.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now in order to pursue high performance, applications mostly use some high-performance user-mode memory allocators, such as jemalloc or tcmalloc. These memory allocators use madvise(MADV_DONTNEED or MADV_FREE) to release physical memory, but neither MADV_DONTNEED nor MADV_FREE will release page table memory, which may cause huge page table memory usage. The following are a memory usage snapshot of one process which actually happened on our server: VIRT: 55t RES: 590g VmPTE: 110g In this case, most of the page table entries are empty. For such a PTE page where all entries are empty, we can actually free it back to the system for others to use. As a first step, this commit aims to synchronously free the empty PTE pages in madvise(MADV_DONTNEED) case. We will detect and free empty PTE pages in zap_pte_range(), and will add zap_details.reclaim_pt to exclude cases other than madvise(MADV_DONTNEED). Once an empty PTE is detected, we first try to hold the pmd lock within the pte lock. If successful, we clear the pmd entry directly (fast path). Otherwise, we wait until the pte lock is released, then re-hold the pmd and pte locks and loop PTRS_PER_PTE times to check pte_none() to re-detect whether the PTE page is empty and free it (slow path). For other cases such as madvise(MADV_FREE), consider scanning and freeing empty PTE pages asynchronously in the future. The following code snippet can show the effect of optimization: mmap 50G while (1) { for (; i < 1024 * 25; i++) { touch 2M memory madvise MADV_DONTNEED 2M } } As we can see, the memory usage of VmPTE is reduced: before after VIRT 50.0 GB 50.0 GB RES 3.1 MB 3.1 MB VmPTE 102640 KB 240 KB Signed-off-by: Qi Zheng --- include/linux/mm.h | 1 + mm/Kconfig | 15 ++++++++++ mm/Makefile | 1 + mm/internal.h | 19 +++++++++++++ mm/madvise.c | 7 ++++- mm/memory.c | 21 ++++++++++++-- mm/pt_reclaim.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 132 insertions(+), 3 deletions(-) create mode 100644 mm/pt_reclaim.c diff --git a/include/linux/mm.h b/include/linux/mm.h index 12fb3b9334269..8f3c824ee5a77 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2319,6 +2319,7 @@ extern void pagefault_out_of_memory(void); struct zap_details { struct folio *single_folio; /* Locked folio to be unmapped */ bool even_cows; /* Zap COWed private pages too? */ + bool reclaim_pt; /* Need reclaim page tables? */ zap_flags_t zap_flags; /* Extra flags for zapping */ }; =20 diff --git a/mm/Kconfig b/mm/Kconfig index 84000b0168086..7949ab121070f 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1301,6 +1301,21 @@ config ARCH_HAS_USER_SHADOW_STACK The architecture has hardware support for userspace shadow call stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). =20 +config ARCH_SUPPORTS_PT_RECLAIM + def_bool n + +config PT_RECLAIM + bool "reclaim empty user page table pages" + default y + depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP + select MMU_GATHER_RCU_TABLE_FREE + help + Try to reclaim empty user page table pages in paths other than munmap + and exit_mmap path. + + Note: now only empty user PTE page table pages will be reclaimed. + + source "mm/damon/Kconfig" =20 endmenu diff --git a/mm/Makefile b/mm/Makefile index dba52bb0da8ab..850386a67b3e0 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -146,3 +146,4 @@ obj-$(CONFIG_GENERIC_IOREMAP) +=3D ioremap.o obj-$(CONFIG_SHRINKER_DEBUG) +=3D shrinker_debug.o obj-$(CONFIG_EXECMEM) +=3D execmem.o obj-$(CONFIG_TMPFS_QUOTA) +=3D shmem_quota.o +obj-$(CONFIG_PT_RECLAIM) +=3D pt_reclaim.o diff --git a/mm/internal.h b/mm/internal.h index 74713b44bedb6..3958a965e56e1 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1545,4 +1545,23 @@ int walk_page_range_mm(struct mm_struct *mm, unsigne= d long start, unsigned long end, const struct mm_walk_ops *ops, void *private); =20 +/* pt_reclaim.c */ +bool try_get_and_clear_pmd(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdval= ); +void free_pte(struct mm_struct *mm, unsigned long addr, struct mmu_gather = *tlb, + pmd_t pmdval); +void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, + struct mmu_gather *tlb); + +#ifdef CONFIG_PT_RECLAIM +bool reclaim_pt_is_enabled(unsigned long start, unsigned long end, + struct zap_details *details); +#else +static inline bool reclaim_pt_is_enabled(unsigned long start, unsigned lon= g end, + struct zap_details *details) +{ + return false; +} +#endif /* CONFIG_PT_RECLAIM */ + + #endif /* __MM_INTERNAL_H */ diff --git a/mm/madvise.c b/mm/madvise.c index 0ceae57da7dad..49f3a75046f63 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -851,7 +851,12 @@ static int madvise_free_single_vma(struct vm_area_stru= ct *vma, static long madvise_dontneed_single_vma(struct vm_area_struct *vma, unsigned long start, unsigned long end) { - zap_page_range_single(vma, start, end - start, NULL); + struct zap_details details =3D { + .reclaim_pt =3D true, + .even_cows =3D true, + }; + + zap_page_range_single(vma, start, end - start, &details); return 0; } =20 diff --git a/mm/memory.c b/mm/memory.c index 36a59bea289d1..1fc1f14839916 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1436,7 +1436,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma) static inline bool should_zap_cows(struct zap_details *details) { /* By default, zap all pages */ - if (!details) + if (!details || details->reclaim_pt) return true; =20 /* Or, we zap COWed pages only if the caller wants to */ @@ -1710,12 +1710,15 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, struct zap_details *details) { bool force_flush =3D false, force_break =3D false; - bool any_skipped =3D false; struct mm_struct *mm =3D tlb->mm; int rss[NR_MM_COUNTERS]; spinlock_t *ptl; pte_t *start_pte; pte_t *pte; + pmd_t pmdval; + unsigned long start =3D addr; + bool can_reclaim_pt =3D reclaim_pt_is_enabled(start, end, details); + bool direct_reclaim =3D false; int nr; =20 retry: @@ -1728,17 +1731,24 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); do { + bool any_skipped =3D false; + if (need_resched()) break; =20 nr =3D do_zap_pte_range(tlb, vma, pte, addr, end, details, rss, &force_flush, &force_break, &any_skipped); + if (any_skipped) + can_reclaim_pt =3D false; if (unlikely(force_break)) { addr +=3D nr * PAGE_SIZE; break; } } while (pte +=3D nr, addr +=3D PAGE_SIZE * nr, addr !=3D end); =20 + if (can_reclaim_pt && addr =3D=3D end) + direct_reclaim =3D try_get_and_clear_pmd(mm, pmd, &pmdval); + add_mm_rss_vec(mm, rss); arch_leave_lazy_mmu_mode(); =20 @@ -1765,6 +1775,13 @@ static unsigned long zap_pte_range(struct mmu_gather= *tlb, goto retry; } =20 + if (can_reclaim_pt) { + if (direct_reclaim) + free_pte(mm, start, tlb, pmdval); + else + try_to_free_pte(mm, pmd, start, tlb); + } + return addr; } =20 diff --git a/mm/pt_reclaim.c b/mm/pt_reclaim.c new file mode 100644 index 0000000000000..6540a3115dde8 --- /dev/null +++ b/mm/pt_reclaim.c @@ -0,0 +1,71 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +#include "internal.h" + +bool reclaim_pt_is_enabled(unsigned long start, unsigned long end, + struct zap_details *details) +{ + return details && details->reclaim_pt && (end - start >=3D PMD_SIZE); +} + +bool try_get_and_clear_pmd(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdval) +{ + spinlock_t *pml =3D pmd_lockptr(mm, pmd); + + if (!spin_trylock(pml)) + return false; + + *pmdval =3D pmdp_get_lockless(pmd); + pmd_clear(pmd); + spin_unlock(pml); + + return true; +} + +void free_pte(struct mm_struct *mm, unsigned long addr, struct mmu_gather = *tlb, + pmd_t pmdval) +{ + pte_free_tlb(tlb, pmd_pgtable(pmdval), addr); + mm_dec_nr_ptes(mm); +} + +void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, + struct mmu_gather *tlb) +{ + pmd_t pmdval; + spinlock_t *pml, *ptl; + pte_t *start_pte, *pte; + int i; + + pml =3D pmd_lock(mm, pmd); + start_pte =3D pte_offset_map_rw_nolock(mm, pmd, addr, &pmdval, &ptl); + if (!start_pte) + goto out_ptl; + if (ptl !=3D pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + + /* Check if it is empty PTE page */ + for (i =3D 0, pte =3D start_pte; i < PTRS_PER_PTE; i++, pte++) { + if (!pte_none(ptep_get(pte))) + goto out_ptl; + } + pte_unmap(start_pte); + + pmd_clear(pmd); + + if (ptl !=3D pml) + spin_unlock(ptl); + spin_unlock(pml); + + free_pte(mm, addr, tlb, pmdval); + + return; +out_ptl: + if (start_pte) + pte_unmap_unlock(start_pte, ptl); + if (ptl !=3D pml) + spin_unlock(pml); +} --=20 2.20.1 From nobody Sat Dec 13 22:52:59 2025 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C94C1B6D1B for ; Wed, 4 Dec 2024 11:11:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310698; cv=none; b=dAVp4g3CoweUmWA/7o8XGEPdRvoLj6qdXtcC1pDyY5Qhr9U3Ecjn4vt5ZPXE7vuBV5SWI2WzPsf9U+lNaZLiTe9FVvQ7SiLQnCLumM1yNkU4JzOu257TaowYClNuhWXpWYmg/J6zdrq7CFCkZ5WSJqLnJ8Kt7k5bUbDi02ELks0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310698; c=relaxed/simple; bh=oK8sfhBv2wUjIXRu41nFpkp8JSCrB8y/UDm/DnICKz4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CvQJsyEHLLERpP02V+fuoksd4dqOrNb99bB4MWPJfuizAFgQjG7h64AoB88Dl/hDK9nkcLtsUM3kzLmiv7dYEi3L/XvgVPw0GLZZifjiRA4SL+I+V5cDYGGwQTxeBg96ZrGpKrjJRUcb7osWRelCcqV2grVWjEPfL2nTCVd6Qp4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=AGPQKYTx; arc=none smtp.client-ip=209.85.210.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="AGPQKYTx" Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-7252f48acf2so4831834b3a.2 for ; Wed, 04 Dec 2024 03:11:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310696; x=1733915496; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9fSWWcfeV8hzoFpFk7VaZVUYbnww7eiSUjs7tN8xr0c=; b=AGPQKYTxG/ikFMWQsHRFj9v/dwJEG4iBNJ/87yOsa1uYCUZhP9CLMhpp/WJZYQxZ93 FhA1Qsh84J4xFtXnSU0GuyK8ddWGoat/H2rl1FEfuQzAyPKjcsVvVsfnMec4Ii9Ov1bg dOlD9Mse22AyuNDKRSe0QfpFBMBWFb6T3GEU0/sUMtSBieKo7bydseJgJi0RoaDQCUiO 6MEVi06jUAt350p7daxGPgY9mfcIftIEpc2YxGSoJAZpM6QZ2mPkmMbOz2ZZUxfk4+5G H73ZgUlSGtjbthkOD1RtarHyNfolepRNW1CyG4CtAgCgptX3HjYqRWkeyNs+4aHu+aky 0JJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310696; x=1733915496; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9fSWWcfeV8hzoFpFk7VaZVUYbnww7eiSUjs7tN8xr0c=; b=pWabUzY50mwfpPSU4br3XykWOpHxIeoYfqYSH1pSWPT6zbQM6QmOjBH3FVG5zRamQ8 GfrXmX8WLV4jvg2us5u8X3fTKNPLF6j/sqppYg/eDJN0kdnZ02y88ikmnX/BOD12rTKg 3ymWIfzyxwgrnCEE3hCw5MpiyjBsRt39yNRMDb2GOATkBNdL2u2sPsiYLe0LsDxFd0zj G2cDlVX2re5uTioL1LN/eKU+UfZjR0bKk6sdKIHkpOam8A3qM6NELQ80D3ZNecQ8Hy9s 0h3XYJS99jgkNLu84c8N5g5+0I8SlXvhy/oWHuNhi22yzSspnSU8E3Szlpe/WBwfJK9Q ML+A== X-Forwarded-Encrypted: i=1; AJvYcCVx+5eQW8DRRKXf6FNOwkGx8nmvFbreEnjRacARwwAnFFX4IMWL2rxFcoX+NXfVR49WCnUM9n0NxjjKdo4=@vger.kernel.org X-Gm-Message-State: AOJu0YxViXOtE8w0jhJZBaun2dxyrw9XVSI2ak4JjE7QaAtImOI9DexW ifh3/SToNcrsXQGZ5KWEFmJNLksNGaOatIabcOhpu6KmbUU3E1ls07D3p6nzBBU= X-Gm-Gg: ASbGncsmsoKKPft6igjns7KtwVLyUfUJwrr1zulwuTy1N8CsqazjeBRPWEwhBTl4QCh SXDSKMCWNtaWVWYWVfkTrSpEJeWCqN3ouOMa3B0ShOcguCgSYogf/xhZWfCcrq4ARHPl28yTkwH oPAplZuHX9gcRV7WUR9bSMv35PAPvxxiQOqr8P3ZUdMaRaqGit5jWefnUGbYpWNPRElYLgn+dAe 7E1MitQcN1Jvjb4xQjRK4OamLzEQPLHNdcxvC95n0EuptpP+tT8J9BWzE15GUtDYduJVdeO8Gxj VH91KGtB9MKkSzw= X-Google-Smtp-Source: AGHT+IFovq7hDlm4wLdDhm1VdKh1zZ19bET/1mJY6NLSrbW7cKbNUzk3vayQjzm+UlBEnc1Q8UAnEw== X-Received: by 2002:a17:902:ea10:b0:215:a190:ba28 with SMTP id d9443c01a7336-215d0041378mr59888075ad.22.1733310695765; Wed, 04 Dec 2024 03:11:35 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.11.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:11:35 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 10/11] x86: mm: free page table pages by RCU instead of semi RCU Date: Wed, 4 Dec 2024 19:09:50 +0800 Message-Id: <0287d442a973150b0e1019cc406e6322d148277a.1733305182.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, the page table pages will be freed by semi RCU, that is: - batch table freeing: asynchronous free by RCU - single table freeing: IPI + synchronous free In this way, the page table can be lockless traversed by disabling IRQ in paths such as fast GUP. But this is not enough to free the empty PTE page table pages in paths other that munmap and exit_mmap path, because IPI cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}(). In preparation for supporting empty PTE page table pages reclaimation, let single table also be freed by RCU like batch table freeing. Then we can also use pte_offset_map() etc to prevent PTE page from being freed. Like pte_free_defer(), we can also safely use ptdesc->pt_rcu_head to free the page table pages: - The pt_rcu_head is unioned with pt_list and pmd_huge_pte. - For pt_list, it is used to manage the PGD page in x86. Fortunately tlb_remove_table() will not be used for free PGD pages, so it is safe to use pt_rcu_head. - For pmd_huge_pte, it is used for THPs, so it is safe. After applying this patch, if CONFIG_PT_RECLAIM is enabled, the function call of free_pte() is as follows: free_pte pte_free_tlb __pte_free_tlb ___pte_free_tlb paravirt_tlb_remove_table tlb_remove_table [!CONFIG_PARAVIRT, Xen PV, Hyper-V, KVM] [no-free-memory slowpath:] tlb_table_invalidate tlb_remove_table_one __tlb_remove_table_one [frees via RCU] [fastpath:] tlb_table_flush tlb_remove_table_free [frees via RCU] native_tlb_remove_table [CONFIG_PARAVIRT on native] tlb_remove_table [see above] Signed-off-by: Qi Zheng Cc: x86@kernel.org Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra --- arch/x86/include/asm/tlb.h | 20 ++++++++++++++++++++ arch/x86/kernel/paravirt.c | 7 +++++++ arch/x86/mm/pgtable.c | 10 +++++++++- include/linux/mm_types.h | 4 +++- mm/mmu_gather.c | 9 ++++++++- 5 files changed, 47 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h index 4d3c9d00d6b6b..73f0786181cc9 100644 --- a/arch/x86/include/asm/tlb.h +++ b/arch/x86/include/asm/tlb.h @@ -34,8 +34,28 @@ static inline void __tlb_remove_table(void *table) free_page_and_swap_cache(table); } =20 +#ifdef CONFIG_PT_RECLAIM +static inline void __tlb_remove_table_one_rcu(struct rcu_head *head) +{ + struct page *page; + + page =3D container_of(head, struct page, rcu_head); + put_page(page); +} + +static inline void __tlb_remove_table_one(void *table) +{ + struct page *page; + + page =3D table; + call_rcu(&page->rcu_head, __tlb_remove_table_one_rcu); +} +#define __tlb_remove_table_one __tlb_remove_table_one +#endif /* CONFIG_PT_RECLAIM */ + static inline void invlpg(unsigned long addr) { asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); } + #endif /* _ASM_X86_TLB_H */ diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index fec3815335558..89688921ea62e 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -59,10 +59,17 @@ void __init native_pv_lock_init(void) static_branch_enable(&virt_spin_lock_key); } =20 +#ifndef CONFIG_PT_RECLAIM static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) { tlb_remove_page(tlb, table); } +#else +static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + tlb_remove_table(tlb, table); +} +#endif =20 struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 5745a354a241c..69a357b15974a 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -19,12 +19,20 @@ EXPORT_SYMBOL(physical_mask); #endif =20 #ifndef CONFIG_PARAVIRT +#ifndef CONFIG_PT_RECLAIM static inline void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) { tlb_remove_page(tlb, table); } -#endif +#else +static inline +void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + tlb_remove_table(tlb, table); +} +#endif /* !CONFIG_PT_RECLAIM */ +#endif /* !CONFIG_PARAVIRT */ =20 gfp_t __userpte_alloc_gfp =3D GFP_PGTABLE_USER | PGTABLE_HIGHMEM; =20 diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 3a35546bac944..706b3c926a089 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -438,7 +438,9 @@ FOLIO_MATCH(compound_head, _head_2a); * struct ptdesc - Memory descriptor for page tables. * @__page_flags: Same as page flags. Powerpc only. * @pt_rcu_head: For freeing page table pages. - * @pt_list: List of used page tables. Used for s390 and x86. + * @pt_list: List of used page tables. Used for s390 gmap shadow = pages + * (which are not linked into the user page tables) and= x86 + * pgds. * @_pt_pad_1: Padding that aliases with page's compound head. * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs. * @__page_mapping: Aliases with page->mapping. Unused for page tables. diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 99b3e9408aa0f..1e21022bcf339 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -311,11 +311,18 @@ static inline void tlb_table_invalidate(struct mmu_ga= ther *tlb) } } =20 -static void tlb_remove_table_one(void *table) +#ifndef __tlb_remove_table_one +static inline void __tlb_remove_table_one(void *table) { tlb_remove_table_sync_one(); __tlb_remove_table(table); } +#endif + +static void tlb_remove_table_one(void *table) +{ + __tlb_remove_table_one(table); +} =20 static void tlb_table_flush(struct mmu_gather *tlb) { --=20 2.20.1 From nobody Sat Dec 13 22:52:59 2025 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 591841B87E8 for ; Wed, 4 Dec 2024 11:11:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310705; cv=none; b=B7MrmrqC1uKduF4xE59GZl8VG9hhjWxTrP/p/X1lf+qNtdhdoX499R7r8scEyF07GCy9aCD3zgMEjub9KIQYyz2ems6TGOuVXO3Luvp0wlMzvCdfgXHTUutjl58OwnWEItRAFQ5zDFdI+xVzw2+oHtKQZ9y5KrcWF6cMRxXs+a4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733310705; c=relaxed/simple; bh=h03waY8RoZaiaLEfywQIp90InoLIau7FQLnsulSQ8Lk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OaaJF7TiAvjMur/SAA4OR49h6X7o/gd91cGIVVNnI98WeNb9F+HO9WDgOXqMDpYpVZJa7RMRVf7lKemwnNJcwWeDlQ6Z+iq8394a6KF9dlXcSa7aWJYMFJIcK0NZ/rNQuSvBMoiXZZtijYoXNEV2+y6Hg1qFQ1qpI8yR+4mHnKQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=O94JQNJ8; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="O94JQNJ8" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-215909152c5so32875325ad.3 for ; Wed, 04 Dec 2024 03:11:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310704; x=1733915504; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=S+2TKl5mjR1/DeVU9OKWZNhWFB9aKSgXlEoedfavFGk=; b=O94JQNJ8Uw1mfC+KlsD7IYjwqBLvRWhwnzMYwR0C8UqNP1BtHdazGA+L1pkQBoF0ab fX3uK5jkur86WQYeMwSoN+fiYg8IdLLePjOuXhRv3Fx9zn5Hz/g/tM2RquFUHnkYWKFP VnB3b1iu9iAR+ikpb+ANMhkWtmleuPl6yoZ+GzOPMnwkVppbN/lDLXqOCWqHCtSYkqSA sXXkgPYWCsE0ETSFJYTQEVLucdkMUCbhd/awqDo2k28ix8BO2r7t1KJPO0ZnIaLByV/t 97DJyomB4v2TihWQl+hqM0AQDfVS1VhNN+t4XjQBRODsRBm3tyiHxGpUblesf5WecGP8 +1lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310704; x=1733915504; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S+2TKl5mjR1/DeVU9OKWZNhWFB9aKSgXlEoedfavFGk=; b=VTPnN5Xn3z4jK21KkEGJJgASzyzKTMX5p9NzSwHiXcQ/afoxLqLoNrhRXlf1TbJ2UJ 7JXz/oglPhOf1wk1JW6qfI77getDvINKwWGXJaKx5iUOJKvI5UUHAcpmiTlFNZ+z09Vc 1HVBQF3cIB/KeZn6jEeT8Dv16zDdS8yqXP/o46kpPNY/5KIQhNoMELThkX/dBxUQtHGe VpGYIQx7Ca8KBbrfDjWnN7VTK9m7EpQk/0UVm3DRGbUd7NrwzPgmw401+yJqmmfCOKqz 5+/oYLNXVre0XkWXmEOz4M70UIVDXYkJL+PB9VXC1EfnoSR/FQM7AZgLrjxDh7DY88Gf oz3A== X-Forwarded-Encrypted: i=1; AJvYcCWuzYeaxUSiBwle1jpsrIKk30wmmauPxrGcZAkBHFh5gUrlYuIEVlju4g6tOokSiCUb67gKg1iIlpn0ZBw=@vger.kernel.org X-Gm-Message-State: AOJu0Ywd9g0G+5775WHZto12TkLjyOXiomRKgbwdvhdlrH8Tzsoe8vDQ XXL5lpeuxDADfa0Zcc/RAAZhvBngrqWCtEUkgKG/PiBKeaYcLDeIVB65HqJcIMDvDUKSY69grVX q X-Gm-Gg: ASbGncvuu4X34lsF3c4aR+Nfvulu0+wMZ8SLPyAjTn4T1G3Fxmx8SrA1a0qlj+DgsSY 0hqB5prRImM1ILeP5Y/lOsh3DuFcnNT15S2tkZJQ0EVpixeCi/WrnLrUBGPRHyt0NHSx6EM/6zN ZT5xDaFUTaLG9Nlk/DmirK0xu4jJcDG6rDOv1Bjvq+0ABdxnadHK1EqqKWg7IMLXRoKy20mT9ad HCnK3u1L4AANifSFISDcE2/qIbJeRodTcOhCbEN/Cm5MmNs2xgTM7iNtQ8sgeX08YeNSIvfDMpL fFWQy2l/Xdsm9QQ= X-Google-Smtp-Source: AGHT+IGw0dejDQHZRzw7N1ju/UdXqXmsmAXf2JNNoAGHvDVcgOtViNVAc4qWa1MErCGtacjUW7kRdQ== X-Received: by 2002:a17:902:f543:b0:20c:a44b:3221 with SMTP id d9443c01a7336-215bd1cb76emr76174235ad.15.1733310703813; Wed, 04 Dec 2024 03:11:43 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.11.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:11:42 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 11/11] x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64 Date: Wed, 4 Dec 2024 19:09:51 +0800 Message-Id: <841c1f35478d5354872d307888979c9e20de9c09.1733305182.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now, x86 has fully supported the CONFIG_PT_RECLAIM feature, and reclaiming PTE pages is profitable only on 64-bit systems, so select ARCH_SUPPORTS_PT_RECLAIM if X86_64. Signed-off-by: Qi Zheng Cc: x86@kernel.org Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 65f8478fe7a96..77f001c6a5679 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -324,6 +324,7 @@ config X86 select FUNCTION_ALIGNMENT_4B imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE + select ARCH_SUPPORTS_PT_RECLAIM if X86_64 =20 config INSTRUCTION_DECODER def_bool y --=20 2.20.1