From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66817C4332F for ; Fri, 28 Jan 2022 13:19:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348778AbiA1NTL (ORCPT ); Fri, 28 Jan 2022 08:19:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244833AbiA1NTI (ORCPT ); Fri, 28 Jan 2022 08:19:08 -0500 X-Greylist: delayed 539 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Fri, 28 Jan 2022 05:19:07 PST Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7193C061714 for ; Fri, 28 Jan 2022 05:19:07 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=60KVsRSybg58S5Bq7/6VWRR0Ob54Rjhy19T2JOiOHJE=; b=M7pzfh5ZrllwylgwXDFgMgUOxk+x0CQXf6B5IVcHP6kNbiwFEwwKpkHsAzgcdMs6ciQ+K 40+M0hVJhJPDMYlCA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=60KVsRSybg58S5Bq7/6VWRR0Ob54Rjhy19T2JOiOHJE=; b=hH+5ulWntN/qZ08GjZofRdKqO2OeOaAGUxe42j5ciL7Bjfd4AWn2kXvECpD0Tk1kLuQim pXNlIDBs0nLbih3vTu1Xmftjg+LmMrBFYRCGmnBJXw+5vZdhI3IEbAvdi8t/VIWjDGOIWvb VK/0trH9WTTtLQsjBsY0lG/KIFr00ZEXA0HIUrpP7HHkEFIFaJjWppZIzD16uxbHJw/3rti LymDIcnLEYmmKHELb7Bb3LM2Q3tDBAZ1TqK6ebvJCW9rjIgkV5aWyrvSy/xo6LiJO1+KsYu PgxAcY2i2izS6gADabfEHywhVuqppThEPWfhOrm+GR6Sg4NMiy43Uk41aLmA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id CEF33160942; Fri, 28 Jan 2022 05:10:06 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id C2D5D1FF74; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 01/35] mm: export dump_mm Date: Fri, 28 Jan 2022 05:09:32 -0800 Message-Id: <20220128131006.67712-2-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This is necessary in order to allow VM_BUG_ON_MM to be used in modules (I encountered the issue when adding VM_BUG_ON_MM in mmap locking functions= ). Signed-off-by: Michel Lespinasse --- mm/debug.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/debug.c b/mm/debug.c index bc9ac87f0e08..40d3f358b75c 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -220,6 +220,7 @@ void dump_mm(const struct mm_struct *mm) mm->def_flags, &mm->def_flags ); } +EXPORT_SYMBOL(dump_mm); =20 static bool page_init_poisoning __read_mostly =3D true; =20 --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68F3BC433F5 for ; Fri, 28 Jan 2022 13:19:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348859AbiA1NTX (ORCPT ); Fri, 28 Jan 2022 08:19:23 -0500 Received: from server.lespinasse.org ([63.205.204.226]:54401 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245035AbiA1NTI (ORCPT ); Fri, 28 Jan 2022 08:19:08 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=Y2k4uPVeoJirwL08p+/VJjxEhsePEHeg9Wr+pPi/UaM=; b=WVJsEvoRqkBk1wBf/PQvT+xODXvkJZ2W/KdNjDE+io9o82HbpdsxP3huXOEqePeoiipqa 4ZiZYFmKjNnl0n6DQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=Y2k4uPVeoJirwL08p+/VJjxEhsePEHeg9Wr+pPi/UaM=; b=vSQ8s0/J8zRmO0ii8kp4Dal6J6g7+90+5Tg8FB0H2IdHKfFKMNcbY4uu/b23/wp85T5aM pOnoLkegyz82Ymv86cgI3Mob1ZWkjQNPdDjPdfSgstcziSsUGjmhsQLiAN4Oq3KjFDqih0C akrA6ZElzAceI7GfqcA/Hx3uB4xgwNGlh4eo+ViqOgSTp0ckD0tcvXWGE2eciaPmXCP7mpw cGEv2SmbXGevETiQwcjSC3/R20Z7R6KHdPlCkgiUFqy7JfGWasujKkQIZwyeKPSD4jS+ymv FfkyfV9r8+INSo2NieNZfKcM0gOcwiJ1e5tDXPb88krQfhDssX28SUscD+QA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id DAEF516094C; Fri, 28 Jan 2022 05:10:06 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id C5D6220328; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 02/35] mmap locking API: mmap_lock_is_contended returns a bool Date: Fri, 28 Jan 2022 05:09:33 -0800 Message-Id: <20220128131006.67712-3-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Change mmap_lock_is_contended to return a bool value, rather than an int which the callers are then supposed to interpret as a bool. This is to ensure consistency with other mmap lock API functions (such as the trylock functions). Signed-off-by: Michel Lespinasse --- include/linux/mmap_lock.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 96e113e23d04..db9785e11274 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -162,9 +162,9 @@ static inline void mmap_assert_write_locked(struct mm_s= truct *mm) VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); } =20 -static inline int mmap_lock_is_contended(struct mm_struct *mm) +static inline bool mmap_lock_is_contended(struct mm_struct *mm) { - return rwsem_is_contended(&mm->mmap_lock); + return rwsem_is_contended(&mm->mmap_lock) !=3D 0; } =20 #endif /* _LINUX_MMAP_LOCK_H */ --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DADBCC433FE for ; Fri, 28 Jan 2022 13:19:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348923AbiA1NTy (ORCPT ); Fri, 28 Jan 2022 08:19:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54994 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348657AbiA1NTJ (ORCPT ); Fri, 28 Jan 2022 08:19:09 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 120A7C06174A for ; Fri, 28 Jan 2022 05:19:08 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=G/sA6eDUTFkJatHlYWtCSRczmeD8GU3nVA1OA3tvlIA=; b=tuPOkcHJ7H0s/Mhca9LNqtBRVRnxk4Z1cxNgHh1WZEWb3QpuUGtcwZy4f8DWLk1PcUhmd rcUhZq0JYrenQUACw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=G/sA6eDUTFkJatHlYWtCSRczmeD8GU3nVA1OA3tvlIA=; b=UUc/5KznUB/vjclApSqjaIYr8v/XO5q9CLULFAyaUUfl+3xsPLAkaGNCYMBA/yCy1uyGt khyVk+i8AVLLJuTiB6TzU0juIZ1Hqx5PbrWz6PgZT+z8dA2YweCyKOCY4KfyM4oz8MkTdaC TLKiY6x/E97IBRpjLe+NlLjpsFLk/x/H/gE+NBVz2t4ygqbfZ+AYzexTQOWJYnANhm4Dmq9 iI76H58ibXNib1XjCLS24jDqARW3/Se1x2aCqNfaB7+YHyRH6+64ZqOQSfEI4KTm0oM1Ka3 KrdnBvUw5MTunj6/frgh0lu6raOrHSZpeU7fm77fXPGm4F14ivu3vbS/637Q== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id DECE5160950; Fri, 28 Jan 2022 05:10:06 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id C8AC720330; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 03/35] mmap locking API: name the return values Date: Fri, 28 Jan 2022 05:09:34 -0800 Message-Id: <20220128131006.67712-4-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In the mmap locking API, the *_killable() functions return an error (or 0 on success), and the *_trylock() functions return a boolean (true on success). Rename the return values "int error" and "bool ok", respectively, rather than using "ret" for both cases which I find less readable. Signed-off-by: Michel Lespinasse --- include/linux/mmap_lock.h | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index db9785e11274..1b14468183d7 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -81,22 +81,22 @@ static inline void mmap_write_lock_nested(struct mm_str= uct *mm, int subclass) =20 static inline int mmap_write_lock_killable(struct mm_struct *mm) { - int ret; + int error; =20 __mmap_lock_trace_start_locking(mm, true); - ret =3D down_write_killable(&mm->mmap_lock); - __mmap_lock_trace_acquire_returned(mm, true, ret =3D=3D 0); - return ret; + error =3D down_write_killable(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, true, !error); + return error; } =20 static inline bool mmap_write_trylock(struct mm_struct *mm) { - bool ret; + bool ok; =20 __mmap_lock_trace_start_locking(mm, true); - ret =3D down_write_trylock(&mm->mmap_lock) !=3D 0; - __mmap_lock_trace_acquire_returned(mm, true, ret); - return ret; + ok =3D down_write_trylock(&mm->mmap_lock) !=3D 0; + __mmap_lock_trace_acquire_returned(mm, true, ok); + return ok; } =20 static inline void mmap_write_unlock(struct mm_struct *mm) @@ -120,22 +120,22 @@ static inline void mmap_read_lock(struct mm_struct *m= m) =20 static inline int mmap_read_lock_killable(struct mm_struct *mm) { - int ret; + int error; =20 __mmap_lock_trace_start_locking(mm, false); - ret =3D down_read_killable(&mm->mmap_lock); - __mmap_lock_trace_acquire_returned(mm, false, ret =3D=3D 0); - return ret; + error =3D down_read_killable(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, false, !error); + return error; } =20 static inline bool mmap_read_trylock(struct mm_struct *mm) { - bool ret; + bool ok; =20 __mmap_lock_trace_start_locking(mm, false); - ret =3D down_read_trylock(&mm->mmap_lock) !=3D 0; - __mmap_lock_trace_acquire_returned(mm, false, ret); - return ret; + ok =3D down_read_trylock(&mm->mmap_lock) !=3D 0; + __mmap_lock_trace_acquire_returned(mm, false, ok); + return ok; } =20 static inline void mmap_read_unlock(struct mm_struct *mm) --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E539C433EF for ; Fri, 28 Jan 2022 13:19:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348891AbiA1NTi (ORCPT ); Fri, 28 Jan 2022 08:19:38 -0500 Received: from server.lespinasse.org ([63.205.204.226]:50591 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348606AbiA1NTJ (ORCPT ); Fri, 28 Jan 2022 08:19:09 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=XOBjo9P4P5QXbaTzEPxUaN/dHjWwd81pn2JDgJBu6Nw=; b=Zgl+0z8dPqYhAgCkfq3ZaZLo8J9FklZBrFYZ4KUxrJk41QL0gHK48EflaTggGi9/FdgST /d9N/2pIOp1kt5sAg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=XOBjo9P4P5QXbaTzEPxUaN/dHjWwd81pn2JDgJBu6Nw=; b=RXZkQsc2fCt/keozAMU/+XTvOIbG+hRQ17sOqDMwCKaDiYIBy5bbSMu1P7SzsdOIgr897 WQ9H4PuFncpvLdHeCZmk3ra+Z6DWX8BPsJWxn6LEiGOKr1JqS0I2+w+I383l8o8/1aZM/Fd qRUP0R0hKcZ1PsULYeWwhiUsYSZFa1GmB1l2PyVhUayXj9CD6nNDa5kiDtFv+C4UW/6/anx VwJbO6oUIWxKuKVbXkpglJtm+UIBFSmE5DYQq3BG2jFi+7KEvYUh2jPlutoQqdgL/vC3ueP shObKmkZFJqxctv+L/84bRkDjWtOFNxoPQCrJmC5bsFMy6MCfCk5rHKHYRww== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id E3C61160951; Fri, 28 Jan 2022 05:10:06 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id CB74720337; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 04/35] do_anonymous_page: use update_mmu_tlb() Date: Fri, 28 Jan 2022 05:09:35 -0800 Message-Id: <20220128131006.67712-5-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" update_mmu_tlb() can be used instead of update_mmu_cache() when the page fault handler detects that it lost the race to another page fault. It looks like this one call was missed in https://patchwork.kernel.org/project/linux-mips/patch/1590375160-6997-2-git= -send-email-maobibo@loongson.cn after Andrew asked to replace all update_mmu_cache() calls with an alias in the previous version of this patch here: https://patchwork.kernel.org/project/linux-mips/patch/1590031837-9582-2-git= -send-email-maobibo@loongson.cn/#23374625 Signed-off-by: Michel Lespinasse --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..cd9432df3a27 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3799,7 +3799,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); if (!pte_none(*vmf->pte)) { - update_mmu_cache(vma, vmf->address, vmf->pte); + update_mmu_tlb(vma, vmf->address, vmf->pte); goto release; } =20 --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83D17C433F5 for ; Fri, 28 Jan 2022 13:19:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348930AbiA1NTg (ORCPT ); Fri, 28 Jan 2022 08:19:36 -0500 Received: from server.lespinasse.org ([63.205.204.226]:34765 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348642AbiA1NTJ (ORCPT ); Fri, 28 Jan 2022 08:19:09 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=XF266dOIdTNhtq2kUwkT/iSPHVISg780FKYRq+ayge8=; b=UeFZeBamF57+P1oaY18fMZj4FfUhNySrD2UNXPENW1jvkUHqpPyzvB2L28sNE+CgigR65 ao6JdOHgJLf+tDsDA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=XF266dOIdTNhtq2kUwkT/iSPHVISg780FKYRq+ayge8=; b=nfZdubOk41IB1ilp93OnPaLXStItCyh/kWPsVch1n61p/wqlhDa6Z2RnUpujDx4RAzDs0 hqplzEUoK3gLT3JDctt2hO4mG8YyC0HZSlUiWlaoOwBbVaf6wBU+jYdU2DVAx3MCVPvlSNN id+cetPNT6bvfd8SindRMDhgAMAzbrIDRfi1PILTIwGNFoMs30Ap0u6VPfGvfWuXD9SprYN HzHU0ee42fdAITb5o09yNb6h3mvRDMQ16gnFjHQpNqRvwu+DaizbdPQ2spNIqqw31kBgiWo 3pLpvsZ+tfIGSfnJ8p4edHe9btwdWnTF1v6BzAbGqpMfCAhwr7s8Lsk1yPhw== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id E53C716095C; Fri, 28 Jan 2022 05:10:06 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id CE47E2044B; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 05/35] do_anonymous_page: reduce code duplication Date: Fri, 28 Jan 2022 05:09:36 -0800 Message-Id: <20220128131006.67712-6-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In do_anonymous_page(), we have separate cases for the zero page vs allocating new anonymous pages. However, once the pte entry has been computed, the rest of the handling (mapping and locking the page table, checking that we didn't lose a race with another page fault handler, etc) is identical between the two cases. This change reduces the code duplication between the two cases. Signed-off-by: Michel Lespinasse --- mm/memory.c | 87 +++++++++++++++++++++++------------------------------ 1 file changed, 38 insertions(+), 49 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index cd9432df3a27..f83e06b1dafb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3726,7 +3726,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) static vm_fault_t do_anonymous_page(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; - struct page *page; + struct page *page =3D NULL; vm_fault_t ret =3D 0; pte_t entry; =20 @@ -3756,78 +3756,67 @@ static vm_fault_t do_anonymous_page(struct vm_fault= *vmf) !mm_forbids_zeropage(vma->vm_mm)) { entry =3D pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address), vma->vm_page_prot)); - vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, - vmf->address, &vmf->ptl); - if (!pte_none(*vmf->pte)) { - update_mmu_tlb(vma, vmf->address, vmf->pte); - goto unlock; - } - ret =3D check_stable_address_space(vma->vm_mm); - if (ret) - goto unlock; - /* Deliver the page fault to userland, check inside PT lock */ - if (userfaultfd_missing(vma)) { - pte_unmap_unlock(vmf->pte, vmf->ptl); - return handle_userfault(vmf, VM_UFFD_MISSING); - } - goto setpte; + } else { + /* Allocate our own private page. */ + if (unlikely(anon_vma_prepare(vma))) + goto oom; + page =3D alloc_zeroed_user_highpage_movable(vma, vmf->address); + if (!page) + goto oom; + + if (mem_cgroup_charge(page_folio(page), vma->vm_mm, GFP_KERNEL)) + goto oom_free_page; + cgroup_throttle_swaprate(page, GFP_KERNEL); + + /* + * The memory barrier inside __SetPageUptodate makes sure that + * preceding stores to the page contents become visible before + * the set_pte_at() write. + */ + __SetPageUptodate(page); + + entry =3D mk_pte(page, vma->vm_page_prot); + entry =3D pte_sw_mkyoung(entry); + if (vma->vm_flags & VM_WRITE) + entry =3D pte_mkwrite(pte_mkdirty(entry)); } =20 - /* Allocate our own private page. */ - if (unlikely(anon_vma_prepare(vma))) - goto oom; - page =3D alloc_zeroed_user_highpage_movable(vma, vmf->address); - if (!page) - goto oom; - - if (mem_cgroup_charge(page_folio(page), vma->vm_mm, GFP_KERNEL)) - goto oom_free_page; - cgroup_throttle_swaprate(page, GFP_KERNEL); - - /* - * The memory barrier inside __SetPageUptodate makes sure that - * preceding stores to the page contents become visible before - * the set_pte_at() write. - */ - __SetPageUptodate(page); - - entry =3D mk_pte(page, vma->vm_page_prot); - entry =3D pte_sw_mkyoung(entry); - if (vma->vm_flags & VM_WRITE) - entry =3D pte_mkwrite(pte_mkdirty(entry)); - vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); if (!pte_none(*vmf->pte)) { update_mmu_tlb(vma, vmf->address, vmf->pte); - goto release; + goto unlock; } =20 ret =3D check_stable_address_space(vma->vm_mm); if (ret) - goto release; + goto unlock; =20 /* Deliver the page fault to userland, check inside PT lock */ if (userfaultfd_missing(vma)) { pte_unmap_unlock(vmf->pte, vmf->ptl); - put_page(page); + if (page) + put_page(page); return handle_userfault(vmf, VM_UFFD_MISSING); } =20 - inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); - page_add_new_anon_rmap(page, vma, vmf->address, false); - lru_cache_add_inactive_or_unevictable(page, vma); -setpte: + if (page) { + inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); + page_add_new_anon_rmap(page, vma, vmf->address, false); + lru_cache_add_inactive_or_unevictable(page, vma); + } + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); =20 /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); + if (page) + put_page(page); return ret; -release: - put_page(page); - goto unlock; oom_free_page: put_page(page); oom: --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47016C433F5 for ; Fri, 28 Jan 2022 13:19:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348900AbiA1NTc (ORCPT ); Fri, 28 Jan 2022 08:19:32 -0500 Received: from server.lespinasse.org ([63.205.204.226]:48735 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348597AbiA1NTJ (ORCPT ); Fri, 28 Jan 2022 08:19:09 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=DmGgYXIxuiNl6v6cMRPQigMIgjtXtbGpaAKFRJYhv/I=; b=GDRiMZ9S0azV2Kn+iE+ZKlmXOxOZS3W4CUWO2Lkt9Jj7y0ZoRy+qf3ZSGPIvMindJ9Gyz LoMV0lL00tULmr4DQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=DmGgYXIxuiNl6v6cMRPQigMIgjtXtbGpaAKFRJYhv/I=; b=L6qKz8mphs/6Uaryogzhb4PKZ5qz9HjmJQd+AZeOj1NOq9q5tybwmX0BNHaFxsV5/DXz6 0MGNbCfUxasUG+9tMD+dPkuI8Jk+6YyvkqYg+bnKJ5g7ynn0d4IQz6bRWASZuLEvx5kNene fF6zuzmUfbqPSDrnVqLWp8KF6vK0r9yHlgH2Po92eGQPFK/ewtkz/3kXsu5goMvFClEV7Qy mDtoB3MYnR+xH7P1zYg8BQWDJ1flAwNafjNvoUx+q7k6oBniGmVuEcm2f98ca9PERRtme62 HgZYhvNQ9vue4+4h5skh1rMhkUq7ET6b0rZzg/Uur0ZxGrp9Rlz+1KSmgjdQ== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id E651116095D; Fri, 28 Jan 2022 05:10:06 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id D12602044E; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 06/35] mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT Date: Fri, 28 Jan 2022 05:09:37 -0800 Message-Id: <20220128131006.67712-7-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This configuration variable will be used to build the code needed to handle speculative page fault. This is enabled by default on supported architectures with SMP and MMU set. The architecture support is needed since the speculative page fault handler is called from the architecture's page faulting code, and some code has to be added there to try speculative fault handling first. Signed-off-by: Michel Lespinasse --- mm/Kconfig | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index 3326ee3903f3..d304fca0f293 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -894,4 +894,26 @@ config ANON_VMA_NAME =20 source "mm/damon/Kconfig" =20 +config ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT + def_bool n + +config SPECULATIVE_PAGE_FAULT + bool "Speculative page faults" + default y + depends on ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT && MMU && SMP + help + Try to handle user space page faults without holding the mmap lock. + + Instead of blocking writers through the use of mmap lock, + the page fault handler merely verifies, at the end of the page + fault, that no writers have been running concurrently with it. + + In high concurrency situations, the speculative fault handler + gains a throughput advantage by avoiding having to update the + mmap lock reader count. + + If the check fails due to a concurrent writer, or due to hitting + an unsupported case, the fault handler falls back to classical + processing using the mmap read lock. + endmenu --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 494A6C433EF for ; Fri, 28 Jan 2022 13:19:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348967AbiA1NTp (ORCPT ); Fri, 28 Jan 2022 08:19:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348654AbiA1NTJ (ORCPT ); Fri, 28 Jan 2022 08:19:09 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3613CC06174E for ; Fri, 28 Jan 2022 05:19:08 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=BFnS2HVemFqBghBQYQoit3Mzm0TjMbmSrEenm665k8w=; b=2dpLWkBE9v0lknlSIzYPOUv9ofd7VbjV/VSh/N5tA3XRDjPy8JwdzJcC1peuzvFj7CHoh 7yhOi/faRp4ngfHBQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=BFnS2HVemFqBghBQYQoit3Mzm0TjMbmSrEenm665k8w=; b=LXpqXihdo7IvGRJFBVCLL2z4CQzI6N+qpPgYIEda11ogFMEAadWWEJW4swjM25P9IqB20 h1Yo5Cl3w5IBtDesp8ESchFm1g+O9UEcQmSf+6PMvPuXxYmgpzkIwhwYea6CY33/lbWRHd6 3O0VrccO8zNPJBE9Sp1RqqbzTPRHXWCdSSh6bXYc/ZGFIRudFc9tRPqmkhL16of3LSIh8mb S9WPbCxJlnKfiiwiuA8d2lvr/FJUS/T6tmzY4SPCjVo0vOkuHL7zRyozEHq5jIFj7o4BdUY k6qm3DpgdiNfOmVAnYMtSTkMrzRc8asV9g3wYWhxGNfIzEOlBTtEBwtu2n1Q== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id EF82D160965; Fri, 28 Jan 2022 05:10:06 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id D3FD620459; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 07/35] x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Date: Fri, 28 Jan 2022 05:09:38 -0800 Message-Id: <20220128131006.67712-8-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT so that the speculative fault handling code can be compiled on this architecture. Signed-off-by: Michel Lespinasse --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ebe8fc76949a..378bc33bac54 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -34,6 +34,7 @@ config X86_64 select SWIOTLB select ARCH_HAS_ELFCORE_COMPAT select ZONE_DMA32 + select ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT =20 config FORCE_DYNAMIC_FTRACE def_bool y --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 408ECC433EF for ; Fri, 28 Jan 2022 13:19:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348827AbiA1NTa (ORCPT ); Fri, 28 Jan 2022 08:19:30 -0500 Received: from server.lespinasse.org ([63.205.204.226]:37829 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348517AbiA1NTJ (ORCPT ); Fri, 28 Jan 2022 08:19:09 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=PMJHeJsdyCtcxmhvT1hkJdcSx4UXS3jkGHIYYcslNmc=; b=DE7JPUDN+5KEWQTiOjalJIcy92ZNfeEc8SEWgHr0RVYH5eatrZM7nrSmUqyMCL/n2mmYI SK140SxEgrzfZLtAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375406; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=PMJHeJsdyCtcxmhvT1hkJdcSx4UXS3jkGHIYYcslNmc=; b=M5ntAzna6rTFurGazsaqcxoyi+L0bXjLH2rIvjqudwJrJwL9WMXjOY0I7laiC7fX5MTFl 4/JUz0rzpcvK2f1Qk9XffCcqKNRdjGE1yIZK80CA/13QQHFQst3yIdWO5xixiFhEuHaqC9C fNRUPAANLj5n6Jge03J2/6fCAr99RzpHiq/AgiQIpwTkLg3RXi8xes0CczjsIfwuGA0GC6f 8bQBaTbAZEhkvAUANfnaSMig99mgjzQCRYKJB9KxgAWRDKHbH7J7OtIugqfa2NaoNaCONmC HJwVnzfD9CFvkxrtcthuve1OoGhywt0zPAzhIIUs5sF4WNhbkHFuaxFr7gPw== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id EC614160964; Fri, 28 Jan 2022 05:10:06 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id D6C6320472; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 08/35] mm: add FAULT_FLAG_SPECULATIVE flag Date: Fri, 28 Jan 2022 05:09:39 -0800 Message-Id: <20220128131006.67712-9-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Define the new FAULT_FLAG_SPECULATIVE flag, which indicates when we are attempting speculative fault handling (without holding the mmap lock). Signed-off-by: Michel Lespinasse --- include/linux/mm.h | 3 ++- include/linux/mm_types.h | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e1a84b1e6787..7f7aa3f0a396 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -461,7 +461,8 @@ static inline bool fault_flag_allow_retry_first(enum fa= ult_flag flags) { FAULT_FLAG_USER, "USER" }, \ { FAULT_FLAG_REMOTE, "REMOTE" }, \ { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \ - { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" } + { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \ + { FAULT_FLAG_SPECULATIVE, "SPECULATIVE" } =20 /* * vm_fault is filled by the pagefault handler and passed to the vma's diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 9db36dc5d4cf..0ae3bf854aad 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -790,6 +790,7 @@ typedef struct { * @FAULT_FLAG_REMOTE: The fault is not for current task/mm. * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch. * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal si= gnals. + * @FAULT_FLAG_SPECULATIVE: The fault is handled without holding the mmap = lock. * * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify * whether we would allow page faults to retry by specifying these two @@ -821,6 +822,7 @@ enum fault_flag { FAULT_FLAG_REMOTE =3D 1 << 7, FAULT_FLAG_INSTRUCTION =3D 1 << 8, FAULT_FLAG_INTERRUPTIBLE =3D 1 << 9, + FAULT_FLAG_SPECULATIVE =3D 1 << 10, }; =20 #endif /* _LINUX_MM_TYPES_H */ --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50573C433F5 for ; Fri, 28 Jan 2022 13:19:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348888AbiA1NT1 (ORCPT ); Fri, 28 Jan 2022 08:19:27 -0500 Received: from server.lespinasse.org ([63.205.204.226]:49217 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244929AbiA1NTI (ORCPT ); Fri, 28 Jan 2022 08:19:08 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=Y8IXLxDDo2uMYY4ajZ7hmxbwiY+i3sSuKQUk/LNtie0=; b=JulGTkO46MGgZky++qtpQvRETNq3Kvq78SzsTBHZsrHaUN9jXkTfXps3VAXviSm18hICX LVSWndI82XWWqqdCA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=Y8IXLxDDo2uMYY4ajZ7hmxbwiY+i3sSuKQUk/LNtie0=; b=4enGUh1ZmMgQDoOfcwGDX6So/GAUJzChFZrj0x0JBbHZKScmZ+TBT55iyG5zBKYpXL28h NrBvAXE6QHVK2mUNGqxc36/lRElOKGSv9R9kgQ5OOVI+rFSImBPsqh9NywYl2FCI/ffiYGx UzIFaW8nKVjB3WZPa8XIhhvGglpAEqQDdRJB2ph2OPAbpF3/sudBMhAemmqgc9zLmKuNGpb kT3ltigDwFTH5EZ/6mnLDzWUzu72aqKHQ9rUQudTPDcl7vl31w9KO+cOET3hhaQJbzUnFNK KcmghUoMfRWQkZzgtyVaCqaRfI7XKb9rtQsyClQb6Vwf5vMkCbyCnefxf7fA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 0031F160966; Fri, 28 Jan 2022 05:10:06 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id D9AAA20132; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 09/35] mm: add do_handle_mm_fault() Date: Fri, 28 Jan 2022 05:09:40 -0800 Message-Id: <20220128131006.67712-10-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add a new do_handle_mm_fault function, which extends the existing handle_mm_fault() API by adding an mmap sequence count, to be used in the FAULT_FLAG_SPECULATIVE case. In the initial implementation, FAULT_FLAG_SPECULATIVE always fails (by returning VM_FAULT_RETRY). The existing handle_mm_fault() API is kept as a wrapper around do_handle_mm_fault() so that we do not have to immediately update every handle_mm_fault() call site. Signed-off-by: Michel Lespinasse --- include/linux/mm.h | 12 +++++++++--- mm/memory.c | 10 +++++++--- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 7f7aa3f0a396..4600dbb98cef 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1851,9 +1851,15 @@ int generic_error_remove_page(struct address_space *= mapping, struct page *page); int invalidate_inode_page(struct page *page); =20 #ifdef CONFIG_MMU -extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma, - unsigned long address, unsigned int flags, - struct pt_regs *regs); +extern vm_fault_t do_handle_mm_fault(struct vm_area_struct *vma, + unsigned long address, unsigned int flags, + unsigned long seq, struct pt_regs *regs); +static inline vm_fault_t handle_mm_fault(struct vm_area_struct *vma, + unsigned long address, unsigned int flags, + struct pt_regs *regs) +{ + return do_handle_mm_fault(vma, address, flags, 0, regs); +} extern int fixup_user_fault(struct mm_struct *mm, unsigned long address, unsigned int fault_flags, bool *unlocked); diff --git a/mm/memory.c b/mm/memory.c index f83e06b1dafb..aa24cd8c06e9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4761,11 +4761,15 @@ static inline void mm_account_fault(struct pt_regs = *regs, * The mmap_lock may have been released depending on flags and our * return value. See filemap_fault() and __folio_lock_or_retry(). */ -vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long addre= ss, - unsigned int flags, struct pt_regs *regs) +vm_fault_t do_handle_mm_fault(struct vm_area_struct *vma, + unsigned long address, unsigned int flags, + unsigned long seq, struct pt_regs *regs) { vm_fault_t ret; =20 + if (flags & FAULT_FLAG_SPECULATIVE) + return VM_FAULT_RETRY; + __set_current_state(TASK_RUNNING); =20 count_vm_event(PGFAULT); @@ -4807,7 +4811,7 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma= , unsigned long address, =20 return ret; } -EXPORT_SYMBOL_GPL(handle_mm_fault); +EXPORT_SYMBOL_GPL(do_handle_mm_fault); =20 #ifndef __PAGETABLE_P4D_FOLDED /* --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB87CC433FE for ; Fri, 28 Jan 2022 13:19:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348790AbiA1NTO (ORCPT ); Fri, 28 Jan 2022 08:19:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244893AbiA1NTI (ORCPT ); Fri, 28 Jan 2022 08:19:08 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB293C06173B for ; Fri, 28 Jan 2022 05:19:07 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=oZA2R37ZdQsPKRya4evD/1AxU5M2JserIU8FqrANBD0=; b=qW+sQ7HyABXKBsze8DrBwRseqXMce9RCfxlhrUvp9AMkFrpRRy4fHhhD3Kt+/3yVzvqTr yudQB2bVFoLPLJEBA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=oZA2R37ZdQsPKRya4evD/1AxU5M2JserIU8FqrANBD0=; b=CTWnbr/BOodTzck8x1znoxdaW3l5DVCyKQbkGJ8mGFqrhPK4thb9kYgNu/C+uBXfuICLr WB6qqJHNs96os/5WLhVkJGSRdtV8229d8QKw3uzPFMxnADFVZ1yziiZTrVi18UYqSryWkSa OV0ZaV7NTu7wdGC9yxpIx7lPT1rMwu6hc+cRdXSy9x9rSRuU4CAQh+oYgx4k3e7d3AoGtPU MgSSDjF6szOCkRHrmG69ho3BWQFZUDfi/TX4AUolM5NFpi5kPy00kdGGkMpj/1WfIJ9e7s4 EodZMIXxQC1pAukAcCBnXNhuzahxpjbkJbbWVbP85BvMtgdKmGCMyhKNmd5w== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 04071160968; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id DC56A2023B; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 10/35] mm: add per-mm mmap sequence counter for speculative page fault handling. Date: Fri, 28 Jan 2022 05:09:41 -0800 Message-Id: <20220128131006.67712-11-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The counter's write side is hooked into the existing mmap locking API: mmap_write_lock() increments the counter to the next (odd) value, and mmap_write_unlock() increments it again to the next (even) value. The counter's speculative read side is supposed to be used as follows: seq =3D mmap_seq_read_start(mm); if (seq & 1) goto fail; .... speculative handling here .... if (!mmap_seq_read_check(mm, seq) goto fail; This API guarantees that, if none of the "fail" tests abort speculative execution, the speculative code section did not run concurrently with any mmap writer. This is very similar to a seqlock, but both the writer and speculative readers are allowed to block. In the fail case, the speculative reader does not spin on the sequence counter; instead it should fall back to a different mechanism such as grabbing the mmap lock read side. Signed-off-by: Michel Lespinasse --- include/linux/mm_types.h | 4 +++ include/linux/mmap_lock.h | 58 +++++++++++++++++++++++++++++++++++++-- 2 files changed, 60 insertions(+), 2 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 0ae3bf854aad..e4965a6f34f2 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -523,6 +523,10 @@ struct mm_struct { * cacheline. */ struct rw_semaphore mmap_lock; +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + unsigned long mmap_seq; +#endif + =20 struct list_head mmlist; /* List of maybe swapped mm's. These * are globally strung together off diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 1b14468183d7..a2459eb15a33 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -8,8 +8,16 @@ #include #include =20 -#define MMAP_LOCK_INITIALIZER(name) \ - .mmap_lock =3D __RWSEM_INITIALIZER((name).mmap_lock), +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT +#define MMAP_LOCK_SEQ_INITIALIZER(name) \ + .mmap_seq =3D 0, +#else +#define MMAP_LOCK_SEQ_INITIALIZER(name) +#endif + +#define MMAP_LOCK_INITIALIZER(name) \ + .mmap_lock =3D __RWSEM_INITIALIZER((name).mmap_lock), \ + MMAP_LOCK_SEQ_INITIALIZER(name) =20 DECLARE_TRACEPOINT(mmap_lock_start_locking); DECLARE_TRACEPOINT(mmap_lock_acquire_returned); @@ -63,13 +71,52 @@ static inline void __mmap_lock_trace_released(struct mm= _struct *mm, bool write) static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + mm->mmap_seq =3D 0; +#endif } =20 +static inline void __mmap_seq_write_lock(struct mm_struct *mm) +{ +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + VM_BUG_ON_MM(mm->mmap_seq & 1, mm); + mm->mmap_seq++; + smp_wmb(); +#endif +} + +static inline void __mmap_seq_write_unlock(struct mm_struct *mm) +{ +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + smp_wmb(); + mm->mmap_seq++; + VM_BUG_ON_MM(mm->mmap_seq & 1, mm); +#endif +} + +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT +static inline unsigned long mmap_seq_read_start(struct mm_struct *mm) +{ + unsigned long seq; + + seq =3D READ_ONCE(mm->mmap_seq); + smp_rmb(); + return seq; +} + +static inline bool mmap_seq_read_check(struct mm_struct *mm, unsigned long= seq) +{ + smp_rmb(); + return seq =3D=3D READ_ONCE(mm->mmap_seq); +} +#endif + static inline void mmap_write_lock(struct mm_struct *mm) { __mmap_lock_trace_start_locking(mm, true); down_write(&mm->mmap_lock); __mmap_lock_trace_acquire_returned(mm, true, true); + __mmap_seq_write_lock(mm); } =20 static inline void mmap_write_lock_nested(struct mm_struct *mm, int subcla= ss) @@ -77,6 +124,7 @@ static inline void mmap_write_lock_nested(struct mm_stru= ct *mm, int subclass) __mmap_lock_trace_start_locking(mm, true); down_write_nested(&mm->mmap_lock, subclass); __mmap_lock_trace_acquire_returned(mm, true, true); + __mmap_seq_write_lock(mm); } =20 static inline int mmap_write_lock_killable(struct mm_struct *mm) @@ -86,6 +134,8 @@ static inline int mmap_write_lock_killable(struct mm_str= uct *mm) __mmap_lock_trace_start_locking(mm, true); error =3D down_write_killable(&mm->mmap_lock); __mmap_lock_trace_acquire_returned(mm, true, !error); + if (likely(!error)) + __mmap_seq_write_lock(mm); return error; } =20 @@ -96,18 +146,22 @@ static inline bool mmap_write_trylock(struct mm_struct= *mm) __mmap_lock_trace_start_locking(mm, true); ok =3D down_write_trylock(&mm->mmap_lock) !=3D 0; __mmap_lock_trace_acquire_returned(mm, true, ok); + if (likely(ok)) + __mmap_seq_write_lock(mm); return ok; } =20 static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true); + __mmap_seq_write_unlock(mm); up_write(&mm->mmap_lock); } =20 static inline void mmap_write_downgrade(struct mm_struct *mm) { __mmap_lock_trace_acquire_returned(mm, false, true); + __mmap_seq_write_unlock(mm); downgrade_write(&mm->mmap_lock); } =20 --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8538C433EF for ; Fri, 28 Jan 2022 13:19:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348820AbiA1NTS (ORCPT ); Fri, 28 Jan 2022 08:19:18 -0500 Received: from server.lespinasse.org ([63.205.204.226]:40973 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244456AbiA1NTI (ORCPT ); Fri, 28 Jan 2022 08:19:08 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=IpgfV+Eq7IPQY0KnkKUe0KkKJvFgOUOWn3NBKGqibKI=; b=IctmLJgF3nwhCdB6/nN2BKCbg/+ri5TtsREb7+zxwUF2hZZ/LIaVfPhxghmetQ7a/3M3q Q9VY54et8G/l7GVBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=IpgfV+Eq7IPQY0KnkKUe0KkKJvFgOUOWn3NBKGqibKI=; b=FUNzHOA9sCQEnD5fVX1ed8sbazmtJonEYYM3h4apZTLILl70sj90JbvB3WRW8tXEGINiv IzWe9oPGLIvbeoIBpr9CdyIGkNvw0FYKFNdqK61xuYDoKOkBCEVO0ytOejTTj5Eu2J2FeQV SzNS0w/kdVod39/vbZHvGCL1X6e9d3nbT8J96MVYI96Lr29e+Mg2IDmWtjnSTVWX883uarj n5hllQbi6wTh0FF/uhaA2q+Ct6pIwn/7yizD86pU4Px2VTdvxBK3QaA4JKP5yjI3PylzCqJ gMcCeqKorC8BG/Wp7pQbWsbL/vTP+8SdVKrYWbX00ASCHnKCyqKiI+vLP7XA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 02248160967; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id DF25420473; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 11/35] mm: rcu safe vma freeing Date: Fri, 28 Jan 2022 05:09:42 -0800 Message-Id: <20220128131006.67712-12-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This prepares for speculative page faults looking up and copying vmas under protection of an rcu read lock, instead of the usual mmap read lock. Note - it might also be feasible to just use SLAB_TYPESAFE_BY_RCU when creating the vm_area_cachep, but that's probably too subtle to consider her= e. Signed-off-by: Michel Lespinasse --- include/linux/mm_types.h | 16 +++++++++++----- kernel/fork.c | 13 +++++++++++++ 2 files changed, 24 insertions(+), 5 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index e4965a6f34f2..b6678578a729 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -374,12 +374,18 @@ struct anon_vma_name { struct vm_area_struct { /* The first cache line has the info for VMA tree walking. */ =20 - unsigned long vm_start; /* Our start address within vm_mm. */ - unsigned long vm_end; /* The first byte after our end address - within vm_mm. */ + union { + struct { + /* VMA covers [vm_start; vm_end) addresses within mm */ + unsigned long vm_start, vm_end; =20 - /* linked list of VM areas per task, sorted by address */ - struct vm_area_struct *vm_next, *vm_prev; + /* linked list of VMAs per task, sorted by address */ + struct vm_area_struct *vm_next, *vm_prev; + }; +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + struct rcu_head vm_rcu; /* Used for deferred freeing. */ +#endif + }; =20 struct rb_node vm_rb; =20 diff --git a/kernel/fork.c b/kernel/fork.c index d75a528f7b21..2e5f2e8de31a 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -371,10 +371,23 @@ struct vm_area_struct *vm_area_dup(struct vm_area_str= uct *orig) return new; } =20 +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT +static void __vm_area_free(struct rcu_head *head) +{ + struct vm_area_struct *vma =3D container_of(head, struct vm_area_struct, + vm_rcu); + kmem_cache_free(vm_area_cachep, vma); +} +#endif + void vm_area_free(struct vm_area_struct *vma) { free_vma_anon_name(vma); +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + call_rcu(&vma->vm_rcu, __vm_area_free); +#else kmem_cache_free(vm_area_cachep, vma); +#endif } =20 static void account_kernel_stack(struct task_struct *tsk, int account) --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE192C433F5 for ; Fri, 28 Jan 2022 13:19:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348771AbiA1NTJ (ORCPT ); Fri, 28 Jan 2022 08:19:09 -0500 Received: from server.lespinasse.org ([63.205.204.226]:47913 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244259AbiA1NTI (ORCPT ); Fri, 28 Jan 2022 08:19:08 -0500 X-Greylist: delayed 540 seconds by postgrey-1.27 at vger.kernel.org; Fri, 28 Jan 2022 08:19:08 EST DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=hCAjH/3pJlWdfRDX/fRs7omFq2p1vEsDo9ifolTeLag=; b=HD0KmvGGgLZfvgB0K6C7PZ2rjWPZ8D6sLYkWyssN+ArRq1kjZzSZeXmySfGfqBn5e2UgD l7WYntcAlUF3hoDCg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=hCAjH/3pJlWdfRDX/fRs7omFq2p1vEsDo9ifolTeLag=; b=6rMpX8XrXHJaYfiULXAeV9/onNQ2iau6A3/0eyy0Ajx9IBierM3eTmEDSZnryrm8u8vOK zRgYb2OlZ0xmYkay8H2Xw2P+q2ZqxT2iT6wKbvP0K486GTNCDKkrTJygUZM364JXX6Vjs2q Sj5/WUtqMpz1J4KFYhCAaIV/fkttQSKOWpRLZOnJ79QT2k6Ku2COSOGbnNa3iiyrwgXzQsV kv9V8h3LE4zesq4+lw/VV+nCZWv0cnxN6kivDSKZXxm/93k+qVR/ZM7uBvrRtO6q9l4xqRw M1zrnzijpM7EP0TFMtLPZo8xECv1M1Odjsbe1Q5Tlo60BCX6L/rSKHLUwDGg== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 08D59160969; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id E1C6320476; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 12/35] mm: separate mmap locked assertion from find_vma Date: Fri, 28 Jan 2022 05:09:43 -0800 Message-Id: <20220128131006.67712-13-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This adds a new __find_vma() function, which implements find_vma minus the mmap_assert_locked() assertion. find_vma() is then implemented as an inline wrapper around __find_vma(). Signed-off-by: Michel Lespinasse Reported-by: kernel test robot --- drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++-- include/linux/mm.h | 9 ++++++++- mm/mmap.c | 5 ++--- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i= 915_gpu_error.c index 5ae812d60abe..94ab71a9b493 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -515,7 +515,7 @@ static void error_print_context(struct drm_i915_error_s= tate_buf *m, } =20 static struct i915_vma_coredump * -__find_vma(struct i915_vma_coredump *vma, const char *name) +__i915_find_vma(struct i915_vma_coredump *vma, const char *name) { while (vma) { if (strcmp(vma->name, name) =3D=3D 0) @@ -529,7 +529,7 @@ __find_vma(struct i915_vma_coredump *vma, const char *n= ame) static struct i915_vma_coredump * find_batch(const struct intel_engine_coredump *ee) { - return __find_vma(ee->vma, "batch"); + return __i915_find_vma(ee->vma, "batch"); } =20 static void error_print_engine(struct drm_i915_error_state_buf *m, diff --git a/include/linux/mm.h b/include/linux/mm.h index 4600dbb98cef..6f7712179503 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2751,10 +2751,17 @@ extern int expand_upwards(struct vm_area_struct *vm= a, unsigned long address); #endif =20 /* Look up the first VMA which satisfies addr < vm_end, NULL if none. */ -extern struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned lo= ng addr); +extern struct vm_area_struct * __find_vma(struct mm_struct * mm, unsigned = long addr); extern struct vm_area_struct * find_vma_prev(struct mm_struct * mm, unsign= ed long addr, struct vm_area_struct **pprev); =20 +static inline +struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr) +{ + mmap_assert_locked(mm); + return __find_vma(mm, addr); +} + /** * find_vma_intersection() - Look up the first VMA which intersects the in= terval * @mm: The process address space. diff --git a/mm/mmap.c b/mm/mmap.c index 1e8fdb0b51ed..b09a2c875507 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2276,12 +2276,11 @@ get_unmapped_area(struct file *file, unsigned long = addr, unsigned long len, EXPORT_SYMBOL(get_unmapped_area); =20 /* Look up the first VMA which satisfies addr < vm_end, NULL if none. */ -struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr) +struct vm_area_struct *__find_vma(struct mm_struct *mm, unsigned long addr) { struct rb_node *rb_node; struct vm_area_struct *vma; =20 - mmap_assert_locked(mm); /* Check the cache first. */ vma =3D vmacache_find(mm, addr); if (likely(vma)) @@ -2308,7 +2307,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm,= unsigned long addr) return vma; } =20 -EXPORT_SYMBOL(find_vma); +EXPORT_SYMBOL(__find_vma); =20 /* * Same as find_vma, but also return a pointer to the previous VMA in *ppr= ev. --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B9ACC433FE for ; Fri, 28 Jan 2022 13:19:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348812AbiA1NT3 (ORCPT ); Fri, 28 Jan 2022 08:19:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343921AbiA1NTJ (ORCPT ); Fri, 28 Jan 2022 08:19:09 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09E76C061748 for ; Fri, 28 Jan 2022 05:19:08 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=wJYZVoGBeqX86Eqhj6uJBnmDNtLP3Y+lAn8JeAXF+pA=; b=N1Op9TLaSFCt4mNfV9WN12B/y/XV9KdPR31gHO7zQOJxTkmYVquAISEr99VplqYXcbsxo a8jPDgMsCbvLN+FDA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=wJYZVoGBeqX86Eqhj6uJBnmDNtLP3Y+lAn8JeAXF+pA=; b=a/HWmRNlZfInCeg5CFkoGnLKyBm9LZzL0mkQXB4G3DE8RCm2Hrw9gJPQGMI4I82hcZQmj m7Y2BwukOgFpswfKym8uyfr9W+CJw9hM6S1FLUqCy7DZnG9n3VA0GrkKMp0KKC4z5GOrCYx kO5b0o/d7sxVVtRTIWZxj77H2C0BMj2unRyWmSjdh6Sq9APLe9lRz00uaqD4WNSw/ElPdmk jXMtQ/moy5slLpZ3Asq43eoE2u1DH0iggYWx7iF7QOZrpjM+/96NGmJLj22KGqn4C+rRFjm TsQpRRTwCZw/1sQxPQ2iBz2SPCpg+e936J2SgmYY7SBssJL4z46io1zscSgQ== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 09E85160976; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id E469420477; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 13/35] x86/mm: attempt speculative mm faults first Date: Fri, 28 Jan 2022 05:09:44 -0800 Message-Id: <20220128131006.67712-14-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Attempt speculative mm fault handling first, and fall back to the existing (non-speculative) code if that fails. The speculative handling closely mirrors the non-speculative logic. This includes some x86 specific bits such as the access_error() call. This is why we chose to implement the speculative handling in arch/x86 rather than in common code. The vma is first looked up and copied, under protection of the rcu read lock. The mmap lock sequence count is used to verify the integrity of the copied vma, and passed to do_handle_mm_fault() to allow checking against races with mmap writers when finalizing the fault. Signed-off-by: Michel Lespinasse --- arch/x86/mm/fault.c | 44 +++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 5 ++++ include/linux/vm_event_item.h | 4 ++++ mm/vmstat.c | 4 ++++ 4 files changed, 57 insertions(+) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index d0074c6ed31a..99b0a358154e 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1226,6 +1226,10 @@ void do_user_addr_fault(struct pt_regs *regs, struct mm_struct *mm; vm_fault_t fault; unsigned int flags =3D FAULT_FLAG_DEFAULT; +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + struct vm_area_struct pvma; + unsigned long seq; +#endif =20 tsk =3D current; mm =3D tsk->mm; @@ -1323,6 +1327,43 @@ void do_user_addr_fault(struct pt_regs *regs, } #endif =20 +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + count_vm_event(SPF_ATTEMPT); + seq =3D mmap_seq_read_start(mm); + if (seq & 1) + goto spf_abort; + rcu_read_lock(); + vma =3D __find_vma(mm, address); + if (!vma || vma->vm_start > address) { + rcu_read_unlock(); + goto spf_abort; + } + pvma =3D *vma; + rcu_read_unlock(); + if (!mmap_seq_read_check(mm, seq)) + goto spf_abort; + vma =3D &pvma; + if (unlikely(access_error(error_code, vma))) + goto spf_abort; + fault =3D do_handle_mm_fault(vma, address, + flags | FAULT_FLAG_SPECULATIVE, seq, regs); + + if (!(fault & VM_FAULT_RETRY)) + goto done; + + /* Quick path to respond to signals */ + if (fault_signal_pending(fault, regs)) { + if (!user_mode(regs)) + kernelmode_fixup_or_oops(regs, error_code, address, + SIGBUS, BUS_ADRERR, + ARCH_DEFAULT_PKEY); + return; + } + +spf_abort: + count_vm_event(SPF_ABORT); +#endif + /* * Kernel-mode access to the user address space should only occur * on well-defined single instructions listed in the exception @@ -1419,6 +1460,9 @@ void do_user_addr_fault(struct pt_regs *regs, } =20 mmap_read_unlock(mm); +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT +done: +#endif if (likely(!(fault & VM_FAULT_ERROR))) return; =20 diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index b6678578a729..305f05d2a4bc 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -370,6 +370,11 @@ struct anon_vma_name { * per VM-area/task. A VM area is any part of the process virtual memory * space that has a special rule for the page-fault handlers (ie a shared * library, the executable area etc). + * + * Note that speculative page faults make an on-stack copy of the VMA, + * so the structure size matters. + * (TODO - it would be preferable to copy only the required vma attributes + * rather than the entire vma). */ struct vm_area_struct { /* The first cache line has the info for VMA tree walking. */ diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 7b2363388bfa..f00b3e36ff39 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -133,6 +133,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, DIRECT_MAP_LEVEL3_SPLIT, +#endif +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + SPF_ATTEMPT, + SPF_ABORT, #endif NR_VM_EVENT_ITEMS }; diff --git a/mm/vmstat.c b/mm/vmstat.c index 4057372745d0..dbb0160e5558 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1390,6 +1390,10 @@ const char * const vmstat_text[] =3D { "direct_map_level2_splits", "direct_map_level3_splits", #endif +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + "spf_attempt", + "spf_abort", +#endif #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ }; #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */ --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 717F9C433F5 for ; Fri, 28 Jan 2022 13:19:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348803AbiA1NTP (ORCPT ); Fri, 28 Jan 2022 08:19:15 -0500 Received: from server.lespinasse.org ([63.205.204.226]:58853 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244878AbiA1NTI (ORCPT ); Fri, 28 Jan 2022 08:19:08 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=2AfpzyewhmOlo0fUb0SXI6JxtNUdQC+AYomvKlpL5Ds=; b=Ib2Yh44H5XDzNjpRtIRI02jAIe/KU1DMungZIVzI88t8j0wqwdIWvvyZGtOBMiXiwJPBl 1pvNr0iYL9/q4z0Ag== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=2AfpzyewhmOlo0fUb0SXI6JxtNUdQC+AYomvKlpL5Ds=; b=5vh96h1Q6g4N+Rf+N+L3AUPDrGbYhjVO2jJ3zEgFGhjHvXxcXZ2UelKDr0orcD9V/RPx0 p5Q0nD4RoGulcGvYr4T2j2KS6Y8Nmo1yyVjTSxyPV5ukSEa6Z0WqgfRx/fGV5YLjT2YHiGY MYn4cHvOeaN43jwZLJOzr1/VUkCiNfYsbcIkX2U4U2RnXIiR+uiILVQ5NlydE7BZN1lKDiu Vk1qSP3Q4CcqDst4aKcWBqHyIecdT9aM67g6clkLZv7FImiBwcJSEwCiAr2We5zJHeLqlY3 s/efKuCQOuQ9YI1KuFZR0xC9y+bkr2mnw2hIltaEe6+eCUsrWrmb9JXQqRbg== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 0D973160977; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id E6DC020328; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 14/35] mm: add speculative_page_walk_begin() and speculative_page_walk_end() Date: Fri, 28 Jan 2022 05:09:45 -0800 Message-Id: <20220128131006.67712-15-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Speculative page faults will use these to protect against races with page table reclamation. This could always be handled by disabling local IRQs as the fast GUP code does; however speculative page faults do not need to protect against races with THP page splitting, so a weaker rcu read lock is sufficient in the MMU_GATHER_RCU_TABLE_FREE case. Signed-off-by: Michel Lespinasse --- mm/memory.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index aa24cd8c06e9..663952d14bad 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2725,6 +2725,28 @@ int apply_to_existing_page_range(struct mm_struct *m= m, unsigned long addr, } EXPORT_SYMBOL_GPL(apply_to_existing_page_range); =20 +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + +/* + * speculative_page_walk_begin() ... speculative_page_walk_end() protects + * against races with page table reclamation. + * + * This is similar to what fast GUP does, but fast GUP also needs to + * protect against races with THP page splitting, so it always needs + * to disable interrupts. + * Speculative page faults only need to protect against page table reclama= tion, + * so rcu_read_lock() is sufficient in the MMU_GATHER_RCU_TABLE_FREE case. + */ +#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE +#define speculative_page_walk_begin() rcu_read_lock() +#define speculative_page_walk_end() rcu_read_unlock() +#else +#define speculative_page_walk_begin() local_irq_disable() +#define speculative_page_walk_end() local_irq_enable() +#endif + +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ + /* * handle_pte_fault chooses page fault handler according to an entry which= was * read non-atomically. Before making any commitment, on those architectu= res --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7537BC433F5 for ; Fri, 28 Jan 2022 13:19:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348788AbiA1NTM (ORCPT ); Fri, 28 Jan 2022 08:19:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54980 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244752AbiA1NTI (ORCPT ); Fri, 28 Jan 2022 08:19:08 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB370C061747 for ; Fri, 28 Jan 2022 05:19:07 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=N1S1NesGU6Y8f8q9IeGf50c8McugHmVCalVHP9Er6AE=; b=G3mnG6iArJ3iUlnv5jmCwGBZSi4ZmZNvZum1Io0mtaiwsYz/3xaJREVHKG9TcE9hCnOxk eVwcUhjl24gy00cDA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=N1S1NesGU6Y8f8q9IeGf50c8McugHmVCalVHP9Er6AE=; b=NdoeJxId3UHckIXu3hAFoxtvDqiwDvgJJSGGpoR05QXlMP89KsXW0W+ABD8b/gYA0BhuF Mz98ld9u1J576DPaXDTCzvCNgYfR16gNyveBozyLnxl41FS5UKO/RMnGzdj/+9sMR94QUI/ /M5pshUxt3Oy4+uAL7EUNI195zLs+k05K4+MxeagX7bMTPd41C9gmqMf300MOIpeyNtBJai 3Ll/zt6OghuoLMkBvzux0MFzo8De6dNhImoV4VxrB+Zc1ap1bi4fpFoqsn6xXahNmU4U9fb vCcVuMqB4jx8MrwgeiRxeLbkVXdEzv1b7akE/LEXnrI1ONHHlFu9tlWe7kRQ== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 0F8FE160985; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id E9CD020330; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 15/35] mm: refactor __handle_mm_fault() / handle_pte_fault() Date: Fri, 28 Jan 2022 05:09:46 -0800 Message-Id: <20220128131006.67712-16-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Move the code that initializes vmf->pte and vmf->orig_pte from handle_pte_fault() to its single call site in __handle_mm_fault(). This ensures vmf->pte is now initialized together with the higher levels of the page table hierarchy. This also prepares for speculative page fault handling, where the entire page table walk (higher levels down to ptes) needs special care in the speculative case. Signed-off-by: Michel Lespinasse --- mm/memory.c | 98 ++++++++++++++++++++++++++--------------------------- 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 663952d14bad..37a4b92bd4bf 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3769,7 +3769,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) if (pte_alloc(vma->vm_mm, vmf->pmd)) return VM_FAULT_OOM; =20 - /* See comment in handle_pte_fault() */ + /* See comment in __handle_mm_fault() */ if (unlikely(pmd_trans_unstable(vmf->pmd))) return 0; =20 @@ -4062,7 +4062,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return VM_FAULT_OOM; } =20 - /* See comment in handle_pte_fault() */ + /* See comment in __handle_mm_fault() */ if (pmd_devmap_trans_unstable(vmf->pmd)) return 0; =20 @@ -4527,53 +4527,6 @@ static vm_fault_t handle_pte_fault(struct vm_fault *= vmf) { pte_t entry; =20 - if (unlikely(pmd_none(*vmf->pmd))) { - /* - * Leave __pte_alloc() until later: because vm_ops->fault may - * want to allocate huge page, and if we expose page table - * for an instant, it will be difficult to retract from - * concurrent faults and from rmap lookups. - */ - vmf->pte =3D NULL; - } else { - /* - * If a huge pmd materialized under us just retry later. Use - * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead - * of pmd_trans_huge() to ensure the pmd didn't become - * pmd_trans_huge under us and then back to pmd_none, as a - * result of MADV_DONTNEED running immediately after a huge pmd - * fault in a different thread of this mm, in turn leading to a - * misleading pmd_trans_huge() retval. All we have to ensure is - * that it is a regular pmd that we can walk with - * pte_offset_map() and we can do that through an atomic read - * in C, which is what pmd_trans_unstable() provides. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return 0; - /* - * A regular pmd is established and it can't morph into a huge - * pmd from under us anymore at this point because we hold the - * mmap_lock read mode and khugepaged takes it in write mode. - * So now it's safe to run pte_offset_map(). - */ - vmf->pte =3D pte_offset_map(vmf->pmd, vmf->address); - vmf->orig_pte =3D *vmf->pte; - - /* - * some architectures can have larger ptes than wordsize, - * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=3Dy and - * CONFIG_32BIT=3Dy, so READ_ONCE cannot guarantee atomic - * accesses. The code below just needs a consistent view - * for the ifs and we later double check anyway with the - * ptl lock held. So here a barrier will do. - */ - barrier(); - if (pte_none(vmf->orig_pte)) { - pte_unmap(vmf->pte); - vmf->pte =3D NULL; - } - } - if (!vmf->pte) { if (vma_is_anonymous(vmf->vma)) return do_anonymous_page(vmf); @@ -4713,6 +4666,53 @@ static vm_fault_t __handle_mm_fault(struct vm_area_s= truct *vma, } } =20 + if (unlikely(pmd_none(*vmf.pmd))) { + /* + * Leave __pte_alloc() until later: because vm_ops->fault may + * want to allocate huge page, and if we expose page table + * for an instant, it will be difficult to retract from + * concurrent faults and from rmap lookups. + */ + vmf.pte =3D NULL; + } else { + /* + * If a huge pmd materialized under us just retry later. Use + * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead + * of pmd_trans_huge() to ensure the pmd didn't become + * pmd_trans_huge under us and then back to pmd_none, as a + * result of MADV_DONTNEED running immediately after a huge pmd + * fault in a different thread of this mm, in turn leading to a + * misleading pmd_trans_huge() retval. All we have to ensure is + * that it is a regular pmd that we can walk with + * pte_offset_map() and we can do that through an atomic read + * in C, which is what pmd_trans_unstable() provides. + */ + if (pmd_devmap_trans_unstable(vmf.pmd)) + return 0; + /* + * A regular pmd is established and it can't morph into a huge + * pmd from under us anymore at this point because we hold the + * mmap_lock read mode and khugepaged takes it in write mode. + * So now it's safe to run pte_offset_map(). + */ + vmf.pte =3D pte_offset_map(vmf.pmd, vmf.address); + vmf.orig_pte =3D *vmf.pte; + + /* + * some architectures can have larger ptes than wordsize, + * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=3Dy and + * CONFIG_32BIT=3Dy, so READ_ONCE cannot guarantee atomic + * accesses. The code below just needs a consistent view + * for the ifs and we later double check anyway with the + * ptl lock held. So here a barrier will do. + */ + barrier(); + if (pte_none(vmf.orig_pte)) { + pte_unmap(vmf.pte); + vmf.pte =3D NULL; + } + } + return handle_pte_fault(&vmf); } =20 --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB4F8C433F5 for ; Fri, 28 Jan 2022 13:19:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349020AbiA1NTw (ORCPT ); Fri, 28 Jan 2022 08:19:52 -0500 Received: from server.lespinasse.org ([63.205.204.226]:48179 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348649AbiA1NTJ (ORCPT ); Fri, 28 Jan 2022 08:19:09 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=RYRG+RuiDLezST/iL383W/LGMqkq0+0jM8HYaFAclmo=; b=BuXQmxj+3qKNjh5SRNWcL7LLU2E6uJfm8wDD99iZdIdaSW/+achmhWRJVgL+JEBSZlC3j OnYzFu0fiC/xuMuCw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=RYRG+RuiDLezST/iL383W/LGMqkq0+0jM8HYaFAclmo=; b=qovUC5kJOR5SWthTEplixMntG75b2ioozUwSL1SQG2/I0QJfwaCZaveagDAT3SvSW0OSo 6X0lpayd1t0B+kOvU58pqxi9AQbq1ypQ+MZGGH5jHAQ4/gHtj2w/h3oJKA6hx9FfutTN6iN ElE7OO3KTJ2Pt7uN9B7GQWgevhaeA7H6hD5CK/Du826mUrGPxsz8Y4Gdy/k4jllcQptecR0 MHxHVeMRIICOAatbkGkWnCUhM2foKHhMqtb6BvJyajBZ2un2PPReaDo5NWn04xCJ3YS6POX bZFxXLLebTvnUTbza7nsPMPSu2xdDGldD3WgCUazwyK3bAS3tC1QN90SxKbg== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 14FC516098E; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id ECAE620478; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 16/35] mm: implement speculative handling in __handle_mm_fault(). Date: Fri, 28 Jan 2022 05:09:47 -0800 Message-Id: <20220128131006.67712-17-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The speculative path calls speculative_page_walk_begin() before walking the page table tree to prevent page table reclamation. The logic is otherwise similar to the non-speculative path, but with additional restrictions: in the speculative path, we do not handle huge pages or wiring new pages tables. Signed-off-by: Michel Lespinasse --- include/linux/mm.h | 6 ++++ mm/memory.c | 77 ++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 81 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 6f7712179503..2e2122bd3da3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -483,6 +483,10 @@ struct vm_fault { }; enum fault_flag flags; /* FAULT_FLAG_xxx flags * XXX: should really be 'const' */ +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + unsigned long seq; + pmd_t orig_pmd; +#endif pmd_t *pmd; /* Pointer to pmd entry matching * the 'address' */ pud_t *pud; /* Pointer to pud entry matching @@ -490,9 +494,11 @@ struct vm_fault { */ union { pte_t orig_pte; /* Value of PTE at the time of fault */ +#ifndef CONFIG_SPECULATIVE_PAGE_FAULT pmd_t orig_pmd; /* Value of PMD at the time of fault, * used by PMD fault only. */ +#endif }; =20 struct page *cow_page; /* Page handler may use for COW fault */ diff --git a/mm/memory.c b/mm/memory.c index 37a4b92bd4bf..d0db10bd5bee 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4581,7 +4581,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *v= mf) * return value. See filemap_fault() and __folio_lock_or_retry(). */ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, - unsigned long address, unsigned int flags) + unsigned long address, unsigned int flags, unsigned long seq) { struct vm_fault vmf =3D { .vma =3D vma, @@ -4596,6 +4596,79 @@ static vm_fault_t __handle_mm_fault(struct vm_area_s= truct *vma, p4d_t *p4d; vm_fault_t ret; =20 +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + if (flags & FAULT_FLAG_SPECULATIVE) { + pgd_t pgdval; + p4d_t p4dval; + pud_t pudval; + + vmf.seq =3D seq; + + speculative_page_walk_begin(); + pgd =3D pgd_offset(mm, address); + pgdval =3D READ_ONCE(*pgd); + if (pgd_none(pgdval) || unlikely(pgd_bad(pgdval))) + goto spf_fail; + + p4d =3D p4d_offset(pgd, address); + p4dval =3D READ_ONCE(*p4d); + if (p4d_none(p4dval) || unlikely(p4d_bad(p4dval))) + goto spf_fail; + + vmf.pud =3D pud_offset(p4d, address); + pudval =3D READ_ONCE(*vmf.pud); + if (pud_none(pudval) || unlikely(pud_bad(pudval)) || + unlikely(pud_trans_huge(pudval)) || + unlikely(pud_devmap(pudval))) + goto spf_fail; + + vmf.pmd =3D pmd_offset(vmf.pud, address); + vmf.orig_pmd =3D READ_ONCE(*vmf.pmd); + + /* + * pmd_none could mean that a hugepage collapse is in + * progress in our back as collapse_huge_page() mark + * it before invalidating the pte (which is done once + * the IPI is catched by all CPU and we have interrupt + * disabled). For this reason we cannot handle THP in + * a speculative way since we can't safely identify an + * in progress collapse operation done in our back on + * that PMD. + */ + if (unlikely(pmd_none(vmf.orig_pmd) || + is_swap_pmd(vmf.orig_pmd) || + pmd_trans_huge(vmf.orig_pmd) || + pmd_devmap(vmf.orig_pmd))) + goto spf_fail; + + /* + * The above does not allocate/instantiate page-tables because + * doing so would lead to the possibility of instantiating + * page-tables after free_pgtables() -- and consequently + * leaking them. + * + * The result is that we take at least one non-speculative + * fault per PMD in order to instantiate it. + */ + + vmf.pte =3D pte_offset_map(vmf.pmd, address); + vmf.orig_pte =3D READ_ONCE(*vmf.pte); + barrier(); + if (pte_none(vmf.orig_pte)) { + pte_unmap(vmf.pte); + vmf.pte =3D NULL; + } + + speculative_page_walk_end(); + + return handle_pte_fault(&vmf); + + spf_fail: + speculative_page_walk_end(); + return VM_FAULT_RETRY; + } +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ + pgd =3D pgd_offset(mm, address); p4d =3D p4d_alloc(mm, pgd, address); if (!p4d) @@ -4815,7 +4888,7 @@ vm_fault_t do_handle_mm_fault(struct vm_area_struct *= vma, if (unlikely(is_vm_hugetlb_page(vma))) ret =3D hugetlb_fault(vma->vm_mm, vma, address, flags); else - ret =3D __handle_mm_fault(vma, address, flags); + ret =3D __handle_mm_fault(vma, address, flags, seq); =20 if (flags & FAULT_FLAG_USER) { mem_cgroup_exit_user_fault(); --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDA4EC433EF for ; Fri, 28 Jan 2022 13:19:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348995AbiA1NTs (ORCPT ); Fri, 28 Jan 2022 08:19:48 -0500 Received: from server.lespinasse.org ([63.205.204.226]:55603 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348647AbiA1NTJ (ORCPT ); Fri, 28 Jan 2022 08:19:09 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=JSt5tP5xkVG5wLZyDHEk5pGq0dzCPYUPeDFK2yvWOFo=; b=UAVmTeYX7BoelafF+11e5Yij26EpOqoi2+SKLGoVB+sWhqpaTodl8r7ba2TXA4DlhBMj6 Gv3hKJu1eHEVn2zAQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=JSt5tP5xkVG5wLZyDHEk5pGq0dzCPYUPeDFK2yvWOFo=; b=njsY8m4dJHMheGaNgReEskIosMqjhGUGKujjCERT5W6pORlzUbLcOBHgYDo85CwJTmyIv JrgspqaoGOZsXaacBVouzEF3HBr/vKRB7ltnhGzOhj2d7mrIUemc1oY1nIIXGg0Oljc7kYH l8EcRTacsvBuvhEIBsV86TNIm/A01F/vl4mFzqFCcJxefIRWt+5fcLYqNDKj4ijSi9mRFi8 +zn2VwH2wPleE3koirkvwut300CYH8N39cnnNWoEEgXpbkaRgIGJ0zqWvMr7Y9BPsYRzytO n1DDFM/2eTFeTp+lmWOSUICk1L67MCPLTHNx2d8wgj1Lr21vEB67fF3KRNSQ== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 1331C16098D; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id EF6CC2044B; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 17/35] mm: add pte_map_lock() and pte_spinlock() Date: Fri, 28 Jan 2022 05:09:48 -0800 Message-Id: <20220128131006.67712-18-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" pte_map_lock() and pte_spinlock() are used by fault handlers to ensure the pte is mapped and locked before they commit the faulted page to the mm's address space at the end of the fault. The functions differ in their preconditions; pte_map_lock() expects the pte to be unmapped prior to the call, while pte_spinlock() expects it to be already mapped. In the speculative fault case, the functions verify, after locking the pte, that the mmap sequence count has not changed since the start of the fault, and thus that no mmap lock writers have been running concurrently with the fault. After that point the page table lock serializes any further races with concurrent mmap lock writers. If the mmap sequence count check fails, both functions will return false with the pte being left unmapped and unlocked. Signed-off-by: Michel Lespinasse --- include/linux/mm.h | 38 ++++++++++++++++++++++++++ mm/memory.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2e2122bd3da3..7f1083fb94e0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3394,5 +3394,43 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned= long start, } #endif =20 +#ifdef CONFIG_MMU +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + +bool __pte_map_lock(struct vm_fault *vmf); + +static inline bool pte_map_lock(struct vm_fault *vmf) +{ + VM_BUG_ON(vmf->pte); + return __pte_map_lock(vmf); +} + +static inline bool pte_spinlock(struct vm_fault *vmf) +{ + VM_BUG_ON(!vmf->pte); + return __pte_map_lock(vmf); +} + +#else /* !CONFIG_SPECULATIVE_PAGE_FAULT */ + +#define pte_map_lock(__vmf) \ +({ \ + struct vm_fault *vmf =3D __vmf; \ + vmf->pte =3D pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, \ + vmf->address, &vmf->ptl); \ + true; \ +}) + +#define pte_spinlock(__vmf) \ +({ \ + struct vm_fault *vmf =3D __vmf; \ + vmf->ptl =3D pte_lockptr(vmf->vma->vm_mm, vmf->pmd); \ + spin_lock(vmf->ptl); \ + true; \ +}) + +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ +#endif /* CONFIG_MMU */ + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff --git a/mm/memory.c b/mm/memory.c index d0db10bd5bee..1ce837e47395 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2745,6 +2745,72 @@ EXPORT_SYMBOL_GPL(apply_to_existing_page_range); #define speculative_page_walk_end() local_irq_enable() #endif =20 +bool __pte_map_lock(struct vm_fault *vmf) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + pmd_t pmdval; +#endif + pte_t *pte =3D vmf->pte; + spinlock_t *ptl; + + if (!(vmf->flags & FAULT_FLAG_SPECULATIVE)) { + vmf->ptl =3D pte_lockptr(vmf->vma->vm_mm, vmf->pmd); + if (!pte) + vmf->pte =3D pte_offset_map(vmf->pmd, vmf->address); + spin_lock(vmf->ptl); + return true; + } + + speculative_page_walk_begin(); + if (!mmap_seq_read_check(vmf->vma->vm_mm, vmf->seq)) + goto fail; + /* + * The mmap sequence count check guarantees that the page + * tables are still valid at that point, and + * speculative_page_walk_begin() ensures that they stay around. + */ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* + * We check if the pmd value is still the same to ensure that there + * is not a huge collapse operation in progress in our back. + */ + pmdval =3D READ_ONCE(*vmf->pmd); + if (!pmd_same(pmdval, vmf->orig_pmd)) + goto fail; +#endif + ptl =3D pte_lockptr(vmf->vma->vm_mm, vmf->pmd); + if (!pte) + pte =3D pte_offset_map(vmf->pmd, vmf->address); + /* + * Try locking the page table. + * + * Note that we might race against zap_pte_range() which + * invalidates TLBs while holding the page table lock. + * We are still under the speculative_page_walk_begin() section, + * and zap_pte_range() could thus deadlock with us if we tried + * using spin_lock() here. + * + * We also don't want to retry until spin_trylock() succeeds, + * because of the starvation potential against a stream of lockers. + */ + if (unlikely(!spin_trylock(ptl))) + goto fail; + if (!mmap_seq_read_check(vmf->vma->vm_mm, vmf->seq)) + goto unlock_fail; + speculative_page_walk_end(); + vmf->pte =3D pte; + vmf->ptl =3D ptl; + return true; + +unlock_fail: + spin_unlock(ptl); +fail: + if (pte) + pte_unmap(pte); + speculative_page_walk_end(); + return false; +} + #endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ =20 /* --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59C1BC433EF for ; Fri, 28 Jan 2022 13:19:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348909AbiA1NTl (ORCPT ); Fri, 28 Jan 2022 08:19:41 -0500 Received: from server.lespinasse.org ([63.205.204.226]:35959 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348643AbiA1NTJ (ORCPT ); Fri, 28 Jan 2022 08:19:09 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=UxJsNKcxVkmfEwrc0wLrYxktFQMDzJ2NvtD6CGXLcSg=; b=6r7aPZUUazHr0Yx+Qnh+976CFWlOQ4YNyXZG/9wDEuqKvF1jaH590LzBlXz8Je2wT6aJb sogFe4e9NylpbJwBQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=UxJsNKcxVkmfEwrc0wLrYxktFQMDzJ2NvtD6CGXLcSg=; b=cTAUtcMfHA1B3iEc5GrsP2U2HjGn6DIPIpn7HZ2gtLjdNDl8krMjya4tVaK+5OTEA+wWr Xnj0JsNTJP/v2UE6Qsd7hSwMV2FYZb/nMEA+ldwp10eTwW0rADGPdxnBB0m+O1JNdfdTyqv DWC7tylQgCiEGBa3oQfzqNx34zEWcKxrdjdG5t1zaryzw8AnVDwsekcgtaT7VAtf2Kd5SWG 5j8zgxXTRAe0xi5ToCDyGNZTmpWWZqs9+6c9rc7+Iq/vJ0rEbK2KxJ4ohHyWwm+TXbktBQe jtI88AyNoZA1+73B7ej+1TJ+MKPzv+tBflBZwOqGNbR2DaZWuOFVvuP5R8TA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 1758B16099D; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id F241320481; Fri, 28 Jan 2022 05:10:06 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 18/35] mm: implement speculative handling in do_anonymous_page() Date: Fri, 28 Jan 2022 05:09:49 -0800 Message-Id: <20220128131006.67712-19-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Change do_anonymous_page() to handle the speculative case. This involves aborting speculative faults if they have to allocate a new anon_vma, and using pte_map_lock() instead of pte_offset_map_lock() to complete the page fault. Signed-off-by: Michel Lespinasse Reported-by: kernel test robot Reported-by: kernel test robot --- mm/memory.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 1ce837e47395..8d036140634d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3846,8 +3846,12 @@ static vm_fault_t do_anonymous_page(struct vm_fault = *vmf) vma->vm_page_prot)); } else { /* Allocate our own private page. */ - if (unlikely(anon_vma_prepare(vma))) - goto oom; + if (unlikely(!vma->anon_vma)) { + if (vmf->flags & FAULT_FLAG_SPECULATIVE) + return VM_FAULT_RETRY; + if (__anon_vma_prepare(vma)) + goto oom; + } page =3D alloc_zeroed_user_highpage_movable(vma, vmf->address); if (!page) goto oom; @@ -3869,8 +3873,10 @@ static vm_fault_t do_anonymous_page(struct vm_fault = *vmf) entry =3D pte_mkwrite(pte_mkdirty(entry)); } =20 - vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, - &vmf->ptl); + if (!pte_map_lock(vmf)) { + ret =3D VM_FAULT_RETRY; + goto release; + } if (!pte_none(*vmf->pte)) { update_mmu_tlb(vma, vmf->address, vmf->pte); goto unlock; @@ -3885,6 +3891,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); if (page) put_page(page); + if (vmf->flags & FAULT_FLAG_SPECULATIVE) + return VM_FAULT_RETRY; return handle_userfault(vmf, VM_UFFD_MISSING); } =20 @@ -3902,6 +3910,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) return 0; unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); +release: if (page) put_page(page); return ret; --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11D8FC433EF for ; Fri, 28 Jan 2022 13:20:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348517AbiA1NUR (ORCPT ); Fri, 28 Jan 2022 08:20:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348693AbiA1NTK (ORCPT ); Fri, 28 Jan 2022 08:19:10 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B5E5C061751 for ; Fri, 28 Jan 2022 05:19:08 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=s8IABzRzP3ySJUm3FAWGVRKOhNvfxqtE36yoL1LaJ64=; b=5YyiNFsYpzIQby2cs2RNO6bfzcHVJCc1OHBveg/UD65LrGPfoSpZ46OpZtGa2uzUYJVj/ 0CwrThBTUJzA/lPCg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=s8IABzRzP3ySJUm3FAWGVRKOhNvfxqtE36yoL1LaJ64=; b=RF2Qg4JiQv/B2GTmiJJJgrrzl0zjMUIsaOF/jROil5LtN55Hp11eOn04jBQmSOL/orKqR NcRlQHquDKLgJDhikMo0h99LT9sdFuGHryzYZBQV3gj29cL4fKU1FoleONoLfphsNbzACNp +k8D1ucVhjB6SzuVMWlasEAGRwzN5/c7KWRSEbagrLzMxcnrW4utpUEfsgs/27LchjqNmCH uQx9u9MHz2A/KB0Zi5EcmkAJJpctcV8quh6MV402l66mgoAP+hsmfuQihBBbN4MbgSGT2tN IyWX9f8J7LN5Lfo5PIc24TCTKikYUogJPRb0nG7pdVGXMUoZwg/WyIS7ZBbQ== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 1D48E160AA0; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 00D4520557; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 19/35] mm: enable speculative fault handling through do_anonymous_page() Date: Fri, 28 Jan 2022 05:09:50 -0800 Message-Id: <20220128131006.67712-20-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" in x86 fault handler, only attempt spf if the vma is anonymous. In do_handle_mm_fault(), let speculative page faults proceed as long as they fall into anonymous vmas. This enables the speculative handling code in __handle_mm_fault() and do_anonymous_page(). In handle_pte_fault(), if vmf->pte is set (the original pte was not pte_none), catch speculative faults and return VM_FAULT_RETRY as those cases are not implemented yet. Also assert that do_fault() is not reached in the speculative case. Signed-off-by: Michel Lespinasse --- arch/x86/mm/fault.c | 2 +- mm/memory.c | 16 ++++++++++++---- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 99b0a358154e..6ba109413396 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1334,7 +1334,7 @@ void do_user_addr_fault(struct pt_regs *regs, goto spf_abort; rcu_read_lock(); vma =3D __find_vma(mm, address); - if (!vma || vma->vm_start > address) { + if (!vma || vma->vm_start > address || !vma_is_anonymous(vma)) { rcu_read_unlock(); goto spf_abort; } diff --git a/mm/memory.c b/mm/memory.c index 8d036140634d..74b51aae8166 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4365,6 +4365,8 @@ static vm_fault_t do_fault(struct vm_fault *vmf) struct mm_struct *vm_mm =3D vma->vm_mm; vm_fault_t ret; =20 + VM_BUG_ON(vmf->flags & FAULT_FLAG_SPECULATIVE); + /* * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ @@ -4609,6 +4611,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *= vmf) return do_fault(vmf); } =20 + if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + pte_unmap(vmf->pte); + return VM_FAULT_RETRY; + } + if (!pte_present(vmf->orig_pte)) return do_swap_page(vmf); =20 @@ -4937,8 +4944,7 @@ vm_fault_t do_handle_mm_fault(struct vm_area_struct *= vma, { vm_fault_t ret; =20 - if (flags & FAULT_FLAG_SPECULATIVE) - return VM_FAULT_RETRY; + VM_BUG_ON((flags & FAULT_FLAG_SPECULATIVE) && !vma_is_anonymous(vma)); =20 __set_current_state(TASK_RUNNING); =20 @@ -4960,10 +4966,12 @@ vm_fault_t do_handle_mm_fault(struct vm_area_struct= *vma, if (flags & FAULT_FLAG_USER) mem_cgroup_enter_user_fault(); =20 - if (unlikely(is_vm_hugetlb_page(vma))) + if (unlikely(is_vm_hugetlb_page(vma))) { + VM_BUG_ON(flags & FAULT_FLAG_SPECULATIVE); ret =3D hugetlb_fault(vma->vm_mm, vma, address, flags); - else + } else { ret =3D __handle_mm_fault(vma, address, flags, seq); + } =20 if (flags & FAULT_FLAG_USER) { mem_cgroup_exit_user_fault(); --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42C2BC433EF for ; Fri, 28 Jan 2022 13:20:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348920AbiA1NUV (ORCPT ); Fri, 28 Jan 2022 08:20:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348701AbiA1NTK (ORCPT ); Fri, 28 Jan 2022 08:19:10 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAAF0C061714 for ; Fri, 28 Jan 2022 05:19:08 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=7+5QvEdtmG46cyTSwpuhmgwjYp8hFfGi6s7BBfWPw2Y=; b=2+PTKlEkpcO5i4QYKxeQTRm7LeyQtVi478aVZzl/7sCj4kBJ4ua+ErhWugoAJkfOJZCFG bATPcA9HkHi9zlcBA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=7+5QvEdtmG46cyTSwpuhmgwjYp8hFfGi6s7BBfWPw2Y=; b=2mC/3CoATcjgomsxJlVsgkFCMEqHUGQDhcHQzse5CKmFMppDw6BuyRCMAXfHsJWlmr1jb +yDu9wTtJz1tYoxVd/WlKzGyMyh5PtbTcfg43ZBsi8TTCDZp5QPasrmq25NozO0Dt4rgkeI 5NfMx8pKTn3F9DvOH0XNL1whff2V4qvD7IZIaQ3PbrQnmat49eTt/6k7mLTkggOuyk9zMaQ YL0DOmYJ9skwukaR9KJm45ufu0yHmc3vDiSZbycXC6lpdBhpWoNPwXUZJIziqhEFuYQfJma t1erBL6fEU16lbtA3njilP/fzWhi/SBHvC+C3aOWVHo1QTWAxvGzKsIzGV1w== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 1A68F16099F; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 04C7A20337; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 20/35] mm: implement speculative handling in do_numa_page() Date: Fri, 28 Jan 2022 05:09:51 -0800 Message-Id: <20220128131006.67712-21-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" change do_numa_page() to use pte_spinlock() when locking the page table, so that the mmap sequence counter will be validated in the speculative case. Signed-off-by: Michel Lespinasse --- mm/memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 74b51aae8166..083e015ff194 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4441,8 +4441,8 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) * validation through pte_unmap_same(). It's of NUMA type but * the pfn may be screwed if the read is non atomic. */ - vmf->ptl =3D pte_lockptr(vma->vm_mm, vmf->pmd); - spin_lock(vmf->ptl); + if (!pte_spinlock(vmf)) + return VM_FAULT_RETRY; if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { pte_unmap_unlock(vmf->pte, vmf->ptl); goto out; --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB51FC433EF for ; Fri, 28 Jan 2022 13:20:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349199AbiA1NUs (ORCPT ); Fri, 28 Jan 2022 08:20:48 -0500 Received: from server.lespinasse.org ([63.205.204.226]:35317 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348751AbiA1NTL (ORCPT ); Fri, 28 Jan 2022 08:19:11 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=uHoqHiEitsG2adH6xd+AzaN4440pSdu1fLLTZyBjHdc=; b=1eE7s25AKDC9KWLUsK1P+gDDXQS3CKuxa6HMqYPfvCfGKLZhIojDGscSOm/Ufv9mPLMnG XKbeaxCLmPX/TWaAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=uHoqHiEitsG2adH6xd+AzaN4440pSdu1fLLTZyBjHdc=; b=djUTh9Y+zDGOrV/rzW37emAErhohtTwhjN/ABtXVYGKLWtLhSR5MR8Sq5CIrnvNlX6iSC PW8YmSUJzY/Qx/HxEnw3rrkJamXcI0gSEsJJzN2rAMjvaHi/dsFb0kWQNFN+NKtX+jl7WLq x2ftK834MzhH3bnYqmlymAIGNsG3yyDZf3rDELQGZ12Eo5i2WP+HyI9hWLRiWIw2vJlbDY/ IdBO/neq4UrCqmgYFRCG4lGSEK5TesRe+kWc/XrLlQRPU48c8a6kFaA2G/LswmRFkHszUzt PZgM/bnPyhFRiSsUyqeCMP3RLoLdB59R8VcKi5feKkRFSzc6MhQxJhC5FzXA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 21498160AA1; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 065B82044E; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 21/35] mm: enable speculative fault handling in do_numa_page() Date: Fri, 28 Jan 2022 05:09:52 -0800 Message-Id: <20220128131006.67712-22-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Change handle_pte_fault() to allow speculative fault execution to proceed through do_numa_page(). do_swap_page() does not implement speculative execution yet, so it needs to abort with VM_FAULT_RETRY in that case. Signed-off-by: Michel Lespinasse --- mm/memory.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 083e015ff194..73b1a328b797 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3589,6 +3589,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vm_fault_t ret =3D 0; void *shadow =3D NULL; =20 + if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + pte_unmap(vmf->pte); + return VM_FAULT_RETRY; + } + if (!pte_unmap_same(vmf)) goto out; =20 @@ -4611,17 +4616,17 @@ static vm_fault_t handle_pte_fault(struct vm_fault = *vmf) return do_fault(vmf); } =20 - if (vmf->flags & FAULT_FLAG_SPECULATIVE) { - pte_unmap(vmf->pte); - return VM_FAULT_RETRY; - } - if (!pte_present(vmf->orig_pte)) return do_swap_page(vmf); =20 if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) return do_numa_page(vmf); =20 + if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + pte_unmap(vmf->pte); + return VM_FAULT_RETRY; + } + vmf->ptl =3D pte_lockptr(vmf->vma->vm_mm, vmf->pmd); spin_lock(vmf->ptl); entry =3D vmf->orig_pte; --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB8A1C433FE for ; Fri, 28 Jan 2022 13:20:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348806AbiA1NUp (ORCPT ); Fri, 28 Jan 2022 08:20:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348763AbiA1NTL (ORCPT ); Fri, 28 Jan 2022 08:19:11 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AA1CC061757 for ; Fri, 28 Jan 2022 05:19:09 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=OOok3oAXLMDgG2pHbvztom5rL8XJmkTzwxFo3hW65dY=; b=apOsMK7sYOiHe+UmWef7PYiCm3i1aOipU8SikDOdqQIVQ92th1dViE6N0eom1nHudhL5n 4/szWqoaviTOG4LCg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=OOok3oAXLMDgG2pHbvztom5rL8XJmkTzwxFo3hW65dY=; b=7jHZVpmUHaV2hihoS/xDYZ7lIHZFiNosHjwrjMvHdF49E3nYRWUd6B0f1WZbzTOTJ3DbR KdoA+Tk76uCvQdhq/sopUjEvWhuCJjuvJQc8/G1yZ/OyLY6M4cq0fNS3bDS0MU2I06fz25j YiRmitgizvCq5ZyOFmdCiJVrVCui+9lSU2DRXUyV0dSwntrbLtIIY9VqGev1BV8uIRUsKsJ PXqDwUkAU6QwETRspdM3sJvxgXFDgnHHdQOdh4r2QQb2pZGwRMwbdT0yCFjPQ6WywdVwJoL kwdzpN8FW1tj5a9YAAtuAb61eSHPIKy+MDAGv2lLbceDf265pbntCbCh2Hrw== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 25423160AAA; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 093D420472; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 22/35] percpu-rwsem: enable percpu_sem destruction in atomic context Date: Fri, 28 Jan 2022 05:09:53 -0800 Message-Id: <20220128131006.67712-23-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Suren Baghdasaryan Calling percpu_free_rwsem in atomic context results in "scheduling while atomic" bug being triggered: BUG: scheduling while atomic: klogd/158/0x00000002 ... __schedule_bug+0x191/0x290 schedule_debug+0x97/0x180 __schedule+0xdc/0xba0 schedule+0xda/0x250 schedule_timeout+0x92/0x2d0 __wait_for_common+0x25b/0x430 wait_for_completion+0x1f/0x30 rcu_barrier+0x440/0x4f0 rcu_sync_dtor+0xaa/0x190 percpu_free_rwsem+0x41/0x80 Introduce percpu_rwsem_destroy function to perform semaphore destruction in a worker thread. Signed-off-by: Suren Baghdasaryan Signed-off-by: Michel Lespinasse --- include/linux/percpu-rwsem.h | 13 ++++++++++++- kernel/locking/percpu-rwsem.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+), 1 deletion(-) diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h index 5fda40f97fe9..bf1668fc9c5e 100644 --- a/include/linux/percpu-rwsem.h +++ b/include/linux/percpu-rwsem.h @@ -13,7 +13,14 @@ struct percpu_rw_semaphore { struct rcu_sync rss; unsigned int __percpu *read_count; struct rcuwait writer; - wait_queue_head_t waiters; + /* + * destroy_list_entry is used during object destruction when waiters + * can't be used, therefore reusing the same space. + */ + union { + wait_queue_head_t waiters; + struct list_head destroy_list_entry; + }; atomic_t block; #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; @@ -127,8 +134,12 @@ extern void percpu_up_write(struct percpu_rw_semaphore= *); extern int __percpu_init_rwsem(struct percpu_rw_semaphore *, const char *, struct lock_class_key *); =20 +/* Can't be called in atomic context. */ extern void percpu_free_rwsem(struct percpu_rw_semaphore *); =20 +/* Invokes percpu_free_rwsem and frees the semaphore from a worker thread.= */ +extern void percpu_rwsem_async_destroy(struct percpu_rw_semaphore *sem); + #define percpu_init_rwsem(sem) \ ({ \ static struct lock_class_key rwsem_key; \ diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c index 70a32a576f3f..a3d37bf83c60 100644 --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -7,6 +7,7 @@ #include #include #include +#include #include =20 int __percpu_init_rwsem(struct percpu_rw_semaphore *sem, @@ -268,3 +269,34 @@ void percpu_up_write(struct percpu_rw_semaphore *sem) rcu_sync_exit(&sem->rss); } EXPORT_SYMBOL_GPL(percpu_up_write); + +static LIST_HEAD(destroy_list); +static DEFINE_SPINLOCK(destroy_list_lock); + +static void destroy_list_workfn(struct work_struct *work) +{ + struct percpu_rw_semaphore *sem, *sem2; + LIST_HEAD(to_destroy); + + spin_lock(&destroy_list_lock); + list_splice_init(&destroy_list, &to_destroy); + spin_unlock(&destroy_list_lock); + + if (list_empty(&to_destroy)) + return; + + list_for_each_entry_safe(sem, sem2, &to_destroy, destroy_list_entry) { + percpu_free_rwsem(sem); + kfree(sem); + } +} + +static DECLARE_WORK(destroy_list_work, destroy_list_workfn); + +void percpu_rwsem_async_destroy(struct percpu_rw_semaphore *sem) +{ + spin_lock(&destroy_list_lock); + list_add_tail(&sem->destroy_list_entry, &destroy_list); + spin_unlock(&destroy_list_lock); + schedule_work(&destroy_list_work); +} --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81BF6C433EF for ; Fri, 28 Jan 2022 13:20:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349174AbiA1NUY (ORCPT ); Fri, 28 Jan 2022 08:20:24 -0500 Received: from server.lespinasse.org ([63.205.204.226]:58853 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348747AbiA1NTL (ORCPT ); Fri, 28 Jan 2022 08:19:11 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=P67mhsIt1ppjFTFW+x1YN2vEk/hxfE1OzpykzhPxla8=; b=j05HraC270uf9eSPytFr4+XRdIB2UMC4nP+hXv9KOHmP5Tpxd1I3xZVKhGwxrIWQxFito 6SX6oF6SPaYp8DyAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=P67mhsIt1ppjFTFW+x1YN2vEk/hxfE1OzpykzhPxla8=; b=ATrqRJnge0cYDvYRn6k7PBbijeM0h2dmb6XlkF6kgZBYAzZTNWQhpOEyYdfd3ig3EAxlm XRJY7kymtFXBbDuHn20R/ysQ3y+T/UjXQH9ehvC7rWhtJ6mKKjSO4iDbz2j+PZjNMZ5oONj rt8Y0w7HuZxI6oRK0/a1DdPwp8U6Wpk2z6peNkcODaj8xWGYHJnozGnTJmWHzkNbzc9DTQB FGnIuAqR5K+Q++6P4Uz9di1TOldV0BQCB1qqF4PEYmbt1WY5n7VTjqrQr0i+VvEIhcG4heR 9ZzYLWjDurPxXV+wh3Zz/ADRzNspdvk3dD/t6Wz24QmuwistOpeXw6u6QHuA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 29F9D160AAC; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 0BE3F2055C; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 23/35] mm: add mmu_notifier_lock Date: Fri, 28 Jan 2022 05:09:54 -0800 Message-Id: <20220128131006.67712-24-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Introduce mmu_notifier_lock as a per-mm percpu_rw_semaphore, as well as the code to initialize and destroy it together with the mm. This lock will be used to prevent races between mmu_notifier_register() and speculative fault handlers that need to fire MMU notifications without holding any of the mmap or rmap locks. Signed-off-by: Michel Lespinasse --- include/linux/mm_types.h | 6 +++++- include/linux/mmu_notifier.h | 27 +++++++++++++++++++++++++-- kernel/fork.c | 3 ++- 3 files changed, 32 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 305f05d2a4bc..f77e2dec038d 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -462,6 +462,7 @@ struct vm_area_struct { } __randomize_layout; =20 struct kioctx_table; +struct percpu_rw_semaphore; struct mm_struct { struct { struct vm_area_struct *mmap; /* list of VMAs */ @@ -608,7 +609,10 @@ struct mm_struct { struct file __rcu *exe_file; #ifdef CONFIG_MMU_NOTIFIER struct mmu_notifier_subscriptions *notifier_subscriptions; -#endif +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + struct percpu_rw_semaphore *mmu_notifier_lock; +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ +#endif /* CONFIG_MMU_NOTIFIER */ #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS pgtable_t pmd_huge_pte; /* protected by page_table_lock */ #endif diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 45fc2c81e370..ace76fe91c0c 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -6,6 +6,8 @@ #include #include #include +#include +#include #include #include =20 @@ -499,15 +501,35 @@ static inline void mmu_notifier_invalidate_range(stru= ct mm_struct *mm, __mmu_notifier_invalidate_range(mm, start, end); } =20 -static inline void mmu_notifier_subscriptions_init(struct mm_struct *mm) +static inline bool mmu_notifier_subscriptions_init(struct mm_struct *mm) { +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + mm->mmu_notifier_lock =3D kzalloc(sizeof(struct percpu_rw_semaphore), GFP= _KERNEL); + if (!mm->mmu_notifier_lock) + return false; + if (percpu_init_rwsem(mm->mmu_notifier_lock)) { + kfree(mm->mmu_notifier_lock); + return false; + } +#endif + mm->notifier_subscriptions =3D NULL; + return true; } =20 static inline void mmu_notifier_subscriptions_destroy(struct mm_struct *mm) { if (mm_has_notifiers(mm)) __mmu_notifier_subscriptions_destroy(mm); + +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + if (!in_atomic()) { + percpu_free_rwsem(mm->mmu_notifier_lock); + kfree(mm->mmu_notifier_lock); + } else { + percpu_rwsem_async_destroy(mm->mmu_notifier_lock); + } +#endif } =20 =20 @@ -724,8 +746,9 @@ static inline void mmu_notifier_invalidate_range(struct= mm_struct *mm, { } =20 -static inline void mmu_notifier_subscriptions_init(struct mm_struct *mm) +static inline bool mmu_notifier_subscriptions_init(struct mm_struct *mm) { + return true; } =20 static inline void mmu_notifier_subscriptions_destroy(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index 2e5f2e8de31a..db92e42d0087 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1069,7 +1069,8 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, mm_init_owner(mm, p); mm_init_pasid(mm); RCU_INIT_POINTER(mm->exe_file, NULL); - mmu_notifier_subscriptions_init(mm); + if (!mmu_notifier_subscriptions_init(mm)) + goto fail_nopgd; init_tlb_flush_pending(mm); #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS mm->pmd_huge_pte =3D NULL; --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D2F9C433F5 for ; Fri, 28 Jan 2022 13:20:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349096AbiA1NUv (ORCPT ); Fri, 28 Jan 2022 08:20:51 -0500 Received: from server.lespinasse.org ([63.205.204.226]:45397 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348748AbiA1NTL (ORCPT ); Fri, 28 Jan 2022 08:19:11 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=brYV8LPA4M9MTtgPlp3UWTnacdv4IgtI3z+1Ewrumi8=; b=I1DZbEmzQI4BcXWoERJUPqVuo+xoDE/u5gIe/livDsttlyUfT+MZsR0eXQlVWW88IXojZ 0jaTameOtDQD5k0Cw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=brYV8LPA4M9MTtgPlp3UWTnacdv4IgtI3z+1Ewrumi8=; b=BnwIWxwakg1LSUdfMDZOXM07NSkkaRmm7gkd+wArC4muOcJ34MoFFiQ8zPm2Gv9Fvzeb6 Uvld4y0CFz/xgMOWzzPdrQS8SkI/FRPZS+avWFDq9lAW4LHsL1zFSMRRoDILiO22d27+8l4 h4wYnNj/8p/IH1qHJ2oEOMNuJpsAFjSLzHMu2eeXkTUpGHu3yjRJnPqUtFs0XGwkiuMcG/H voXrhVBC9PZEQuTti4ut+rreXnIDxCC/XYgPL3+TiU757qOBPUm39Ch+wXPw/DG0yIxAOTy 35hiWPkK/9hDr8NZ6eENLlEC6XtGLCsYgX477+xRrPBYMdb2h5hkkByDflOA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 29450160AAB; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 0E8D620132; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 24/35] mm: write lock mmu_notifier_lock when registering mmu notifiers Date: Fri, 28 Jan 2022 05:09:55 -0800 Message-Id: <20220128131006.67712-25-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Change mm_take_all_locks to also take the mmu_notifier_lock. Note that mm_take_all_locks is called from mmu_notifier_register() only. Signed-off-by: Michel Lespinasse --- mm/mmap.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index b09a2c875507..a67c3600d995 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3592,6 +3592,10 @@ int mm_take_all_locks(struct mm_struct *mm) =20 mutex_lock(&mm_all_locks_mutex); =20 +#if defined(CONFIG_MMU_NOTIFIER) && defined(CONFIG_SPECULATIVE_PAGE_FAULT) + percpu_down_write(mm->mmu_notifier_lock); +#endif + for (vma =3D mm->mmap; vma; vma =3D vma->vm_next) { if (signal_pending(current)) goto out_unlock; @@ -3679,6 +3683,10 @@ void mm_drop_all_locks(struct mm_struct *mm) vm_unlock_mapping(vma->vm_file->f_mapping); } =20 +#if defined(CONFIG_MMU_NOTIFIER) && defined(CONFIG_SPECULATIVE_PAGE_FAULT) + percpu_up_write(mm->mmu_notifier_lock); +#endif + mutex_unlock(&mm_all_locks_mutex); } =20 --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A4F6C433F5 for ; Fri, 28 Jan 2022 13:20:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349149AbiA1NUT (ORCPT ); Fri, 28 Jan 2022 08:20:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348727AbiA1NTK (ORCPT ); Fri, 28 Jan 2022 08:19:10 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BAFB8C061755 for ; Fri, 28 Jan 2022 05:19:08 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=RZET6KbJQW7VW1QW9ZpHlj8GvOwKrY4N8XGYpxkjsRk=; b=lg7mwIi1UUybUJbUCaCkKGLFeyjCzcgmEUUUtMqA2m435D54fPR9BuFNzrtW1IzZHziPP IZmxP4OfpkTNZIZBw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=RZET6KbJQW7VW1QW9ZpHlj8GvOwKrY4N8XGYpxkjsRk=; b=I5CUBm8DMAq9QhQZuDTlz8HPE2GIJe8Pl3qJLv9+aAuCuvlUoLHPmQcQy4o+btcgoFyJW 3Y7vobYfMEpLBYKSi+cUmMCTZmiDcDl9/0RST+kJVyF9iQstS//R4caHv/PgITt2zBu5Yqr Gbf03b8Kb9YrDIUKGUSbIFHu7CjsWATo32dn+aB5Kn4rvTcVkILtwbJNea7ub2Yc6ff0esG y0U2jbBv910miMShGu4zYhfJNj6ssvqdDUaNOZfYyHezMQlfu9JyzcjMlEooD06liuQ1w+G iP51OphoqkfoVswBHn3l8LMgiQgbkAt1EBXgCT9AJwbLwtb4u87lUpO7PAIw== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 34E4D160AAF; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 116CF20561; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 25/35] mm: add mmu_notifier_trylock() and mmu_notifier_unlock() Date: Fri, 28 Jan 2022 05:09:56 -0800 Message-Id: <20220128131006.67712-26-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" These new functions are to be used when firing MMU notifications without holding any of the mmap or rmap locks, as is the case with speculative page fault handlers. Signed-off-by: Michel Lespinasse --- include/linux/mmu_notifier.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index ace76fe91c0c..d0430410fdd8 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -772,4 +772,29 @@ static inline void mmu_notifier_synchronize(void) =20 #endif /* CONFIG_MMU_NOTIFIER */ =20 +#if defined(CONFIG_MMU_NOTIFIER) && defined(CONFIG_SPECULATIVE_PAGE_FAULT) + +static inline bool mmu_notifier_trylock(struct mm_struct *mm) +{ + return percpu_down_read_trylock(mm->mmu_notifier_lock); +} + +static inline void mmu_notifier_unlock(struct mm_struct *mm) +{ + percpu_up_read(mm->mmu_notifier_lock); +} + +#else + +static inline bool mmu_notifier_trylock(struct mm_struct *mm) +{ + return true; +} + +static inline void mmu_notifier_unlock(struct mm_struct *mm) +{ +} + +#endif + #endif /* _LINUX_MMU_NOTIFIER_H */ --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FF30C433F5 for ; Fri, 28 Jan 2022 13:20:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348928AbiA1NUC (ORCPT ); Fri, 28 Jan 2022 08:20:02 -0500 Received: from server.lespinasse.org ([63.205.204.226]:50107 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348678AbiA1NTK (ORCPT ); Fri, 28 Jan 2022 08:19:10 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=7ZwO3JSFTMu+I+SD7pVtiWOQ8TjLyD2OehGAIC7WXjM=; b=kXZaaQKnfFfdx1WcuV/gsR3sFHyjkkWzgVlVIxl61xzaSmlhPKCw8k9+0tXt42thV1hgh 76J7O3GfBptE7fSCg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=7ZwO3JSFTMu+I+SD7pVtiWOQ8TjLyD2OehGAIC7WXjM=; b=N+8pg6EwjJQ3+6Y7V4x16KZEvrtmwYS+Kkac+I8wPHurF9yISlUf9mrEqxGCFMpE8iKWo af//9oNNzfjUKvmlV90xDVOpnS9w6rIknF2VlARdpSg5gTndvdvI3korqttpbVfl6IUxQae O3VrDU2+4MMBPDgX1SU44DD1De1BaiokD4lkS+M6QKgl6CR1s/U5L7KYI8dBXx6sq7vxC4q 3pMrI0M/abW1VuAe5ZzAMd9o1YzyS9vkrsGGZhxx0gK2cmOrIUpP3cW3a+eNfZ9eN9Q9WZO hd1vFAxrPwIytcuycY+dTWvMIXOnOummJyesV0+o3FKbRWSh/AzaaJFY6OGw== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 34F95160AC3; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 1422920F30; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 26/35] mm: implement speculative handling in wp_page_copy() Date: Fri, 28 Jan 2022 05:09:57 -0800 Message-Id: <20220128131006.67712-27-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Change wp_page_copy() to handle the speculative case. This involves aborting speculative faults if they have to allocate an anon_vma, read-locking the mmu_notifier_lock to avoid races with mmu_notifier_register(), and using pte_map_lock() instead of pte_offset_map_lock() to complete the page fault. Also change call sites to clear vmf->pte after unmapping the page table, in order to satisfy pte_map_lock()'s preconditions. Signed-off-by: Michel Lespinasse --- mm/memory.c | 42 +++++++++++++++++++++++++++++++++--------- 1 file changed, 33 insertions(+), 9 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 73b1a328b797..fd8984d89109 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3087,20 +3087,27 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) pte_t entry; int page_copied =3D 0; struct mmu_notifier_range range; + vm_fault_t ret =3D VM_FAULT_OOM; =20 - if (unlikely(anon_vma_prepare(vma))) - goto oom; + if (unlikely(!vma->anon_vma)) { + if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + ret =3D VM_FAULT_RETRY; + goto out; + } + if (__anon_vma_prepare(vma)) + goto out; + } =20 if (is_zero_pfn(pte_pfn(vmf->orig_pte))) { new_page =3D alloc_zeroed_user_highpage_movable(vma, vmf->address); if (!new_page) - goto oom; + goto out; } else { new_page =3D alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address); if (!new_page) - goto oom; + goto out; =20 if (!cow_user_page(new_page, old_page, vmf)) { /* @@ -3117,11 +3124,16 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) } =20 if (mem_cgroup_charge(page_folio(new_page), mm, GFP_KERNEL)) - goto oom_free_new; + goto out_free_new; cgroup_throttle_swaprate(new_page, GFP_KERNEL); =20 __SetPageUptodate(new_page); =20 + if ((vmf->flags & FAULT_FLAG_SPECULATIVE) && + !mmu_notifier_trylock(mm)) { + ret =3D VM_FAULT_RETRY; + goto out_free_new; + } mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, vmf->address & PAGE_MASK, (vmf->address & PAGE_MASK) + PAGE_SIZE); @@ -3130,7 +3142,11 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) /* * Re-check the pte - we dropped the lock */ - vmf->pte =3D pte_offset_map_lock(mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!pte_map_lock(vmf)) { + ret =3D VM_FAULT_RETRY; + /* put_page() will uncharge the page */ + goto out_notify; + } if (likely(pte_same(*vmf->pte, vmf->orig_pte))) { if (old_page) { if (!PageAnon(old_page)) { @@ -3205,6 +3221,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * the above ptep_clear_flush_notify() did already call it. */ mmu_notifier_invalidate_range_only_end(&range); + if (vmf->flags & FAULT_FLAG_SPECULATIVE) + mmu_notifier_unlock(mm); if (old_page) { /* * Don't let another task, with possibly unlocked vma, @@ -3221,12 +3239,16 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) put_page(old_page); } return page_copied ? VM_FAULT_WRITE : 0; -oom_free_new: +out_notify: + mmu_notifier_invalidate_range_only_end(&range); + if (vmf->flags & FAULT_FLAG_SPECULATIVE) + mmu_notifier_unlock(mm); +out_free_new: put_page(new_page); -oom: +out: if (old_page) put_page(old_page); - return VM_FAULT_OOM; + return ret; } =20 /** @@ -3369,6 +3391,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) return wp_pfn_shared(vmf); =20 pte_unmap_unlock(vmf->pte, vmf->ptl); + vmf->pte =3D NULL; return wp_page_copy(vmf); } =20 @@ -3407,6 +3430,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) get_page(vmf->page); =20 pte_unmap_unlock(vmf->pte, vmf->ptl); + vmf->pte =3D NULL; return wp_page_copy(vmf); } =20 --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5785C433EF for ; Fri, 28 Jan 2022 13:20:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349095AbiA1NUM (ORCPT ); Fri, 28 Jan 2022 08:20:12 -0500 Received: from server.lespinasse.org ([63.205.204.226]:48725 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348681AbiA1NTK (ORCPT ); Fri, 28 Jan 2022 08:19:10 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=h0OB3itXHSsaegtYCAjglB0dKK8hw9ET+M16zSiFNyU=; b=xS5mCESfy6y0TI6ONSNvUVVbt4kZJ5FEuG4+8W9WsmR3WuvRkxdSZBFUA3lpIapuHDyRM TMjVq3VB/OcODo8Ag== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=h0OB3itXHSsaegtYCAjglB0dKK8hw9ET+M16zSiFNyU=; b=LpDorKuWKqjFoH3XY0OMq4Pa43KItEaXJXyu7mK08bI5Ji3kWxrbak6QdNfKkOzLKUbCq g/OAyfTR+1m765H905fFlfVQTEnIZ+zosJ9T/4AzVAt0gMs8SyYyFQw39rGQjnFLfRGi4F1 ctyEGWUGnRTs+tkjbfCYjUOWAy4OWcwwldTmyaAJblOakogBGqRnIWQgODeVVB2GUbgxd8y +f/1QjxSujsaarrkwFvlDtv+U5AY46icuP5CKcmDxBKHPmbJHL/5y4goGC/ec0/9Pk5wjO6 ZYZwyrtUbEv+9K8WSaeZLhXGIJ5xGAxye2z04RQAbIsNL7WeoXk5J/d+8WLA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 33B8A160AAD; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 16FC920F8E; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 27/35] mm: implement and enable speculative fault handling in handle_pte_fault() Date: Fri, 28 Jan 2022 05:09:58 -0800 Message-Id: <20220128131006.67712-28-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In handle_pte_fault(), allow speculative execution to proceed. Use pte_spinlock() to validate the mmap sequence count when locking the page table. If speculative execution proceeds through do_wp_page(), ensure that we end up in the wp_page_reuse() or wp_page_copy() paths, rather than wp_pfn_shared() or wp_page_shared() (both unreachable as we only handle anon vmas so far) or handle_userfault() (needs an explicit abort to handle non-speculatively). Signed-off-by: Michel Lespinasse --- mm/memory.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index fd8984d89109..7f8dbd729dce 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3293,6 +3293,7 @@ static vm_fault_t wp_pfn_shared(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; =20 + VM_BUG_ON(vmf->flags & FAULT_FLAG_SPECULATIVE); if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) { vm_fault_t ret; =20 @@ -3313,6 +3314,8 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf) struct vm_area_struct *vma =3D vmf->vma; vm_fault_t ret =3D VM_FAULT_WRITE; =20 + VM_BUG_ON(vmf->flags & FAULT_FLAG_SPECULATIVE); + get_page(vmf->page); =20 if (vma->vm_ops && vma->vm_ops->page_mkwrite) { @@ -3366,6 +3369,8 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) =20 if (userfaultfd_pte_wp(vma, *vmf->pte)) { pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->flags & FAULT_FLAG_SPECULATIVE) + return VM_FAULT_RETRY; return handle_userfault(vmf, VM_UFFD_WP); } =20 @@ -4646,13 +4651,8 @@ static vm_fault_t handle_pte_fault(struct vm_fault *= vmf) if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) return do_numa_page(vmf); =20 - if (vmf->flags & FAULT_FLAG_SPECULATIVE) { - pte_unmap(vmf->pte); + if (!pte_spinlock(vmf)) return VM_FAULT_RETRY; - } - - vmf->ptl =3D pte_lockptr(vmf->vma->vm_mm, vmf->pmd); - spin_lock(vmf->ptl); entry =3D vmf->orig_pte; if (unlikely(!pte_same(*vmf->pte, entry))) { update_mmu_tlb(vmf->vma, vmf->address, vmf->pte); --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 083D3C433F5 for ; Fri, 28 Jan 2022 13:20:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349036AbiA1NUA (ORCPT ); Fri, 28 Jan 2022 08:20:00 -0500 Received: from server.lespinasse.org ([63.205.204.226]:45043 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348670AbiA1NTK (ORCPT ); Fri, 28 Jan 2022 08:19:10 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=N1bCyCqlqCPmn3u4dr/qGIvPJEOlMd8KVKMJcaELgEw=; b=GcBReqmwDZ82aVnHS0hCCWXYruQ1qrdcFPqUv1QkU6UoqgPLM+nRmvNS0Y8AYDNay6K1/ WrF/45/SfCMN4X4Bw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=N1bCyCqlqCPmn3u4dr/qGIvPJEOlMd8KVKMJcaELgEw=; b=0fIaqi5/e6TyDkW5DpsQrirQBlbVLQ6vD67HM9rQxnG/ldkbMS/1aO1AiQET+/+D1Wxp3 Vo3HWduaw/24aWGBGAOG7sRUAqxQa9gvY5TqSrmEIbhtiNXIFrxZzvVmTqly4Issy95x4JZ XpUoZZcE5SnkHhoHB10UfCAVkkVE2c9veNH+HcRiGCIJE7VAJr7oiNuou1QLzhuQ9Algvkf igQLboMXPIsg3bvd3gxPJfjO6ges3d0MKFpLCjlXZt+yUAlXlhwjVP8IkEAW9tPE7bRVYLM kP474IHw4IXNCpCJ02BnlHqMo47/ZINoDZlC8r5J6BYzN2BgvHhe5CoZbTdA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 33BE8160AAE; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 19D0820FB1; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 28/35] mm: disable speculative faults for single threaded user space Date: Fri, 28 Jan 2022 05:09:59 -0800 Message-Id: <20220128131006.67712-29-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Performance tuning: single threaded userspace does not benefit from speculative page faults, so we turn them off to avoid any related (small) extra overheads. Signed-off-by: Michel Lespinasse --- arch/x86/mm/fault.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 6ba109413396..d6f8d4967c49 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1328,6 +1328,13 @@ void do_user_addr_fault(struct pt_regs *regs, #endif =20 #ifdef CONFIG_SPECULATIVE_PAGE_FAULT + /* + * No need to try speculative faults for kernel or + * single threaded user space. + */ + if (!(flags & FAULT_FLAG_USER) || atomic_read(&mm->mm_users) =3D=3D 1) + goto no_spf; + count_vm_event(SPF_ATTEMPT); seq =3D mmap_seq_read_start(mm); if (seq & 1) @@ -1362,7 +1369,9 @@ void do_user_addr_fault(struct pt_regs *regs, =20 spf_abort: count_vm_event(SPF_ABORT); -#endif +no_spf: + +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ =20 /* * Kernel-mode access to the user address space should only occur --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 017CAC433EF for ; Fri, 28 Jan 2022 13:20:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348879AbiA1NU2 (ORCPT ); Fri, 28 Jan 2022 08:20:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348754AbiA1NTL (ORCPT ); Fri, 28 Jan 2022 08:19:11 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED46DC061747 for ; Fri, 28 Jan 2022 05:19:08 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=WKv5bmQRqPHdZUO4LXo+QEVU5+AgH+NzFasQ6BtkXFM=; b=/llg3A/maOZyNvvtAsqPMT8BSicLcbnbTOgrCs9XoOeH6gnm+d0cUa1QPJAqq4O8MYZk+ 73okmSGTVaB7uXOAw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=WKv5bmQRqPHdZUO4LXo+QEVU5+AgH+NzFasQ6BtkXFM=; b=isSEOn+qW7HslS9ExCPC4SWo6ixC3JaQtA/XDcgOW/HSjp235iI6dbuuhQWVnRuW3C4VC uRo57aSyhVd7UxYlHuifDAq7xC7OAJVnykW+5LpOPvsqHyO0fvzHtEyGaSJUUSc7CqFFm/Y f1UKBDtJAXBGHOuTmsMGu6qrs5b+pNzdA1bQ34JZYbkQLf44t13EMFyrga+f9AXa9GS7G53 enzvE6DnD84v9hHe1ky+dyUTlXSOY08Oer0XUwqyqHSkurysfooyCTMjsrS5KRj/PVI3i+v tIRp6oX2lYonjB00oQJKO6OFfmuobq1OxPPYGWKNhfaNDoAx3E7ifbr3/sew== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 36C7E160AC7; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 1CA4E2023B; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 29/35] mm: disable rcu safe vma freeing for single threaded user space Date: Fri, 28 Jan 2022 05:10:00 -0800 Message-Id: <20220128131006.67712-30-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Performance tuning: as single threaded userspace does not use speculative page faults, it does not require rcu safe vma freeing. Turn this off to avoid the related (small) extra overheads. For multi threaded userspace, we often see a performance benefit from the rcu safe vma freeing - even in tests that do not have any frequent concurrent page faults ! This is because rcu safe vma freeing prevents recently released vmas from being immediately reused in a new thread. Signed-off-by: Michel Lespinasse --- kernel/fork.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index db92e42d0087..34600fe86743 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -384,10 +384,12 @@ void vm_area_free(struct vm_area_struct *vma) { free_vma_anon_name(vma); #ifdef CONFIG_SPECULATIVE_PAGE_FAULT - call_rcu(&vma->vm_rcu, __vm_area_free); -#else - kmem_cache_free(vm_area_cachep, vma); + if (atomic_read(&vma->vm_mm->mm_users) > 1) { + call_rcu(&vma->vm_rcu, __vm_area_free); + return; + } #endif + kmem_cache_free(vm_area_cachep, vma); } =20 static void account_kernel_stack(struct task_struct *tsk, int account) --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28B1FC433F5 for ; Fri, 28 Jan 2022 13:20:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348873AbiA1NUI (ORCPT ); Fri, 28 Jan 2022 08:20:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348731AbiA1NTK (ORCPT ); Fri, 28 Jan 2022 08:19:10 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BAEB1C061753 for ; Fri, 28 Jan 2022 05:19:08 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=48igdK4XD2qyyw4H7eAGAtmCU5dM/sdUsGGChgdX+mc=; b=Mhjb3jaW8n5cT4q9t7IwQYixbkD0AoAJ+xw9LX+B1BKrkYOiNMDcF1BsuBXLDnxS9YJi9 RhVexvhKEcXF04/AQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=48igdK4XD2qyyw4H7eAGAtmCU5dM/sdUsGGChgdX+mc=; b=5s8QcT/Pe0E6zMe8gQOZSoZuokKcmnSQyMMr7nyXpESmSSSMVXIWQd1aYnbbC9YGDNAj8 aUl/zfdnGrEFhEn3N4PAhTN8iMevYWBb6Z1psZnu98HvYz2v3h6snm6FOsHBDKTsSbVPAxm 1lziHA61mdd2LDX8yvJ7QZwvQ6OgO+zeNo/C8AlUMoVxDsdhJZjPnx+2rDjkAk7vhHok6LF 7NWRmtIwOS/7e6XRDdV9Y8L3MDQlu9fqXoQiprHms0xKtY2BwwUSFE6Z6qN/B5SEhwxcv6F mWwDUtp8fbL7RoWGeghYSQm1a2HibCixrOU4/RQ3B+3ErxvL/xVd5daVs3vA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 3AA38160AC8; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 1F74A20459; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 30/35] mm: create new include/linux/vm_event.h header file Date: Fri, 28 Jan 2022 05:10:01 -0800 Message-Id: <20220128131006.67712-31-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Split off the definitions necessary to update event counters from vmstat.h into a new vm_event.h file. The rationale is to allow header files included from mm.h to update counter events. vmstat.h can not be included from such header files, because it refers to page_pgdat() which is only defined later down in mm.h, and thus results in compile errors. vm_event.h does not refer to page_pgdat() and thus does not result in such errors. Signed-off-by: Michel Lespinasse --- include/linux/vm_event.h | 105 +++++++++++++++++++++++++++++++++++++++ include/linux/vmstat.h | 95 +---------------------------------- 2 files changed, 106 insertions(+), 94 deletions(-) create mode 100644 include/linux/vm_event.h diff --git a/include/linux/vm_event.h b/include/linux/vm_event.h new file mode 100644 index 000000000000..b3ae108a3841 --- /dev/null +++ b/include/linux/vm_event.h @@ -0,0 +1,105 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_VM_EVENT_H +#define _LINUX_VM_EVENT_H + +#include +#include +#include +#include +#include + +#ifdef CONFIG_VM_EVENT_COUNTERS +/* + * Light weight per cpu counter implementation. + * + * Counters should only be incremented and no critical kernel component + * should rely on the counter values. + * + * Counters are handled completely inline. On many platforms the code + * generated will simply be the increment of a global address. + */ + +struct vm_event_state { + unsigned long event[NR_VM_EVENT_ITEMS]; +}; + +DECLARE_PER_CPU(struct vm_event_state, vm_event_states); + +/* + * vm counters are allowed to be racy. Use raw_cpu_ops to avoid the + * local_irq_disable overhead. + */ +static inline void __count_vm_event(enum vm_event_item item) +{ + raw_cpu_inc(vm_event_states.event[item]); +} + +static inline void count_vm_event(enum vm_event_item item) +{ + this_cpu_inc(vm_event_states.event[item]); +} + +static inline void __count_vm_events(enum vm_event_item item, long delta) +{ + raw_cpu_add(vm_event_states.event[item], delta); +} + +static inline void count_vm_events(enum vm_event_item item, long delta) +{ + this_cpu_add(vm_event_states.event[item], delta); +} + +extern void all_vm_events(unsigned long *); + +extern void vm_events_fold_cpu(int cpu); + +#else + +/* Disable counters */ +static inline void count_vm_event(enum vm_event_item item) +{ +} +static inline void count_vm_events(enum vm_event_item item, long delta) +{ +} +static inline void __count_vm_event(enum vm_event_item item) +{ +} +static inline void __count_vm_events(enum vm_event_item item, long delta) +{ +} +static inline void all_vm_events(unsigned long *ret) +{ +} +static inline void vm_events_fold_cpu(int cpu) +{ +} + +#endif /* CONFIG_VM_EVENT_COUNTERS */ + +#ifdef CONFIG_NUMA_BALANCING +#define count_vm_numa_event(x) count_vm_event(x) +#define count_vm_numa_events(x, y) count_vm_events(x, y) +#else +#define count_vm_numa_event(x) do {} while (0) +#define count_vm_numa_events(x, y) do { (void)(y); } while (0) +#endif /* CONFIG_NUMA_BALANCING */ + +#ifdef CONFIG_DEBUG_TLBFLUSH +#define count_vm_tlb_event(x) count_vm_event(x) +#define count_vm_tlb_events(x, y) count_vm_events(x, y) +#else +#define count_vm_tlb_event(x) do {} while (0) +#define count_vm_tlb_events(x, y) do { (void)(y); } while (0) +#endif + +#ifdef CONFIG_DEBUG_VM_VMACACHE +#define count_vm_vmacache_event(x) count_vm_event(x) +#else +#define count_vm_vmacache_event(x) do {} while (0) +#endif + +#define __count_zid_vm_events(item, zid, delta) \ + __count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta) + +#endif /* _LINUX_VM_EVENT_H */ diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index bfe38869498d..7c3c892ce89a 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -40,100 +41,6 @@ enum writeback_stat_item { NR_VM_WRITEBACK_STAT_ITEMS, }; =20 -#ifdef CONFIG_VM_EVENT_COUNTERS -/* - * Light weight per cpu counter implementation. - * - * Counters should only be incremented and no critical kernel component - * should rely on the counter values. - * - * Counters are handled completely inline. On many platforms the code - * generated will simply be the increment of a global address. - */ - -struct vm_event_state { - unsigned long event[NR_VM_EVENT_ITEMS]; -}; - -DECLARE_PER_CPU(struct vm_event_state, vm_event_states); - -/* - * vm counters are allowed to be racy. Use raw_cpu_ops to avoid the - * local_irq_disable overhead. - */ -static inline void __count_vm_event(enum vm_event_item item) -{ - raw_cpu_inc(vm_event_states.event[item]); -} - -static inline void count_vm_event(enum vm_event_item item) -{ - this_cpu_inc(vm_event_states.event[item]); -} - -static inline void __count_vm_events(enum vm_event_item item, long delta) -{ - raw_cpu_add(vm_event_states.event[item], delta); -} - -static inline void count_vm_events(enum vm_event_item item, long delta) -{ - this_cpu_add(vm_event_states.event[item], delta); -} - -extern void all_vm_events(unsigned long *); - -extern void vm_events_fold_cpu(int cpu); - -#else - -/* Disable counters */ -static inline void count_vm_event(enum vm_event_item item) -{ -} -static inline void count_vm_events(enum vm_event_item item, long delta) -{ -} -static inline void __count_vm_event(enum vm_event_item item) -{ -} -static inline void __count_vm_events(enum vm_event_item item, long delta) -{ -} -static inline void all_vm_events(unsigned long *ret) -{ -} -static inline void vm_events_fold_cpu(int cpu) -{ -} - -#endif /* CONFIG_VM_EVENT_COUNTERS */ - -#ifdef CONFIG_NUMA_BALANCING -#define count_vm_numa_event(x) count_vm_event(x) -#define count_vm_numa_events(x, y) count_vm_events(x, y) -#else -#define count_vm_numa_event(x) do {} while (0) -#define count_vm_numa_events(x, y) do { (void)(y); } while (0) -#endif /* CONFIG_NUMA_BALANCING */ - -#ifdef CONFIG_DEBUG_TLBFLUSH -#define count_vm_tlb_event(x) count_vm_event(x) -#define count_vm_tlb_events(x, y) count_vm_events(x, y) -#else -#define count_vm_tlb_event(x) do {} while (0) -#define count_vm_tlb_events(x, y) do { (void)(y); } while (0) -#endif - -#ifdef CONFIG_DEBUG_VM_VMACACHE -#define count_vm_vmacache_event(x) count_vm_event(x) -#else -#define count_vm_vmacache_event(x) do {} while (0) -#endif - -#define __count_zid_vm_events(item, zid, delta) \ - __count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta) - /* * Zone and node-based page accounting with per cpu differentials. */ --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CBFEC433EF for ; Fri, 28 Jan 2022 13:20:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349065AbiA1NUh (ORCPT ); Fri, 28 Jan 2022 08:20:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348758AbiA1NTL (ORCPT ); Fri, 28 Jan 2022 08:19:11 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1AB7C061756 for ; Fri, 28 Jan 2022 05:19:08 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=vNVgJYJu6sYRiF4zfC0JNPP4aiQCkGfHC9+Vt/jd9gY=; b=jar27umGUl1J/bfnDyN7CaQy9w55QJGtVvNX39hoMdZcLem7v73raVDxfA2rSBgEYBj7T FW3lE2XPKdYf+bpAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=vNVgJYJu6sYRiF4zfC0JNPP4aiQCkGfHC9+Vt/jd9gY=; b=WvVe6DmTU6TnNrJY6sHATKXNNNhjhY7wc7UuholTUIa1QA3xfdeue7OBzgBBpB+9iM7Z5 RhIgyaFbCe21MojxOrzub2n8R/HM8WSBJoE9/LPpYMX5JIqMppts4lbruESGIvvJE+LSarS mNWE1YJ8dHwR9+qkqgiu/qLOC1LiW6W6eRcZYj67eBlgAr3+FxrNu05WbsmeCg4uBtrB2X/ BL8Mo8219R3jKHiiiW3JEXRdBaN6L8y5Ht+O9NzG0rTVgE32w6IqOM3yoPYdre/33W2/7cw HurcNm4TXXgAu2OXobP2dZZxTGoMvm9TgSvQwFO3WJnffgRArV4jMDITPg5g== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 3EC27160AC9; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 225DA20FB2; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 31/35] mm: anon spf statistics Date: Fri, 28 Jan 2022 05:10:02 -0800 Message-Id: <20220128131006.67712-32-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add a new CONFIG_SPECULATIVE_PAGE_FAULT_STATS config option, and dump extra statistics about executed spf cases and abort reasons when the option is set. Signed-off-by: Michel Lespinasse --- arch/x86/mm/fault.c | 18 ++++++++--- include/linux/mmap_lock.h | 19 ++++++++++-- include/linux/vm_event.h | 6 ++++ include/linux/vm_event_item.h | 21 +++++++++++++ mm/Kconfig.debug | 7 +++++ mm/memory.c | 56 ++++++++++++++++++++++++++++------- mm/vmstat.c | 21 +++++++++++++ 7 files changed, 131 insertions(+), 17 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index d6f8d4967c49..a5a19561c319 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1337,21 +1337,31 @@ void do_user_addr_fault(struct pt_regs *regs, =20 count_vm_event(SPF_ATTEMPT); seq =3D mmap_seq_read_start(mm); - if (seq & 1) + if (seq & 1) { + count_vm_spf_event(SPF_ABORT_ODD); goto spf_abort; + } rcu_read_lock(); vma =3D __find_vma(mm, address); - if (!vma || vma->vm_start > address || !vma_is_anonymous(vma)) { + if (!vma || vma->vm_start > address) { rcu_read_unlock(); + count_vm_spf_event(SPF_ABORT_UNMAPPED); + goto spf_abort; + } + if (!vma_is_anonymous(vma)) { + rcu_read_unlock(); + count_vm_spf_event(SPF_ABORT_NO_SPECULATE); goto spf_abort; } pvma =3D *vma; rcu_read_unlock(); - if (!mmap_seq_read_check(mm, seq)) + if (!mmap_seq_read_check(mm, seq, SPF_ABORT_VMA_COPY)) goto spf_abort; vma =3D &pvma; - if (unlikely(access_error(error_code, vma))) + if (unlikely(access_error(error_code, vma))) { + count_vm_spf_event(SPF_ABORT_ACCESS_ERROR); goto spf_abort; + } fault =3D do_handle_mm_fault(vma, address, flags | FAULT_FLAG_SPECULATIVE, seq, regs); =20 diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index a2459eb15a33..747805ce07b8 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -7,6 +7,7 @@ #include #include #include +#include =20 #ifdef CONFIG_SPECULATIVE_PAGE_FAULT #define MMAP_LOCK_SEQ_INITIALIZER(name) \ @@ -104,12 +105,26 @@ static inline unsigned long mmap_seq_read_start(struc= t mm_struct *mm) return seq; } =20 -static inline bool mmap_seq_read_check(struct mm_struct *mm, unsigned long= seq) +static inline bool __mmap_seq_read_check(struct mm_struct *mm, + unsigned long seq) { smp_rmb(); return seq =3D=3D READ_ONCE(mm->mmap_seq); } -#endif + +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT_STATS +static inline bool mmap_seq_read_check(struct mm_struct *mm, unsigned long= seq, + enum vm_event_item fail_event) +{ + if (__mmap_seq_read_check(mm, seq)) + return true; + count_vm_event(fail_event); + return false; +} +#else +#define mmap_seq_read_check(mm, seq, fail) __mmap_seq_read_check(mm, seq) +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT_STATS */ +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ =20 static inline void mmap_write_lock(struct mm_struct *mm) { diff --git a/include/linux/vm_event.h b/include/linux/vm_event.h index b3ae108a3841..689a21387dad 100644 --- a/include/linux/vm_event.h +++ b/include/linux/vm_event.h @@ -77,6 +77,12 @@ static inline void vm_events_fold_cpu(int cpu) =20 #endif /* CONFIG_VM_EVENT_COUNTERS */ =20 +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT_STATS +#define count_vm_spf_event(x) count_vm_event(x) +#else +#define count_vm_spf_event(x) do {} while (0) +#endif + #ifdef CONFIG_NUMA_BALANCING #define count_vm_numa_event(x) count_vm_event(x) #define count_vm_numa_events(x, y) count_vm_events(x, y) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index f00b3e36ff39..0390b81b1e71 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -137,6 +137,27 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_SPECULATIVE_PAGE_FAULT SPF_ATTEMPT, SPF_ABORT, +#endif +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT_STATS + SPF_ABORT_ODD, + SPF_ABORT_UNMAPPED, + SPF_ABORT_NO_SPECULATE, + SPF_ABORT_VMA_COPY, + SPF_ABORT_ACCESS_ERROR, + SPF_ABORT_PUD, + SPF_ABORT_PMD, + SPF_ABORT_ANON_VMA, + SPF_ABORT_PTE_MAP_LOCK_SEQ1, + SPF_ABORT_PTE_MAP_LOCK_PMD, + SPF_ABORT_PTE_MAP_LOCK_PTL, + SPF_ABORT_PTE_MAP_LOCK_SEQ2, + SPF_ABORT_USERFAULTFD, + SPF_ABORT_FAULT, + SPF_ABORT_SWAP, + SPF_ATTEMPT_ANON, + SPF_ATTEMPT_NUMA, + SPF_ATTEMPT_PTE, + SPF_ATTEMPT_WP, #endif NR_VM_EVENT_ITEMS }; diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 5bd5bb097252..73b61cc95562 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -174,3 +174,10 @@ config PTDUMP_DEBUGFS kernel. =20 If in doubt, say N. + +config SPECULATIVE_PAGE_FAULT_STATS + bool "Additional statistics for speculative page faults" + depends on SPECULATIVE_PAGE_FAULT + help + Additional statistics for speculative page faults. + If in doubt, say N. diff --git a/mm/memory.c b/mm/memory.c index 7f8dbd729dce..a5754309eaae 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2762,7 +2762,8 @@ bool __pte_map_lock(struct vm_fault *vmf) } =20 speculative_page_walk_begin(); - if (!mmap_seq_read_check(vmf->vma->vm_mm, vmf->seq)) + if (!mmap_seq_read_check(vmf->vma->vm_mm, vmf->seq, + SPF_ABORT_PTE_MAP_LOCK_SEQ1)) goto fail; /* * The mmap sequence count check guarantees that the page @@ -2775,8 +2776,10 @@ bool __pte_map_lock(struct vm_fault *vmf) * is not a huge collapse operation in progress in our back. */ pmdval =3D READ_ONCE(*vmf->pmd); - if (!pmd_same(pmdval, vmf->orig_pmd)) + if (!pmd_same(pmdval, vmf->orig_pmd)) { + count_vm_spf_event(SPF_ABORT_PTE_MAP_LOCK_PMD); goto fail; + } #endif ptl =3D pte_lockptr(vmf->vma->vm_mm, vmf->pmd); if (!pte) @@ -2793,9 +2796,12 @@ bool __pte_map_lock(struct vm_fault *vmf) * We also don't want to retry until spin_trylock() succeeds, * because of the starvation potential against a stream of lockers. */ - if (unlikely(!spin_trylock(ptl))) + if (unlikely(!spin_trylock(ptl))) { + count_vm_spf_event(SPF_ABORT_PTE_MAP_LOCK_PTL); goto fail; - if (!mmap_seq_read_check(vmf->vma->vm_mm, vmf->seq)) + } + if (!mmap_seq_read_check(vmf->vma->vm_mm, vmf->seq, + SPF_ABORT_PTE_MAP_LOCK_SEQ2)) goto unlock_fail; speculative_page_walk_end(); vmf->pte =3D pte; @@ -3091,6 +3097,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) =20 if (unlikely(!vma->anon_vma)) { if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + count_vm_spf_event(SPF_ABORT_ANON_VMA); ret =3D VM_FAULT_RETRY; goto out; } @@ -3367,10 +3374,15 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; =20 + if (vmf->flags & FAULT_FLAG_SPECULATIVE) + count_vm_spf_event(SPF_ATTEMPT_WP); + if (userfaultfd_pte_wp(vma, *vmf->pte)) { pte_unmap_unlock(vmf->pte, vmf->ptl); - if (vmf->flags & FAULT_FLAG_SPECULATIVE) + if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + count_vm_spf_event(SPF_ABORT_USERFAULTFD); return VM_FAULT_RETRY; + } return handle_userfault(vmf, VM_UFFD_WP); } =20 @@ -3620,6 +3632,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) =20 if (vmf->flags & FAULT_FLAG_SPECULATIVE) { pte_unmap(vmf->pte); + count_vm_spf_event(SPF_ABORT_SWAP); return VM_FAULT_RETRY; } =20 @@ -3852,6 +3865,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) vm_fault_t ret =3D 0; pte_t entry; =20 + if (vmf->flags & FAULT_FLAG_SPECULATIVE) + count_vm_spf_event(SPF_ATTEMPT_ANON); + /* File mapping without ->vm_ops ? */ if (vma->vm_flags & VM_SHARED) return VM_FAULT_SIGBUS; @@ -3881,8 +3897,10 @@ static vm_fault_t do_anonymous_page(struct vm_fault = *vmf) } else { /* Allocate our own private page. */ if (unlikely(!vma->anon_vma)) { - if (vmf->flags & FAULT_FLAG_SPECULATIVE) + if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + count_vm_spf_event(SPF_ABORT_ANON_VMA); return VM_FAULT_RETRY; + } if (__anon_vma_prepare(vma)) goto oom; } @@ -3925,8 +3943,10 @@ static vm_fault_t do_anonymous_page(struct vm_fault = *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); if (page) put_page(page); - if (vmf->flags & FAULT_FLAG_SPECULATIVE) + if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + count_vm_spf_event(SPF_ABORT_USERFAULTFD); return VM_FAULT_RETRY; + } return handle_userfault(vmf, VM_UFFD_MISSING); } =20 @@ -4470,6 +4490,9 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) bool was_writable =3D pte_savedwrite(vmf->orig_pte); int flags =3D 0; =20 + if (vmf->flags & FAULT_FLAG_SPECULATIVE) + count_vm_spf_event(SPF_ATTEMPT_NUMA); + /* * The "pte" at this point cannot be used safely without * validation through pte_unmap_same(). It's of NUMA type but @@ -4651,6 +4674,9 @@ static vm_fault_t handle_pte_fault(struct vm_fault *v= mf) if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) return do_numa_page(vmf); =20 + if (vmf->flags & FAULT_FLAG_SPECULATIVE) + count_vm_spf_event(SPF_ATTEMPT_PTE); + if (!pte_spinlock(vmf)) return VM_FAULT_RETRY; entry =3D vmf->orig_pte; @@ -4718,20 +4744,26 @@ static vm_fault_t __handle_mm_fault(struct vm_area_= struct *vma, speculative_page_walk_begin(); pgd =3D pgd_offset(mm, address); pgdval =3D READ_ONCE(*pgd); - if (pgd_none(pgdval) || unlikely(pgd_bad(pgdval))) + if (pgd_none(pgdval) || unlikely(pgd_bad(pgdval))) { + count_vm_spf_event(SPF_ABORT_PUD); goto spf_fail; + } =20 p4d =3D p4d_offset(pgd, address); p4dval =3D READ_ONCE(*p4d); - if (p4d_none(p4dval) || unlikely(p4d_bad(p4dval))) + if (p4d_none(p4dval) || unlikely(p4d_bad(p4dval))) { + count_vm_spf_event(SPF_ABORT_PUD); goto spf_fail; + } =20 vmf.pud =3D pud_offset(p4d, address); pudval =3D READ_ONCE(*vmf.pud); if (pud_none(pudval) || unlikely(pud_bad(pudval)) || unlikely(pud_trans_huge(pudval)) || - unlikely(pud_devmap(pudval))) + unlikely(pud_devmap(pudval))) { + count_vm_spf_event(SPF_ABORT_PUD); goto spf_fail; + } =20 vmf.pmd =3D pmd_offset(vmf.pud, address); vmf.orig_pmd =3D READ_ONCE(*vmf.pmd); @@ -4749,8 +4781,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_s= truct *vma, if (unlikely(pmd_none(vmf.orig_pmd) || is_swap_pmd(vmf.orig_pmd) || pmd_trans_huge(vmf.orig_pmd) || - pmd_devmap(vmf.orig_pmd))) + pmd_devmap(vmf.orig_pmd))) { + count_vm_spf_event(SPF_ABORT_PMD); goto spf_fail; + } =20 /* * The above does not allocate/instantiate page-tables because diff --git a/mm/vmstat.c b/mm/vmstat.c index dbb0160e5558..20ac17cf582a 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1394,6 +1394,27 @@ const char * const vmstat_text[] =3D { "spf_attempt", "spf_abort", #endif +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT_STATS + "SPF_ABORT_ODD", + "SPF_ABORT_UNMAPPED", + "SPF_ABORT_NO_SPECULATE", + "SPF_ABORT_VMA_COPY", + "SPF_ABORT_ACCESS_ERROR", + "SPF_ABORT_PUD", + "SPF_ABORT_PMD", + "SPF_ABORT_ANON_VMA", + "SPF_ABORT_PTE_MAP_LOCK_SEQ1", + "SPF_ABORT_PTE_MAP_LOCK_PMD", + "SPF_ABORT_PTE_MAP_LOCK_PTL", + "SPF_ABORT_PTE_MAP_LOCK_SEQ2", + "SPF_ABORT_USERFAULTFD", + "SPF_ABORT_FAULT", + "SPF_ABORT_SWAP", + "SPF_ATTEMPT_ANON", + "SPF_ATTEMPT_NUMA", + "SPF_ATTEMPT_PTE", + "SPF_ATTEMPT_WP", +#endif #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ }; #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */ --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D90A9C433EF for ; Fri, 28 Jan 2022 13:20:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244837AbiA1NUY (ORCPT ); Fri, 28 Jan 2022 08:20:24 -0500 Received: from server.lespinasse.org ([63.205.204.226]:41711 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348738AbiA1NTL (ORCPT ); Fri, 28 Jan 2022 08:19:11 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=hJIKChkB8VHiAv0wjag4cY+zBFGl6h7s8aJLFAJ4t0k=; b=Oyga/3lCe/RtyePrIqDmFLWhX0WM4mw25anmtIa6D8dSMSzDkLudB3p1WZ9iuB7b0C2pS eHBPvV4iuqF+1dfBQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=hJIKChkB8VHiAv0wjag4cY+zBFGl6h7s8aJLFAJ4t0k=; b=UsMUlczIw2jWjOR5SBzHQnYm9vx8Q0Npk09M/ESPvhfFkeTqtcANk3M9DqGLJRlfmqctP aXhM1bdRoDv/uV17HzEB/CfGYygE0r5zLawZTdSKm8lOScN6cotkIoMmXwS6NZx8/ndbtDR CtLcYcf7Q+yhYzFs766riGUBWIYS8IPqkElBrJV3oOZ2nN9rNQFZGZ4ZnwP9MDSH0X409G6 UQWDalHyrq+/0D6y2FwuoFRRrSxZyLfr6es3W8xkSvUK0ouTQhRgOm3+PBCokIL5ueLe99n eyedRbdNwHfm/c9v4K6IlG4cs1EzJGVjlBsaQy/R+2NNuEoCEKmHM7S4fsJw== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 41B2D160ADF; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 252BB20FB3; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 32/35] arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Date: Fri, 28 Jan 2022 05:10:03 -0800 Message-Id: <20220128131006.67712-33-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT so that the speculative fault handling code can be compiled on this architecture. Signed-off-by: Michel Lespinasse --- arch/arm64/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 6978140edfa4..e764329b11a7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -222,6 +222,7 @@ config ARM64 select THREAD_INFO_IN_TASK select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD select TRACE_IRQFLAGS_SUPPORT + select ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT help ARM 64-bit (AArch64) Linux support. =20 --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48143C433F5 for ; Fri, 28 Jan 2022 13:20:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349110AbiA1NUO (ORCPT ); Fri, 28 Jan 2022 08:20:14 -0500 Received: from server.lespinasse.org ([63.205.204.226]:33783 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348692AbiA1NTK (ORCPT ); Fri, 28 Jan 2022 08:19:10 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=9LUG5WrBQEb2xTCcRQnwKzWeTPNgCCAgZCv7h4INXy0=; b=HoH4UKsoi3piw3Y7Ft9hloWe9vIGQzFqMVsgTxTip2qSR0AuyqsuuqvY2O8W/ZpvxtODE Z0fofr6+wkDfoqFCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=9LUG5WrBQEb2xTCcRQnwKzWeTPNgCCAgZCv7h4INXy0=; b=v9fnJPF2zKk5woJCEPKhHnu6QX/4Q0liJnlf1qw3UflqWC7A8ZMYjk7zQRxEGtay18Rlc CUi/+bzYWT/TniOqJs578nQrhRGMMKqSC4pJL6cxZWU8H41lyVBug6AGwy6vCltQHRjM2Dm aCJTil5iQA9QTA3LPMwBFcAGV1JRg+JQm1HB7Jw7KVd/Jrh7xfsGTO0S72fW9qUM8UagwlK 73zzwhALRQDi0GxhhDTH50AWpwHu6kDigcbgqWulY0Y1UYCN8CRyJywino5762BFo0V/G0P Tq0OYR9VH546GjTsHUuwPfaL+FhHeVDVRqiK9a1k8yv0saMpcGbHJt9v1qqw== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 4210A160AE0; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 2812820473; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 33/35] arm64/mm: attempt speculative mm faults first Date: Fri, 28 Jan 2022 05:10:04 -0800 Message-Id: <20220128131006.67712-34-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Attempt speculative mm fault handling first, and fall back to the existing (non-speculative) code if that fails. This follows the lines of the x86 speculative fault handling code, but with some minor arch differences such as the way that the VM_FAULT_BADACCESS case is handled. Signed-off-by: Michel Lespinasse --- arch/arm64/mm/fault.c | 62 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 77341b160aca..2598795f4e70 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -25,6 +25,7 @@ #include #include #include +#include =20 #include #include @@ -524,6 +525,11 @@ static int __kprobes do_page_fault(unsigned long far, = unsigned int esr, unsigned long vm_flags; unsigned int mm_flags =3D FAULT_FLAG_DEFAULT; unsigned long addr =3D untagged_addr(far); +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + struct vm_area_struct *vma; + struct vm_area_struct pvma; + unsigned long seq; +#endif =20 if (kprobe_page_fault(regs, esr)) return 0; @@ -574,6 +580,59 @@ static int __kprobes do_page_fault(unsigned long far, = unsigned int esr, =20 perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr); =20 +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + /* + * No need to try speculative faults for kernel or + * single threaded user space. + */ + if (!(mm_flags & FAULT_FLAG_USER) || atomic_read(&mm->mm_users) =3D=3D 1) + goto no_spf; + + count_vm_event(SPF_ATTEMPT); + seq =3D mmap_seq_read_start(mm); + if (seq & 1) { + count_vm_spf_event(SPF_ABORT_ODD); + goto spf_abort; + } + rcu_read_lock(); + vma =3D __find_vma(mm, addr); + if (!vma || vma->vm_start > addr) { + rcu_read_unlock(); + count_vm_spf_event(SPF_ABORT_UNMAPPED); + goto spf_abort; + } + if (!vma_is_anonymous(vma)) { + rcu_read_unlock(); + count_vm_spf_event(SPF_ABORT_NO_SPECULATE); + goto spf_abort; + } + pvma =3D *vma; + rcu_read_unlock(); + if (!mmap_seq_read_check(mm, seq, SPF_ABORT_VMA_COPY)) + goto spf_abort; + vma =3D &pvma; + if (!(vma->vm_flags & vm_flags)) { + count_vm_spf_event(SPF_ABORT_ACCESS_ERROR); + goto spf_abort; + } + fault =3D do_handle_mm_fault(vma, addr & PAGE_MASK, + mm_flags | FAULT_FLAG_SPECULATIVE, seq, regs); + + /* Quick path to respond to signals */ + if (fault_signal_pending(fault, regs)) { + if (!user_mode(regs)) + goto no_context; + return 0; + } + if (!(fault & VM_FAULT_RETRY)) + goto done; + +spf_abort: + count_vm_event(SPF_ABORT); +no_spf: + +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ + /* * As per x86, we may deadlock here. However, since the kernel only * validly references user space from well defined areas of the code, @@ -612,6 +671,9 @@ static int __kprobes do_page_fault(unsigned long far, u= nsigned int esr, goto retry; } mmap_read_unlock(mm); +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT +done: +#endif =20 /* * Handle the "normal" (no error) case first. --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08182C433EF for ; Fri, 28 Jan 2022 13:20:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349073AbiA1NUn (ORCPT ); Fri, 28 Jan 2022 08:20:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348765AbiA1NTL (ORCPT ); Fri, 28 Jan 2022 08:19:11 -0500 Received: from server.lespinasse.org (server.lespinasse.org [IPv6:2001:470:82ab::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25F1EC06173B for ; Fri, 28 Jan 2022 05:19:09 -0800 (PST) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=1qNQozrzr3u02685KSOX9mmqDOdBZq9NnzsUpi2IPaw=; b=6BYWY9OeTakFO1yedYXP/YDk5XWFtpTASfvFH9l3bWr14MWG0qakCH6lLPxP47W8w6cKt Kf50V6wlrfBBZOYAw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=1qNQozrzr3u02685KSOX9mmqDOdBZq9NnzsUpi2IPaw=; b=ueeE+B3WdhuSeZQhwoRdlPRcMIViOulAB79HXUYcoOLo++ZiI760j4qwPv9uDOQSTzzpW /SmFc0NJcAzQ9qS5VYlPofJcymNqAmvkn6oRVdcZSEiYya1FVJhCUjzwdTYfaaZ6pMonUkv 0OOV3vWA6K4eYa/ht98YCHS0FBn6Vz4TjjnJq+hI//DdVqBC4orjnmU5VQ78FjKvdtwt6MH vi7Xyh7KIsSX2Hez+ibTnIMw2QdzCRnhJAZvjvOY7LBYpe/95Nwt4i0LpN9uv6NjtFtAhcv dWRPzuo3EhJ+uOmpw35SZ2O3c6niIb76XF8wQwAAjmAt9JkEuQjr5Fp+TQJg== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 4A351160AEE; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 2AF3520477; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 34/35] powerpc/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Date: Fri, 28 Jan 2022 05:10:05 -0800 Message-Id: <20220128131006.67712-35-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT so that the speculative fault handling code can be compiled on this architecture. Signed-off-by: Michel Lespinasse --- arch/powerpc/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index b779603978e1..5f82bc7eee0b 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -144,6 +144,7 @@ config PPC select ARCH_STACKWALK select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_DEBUG_PAGEALLOC if PPC_BOOK3S || PPC_8xx || 40x + select ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT if PPC_BOOK3S_64 select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF if PPC64 select ARCH_USE_MEMTEST --=20 2.20.1 From nobody Tue Jun 30 00:50:12 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00275C433EF for ; Fri, 28 Jan 2022 13:20:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349188AbiA1NUa (ORCPT ); Fri, 28 Jan 2022 08:20:30 -0500 Received: from server.lespinasse.org ([63.205.204.226]:54317 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348741AbiA1NTL (ORCPT ); Fri, 28 Jan 2022 08:19:11 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-ed; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=Ws6nSBPjrVyMpiYOH2Cl/taOt/D/Mr8ccdsRtTt0KNg=; b=hZUoPYqCiSZU/TZEOEN3hUdTisHLznIpWUYnuiF4iT5phPKIe7MHJlrRXGB51QKAU7t8p tm979gzz3ai95sUBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-52-rsa; t=1643375407; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=Ws6nSBPjrVyMpiYOH2Cl/taOt/D/Mr8ccdsRtTt0KNg=; b=05qrSfOE11cRELTpZK5FABvk64z57z07taWkOiDYqtdoFu2bmZLuEQAjGKjj/MH0ZvTkm p6BJDLlUAOtvNI4JmkRFKG9ml3GDuOMGAbpKn7p1XJle0rrCxJZOSJqEOM6dkuuI+0R1fu3 wLjgl8np3i5Hetx1zPKy6+1seCqxTWLB89owIomdh0CCra0zkxLKWk+erkEmj3tqrxL+JIU UYyhGcMfQAosylV0usu6jbJz8PGjmIVuY1e26bh1EwsCXM9cAORG9AmhA6uJIqThjEKYVMn 8h7uIC1xHx6NyJ3ma9cGi8IpvVZkD6slsz36NBeMAZUIDr7Mbwrdmi1ANzGA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 4784B160AE1; Fri, 28 Jan 2022 05:10:07 -0800 (PST) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 2DBAD20FB5; Fri, 28 Jan 2022 05:10:07 -0800 (PST) From: Michel Lespinasse To: Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Cc: kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Michel Lespinasse Subject: [PATCH v2 35/35] powerpc/mm: attempt speculative mm faults first Date: Fri, 28 Jan 2022 05:10:06 -0800 Message-Id: <20220128131006.67712-36-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Attempt speculative mm fault handling first, and fall back to the existing (non-speculative) code if that fails. This follows the lines of the x86 speculative fault handling code, but with some minor arch differences such as the way that the access_pkey_error case is handled Signed-off-by: Michel Lespinasse --- arch/powerpc/mm/fault.c | 64 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index eb8ecd7343a9..3f039504e8fd 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -395,6 +395,10 @@ static int ___do_page_fault(struct pt_regs *regs, unsi= gned long address, int is_write =3D page_fault_is_write(error_code); vm_fault_t fault, major =3D 0; bool kprobe_fault =3D kprobe_page_fault(regs, 11); +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + struct vm_area_struct pvma; + unsigned long seq; +#endif =20 if (unlikely(debugger_fault_handler(regs) || kprobe_fault)) return 0; @@ -451,6 +455,63 @@ static int ___do_page_fault(struct pt_regs *regs, unsi= gned long address, if (is_exec) flags |=3D FAULT_FLAG_INSTRUCTION; =20 +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + /* + * No need to try speculative faults for kernel or + * single threaded user space. + */ + if (!(flags & FAULT_FLAG_USER) || atomic_read(&mm->mm_users) =3D=3D 1) + goto no_spf; + + count_vm_event(SPF_ATTEMPT); + seq =3D mmap_seq_read_start(mm); + if (seq & 1) { + count_vm_spf_event(SPF_ABORT_ODD); + goto spf_abort; + } + rcu_read_lock(); + vma =3D __find_vma(mm, address); + if (!vma || vma->vm_start > address) { + rcu_read_unlock(); + count_vm_spf_event(SPF_ABORT_UNMAPPED); + goto spf_abort; + } + if (!vma_is_anonymous(vma)) { + rcu_read_unlock(); + count_vm_spf_event(SPF_ABORT_NO_SPECULATE); + goto spf_abort; + } + pvma =3D *vma; + rcu_read_unlock(); + if (!mmap_seq_read_check(mm, seq, SPF_ABORT_VMA_COPY)) + goto spf_abort; + vma =3D &pvma; +#ifdef CONFIG_PPC_MEM_KEYS + if (unlikely(access_pkey_error(is_write, is_exec, + (error_code & DSISR_KEYFAULT), vma))) { + count_vm_spf_event(SPF_ABORT_ACCESS_ERROR); + goto spf_abort; + } +#endif /* CONFIG_PPC_MEM_KEYS */ + if (unlikely(access_error(is_write, is_exec, vma))) { + count_vm_spf_event(SPF_ABORT_ACCESS_ERROR); + goto spf_abort; + } + fault =3D do_handle_mm_fault(vma, address, + flags | FAULT_FLAG_SPECULATIVE, seq, regs); + major |=3D fault & VM_FAULT_MAJOR; + + if (fault_signal_pending(fault, regs)) + return user_mode(regs) ? 0 : SIGBUS; + if (!(fault & VM_FAULT_RETRY)) + goto done; + +spf_abort: + count_vm_event(SPF_ABORT); +no_spf: + +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ + /* When running in the kernel we expect faults to occur only to * addresses in user space. All other faults represent errors in the * kernel and should generate an OOPS. Unfortunately, in the case of an @@ -522,6 +583,9 @@ static int ___do_page_fault(struct pt_regs *regs, unsig= ned long address, } =20 mmap_read_unlock(current->mm); +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT +done: +#endif =20 if (unlikely(fault & VM_FAULT_ERROR)) return mm_fault_error(regs, address, fault); --=20 2.20.1