From nobody Sat Feb 7 12:40:45 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27B3BEB64DA for ; Wed, 28 Jun 2023 08:28:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234608AbjF1I2R (ORCPT ); Wed, 28 Jun 2023 04:28:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234189AbjF1IZu (ORCPT ); Wed, 28 Jun 2023 04:25:50 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F26D2420F for ; Wed, 28 Jun 2023 01:15:41 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id 98e67ed59e1d1-2631fc29e8aso837082a91.3 for ; Wed, 28 Jun 2023 01:15:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687940141; x=1690532141; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KMxqHAnxOZxMzgoGe7kH228DI7ZaEMF5Tfki0D7IS+4=; b=CBHWWOf90wfELDWCOMJatEAe0U62nsw8bLsy9S1NhivQJI6Lywv70VWvT5bZkk3l2X mQaa+A/h2dQkbPrsHOnNWR/Ckh/FYubF4CF0z0B43SHPxGW0HXFdFKIu+wf1+inIrRe8 d8GzW1GZIOacvKMtIJcWpBJdjSVptgSwbg2yVIB6If8o8jhOsXWJ3ELpsK7EKCLHAMnI RP1jcr1RXKxgOUzVonG3/faDv1dh8iGUIZlgzaLdhO1ToOE6jRezTPPq3I81vu9deONt QheNLc5eNO2dvRjwwcMOXO6zARsbLXW7YuSXreuwcQ1kOVW+fxsAFlg8sxhNwSdNBxXb X+Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687940141; x=1690532141; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KMxqHAnxOZxMzgoGe7kH228DI7ZaEMF5Tfki0D7IS+4=; b=Db584hro4xX1W1UVfLw85I4Oc551RYw5XJJ/7565TmnabxARqI5MwDXXssW/VDR0QU 117g5pqMGCQM2TsjN+DaWxXyDgAlJzEvRuDxbmzeVK5YW6wHSgtVVqEYqW5OkHNo2Qq3 kht7BdU+xETBQFAdbLFo0xgjHUEMncvxaye0Mzljb0UMR+g/Fc+UfmZXEGTeoSquAPXa 5BWzHlBFovEJQZDTnEdQNeUfDp+joviVgT7H464TpCOn38hTTsgLeWKFCc9fYQ94uYpK lm/8wY3WP7CdJ2WyNMiadPtRW6XXABFCSirp3JHeSgcEeRLEwDdSQNXbtlXEL/Kn0Pmj stiw== X-Gm-Message-State: AC+VfDwoOgyBLwDfF5+E4TUYd8KKUw9rQAOtqhZNzxjWhQyT+Oz1NPKf uRQNR9hHeNV59ZDXP6wH16SZCWyiGsg= X-Google-Smtp-Source: ACHHUZ7N2+Vdz5wzVt4qo3ORJk1cvXaFXFgrNn/dwhdlBeAqH0Ef2MJ/aZaUm1mJL36im1QJNi0z39SWTj8= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:6664:8bd3:57fd:c83a]) (user=surenb job=sendgmr) by 2002:a25:ab82:0:b0:bb0:f056:cf43 with SMTP id v2-20020a25ab82000000b00bb0f056cf43mr7882151ybi.1.1687936687271; Wed, 28 Jun 2023 00:18:07 -0700 (PDT) Date: Wed, 28 Jun 2023 00:17:55 -0700 In-Reply-To: <20230628071800.544800-1-surenb@google.com> Mime-Version: 1.0 References: <20230628071800.544800-1-surenb@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230628071800.544800-2-surenb@google.com> Subject: [PATCH v4 1/6] swap: remove remnants of polling from read_swap_cache_async From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, surenb@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Christoph Hellwig Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Commit [1] introduced IO polling support duding swapin to reduce swap read latency for block devices that can be polled. However later commit [2] removed polling support. Therefore it seems safe to remove do_poll parameter in read_swap_cache_async and always call swap_readpage with synchronous=3Dfalse waiting for IO completion in folio_lock_or_retry. [1] commit 23955622ff8d ("swap: add block io poll in swapin path") [2] commit 9650b453a3d4 ("block: ignore RWF_HIPRI hint for sync dio") Suggested-by: "Huang, Ying" Signed-off-by: Suren Baghdasaryan Reviewed-by: "Huang, Ying" Reviewed-by: Christoph Hellwig --- mm/madvise.c | 4 ++-- mm/swap.h | 1 - mm/swap_state.c | 12 +++++------- 3 files changed, 7 insertions(+), 10 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index b5ffbaf616f5..b1e8adf1234e 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -215,7 +215,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned l= ong start, continue; =20 page =3D read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, - vma, index, false, &splug); + vma, index, &splug); if (page) put_page(page); } @@ -252,7 +252,7 @@ static void force_shm_swapin_readahead(struct vm_area_s= truct *vma, rcu_read_unlock(); =20 page =3D read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE, - NULL, 0, false, &splug); + NULL, 0, &splug); if (page) put_page(page); =20 diff --git a/mm/swap.h b/mm/swap.h index 7c033d793f15..8a3c7a0ace4f 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -46,7 +46,6 @@ struct folio *filemap_get_incore_folio(struct address_spa= ce *mapping, struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, - bool do_poll, struct swap_iocb **plug); struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, diff --git a/mm/swap_state.c b/mm/swap_state.c index b76a65ac28b3..a3839de71f3f 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -517,15 +517,14 @@ struct page *__read_swap_cache_async(swp_entry_t entr= y, gfp_t gfp_mask, */ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, - unsigned long addr, bool do_poll, - struct swap_iocb **plug) + unsigned long addr, struct swap_iocb **plug) { bool page_was_allocated; struct page *retpage =3D __read_swap_cache_async(entry, gfp_mask, vma, addr, &page_was_allocated); =20 if (page_was_allocated) - swap_readpage(retpage, do_poll, plug); + swap_readpage(retpage, false, plug); =20 return retpage; } @@ -620,7 +619,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, = gfp_t gfp_mask, struct swap_info_struct *si =3D swp_swap_info(entry); struct blk_plug plug; struct swap_iocb *splug =3D NULL; - bool do_poll =3D true, page_allocated; + bool page_allocated; struct vm_area_struct *vma =3D vmf->vma; unsigned long addr =3D vmf->address; =20 @@ -628,7 +627,6 @@ struct page *swap_cluster_readahead(swp_entry_t entry, = gfp_t gfp_mask, if (!mask) goto skip; =20 - do_poll =3D false; /* Read a page_cluster sized and aligned cluster around offset. */ start_offset =3D offset & ~mask; end_offset =3D offset | mask; @@ -660,7 +658,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, = gfp_t gfp_mask, lru_add_drain(); /* Push any new pages onto the LRU now */ skip: /* The page was likely read above, so no need for plugging here */ - return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll, NULL); + return read_swap_cache_async(entry, gfp_mask, vma, addr, NULL); } =20 int init_swap_address_space(unsigned int type, unsigned long nr_pages) @@ -825,7 +823,7 @@ static struct page *swap_vma_readahead(swp_entry_t fent= ry, gfp_t gfp_mask, skip: /* The page was likely read above, so no need for plugging here */ return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, - ra_info.win =3D=3D 1, NULL); + NULL); } =20 /** --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:40:45 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAC86EB64D7 for ; Wed, 28 Jun 2023 08:59:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232155AbjF1I7f (ORCPT ); Wed, 28 Jun 2023 04:59:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234384AbjF1IZ7 (ORCPT ); Wed, 28 Jun 2023 04:25:59 -0400 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9C5330E6 for ; Wed, 28 Jun 2023 01:18:17 -0700 (PDT) Received: by mail-ua1-x949.google.com with SMTP id a1e0cc1a2514c-78f20071722so484634241.0 for ; Wed, 28 Jun 2023 01:18:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687940297; x=1690532297; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=83rYaWFW6QNQhfB0rTt9RkG69OVxHHwKFqolT0xWQu0=; b=uD34f5+45avyvmcS0k/5rnqVFo+WNeLXDsL0ws/AxcFc6PxLPSOXrY7jDDkmLxMElT V2FL7xgbTE0NToHobzgopb6GxkHGQ//tBnqmD0MiCuPRG3aDLc//hl2OyRuIk4H8lHJE +6pPPV44KtQXrjX6ZVo2r9zBIhpwlJc9sds65PGOP08GBdzbncaGXJNky16yK+cnnYsg 3M2N4bHVRXrl7Z8b4dUDmJC2oL9UK5OtRKxABYx2h5fqgQUoocY8QnTvSxSr79ypVwLG GYydFooO/JH6RINQLIFfX+Zdk43NQVRq9znQcroyPs6Lt8a79BG75aKR3YnJmbw4Hpar w2Ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687940297; x=1690532297; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=83rYaWFW6QNQhfB0rTt9RkG69OVxHHwKFqolT0xWQu0=; b=AJoS6sFfZLupFpFnCkwFZlTffTR92iIyJYvX1JiiPajv8Qxw8c4LCKwwY808rSbcC/ DhvFBR9D4bbHPsA26L05zpjPJs+BqOgDMnurQjYFmL0Wi6cPWlwA0P7RMY4YkaBkgVib FBfnUbQJkD5Ac2GMtVrdZmPMqWM7HzVjnLCq8EIH66/GagZ3GugmGeC782YVazp6AAUx CIGpUCWbz7pAzXRoco8jxnvwCQxE45RFCy8Wpti4KrhZWGTRHxn0dHkiPlp+MNUfE2zR U6lnC5qV2TXuQIK/B96YMPC6U8LygSp73bllOIwH3KhEjQJkxtb+JprGZfGHpM2Fv0UZ Jhuw== X-Gm-Message-State: AC+VfDy+F6Q9NR3DsIeOvXOvaOwIe01auOntRDLloJYFPf23nQyI36si iRBbSJzDdbEbqJPFB1zfC2Hbl7HcBpE= X-Google-Smtp-Source: APBJJlEw1+A/dz6nNLU/xIoq936J+gnmtGHgb4WKOZkDIT7PH8urmhAvh0DMqKPzkAG85bCaYotcRGxUtQE= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:6664:8bd3:57fd:c83a]) (user=surenb job=sendgmr) by 2002:a5b:512:0:b0:c38:993b:3be5 with SMTP id o18-20020a5b0512000000b00c38993b3be5mr245ybp.0.1687936689805; Wed, 28 Jun 2023 00:18:09 -0700 (PDT) Date: Wed, 28 Jun 2023 00:17:56 -0700 In-Reply-To: <20230628071800.544800-1-surenb@google.com> Mime-Version: 1.0 References: <20230628071800.544800-1-surenb@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230628071800.544800-3-surenb@google.com> Subject: [PATCH v4 2/6] mm: add missing VM_FAULT_RESULT_TRACE name for VM_FAULT_COMPLETED From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, surenb@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" VM_FAULT_RESULT_TRACE should contain an element for every vm_fault_reason to be used as flag_array inside trace_print_flags_seq(). The element for VM_FAULT_COMPLETED is missing, add it. Signed-off-by: Suren Baghdasaryan Reviewed-by: Peter Xu --- include/linux/mm_types.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 306a3d1a0fa6..79765e3dd8f3 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1070,7 +1070,8 @@ enum vm_fault_reason { { VM_FAULT_RETRY, "RETRY" }, \ { VM_FAULT_FALLBACK, "FALLBACK" }, \ { VM_FAULT_DONE_COW, "DONE_COW" }, \ - { VM_FAULT_NEEDDSYNC, "NEEDDSYNC" } + { VM_FAULT_NEEDDSYNC, "NEEDDSYNC" }, \ + { VM_FAULT_COMPLETED, "COMPLETED" } =20 struct vm_special_mapping { const char *name; /* The name, e.g. "[vdso]". */ --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:40:45 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78966EB64D7 for ; Wed, 28 Jun 2023 08:36:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235416AbjF1Igx (ORCPT ); Wed, 28 Jun 2023 04:36:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234156AbjF1Ict (ORCPT ); Wed, 28 Jun 2023 04:32:49 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2EF719A6 for ; Wed, 28 Jun 2023 01:25:08 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-c118efd0c36so1463786276.0 for ; Wed, 28 Jun 2023 01:25:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687940708; x=1690532708; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=vpZ4UoIowr79k29KU4+AhJbPBUsrrQbVzV/sKUpKyik=; b=Y/BmyWEpuUjChZT5QdMMP23e5pjfNPIZ030KUHXF3YPb34eOidF7lQwnrjyKF0VLru yCn1PKNYo9pyiiY+Fa7ii3YrTr/SVuMiXElIIztvzuFRH0vPTurvtANRAe3kmh5rzI8F nIi7cfvOCTDDkhzjaYgm4FXGtm1xvw7xnZNVEcZktnj+cx4AjS3jm/BNAaupd+A5sUZG TZwxtrT9Srz/yiCUYSHbU2wo2cll9XFfJhkMH0PFV5H2igsfhdEgD2xUaxvXw05t8Ywf NArOOkAjaoXb9Tv/FvLZZI3pIsEkSvdO+G3PQF210WwBo0qYEHRZFiIr+sQLSt4NWVVa /XXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687940708; x=1690532708; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vpZ4UoIowr79k29KU4+AhJbPBUsrrQbVzV/sKUpKyik=; b=Y+3iDgkHitDv5lso9vrjUh/v5PfDeqQZV9R1+3VCmpjACelYEKvfE9hgngaRn3N4Lr jSuHwFwMrb6pHuuFfyPgctCO+cmUHFEPPlLQbLNBDF1G04jOG7/2AS+Ue6HaCQ+e/xed d0J2rQVGlkZ94SNaJ2IQ+z3sVCbKnyVwYVJtVvB1WJ5WqPAmVXh6OL5B/o44HKDBCucF 0ip+3gZwd58N4rZWlb8WtiYepRO2J5sAw1pgjnDE0MrhQ/XhBN6xKGzBdZzFbSWm6ZSk B7DiH6rKwZUBaIRs13lwGaWFrIrYRitO67jwMtbcF1WU0Equ3wO6+4bLrKxCTHenGpYA 4qzw== X-Gm-Message-State: ABy/qLZD6j967UXlrASFi8TwprD/8WHwzq+xWTUODR1T8x6PBXKsw9rU OWK22kLVY3kTjVq5xojOMnyr49AAZLc= X-Google-Smtp-Source: APBJJlFgJ2jRK81tWxhk4XDKf3t6TTIa6oT5N32m+JJJWdx/Y3G1ED6lRjtvxsVjVQo+TveZPkRF5dvhPUU= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:6664:8bd3:57fd:c83a]) (user=surenb job=sendgmr) by 2002:a05:6902:1105:b0:bc3:cdb7:4ec8 with SMTP id o5-20020a056902110500b00bc3cdb74ec8mr7690ybu.6.1687936691637; Wed, 28 Jun 2023 00:18:11 -0700 (PDT) Date: Wed, 28 Jun 2023 00:17:57 -0700 In-Reply-To: <20230628071800.544800-1-surenb@google.com> Mime-Version: 1.0 References: <20230628071800.544800-1-surenb@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230628071800.544800-4-surenb@google.com> Subject: [PATCH v4 3/6] mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, surenb@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" handle_mm_fault returning VM_FAULT_RETRY or VM_FAULT_COMPLETED means mmap_lock has been released. However with per-VMA locks behavior is different and the caller should still release it. To make the rules consistent for the caller, drop the per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED. Currently the only path returning VM_FAULT_RETRY under per-VMA locks is do_swap_page and no path returns VM_FAULT_COMPLETED for now. Signed-off-by: Suren Baghdasaryan Acked-by: Peter Xu --- arch/arm64/mm/fault.c | 3 ++- arch/powerpc/mm/fault.c | 3 ++- arch/s390/mm/fault.c | 3 ++- arch/x86/mm/fault.c | 3 ++- mm/memory.c | 1 + 5 files changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index c85b6d70b222..9c06c53a9ff3 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -612,7 +612,8 @@ static int __kprobes do_page_fault(unsigned long far, u= nsigned long esr, goto lock_mmap; } fault =3D handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, regs= ); - vma_end_read(vma); + if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED))) + vma_end_read(vma); =20 if (!(fault & VM_FAULT_RETRY)) { count_vm_vma_lock_event(VMA_LOCK_SUCCESS); diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 531177a4ee08..4697c5dca31c 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -494,7 +494,8 @@ static int ___do_page_fault(struct pt_regs *regs, unsig= ned long address, } =20 fault =3D handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs= ); - vma_end_read(vma); + if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED))) + vma_end_read(vma); =20 if (!(fault & VM_FAULT_RETRY)) { count_vm_vma_lock_event(VMA_LOCK_SUCCESS); diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index b65144c392b0..cccefe41038b 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -418,7 +418,8 @@ static inline vm_fault_t do_exception(struct pt_regs *r= egs, int access) goto lock_mmap; } fault =3D handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs= ); - vma_end_read(vma); + if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED))) + vma_end_read(vma); if (!(fault & VM_FAULT_RETRY)) { count_vm_vma_lock_event(VMA_LOCK_SUCCESS); goto out; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index e4399983c50c..d69c85c1c04e 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1347,7 +1347,8 @@ void do_user_addr_fault(struct pt_regs *regs, goto lock_mmap; } fault =3D handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs= ); - vma_end_read(vma); + if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED))) + vma_end_read(vma); =20 if (!(fault & VM_FAULT_RETRY)) { count_vm_vma_lock_event(VMA_LOCK_SUCCESS); diff --git a/mm/memory.c b/mm/memory.c index f69fbc251198..f14d45957b83 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3713,6 +3713,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) =20 if (vmf->flags & FAULT_FLAG_VMA_LOCK) { ret =3D VM_FAULT_RETRY; + vma_end_read(vma); goto out; } =20 --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:40:45 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E646FEB64DA for ; Wed, 28 Jun 2023 08:42:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235782AbjF1ImM (ORCPT ); Wed, 28 Jun 2023 04:42:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234611AbjF1IjV (ORCPT ); Wed, 28 Jun 2023 04:39:21 -0400 Received: from mail-qt1-x84a.google.com (mail-qt1-x84a.google.com [IPv6:2607:f8b0:4864:20::84a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5726D3A90 for ; Wed, 28 Jun 2023 01:30:24 -0700 (PDT) Received: by mail-qt1-x84a.google.com with SMTP id d75a77b69052e-3ff2770311dso67259531cf.2 for ; Wed, 28 Jun 2023 01:30:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687941023; x=1690533023; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=70kkRXsfLM/8BYxAf0BjvmYkgZM27G9ch/j23+KotKk=; b=AEZDqgBoFVjXU3y/7TV5VuKiyMdJjz5LdugNdq6UPH4dqhts9B5ssFe+372kY6VRPY wRv20QLNQA9pv7bToi/NasIjiBHtbtN4Vi+lU3YiZgDCroFuZwPeMEfwEba1vRKMlzaS mOBLT5OFkq4f8N1ktlQxuQYjmvgDDA0dcNz460leXvFX/K1ZopTrSuKUFC9RE+NWG/BX K6N6DV6xR9gKCFGc0xPfGw3/rBWYpGy/oafHlOwnetz6pZxCTifM4Xc0QkVqka20AhRo logKLiUNUEQE2dEAkkDGBHFfu1IxpqKIwmBXAnIVC6Wh//tWhIUw80py89fFdmJKMOHd a0tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687941023; x=1690533023; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=70kkRXsfLM/8BYxAf0BjvmYkgZM27G9ch/j23+KotKk=; b=VOvtNryBhJjWE7ZTzLPDMMbm6lu911VEiktH6NIVoPjnc7RTHAYucsknmmu86MN6ob K5zmunz5PuX7YZNTAb/IGBrSkU9wXSlWQeTK5xzyTy6/1k+yCnrNR93h4S7+tcsg+wNh Knn70RP/uAx47L0WuC+XD+YIcOmKEuwQGDfRXcy7fCM3FervvxJt1fcY0at60PVPg2P+ erIpCSbUYDF1KKOetBmPKASeDCSqCa+NLifkMGuda6yp6BZTtEiYfhWPOViH2zNuvTWs itayOiaFsRLN/zzk27uSpPFM6dXVtKAg9Uj0vXkKG1jXhdR3KHluHXwN9JqJVPMBr4Ql R5gw== X-Gm-Message-State: AC+VfDxeBE5RQaWPmPJjZrJLFRZVE55FkweLlbgDj56UUUYRg1QAIAJS 78jAX1LXFS87wCIxNaZX/maXOTKuX2g= X-Google-Smtp-Source: ACHHUZ6OQi+GrrsFt6L2aYE+Bh/8JjB2CJChJJQK4V9w8+uL61Sq3ukd625ySw4N4Q1XugZY6mxhmwtxYNQ= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:6664:8bd3:57fd:c83a]) (user=surenb job=sendgmr) by 2002:a25:11c4:0:b0:c2a:b486:1085 with SMTP id 187-20020a2511c4000000b00c2ab4861085mr2504239ybr.10.1687936693767; Wed, 28 Jun 2023 00:18:13 -0700 (PDT) Date: Wed, 28 Jun 2023 00:17:58 -0700 In-Reply-To: <20230628071800.544800-1-surenb@google.com> Mime-Version: 1.0 References: <20230628071800.544800-1-surenb@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230628071800.544800-5-surenb@google.com> Subject: [PATCH v4 4/6] mm: change folio_lock_or_retry to use vm_fault directly From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, surenb@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Change folio_lock_or_retry to accept vm_fault struct and return the vm_fault_t directly. Suggested-by: Matthew Wilcox Signed-off-by: Suren Baghdasaryan Acked-by: Peter Xu --- include/linux/pagemap.h | 9 ++++----- mm/filemap.c | 22 ++++++++++++---------- mm/memory.c | 14 ++++++-------- 3 files changed, 22 insertions(+), 23 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index a56308a9d1a4..59d070c55c97 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -896,8 +896,7 @@ static inline bool wake_page_match(struct wait_page_que= ue *wait_page, =20 void __folio_lock(struct folio *folio); int __folio_lock_killable(struct folio *folio); -bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, - unsigned int flags); +vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf= ); void unlock_page(struct page *page); void folio_unlock(struct folio *folio); =20 @@ -1001,11 +1000,11 @@ static inline int folio_lock_killable(struct folio = *folio) * Return value and mmap_lock implications depend on flags; see * __folio_lock_or_retry(). */ -static inline bool folio_lock_or_retry(struct folio *folio, - struct mm_struct *mm, unsigned int flags) +static inline vm_fault_t folio_lock_or_retry(struct folio *folio, + struct vm_fault *vmf) { might_sleep(); - return folio_trylock(folio) || __folio_lock_or_retry(folio, mm, flags); + return folio_trylock(folio) ? 0 : __folio_lock_or_retry(folio, vmf); } =20 /* diff --git a/mm/filemap.c b/mm/filemap.c index 00f01d8ead47..52bcf12dcdbf 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1701,32 +1701,34 @@ static int __folio_lock_async(struct folio *folio, = struct wait_page_queue *wait) =20 /* * Return values: - * true - folio is locked; mmap_lock is still held. - * false - folio is not locked. + * 0 - folio is locked. + * VM_FAULT_RETRY - folio is not locked. * mmap_lock has been released (mmap_read_unlock(), unless flags had b= oth * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in * which case mmap_lock is still held. * - * If neither ALLOW_RETRY nor KILLABLE are set, will always return true + * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0 * with the folio locked and the mmap_lock unperturbed. */ -bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, - unsigned int flags) +vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf) { + struct mm_struct *mm =3D vmf->vma->vm_mm; + unsigned int flags =3D vmf->flags; + if (fault_flag_allow_retry_first(flags)) { /* * CAUTION! In this case, mmap_lock is not released - * even though return 0. + * even though return VM_FAULT_RETRY. */ if (flags & FAULT_FLAG_RETRY_NOWAIT) - return false; + return VM_FAULT_RETRY; =20 mmap_read_unlock(mm); if (flags & FAULT_FLAG_KILLABLE) folio_wait_locked_killable(folio); else folio_wait_locked(folio); - return false; + return VM_FAULT_RETRY; } if (flags & FAULT_FLAG_KILLABLE) { bool ret; @@ -1734,13 +1736,13 @@ bool __folio_lock_or_retry(struct folio *folio, str= uct mm_struct *mm, ret =3D __folio_lock_killable(folio); if (ret) { mmap_read_unlock(mm); - return false; + return VM_FAULT_RETRY; } } else { __folio_lock(folio); } =20 - return true; + return 0; } =20 /** diff --git a/mm/memory.c b/mm/memory.c index f14d45957b83..345080052003 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3568,6 +3568,7 @@ static vm_fault_t remove_device_exclusive_entry(struc= t vm_fault *vmf) struct folio *folio =3D page_folio(vmf->page); struct vm_area_struct *vma =3D vmf->vma; struct mmu_notifier_range range; + vm_fault_t ret; =20 /* * We need a reference to lock the folio because we don't hold @@ -3580,9 +3581,10 @@ static vm_fault_t remove_device_exclusive_entry(stru= ct vm_fault *vmf) if (!folio_try_get(folio)) return 0; =20 - if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) { + ret =3D folio_lock_or_retry(folio, vmf); + if (ret) { folio_put(folio); - return VM_FAULT_RETRY; + return ret; } mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, vma->vm_mm, vmf->address & PAGE_MASK, @@ -3704,7 +3706,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) bool exclusive =3D false; swp_entry_t entry; pte_t pte; - int locked; vm_fault_t ret =3D 0; void *shadow =3D NULL; =20 @@ -3826,12 +3827,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out_release; } =20 - locked =3D folio_lock_or_retry(folio, vma->vm_mm, vmf->flags); - - if (!locked) { - ret |=3D VM_FAULT_RETRY; + ret |=3D folio_lock_or_retry(folio, vmf); + if (ret & VM_FAULT_RETRY) goto out_release; - } =20 if (swapcache) { /* --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:40:45 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4EBCEB64D7 for ; Wed, 28 Jun 2023 08:29:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233537AbjF1I3S (ORCPT ); Wed, 28 Jun 2023 04:29:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41402 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234754AbjF1I06 (ORCPT ); Wed, 28 Jun 2023 04:26:58 -0400 Received: from mail-qt1-x849.google.com (mail-qt1-x849.google.com [IPv6:2607:f8b0:4864:20::849]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 906C93A90 for ; Wed, 28 Jun 2023 01:19:32 -0700 (PDT) Received: by mail-qt1-x849.google.com with SMTP id d75a77b69052e-402cd359b19so12516041cf.2 for ; Wed, 28 Jun 2023 01:19:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687940371; x=1690532371; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=b7MqQHJHdH35CYJ6Jb1zpJ4/eRYK1pcerh43dShkH+M=; b=5ask6lvBO8MfRNdFMGTOVpDmCKYeI7+nECY3mTNiIVt5DulFZta5fENZSDbFabf8Vk KUl1QkUb9usX/GJIbs6FRzUQfWi7+b0k+mtWKEobFX/IjjXb6egqu8/Si/rqQSNoPvzP Gvorcbbe4/BCVRMXx0m4kN/o84Ghp7oZw1clfye79NjkXM/xrlWjONM2CtBa/WQtj930 2W8OXisAh35+ZPW5kfXTZXeU8C6LaNm5MOinvi749Zl8mf4C1vs4sMAxxTcLIq+ru3Oe 8h0E5L6eo4l/rsc4f99jkMxIbgRlh5ruFcFAJF3QINu9bmfc1lBZJ6SRrbfg8aRANGdJ 8iuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687940371; x=1690532371; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=b7MqQHJHdH35CYJ6Jb1zpJ4/eRYK1pcerh43dShkH+M=; b=ZKtKPF1QMtZIPs4qAUwbQqAyfnywL3QqGMvpiPuFlY7za24+xs7zpmxpj3b+rhHOzf CtqguM6MR3R+LSIY5IQU6ankgZq+1apFIddmMgrAZbkvJXD4K9Qzb964ntIU7NAE9NEU IqbkMYl301NtJZq6+gH6rQNkJmpXbpgiuJknGK2aS6OKjKNvcL5lVSlMYaXncfuh4cA2 IrCTaxrHl8M+eKKfDYg/9yM+K4WKYPUv9BFJD28FiBQpZPAkPbH6DjNPIccjUo3xBQKT bsVfclCnYpBn72LW0R3wUKJ6kmHHdDg9tdtDNCcxVvH1Ne4RgbOBR8HyF6eUSnPcQJNS pCMA== X-Gm-Message-State: AC+VfDyE/MxczDOvAc192ilUTCkfsCp1gl9Kn0X53mBx32l+WLR/4Tyk U1GE+RPZhQI9Bs9ZRDOtT5Ukw0NFjK0= X-Google-Smtp-Source: ACHHUZ5p7K5vwu3GLTcOloeUXHu3uFg6BFvaVURIKfx6u7AbyQoKOvh6cJprBsCrY0QmdnAunXGn4WazRWI= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:6664:8bd3:57fd:c83a]) (user=surenb job=sendgmr) by 2002:a81:af22:0:b0:559:c032:eb5e with SMTP id n34-20020a81af22000000b00559c032eb5emr14112427ywh.1.1687936696048; Wed, 28 Jun 2023 00:18:16 -0700 (PDT) Date: Wed, 28 Jun 2023 00:17:59 -0700 In-Reply-To: <20230628071800.544800-1-surenb@google.com> Mime-Version: 1.0 References: <20230628071800.544800-1-surenb@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230628071800.544800-6-surenb@google.com> Subject: [PATCH v4 5/6] mm: handle swap page faults under per-VMA lock From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, surenb@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When page fault is handled under per-VMA lock protection, all swap page faults are retried with mmap_lock because folio_lock_or_retry has to drop and reacquire mmap_lock if folio could not be immediately locked. Follow the same pattern as mmap_lock to drop per-VMA lock when waiting for folio and retrying once folio is available. With this obstacle removed, enable do_swap_page to operate under per-VMA lock protection. Drivers implementing ops->migrate_to_ram might still rely on mmap_lock, therefore we have to fall back to mmap_lock in that particular case. Note that the only time do_swap_page calls synchronous swap_readpage is when SWP_SYNCHRONOUS_IO is set, which is only set for QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and pmem). Therefore we don't sleep in this path, and there's no need to drop the mmap or per-VMA lock. Signed-off-by: Suren Baghdasaryan Acked-by: Peter Xu --- mm/filemap.c | 25 ++++++++++++++++--------- mm/memory.c | 16 ++++++++++------ 2 files changed, 26 insertions(+), 15 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 52bcf12dcdbf..7ee078e1a0d2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1699,31 +1699,38 @@ static int __folio_lock_async(struct folio *folio, = struct wait_page_queue *wait) return ret; } =20 +static void release_fault_lock(struct vm_fault *vmf) +{ + if (vmf->flags & FAULT_FLAG_VMA_LOCK) + vma_end_read(vmf->vma); + else + mmap_read_unlock(vmf->vma->vm_mm); +} + /* * Return values: * 0 - folio is locked. * VM_FAULT_RETRY - folio is not locked. - * mmap_lock has been released (mmap_read_unlock(), unless flags had b= oth - * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in - * which case mmap_lock is still held. + * mmap_lock or per-VMA lock has been released (mmap_read_unlock() or + * vma_end_read()), unless flags had both FAULT_FLAG_ALLOW_RETRY and + * FAULT_FLAG_RETRY_NOWAIT set, in which case the lock is still held. * * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0 - * with the folio locked and the mmap_lock unperturbed. + * with the folio locked and the mmap_lock/per-VMA lock is left unperturbe= d. */ vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf) { - struct mm_struct *mm =3D vmf->vma->vm_mm; unsigned int flags =3D vmf->flags; =20 if (fault_flag_allow_retry_first(flags)) { /* - * CAUTION! In this case, mmap_lock is not released - * even though return VM_FAULT_RETRY. + * CAUTION! In this case, mmap_lock/per-VMA lock is not + * released even though returning VM_FAULT_RETRY. */ if (flags & FAULT_FLAG_RETRY_NOWAIT) return VM_FAULT_RETRY; =20 - mmap_read_unlock(mm); + release_fault_lock(vmf); if (flags & FAULT_FLAG_KILLABLE) folio_wait_locked_killable(folio); else @@ -1735,7 +1742,7 @@ vm_fault_t __folio_lock_or_retry(struct folio *folio,= struct vm_fault *vmf) =20 ret =3D __folio_lock_killable(folio); if (ret) { - mmap_read_unlock(mm); + release_fault_lock(vmf); return VM_FAULT_RETRY; } } else { diff --git a/mm/memory.c b/mm/memory.c index 345080052003..76c7907e7286 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3712,12 +3712,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!pte_unmap_same(vmf)) goto out; =20 - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { - ret =3D VM_FAULT_RETRY; - vma_end_read(vma); - goto out; - } - entry =3D pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { if (is_migration_entry(entry)) { @@ -3727,6 +3721,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vmf->page =3D pfn_swap_entry_to_page(entry); ret =3D remove_device_exclusive_entry(vmf); } else if (is_device_private_entry(entry)) { + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { + /* + * migrate_to_ram is not yet ready to operate + * under VMA lock. + */ + vma_end_read(vma); + ret |=3D VM_FAULT_RETRY; + goto out; + } + vmf->page =3D pfn_swap_entry_to_page(entry); vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:40:45 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36BF1EB64D7 for ; Wed, 28 Jun 2023 08:39:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234351AbjF1Ii6 (ORCPT ); Wed, 28 Jun 2023 04:38:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235234AbjF1Ift (ORCPT ); Wed, 28 Jun 2023 04:35:49 -0400 Received: from mail-oo1-xc4a.google.com (mail-oo1-xc4a.google.com [IPv6:2607:f8b0:4864:20::c4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3468F3582 for ; Wed, 28 Jun 2023 01:27:08 -0700 (PDT) Received: by mail-oo1-xc4a.google.com with SMTP id 006d021491bc7-56340665b09so2716650eaf.2 for ; Wed, 28 Jun 2023 01:27:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687940827; x=1690532827; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=P/PrTNNebDD3ET79IWIr+QJG2wgn5QmJDayPL9+A73A=; b=HgU6PDIaGLI3i9IQDTlFfVp8Tq3qnFFEnHPHIV/6IN3ChvwT+7vqljCmxVTAopTOPx ZyZhwz0qkw1w5pQL05SM+sYKlVqVWYoMTfgCJb20YfVCqVAbM/Qs194aLplgtFKc9VTZ n3AcFDQk6gsUiAnZ25G8NZtYoQ2drUX+Hil92tIFJyG+bsgp4enFYHyRRg7tRo0BUNrq qPSZLe+WsA07DYHM6UR718iLA0Q9TumlE3VuDFWgkcK/G37Pr7CATqjy53T/8HN01HLK 18CsOhqu6MGIzRk/rhwv+xZpxe2huEAdbt3eNQdax5ZQ1kH5sFtMmIWnzf+QoGLeCG1N u2uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687940827; x=1690532827; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=P/PrTNNebDD3ET79IWIr+QJG2wgn5QmJDayPL9+A73A=; b=kEwNEKokY+1ANb4H0nv+PvJSKEcFbAZDPRwDArCvIWK7yc0B7PEYgYmcI9wn/AiePf GCZ2Lw+87akLDLAMxL/2LRS6Lz1+HvDoiDfW3SgjzvwaDtGGsHW9I1GjT+bMmkfYdvgS hWPjA7tiT9tx9knRbaad0yTJboHwjIirSh8iUcLzYLzK/t4LxvySfOYe8UrLpHQSEk1p okwxK2DSb03QVoR4q330q/S6HSB2w3mJrEMJzWdko1zAVFiWqdiNhKW1TgQ6BsXPSRWL dXBxgD0pi5DUy1N681E7aQRL1SNlG5BDUickSNzhq3pX+vZThJoSlsDYZzipb5M6K+bW zUgg== X-Gm-Message-State: AC+VfDyjBW8orWKOz4+s1N3e0RJvOIXwz4p2HWjMXwFlPHio8TXVeOmz +4ZHU5xCsfHo3q+ouigR0yMxtZd6F2A= X-Google-Smtp-Source: ACHHUZ7aI7icl9yP+XTVAxjb/B56LWhJlYyY/bkxia+e7VEwFYxsmG4eFO4RUCr5qbQXOHl5LdpkQKpzOJA= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:6664:8bd3:57fd:c83a]) (user=surenb job=sendgmr) by 2002:a25:198a:0:b0:c00:a33:7 with SMTP id 132-20020a25198a000000b00c000a330007mr9718444ybz.8.1687936698504; Wed, 28 Jun 2023 00:18:18 -0700 (PDT) Date: Wed, 28 Jun 2023 00:18:00 -0700 In-Reply-To: <20230628071800.544800-1-surenb@google.com> Mime-Version: 1.0 References: <20230628071800.544800-1-surenb@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230628071800.544800-7-surenb@google.com> Subject: [PATCH v4 6/6] mm: handle userfaults under VMA lock From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, surenb@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Enable handle_userfault to operate under VMA lock by releasing VMA lock instead of mmap_lock and retrying. Note that FAULT_FLAG_RETRY_NOWAIT should never be used when handling faults under per-VMA lock protection because that would break the assumption that lock is dropped on retry. Signed-off-by: Suren Baghdasaryan --- fs/userfaultfd.c | 39 ++++++++++++++++++--------------------- include/linux/mm.h | 39 +++++++++++++++++++++++++++++++++++++++ mm/filemap.c | 8 -------- mm/memory.c | 9 --------- 4 files changed, 57 insertions(+), 38 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 4e800bb7d2ab..d019e7df6f15 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -277,17 +277,16 @@ static inline struct uffd_msg userfault_msg(unsigned = long address, * hugepmd ranges. */ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, - struct vm_area_struct *vma, - unsigned long address, - unsigned long flags, - unsigned long reason) + struct vm_fault *vmf, + unsigned long reason) { + struct vm_area_struct *vma =3D vmf->vma; pte_t *ptep, pte; bool ret =3D true; =20 - mmap_assert_locked(ctx->mm); + assert_fault_locked(ctx->mm, vmf); =20 - ptep =3D hugetlb_walk(vma, address, vma_mmu_pagesize(vma)); + ptep =3D hugetlb_walk(vma, vmf->address, vma_mmu_pagesize(vma)); if (!ptep) goto out; =20 @@ -308,10 +307,8 @@ static inline bool userfaultfd_huge_must_wait(struct u= serfaultfd_ctx *ctx, } #else static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, - struct vm_area_struct *vma, - unsigned long address, - unsigned long flags, - unsigned long reason) + struct vm_fault *vmf, + unsigned long reason) { return false; /* should never get here */ } @@ -325,11 +322,11 @@ static inline bool userfaultfd_huge_must_wait(struct = userfaultfd_ctx *ctx, * threads. */ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, - unsigned long address, - unsigned long flags, + struct vm_fault *vmf, unsigned long reason) { struct mm_struct *mm =3D ctx->mm; + unsigned long address =3D vmf->address; pgd_t *pgd; p4d_t *p4d; pud_t *pud; @@ -337,7 +334,7 @@ static inline bool userfaultfd_must_wait(struct userfau= ltfd_ctx *ctx, pte_t *pte; bool ret =3D true; =20 - mmap_assert_locked(mm); + assert_fault_locked(mm, vmf); =20 pgd =3D pgd_offset(mm, address); if (!pgd_present(*pgd)) @@ -445,7 +442,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsig= ned long reason) * Coredumping runs without mmap_lock so we can only check that * the mmap_lock is held, if PF_DUMPCORE was not set. */ - mmap_assert_locked(mm); + assert_fault_locked(mm, vmf); =20 ctx =3D vma->vm_userfaultfd_ctx.ctx; if (!ctx) @@ -522,8 +519,11 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsi= gned long reason) * and wait. */ ret =3D VM_FAULT_RETRY; - if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) + if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) { + /* Per-VMA lock is expected to be dropped on VM_FAULT_RETRY */ + BUG_ON(vmf->flags & FAULT_FLAG_RETRY_NOWAIT); goto out; + } =20 /* take the reference before dropping the mmap_lock */ userfaultfd_ctx_get(ctx); @@ -561,15 +561,12 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, uns= igned long reason) spin_unlock_irq(&ctx->fault_pending_wqh.lock); =20 if (!is_vm_hugetlb_page(vma)) - must_wait =3D userfaultfd_must_wait(ctx, vmf->address, vmf->flags, - reason); + must_wait =3D userfaultfd_must_wait(ctx, vmf, reason); else - must_wait =3D userfaultfd_huge_must_wait(ctx, vma, - vmf->address, - vmf->flags, reason); + must_wait =3D userfaultfd_huge_must_wait(ctx, vmf, reason); if (is_vm_hugetlb_page(vma)) hugetlb_vma_unlock_read(vma); - mmap_read_unlock(mm); + release_fault_lock(vmf); =20 if (likely(must_wait && !READ_ONCE(ctx->released))) { wake_up_poll(&ctx->fd_wqh, EPOLLIN); diff --git a/include/linux/mm.h b/include/linux/mm.h index fec149585985..70bb2f923e33 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -705,6 +705,17 @@ static inline bool vma_try_start_write(struct vm_area_= struct *vma) return true; } =20 +static inline void vma_assert_locked(struct vm_area_struct *vma) +{ + int mm_lock_seq; + + if (__is_vma_write_locked(vma, &mm_lock_seq)) + return; + + lockdep_assert_held(&vma->vm_lock->lock); + VM_BUG_ON_VMA(!rwsem_is_locked(&vma->vm_lock->lock), vma); +} + static inline void vma_assert_write_locked(struct vm_area_struct *vma) { int mm_lock_seq; @@ -723,6 +734,23 @@ static inline void vma_mark_detached(struct vm_area_st= ruct *vma, bool detached) struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, unsigned long address); =20 +static inline +void assert_fault_locked(struct mm_struct *mm, struct vm_fault *vmf) +{ + if (vmf->flags & FAULT_FLAG_VMA_LOCK) + vma_assert_locked(vmf->vma); + else + mmap_assert_locked(mm); +} + +static inline void release_fault_lock(struct vm_fault *vmf) +{ + if (vmf->flags & FAULT_FLAG_VMA_LOCK) + vma_end_read(vmf->vma); + else + mmap_read_unlock(vmf->vma->vm_mm); +} + #else /* CONFIG_PER_VMA_LOCK */ =20 static inline void vma_init_lock(struct vm_area_struct *vma) {} @@ -736,6 +764,17 @@ static inline void vma_assert_write_locked(struct vm_a= rea_struct *vma) {} static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) {} =20 +static inline +void assert_fault_locked(struct mm_struct *mm, struct vm_fault *vmf) +{ + mmap_assert_locked(mm); +} + +static inline void release_fault_lock(struct vm_fault *vmf) +{ + mmap_read_unlock(vmf->vma->vm_mm); +} + #endif /* CONFIG_PER_VMA_LOCK */ =20 /* diff --git a/mm/filemap.c b/mm/filemap.c index 7ee078e1a0d2..d4d8f474e0c5 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1699,14 +1699,6 @@ static int __folio_lock_async(struct folio *folio, s= truct wait_page_queue *wait) return ret; } =20 -static void release_fault_lock(struct vm_fault *vmf) -{ - if (vmf->flags & FAULT_FLAG_VMA_LOCK) - vma_end_read(vmf->vma); - else - mmap_read_unlock(vmf->vma->vm_mm); -} - /* * Return values: * 0 - folio is locked. diff --git a/mm/memory.c b/mm/memory.c index 76c7907e7286..c6c759922f39 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5294,15 +5294,6 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_= struct *mm, if (!vma_start_read(vma)) goto inval; =20 - /* - * Due to the possibility of userfault handler dropping mmap_lock, avoid - * it for now and fall back to page fault handling under mmap_lock. - */ - if (userfaultfd_armed(vma)) { - vma_end_read(vma); - goto inval; - } - /* Check since vm_start/vm_end might change before we lock the VMA */ if (unlikely(address < vma->vm_start || address >=3D vma->vm_end)) { vma_end_read(vma); --=20 2.41.0.162.gfafddb0af9-goog