From nobody Fri May 3 03:43:22 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A18AC7EE29 for ; Fri, 2 Jun 2023 23:06:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236806AbjFBXGw (ORCPT ); Fri, 2 Jun 2023 19:06:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236190AbjFBXGt (ORCPT ); Fri, 2 Jun 2023 19:06:49 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAC8BE41 for ; Fri, 2 Jun 2023 16:06:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685747161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LyXRkWA/Qyu248n5VoGdUO1zEkK7SG2+W3IdaIl8SN0=; b=baHed5dzHUU0Hc8DS2o2c9Kwlp0UBj6y4Or0HrZkCVJso7tvLYfJmWIbMiIEp/MTqZsscW uYDqEztQReM7qtnpZ786BfOfNrnTE6A4WS0fNFe27RiuWDz8sqQg6U4dZEzQ25uh4YUAKP ZBqDmfbqUXa0YzvttD5JAccyT5wIRgQ= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-180-tFWCiDYEN1qvNS7TYzwrOQ-1; Fri, 02 Jun 2023 19:06:00 -0400 X-MC-Unique: tFWCiDYEN1qvNS7TYzwrOQ-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6262552799aso3404776d6.0 for ; Fri, 02 Jun 2023 16:06:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685747159; x=1688339159; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LyXRkWA/Qyu248n5VoGdUO1zEkK7SG2+W3IdaIl8SN0=; b=C6bkR6fx2pYQc0s336zgtrjDwVZOsdUnkU0awZKc8Z6iJWyFZMrklGsxciZkoqKCNM SYijLYdIWvqbjXsxYycafQEYJqgWkW8PP9eikquXkWHKcKNVxyguaTHUFI5w9IF9fxeG WWAys/fsUKmhbyl3dxWbcHE21vdXDA7BfyLm3PbydSk0CNu/S3OJ6HMCXeXMW62uNUtf O3U6KXmvBb3cLxwJ36mgKl2R54zfih1lhCJ/CvF+8hIXJXmz/VepFXh3ivDNFjmKfijC phfHe+yjcT4EMqeVyM/6SN/NApPyGBLrsHYkXO/sLr45Qz4h94dXz0A+L9aIXvrbO0Of 8Qfw== X-Gm-Message-State: AC+VfDzQVeej2r1Fq+LCaazbWnu1D8+O+6ef7+Misx1X9qgNpiEHCHqG 5uMN1xSeGRi7SFXO/pf44UhauZfTpRpf4iaeK2SFj5RGeFpS5gBGSj4qJxrHciLsMGeOT0hYl6V LdYLkHsSksLyFhb05fBRIht0CkS8Qmptof3MMo4TfZakvSPZvevpFOZm67uqV0lPKDdfcFcBoXp iZGl9ylQ== X-Received: by 2002:a05:6214:d62:b0:625:88f5:7c62 with SMTP id 2-20020a0562140d6200b0062588f57c62mr13309546qvs.2.1685747158875; Fri, 02 Jun 2023 16:05:58 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5K/eCfNpPKf45g/q2kzlcqwhCRJBvumaZFGiMfEQCGequsfufGMuuFCvVjqmP0/3XBibmk/w== X-Received: by 2002:a05:6214:d62:b0:625:88f5:7c62 with SMTP id 2-20020a0562140d6200b0062588f57c62mr13309502qvs.2.1685747158456; Fri, 02 Jun 2023 16:05:58 -0700 (PDT) Received: from x1n.. (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id px13-20020a056214050d00b0062607ea6d01sm1400792qvb.50.2023.06.02.16.05.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jun 2023 16:05:57 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: David Hildenbrand , Alistair Popple , Andrew Morton , Andrea Arcangeli , "Kirill A . Shutemov" , Johannes Weiner , John Hubbard , Naoya Horiguchi , peterx@redhat.com, Muhammad Usama Anjum , Hugh Dickins , Mike Rapoport Subject: [PATCH 1/4] mm/mprotect: Retry on pmd_trans_unstable() Date: Fri, 2 Jun 2023 19:05:49 -0400 Message-Id: <20230602230552.350731-2-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230602230552.350731-1-peterx@redhat.com> References: <20230602230552.350731-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When hit unstable pmd, we should retry the pmd once more because it means we probably raced with a thp insertion. Skipping it might be a problem as no error will be reported to the caller. I assume it means the user will expect prot changed (e.g. mprotect or userfaultfd wr-protections) applied but it's actually not. To achieve it, move the pmd_trans_unstable() call out of change_pte_range() which will make the retry easier, as we can keep the retval of change_pte_range() untouched. Signed-off-by: Peter Xu --- mm/mprotect.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 92d3d3ca390a..e4756899d40c 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -94,15 +94,6 @@ static long change_pte_range(struct mmu_gather *tlb, =20 tlb_change_page_size(tlb, PAGE_SIZE); =20 - /* - * Can be called with only the mmap_lock for reading by - * prot_numa so we must check the pmd isn't constantly - * changing from under us from pmd_none to pmd_trans_huge - * and/or the other way around. - */ - if (pmd_trans_unstable(pmd)) - return 0; - /* * The pmd points to a regular pte so the pmd can't change * from under us even if the mmap_lock is only hold for @@ -411,6 +402,7 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, pages =3D ret; break; } +again: /* * Automatic NUMA balancing walks the tables with mmap_lock * held for read. It's possible a parallel update to occur @@ -465,6 +457,16 @@ static inline long change_pmd_range(struct mmu_gather = *tlb, } /* fall through, the trans huge pmd just split */ } + + /* + * Can be called with only the mmap_lock for reading by + * prot_numa or userfaultfd-wp, so we must check the pmd + * isn't constantly changing from under us from pmd_none to + * pmd_trans_huge and/or the other way around. + */ + if (pmd_trans_unstable(pmd)) + goto again; + pages +=3D change_pte_range(tlb, vma, pmd, addr, next, newprot, cp_flags); next: --=20 2.40.1 From nobody Fri May 3 03:43:22 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74B58C7EE29 for ; Fri, 2 Jun 2023 23:06:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236843AbjFBXGy (ORCPT ); Fri, 2 Jun 2023 19:06:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236796AbjFBXGv (ORCPT ); Fri, 2 Jun 2023 19:06:51 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB83BE40 for ; Fri, 2 Jun 2023 16:06:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685747163; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kxigdMYXU7S1zIuZO/BNG/wM9sX6VFUiTJ8CdWEAXIc=; b=Pkjwsb0TYjCl2JucYwoEr1xtnZgkwH1BZYGvRIEzLwtlHWyJXQzDP4Eh+QUVNDNiNHeAca Q3vKWV1rKuiYkmV3DG5qggNrBXNNRu/RbTP+93co/H02ujyyPXz2HBxgXS0D7djI+TZ/Vt PvifEge0n9+meo31sMdy5JpI9sh8wdE= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-534-04oU9R1PPZ6WwMYtI_0zuQ-1; Fri, 02 Jun 2023 19:06:02 -0400 X-MC-Unique: 04oU9R1PPZ6WwMYtI_0zuQ-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-626380298d3so5604826d6.0 for ; Fri, 02 Jun 2023 16:06:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685747161; x=1688339161; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kxigdMYXU7S1zIuZO/BNG/wM9sX6VFUiTJ8CdWEAXIc=; b=i5TdcrqQmsbZRdMUeDdGIcER7iL2HTHxs1H2Q50roqbUrJEZmVaF0Q9yg2oOEPl6/9 6CmjkXu3BKbEftJDMeTVkaKNgsOZ7t2d8LX7K9tTTEAJYVxEKOdrTImSLzcPrnHi3YlZ veDWnGG3UxCacyTwXYJXkrqpVVnrtmVc7bask8ihMRKW1xZ62Cnw86iZAP0WLLjlILo/ Xj1hwsCb9deeso/7dQuttWUbXsyStkd+mHVFAF4zoPe3Is8Wu7vjJ0pmN6QQj1ThNWzS DI1T+buVSCZR4KT4bWCAdjJg+vm8AfIxy1SXTCqdMX1OJvPCHFXp1THXy1V+EQXIzi16 pJ8A== X-Gm-Message-State: AC+VfDxm9JOBecp+G/L0NCSYDlvf3B51k8fxWRohRLulYKExW1F+8HPN FEYOS9u8fPR3Oq67+cfJl1DvDLVFJGc/Sp+L6OacUZWKBHAiKzRgOGvBJxkVOFS1WeVMRANW1oy P2A4KX48qEOM/Sa5T7c44SmbIY2IbmcldzM+yfoOIs9CG5rk/LnfcGFElwDFDZhHDdrT5iTTSW6 VnukgEOA== X-Received: by 2002:a05:6214:509b:b0:61b:2111:c2e2 with SMTP id kk27-20020a056214509b00b0061b2111c2e2mr11674452qvb.2.1685747161381; Fri, 02 Jun 2023 16:06:01 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7jREXzIBfJ2jGu2Th1XhPhOEnrLhoc+Gdd624qpUgrWRSHyWtIXLUeI1ddBAENoN0t4K9xNg== X-Received: by 2002:a05:6214:509b:b0:61b:2111:c2e2 with SMTP id kk27-20020a056214509b00b0061b2111c2e2mr11674414qvb.2.1685747161031; Fri, 02 Jun 2023 16:06:01 -0700 (PDT) Received: from x1n.. (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id px13-20020a056214050d00b0062607ea6d01sm1400792qvb.50.2023.06.02.16.05.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jun 2023 16:06:00 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: David Hildenbrand , Alistair Popple , Andrew Morton , Andrea Arcangeli , "Kirill A . Shutemov" , Johannes Weiner , John Hubbard , Naoya Horiguchi , peterx@redhat.com, Muhammad Usama Anjum , Hugh Dickins , Mike Rapoport Subject: [PATCH 2/4] mm/migrate: Unify and retry an unstable pmd when hit Date: Fri, 2 Jun 2023 19:05:50 -0400 Message-Id: <20230602230552.350731-3-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230602230552.350731-1-peterx@redhat.com> References: <20230602230552.350731-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There's one pmd_bad() check, but should be better to use pmd_clear_bad() which is part of pmd_trans_unstable(). And I assume that's not enough, because there can be race of thp insertion when reaching pmd_bad(), so it can be !bad but a thp, then the walk is illegal. There's one case though where the function used pmd_trans_unstable() but only for pmd split. Merge them into one, and if it happens retry the whole pmd. Cc: Alistair Popple Cc: John Hubbard Signed-off-by: Peter Xu --- mm/migrate_device.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index d30c9de60b0d..6fc54c053c05 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -83,9 +83,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (is_huge_zero_page(page)) { spin_unlock(ptl); split_huge_pmd(vma, pmdp, addr); - if (pmd_trans_unstable(pmdp)) - return migrate_vma_collect_skip(start, end, - walk); } else { int ret; =20 @@ -106,8 +103,10 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, } } =20 - if (unlikely(pmd_bad(*pmdp))) - return migrate_vma_collect_skip(start, end, walk); + if (unlikely(pmd_trans_unstable(pmdp))) { + walk->action =3D ACTION_AGAIN; + return 0; + } =20 ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); arch_enter_lazy_mmu_mode(); --=20 2.40.1 From nobody Fri May 3 03:43:22 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A354AC7EE29 for ; Fri, 2 Jun 2023 23:06:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236850AbjFBXG6 (ORCPT ); Fri, 2 Jun 2023 19:06:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236799AbjFBXGv (ORCPT ); Fri, 2 Jun 2023 19:06:51 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96886E42 for ; Fri, 2 Jun 2023 16:06:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685747165; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7/N76tvOnLlaQAuSL3pN8eRsyONhrct8MBiAnZh2TuI=; b=h087KohY/9Zk9c75R/dGcIsgkLLMfhctZS6UBnp1D5BgCWuvn4B1TtjU1vEyuPYO1cOY7H qqUbedAEGMO2h8G6opHop4CwP2sC8eaD1WpN4I6Pyh9d49fn/xEy4hBru0aQ0aQhoHMGAT kC/i3cFfoqZd9Us07EDM6mO7S0xUx6M= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-586-rNkeITVuMGi6grhx4f1C2A-1; Fri, 02 Jun 2023 19:06:05 -0400 X-MC-Unique: rNkeITVuMGi6grhx4f1C2A-1 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-3f6b46e281eso3409491cf.0 for ; Fri, 02 Jun 2023 16:06:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685747164; x=1688339164; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7/N76tvOnLlaQAuSL3pN8eRsyONhrct8MBiAnZh2TuI=; b=aEd/DYkU0Rj2K0wfTT+pQpeRKlwGMseKpAGwrHs4xvUxngTvbrjJSockYWEAhtFNe5 DugE2DAfJhNrJ5yOsaEaV2HQXHTpvTiqpt5JvTvxhV7hXme61plLuiT1CGN2L/WLQLMm l5To6mbVUvO6hyzOV2HsUeTa7BnuQP9Gi+ktB5dmfexzV6ABzyCaU33h4mt1lTj3aOJz 3qPodjmw7gkmc691d1yhJhw4FAALbvEsutePjtWEPO55jAwT+RbzvG/TfmPSOWhaF6f1 MIEpcpSh++QzzE4LmIGVpZS4yRx5IU3q0T01o9BGWpMxfXPHjGbRsNAtF/x7N0mx2HOl H2eA== X-Gm-Message-State: AC+VfDz+4XdPJg9voqAlMUn44FhCEc8uKXEh/9rDD0aHqXJkxOU1MdAk 6kK+UQBphq2urOxEBP9aaOwePXcXLTpesX407yZpMqaEA9obIsBnt1wn8Mm7+2fz5cif0/vV475 PyyPv5pIpEmHMDNA36mT1NiXLdGI7M7br691PTB0Y9DcEAMKDzwQCS0dWrQaT23EF2SCRw5oIgl 6J+lrerw== X-Received: by 2002:a05:6214:21a5:b0:621:bf0:7609 with SMTP id t5-20020a05621421a500b006210bf07609mr13309574qvc.0.1685747163782; Fri, 02 Jun 2023 16:06:03 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ72XzP/ydKQWH7l9Z4c25d5qXfP9a6aS2o8XV1w7zadBGgW02t+vCAEiSwKG40CQBv8aJjIeQ== X-Received: by 2002:a05:6214:21a5:b0:621:bf0:7609 with SMTP id t5-20020a05621421a500b006210bf07609mr13309535qvc.0.1685747163387; Fri, 02 Jun 2023 16:06:03 -0700 (PDT) Received: from x1n.. (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id px13-20020a056214050d00b0062607ea6d01sm1400792qvb.50.2023.06.02.16.06.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jun 2023 16:06:02 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: David Hildenbrand , Alistair Popple , Andrew Morton , Andrea Arcangeli , "Kirill A . Shutemov" , Johannes Weiner , John Hubbard , Naoya Horiguchi , peterx@redhat.com, Muhammad Usama Anjum , Hugh Dickins , Mike Rapoport Subject: [PATCH 3/4] mm: Warn for unstable pmd in move_page_tables() Date: Fri, 2 Jun 2023 19:05:51 -0400 Message-Id: <20230602230552.350731-4-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230602230552.350731-1-peterx@redhat.com> References: <20230602230552.350731-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We already hold write mmap lock here, not possible to trigger unstable pmd. Make it a WARN_ON_ONCE instead. Cc: Naoya Horiguchi Signed-off-by: Peter Xu --- mm/mremap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/mremap.c b/mm/mremap.c index da107f2c71bf..9303e4da4e7f 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -544,8 +544,8 @@ unsigned long move_page_tables(struct vm_area_struct *v= ma, old_pmd, new_pmd, need_rmap_locks)) continue; split_huge_pmd(vma, old_pmd, old_addr); - if (pmd_trans_unstable(old_pmd)) - continue; + /* We're with the mmap write lock, not possible to happen */ + WARN_ON_ONCE(pmd_trans_unstable(old_pmd)); } else if (IS_ENABLED(CONFIG_HAVE_MOVE_PMD) && extent =3D=3D PMD_SIZE) { /* --=20 2.40.1 From nobody Fri May 3 03:43:22 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FF0BC7EE2C for ; Fri, 2 Jun 2023 23:07:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236827AbjFBXHG (ORCPT ); Fri, 2 Jun 2023 19:07:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236833AbjFBXGy (ORCPT ); Fri, 2 Jun 2023 19:06:54 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EF67E43 for ; Fri, 2 Jun 2023 16:06:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685747168; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g6wkyDZjE3AkPJxDBxHTBEFEzJVoOQ4nWaImAmEi0sc=; b=aqePeWpru3LXVdZKoBm87b0ERXzgXoP2A5VtWC0+DDPI3ApDEKF8XXU5BKQQhsmhF0l1oJ JxJwAHlejvvK8zdVSJf697LOdmzoUx/NbPX7YLdpq0Ae3JFIDuCeChxAm5MXv36xoeXB2S KY9Mg4bhfO/Hcrq0ZeZans2DCd0rh5c= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-434-HxqvD66iMJSIa2WdzzSNMQ-1; Fri, 02 Jun 2023 19:06:07 -0400 X-MC-Unique: HxqvD66iMJSIa2WdzzSNMQ-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-626204b0663so5584996d6.1 for ; Fri, 02 Jun 2023 16:06:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685747166; x=1688339166; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g6wkyDZjE3AkPJxDBxHTBEFEzJVoOQ4nWaImAmEi0sc=; b=WgOEWhzRJErbvF+MbNB2WT10QGVg9s6JbN/0y8Lq0Mzl6MoBqdHmlwdjloNRAai/GT LvAZKPt//RcpG2KtvhmDeKynbedzxpZQLnxfxNCGkezCnRI4N8i7uZ056eTDH2HM4v2k y++Jl4qfGzmBRUkzTAiaibdJ5AnnSVBOJf8CjKyS3qZqpGeWCMs8rgbesftsqJTks516 mMrg/0WaI3/HDK9amjJz9D7ZeWtx1iTkKh/uNe4GsZhU0hADOKlILRtTkvoFH9UPNCaa pkOfNasj+OJcVn9AxGDRy3F/GhrQPtYI5SmSZ6oUh3WJXg7cFEHADLAN0YQJh6XFPL/L FtKQ== X-Gm-Message-State: AC+VfDzNoJ1m0dSkCscJEfAsTuuUGiTDSL6vmoA6nfXON3Y2Q+gtD3Id ktg1XaAc9AhKNVWb6P7NASen6NJtsqfn5cK/CWQK3gXUWRpwgiuJ7nRD5qMAr3qOhLhRW3Lxito vLpBdgyKgptwYj9CpSfrmRcdryXJTiWmGwHXqdf4RWh6Zzg5J1DapsQB+vpeXeBn78R1yEdUH0v 8GHnLOVA== X-Received: by 2002:a05:6214:5182:b0:625:aa49:c182 with SMTP id kl2-20020a056214518200b00625aa49c182mr11538756qvb.6.1685747166301; Fri, 02 Jun 2023 16:06:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5inHsA1r7fWdyWc5XKfTSPD/IdY8jLR96FRCb15whPmG7ALRLYp3r+GW9iNxru7TlNqwqvFQ== X-Received: by 2002:a05:6214:5182:b0:625:aa49:c182 with SMTP id kl2-20020a056214518200b00625aa49c182mr11538709qvb.6.1685747165861; Fri, 02 Jun 2023 16:06:05 -0700 (PDT) Received: from x1n.. (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id px13-20020a056214050d00b0062607ea6d01sm1400792qvb.50.2023.06.02.16.06.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jun 2023 16:06:04 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: David Hildenbrand , Alistair Popple , Andrew Morton , Andrea Arcangeli , "Kirill A . Shutemov" , Johannes Weiner , John Hubbard , Naoya Horiguchi , peterx@redhat.com, Muhammad Usama Anjum , Hugh Dickins , Mike Rapoport Subject: [PATCH 4/4] mm: Make most walk page paths with pmd_trans_unstable() to retry Date: Fri, 2 Jun 2023 19:05:52 -0400 Message-Id: <20230602230552.350731-5-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230602230552.350731-1-peterx@redhat.com> References: <20230602230552.350731-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" For most of the page walk paths, logically it'll always be good to have the pmd retries if hit pmd_trans_unstable() race. We can treat it as none pmd (per comment above pmd_trans_unstable()), but in most cases we're not even treating that as a none pmd. If to fix it anyway, a retry will be the most accurate. I've went over all the pmd_trans_unstable() special cases and this patch should cover all the rest places where we should retry properly with unstable pmd. With the newly introduced ACTION_AGAIN since 2020 we can easily achieve that. These are the call sites that I think should be fixed with it: *** fs/proc/task_mmu.c: smaps_pte_range[634] if (pmd_trans_unstable(pmd)) clear_refs_pte_range[1194] if (pmd_trans_unstable(pmd)) pagemap_pmd_range[1542] if (pmd_trans_unstable(pmdp)) gather_pte_stats[1891] if (pmd_trans_unstable(pmd)) *** mm/memcontrol.c: mem_cgroup_count_precharge_pte_range[6024] if (pmd_trans_unstable(pmd)) mem_cgroup_move_charge_pte_range[6244] if (pmd_trans_unstable(pmd)) *** mm/memory-failure.c: hwpoison_pte_range[794] if (pmd_trans_unstable(pmdp)) *** mm/mempolicy.c: queue_folios_pte_range[517] if (pmd_trans_unstable(pmd)) *** mm/madvise.c: madvise_cold_or_pageout_pte_range[425] if (pmd_trans_unstable(pmd)) madvise_free_pte_range[625] if (pmd_trans_unstable(pmd)) IIUC most of them may or may not be a big issue even without a retry, either because they're already not strict (smaps, pte_stats, MADV_COLD, .. it can mean e.g. the statistic may be inaccurate or one less 2M chunk to cold worst case), but some of them could have functional error without the retry afaiu (e.g. pagemap, where we can have the output buffer shifted over the unstable pmd range.. so IIUC the pagemap result can be wrong). While these call sites all look fine, and don't need any change: *** include/linux/pgtable.h: pmd_devmap_trans_unstable[1418] return pmd_devmap(*pmd) || pmd_trans_unstab= le(pmd); *** mm/gup.c: follow_pmd_mask[695] if (pmd_trans_unstable(pmd)) *** mm/mapping_dirty_helpers.c: wp_clean_pmd_entry[131] if (!pmd_trans_unstable(&pmdval)) *** mm/memory.c: do_anonymous_page[4060] if (unlikely(pmd_trans_unstable(vmf->pmd))) *** mm/migrate_device.c: migrate_vma_insert_page[616] if (unlikely(pmd_trans_unstable(pmdp))) *** mm/mincore.c: mincore_pte_range[116] if (pmd_trans_unstable(pmd)) { Signed-off-by: Peter Xu --- fs/proc/task_mmu.c | 17 +++++++++++++---- mm/madvise.c | 8 ++++++-- mm/memcontrol.c | 8 ++++++-- mm/memory-failure.c | 4 +++- mm/mempolicy.c | 4 +++- 5 files changed, 31 insertions(+), 10 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 6259dd432eeb..823eaba5c6bf 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -631,8 +631,11 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long a= ddr, unsigned long end, goto out; } =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(pmd)) { + walk->action =3D ACTION_AGAIN; goto out; + } + /* * The mmap_lock held all the way back in m_start() is what * keeps khugepaged out of here and from collapsing things @@ -1191,8 +1194,10 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned= long addr, return 0; } =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(pmd)) { + walk->action =3D ACTION_AGAIN; return 0; + } =20 pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { @@ -1539,8 +1544,10 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned l= ong addr, unsigned long end, return err; } =20 - if (pmd_trans_unstable(pmdp)) + if (pmd_trans_unstable(pmdp)) { + walk->action =3D ACTION_AGAIN; return 0; + } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 /* @@ -1888,8 +1895,10 @@ static int gather_pte_stats(pmd_t *pmd, unsigned lon= g addr, return 0; } =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(pmd)) { + walk->action =3D ACTION_AGAIN; return 0; + } #endif orig_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); do { diff --git a/mm/madvise.c b/mm/madvise.c index 78cd12581628..0fd81712022c 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -424,8 +424,10 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pm= d, } =20 regular_folio: - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(pmd)) { + walk->action =3D ACTION_AGAIN; return 0; + } #endif tlb_change_page_size(tlb, PAGE_SIZE); orig_pte =3D pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); @@ -626,8 +628,10 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned= long addr, if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) goto next; =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(pmd)) { + walk->action =3D ACTION_AGAIN; return 0; + } =20 tlb_change_page_size(tlb, PAGE_SIZE); orig_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6ee433be4c3b..15e50f033e41 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6021,8 +6021,10 @@ static int mem_cgroup_count_precharge_pte_range(pmd_= t *pmd, return 0; } =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(pmd)) { + walk->action =3D ACTION_AGAIN; return 0; + } pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) if (get_mctgt_type(vma, addr, *pte, NULL)) @@ -6241,8 +6243,10 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *p= md, return 0; } =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(pmd)) { + walk->action =3D ACTION_AGAIN; return 0; + } retry: pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); for (; addr !=3D end; addr +=3D PAGE_SIZE) { diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 004a02f44271..c97fb2b7ab4a 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -791,8 +791,10 @@ static int hwpoison_pte_range(pmd_t *pmdp, unsigned lo= ng addr, goto out; } =20 - if (pmd_trans_unstable(pmdp)) + if (pmd_trans_unstable(pmdp)) { + walk->action =3D ACTION_AGAIN; goto out; + } =20 mapped_pte =3D ptep =3D pte_offset_map_lock(walk->vma->vm_mm, pmdp, addr, &ptl); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index f06ca8c18e62..af8907b4aad1 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -514,8 +514,10 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned= long addr, if (ptl) return queue_folios_pmd(pmd, ptl, addr, end, walk); =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(pmd)) { + walk->action =3D ACTION_AGAIN; return 0; + } =20 mapped_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { --=20 2.40.1