From nobody Sat Feb 7 19:45:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5B3BEB64DD for ; Fri, 23 Jun 2023 16:40:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232345AbjFWQkf (ORCPT ); Fri, 23 Jun 2023 12:40:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232397AbjFWQkX (ORCPT ); Fri, 23 Jun 2023 12:40:23 -0400 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC333273D for ; Fri, 23 Jun 2023 09:40:20 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id 41be03b00d2f7-528ab71c95cso507524a12.0 for ; Fri, 23 Jun 2023 09:40:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687538420; x=1690130420; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Awz9MouaWvhYEsa6uGhopM2ZzPpu93glS/WPXSA5BTo=; b=34QbFsFBy9aHUrBbbyPZvNf2E6cREYDEJ3OjWNnfkIABbwc6TGNrC1vrgVO6MW9NbF eqg/aLoYmKTPEpf/V4fsvvxZE2OH/jKO4vFZZEDXI9Fw9rFIdjqAghj7mM553FLclePO KLLFRlkp1ktiExq7SI5uSKhq1wmLqwGDDEZNOLOElvL4lS0kRfXPXgsWyOuJxhzdyMqv nNxyqnDxgcoLV/l2B8Oh7QU4LpW04N+ipd0x+tjGDQNo7DnIZ+46hlrSdKqHVfqSfpu2 IMvjSxyQLqGS45aGc7DQuYRsR+7dKrGsp2eRYXyTB07mxB+Es4MLo7hX+4zKOg4KauQE RrNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687538420; x=1690130420; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Awz9MouaWvhYEsa6uGhopM2ZzPpu93glS/WPXSA5BTo=; b=llkCglROWrpzNGpDxfUTFv008bipClTOGK8TU5ba5qOwBp5GlxaTZ/0b8+5PqVVWjQ LkCxiijC2oIJYC6+EPBbhgPQYubC7Tj4Nc35ncGwwVzzFOhOoWq3+5V2yM9SrJ9kPGHZ 5vU5bdjTGjfQBHo1e30LkeF1K8zadHN0dqyUgqa4ImqkHnkj9H+pue29nqw0gK3PoEI/ Qfmm2bJbSIErjDC5RHM14JeVTky3SJxnMFSW4QOgksApYWpGGR45OdyHu5OC7xu5NVBN pvRt4Ilps25cOFD1e/K6dZcXDIuRtb6bVLj9NWd3S6l3Z8n+AAtx0uoJPqPcmQTr7h0u RwBg== X-Gm-Message-State: AC+VfDx76FurdaugeZQYjZfxo0HCTusvBxdPOxvL2TV5z8K/5q6NJJFs hJcZLrn061slUOB2sVtuLOtfRRT0F/Vxtw== X-Google-Smtp-Source: ACHHUZ5jab1Tbf6QleE2xBkm4fTrRupoObLUsNIdT3ISEaHPrxP1Xi1eIjnLCw3mlAhfFGWA7aDdu24vu/Y4Uw== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a65:6793:0:b0:553:9251:558b with SMTP id e19-20020a656793000000b005539251558bmr2555744pgr.8.1687538420513; Fri, 23 Jun 2023 09:40:20 -0700 (PDT) Date: Fri, 23 Jun 2023 16:40:12 +0000 In-Reply-To: <20230623164015.3431990-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230623164015.3431990-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230623164015.3431990-2-jiaqiyan@google.com> Subject: [PATCH v2 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp From: Jiaqi Yan To: mike.kravetz@oracle.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, shy828301@gmail.com, linmiaohe@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, Jiaqi Yan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Traversal on llist (e.g. llist_for_each_safe) is only safe AFTER entries are deleted from the llist. llist_del_all are lock free with itself. folio_clear_hugetlb_hwpoison()s from __update_and_free_hugetlb_folio and memory_failure won't need explicit locking when freeing the raw_hwp_list. Signed-off-by: Jiaqi Yan Acked-by: Mike Kravetz Acked-by: Naoya Horiguchi --- mm/memory-failure.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 004a02f44271..c415c3c462a3 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1825,12 +1825,11 @@ static inline struct llist_head *raw_hwp_list_head(= struct folio *folio) =20 static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_f= lag) { - struct llist_head *head; - struct llist_node *t, *tnode; + struct llist_node *t, *tnode, *head; unsigned long count =3D 0; =20 - head =3D raw_hwp_list_head(folio); - llist_for_each_safe(tnode, t, head->first) { + head =3D llist_del_all(raw_hwp_list_head(folio)); + llist_for_each_safe(tnode, t, head) { struct raw_hwp_page *p =3D container_of(tnode, struct raw_hwp_page, node= ); =20 if (move_flag) @@ -1840,7 +1839,6 @@ static unsigned long __folio_free_raw_hwp(struct foli= o *folio, bool move_flag) kfree(p); count++; } - llist_del_all(head); return count; } =20 --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 19:45:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ACB9EB64D7 for ; Fri, 23 Jun 2023 16:40:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232457AbjFWQkl (ORCPT ); Fri, 23 Jun 2023 12:40:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232361AbjFWQk1 (ORCPT ); Fri, 23 Jun 2023 12:40:27 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF8881A1 for ; Fri, 23 Jun 2023 09:40:22 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id 98e67ed59e1d1-260a18f1b43so403715a91.0 for ; Fri, 23 Jun 2023 09:40:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687538422; x=1690130422; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Im1IN1ikrfOM9Z5Ok/IoUeSFmmXZyMPCiNwLLRYtHmQ=; b=G/oR7IPVEFTa1GT55h/HZHzutGu3XFeJ0x7CMVx0sQ3+iu4UkSa6QKG8Axe8YoI6ut f73cf7IUkzK073BhTuK3KODwM+XdHg/4/r/ugKvr2NDUxM1pyepVAp2riS8MYC6p3ycL A2dFBf9Sto7NcNpOUMSK12m/QRbu0JbpEfwyv6/fMSIQ4op5bwAMFK4VNKbcf1YjRgCy bsnc7/zKLPO4xteCLLsHZLJ9ZqQdRAZ//cIsnb9+r6XJ40mlk9+1lH8fixAisniBj3qi u3V4W4Qj6vOZ7nmsGBXnq6l/L9udH1IJeP0i8d2d/6/8vmobg24T5d8qWWXJVpt8zltv ipQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687538422; x=1690130422; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Im1IN1ikrfOM9Z5Ok/IoUeSFmmXZyMPCiNwLLRYtHmQ=; b=IPgx5pexTdFdCtpPVPCoBpA+G+8ChvGBTYOEm/UfzJA13rRynGJLxtXPLd7N4mJWWw HJGew79ONcPL/rDpnOEjzVIYfJtTslWVBrRLC1yM3EQVuvfsRespQTmKmVfvZ/PkAsMR rxl7jmBDEY2phCWHpO0Adf96R4Jz4tCUXLlC+kc1OEjExit926GRkOVf/oX3xnmbBq3i wERTXlXbdIgfwZ8rsIkA54N8pCPtnH9wtKndUAJiURETBNEi4xA/wmHPLwOZKdY624kG t0TD7qt3RDmQ++cRT8BBbBVAhzhtFOJp2t8Yw94RubuIncuXjgW8+fyH0RVc1VZ393p7 4HDw== X-Gm-Message-State: AC+VfDxjnOD3G9LmGOOSdsiLZNNwoAU3Ifj3vJm4ltb2LKz6GfXZHsxd o+qaww7lajJoavQ/bmt4oNNfcTzt8ZS4jQ== X-Google-Smtp-Source: ACHHUZ7C66BEaKbQBmvL2pZjnqc3R82KIAB0G9Tt/kvXK5rICRDXs/YcwnK1MlTu5X6j4+3CatJBx/dm8KHUHQ== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a17:90a:bc8e:b0:25b:d596:fd30 with SMTP id x14-20020a17090abc8e00b0025bd596fd30mr3104900pjr.1.1687538421988; Fri, 23 Jun 2023 09:40:21 -0700 (PDT) Date: Fri, 23 Jun 2023 16:40:13 +0000 In-Reply-To: <20230623164015.3431990-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230623164015.3431990-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230623164015.3431990-3-jiaqiyan@google.com> Subject: [PATCH v2 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON From: Jiaqi Yan To: mike.kravetz@oracle.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, shy828301@gmail.com, linmiaohe@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, Jiaqi Yan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Adds the functionality to tell if a subpage of a hugetlb folio is a raw HWPOISON page. This functionality relies on RawHwpUnreliable to be not set; otherwise hugepage's HWPOISON list becomes meaningless. Exports this functionality to be immediately used in the read operation for hugetlbfs. Signed-off-by: Jiaqi Yan Acked-by: Naoya Horiguchi Reviewed-by: Mike Kravetz --- include/linux/hugetlb.h | 19 +++++++++++++++++++ include/linux/mm.h | 7 +++++++ mm/hugetlb.c | 10 ++++++++++ mm/memory-failure.c | 34 ++++++++++++++++++++++++---------- 4 files changed, 60 insertions(+), 10 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 21f942025fec..8b73a12b7b38 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1013,6 +1013,25 @@ void hugetlb_register_node(struct node *node); void hugetlb_unregister_node(struct node *node); #endif =20 +/* + * Struct raw_hwp_page represents information about "raw error page", + * constructing singly linked list from ->_hugetlb_hwpoison field of folio. + */ +struct raw_hwp_page { + struct llist_node node; + struct page *page; +}; + +static inline struct llist_head *raw_hwp_list_head(struct folio *folio) +{ + return (struct llist_head *)&folio->_hugetlb_hwpoison; +} + +/* + * Check if a given raw @subpage in a hugepage @folio is HWPOISON. + */ +bool is_raw_hwp_subpage(struct folio *folio, struct page *subpage); + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; =20 diff --git a/include/linux/mm.h b/include/linux/mm.h index 66032f0d515c..41a283bd41a7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3671,6 +3671,7 @@ extern const struct attribute_group memory_failure_at= tr_group; extern void memory_failure_queue(unsigned long pfn, int flags); extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags, bool *migratable_cleared); +extern bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage= ); void num_poisoned_pages_inc(unsigned long pfn); void num_poisoned_pages_sub(unsigned long pfn, long i); struct task_struct *task_early_kill(struct task_struct *tsk, int force_ear= ly); @@ -3685,6 +3686,12 @@ static inline int __get_huge_page_for_hwpoison(unsig= ned long pfn, int flags, return 0; } =20 +static inline bool __is_raw_hwp_subpage(struct folio *folio, + struct page *subpage) +{ + return false; +} + static inline void num_poisoned_pages_inc(unsigned long pfn) { } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ea24718db4af..6b860de87590 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7377,6 +7377,16 @@ int get_huge_page_for_hwpoison(unsigned long pfn, in= t flags, return ret; } =20 +bool is_raw_hwp_subpage(struct folio *folio, struct page *subpage) +{ + bool ret; + + spin_lock_irq(&hugetlb_lock); + ret =3D __is_raw_hwp_subpage(folio, subpage); + spin_unlock_irq(&hugetlb_lock); + return ret; +} + void folio_putback_active_hugetlb(struct folio *folio) { spin_lock_irq(&hugetlb_lock); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index c415c3c462a3..891248e2930e 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1809,18 +1809,32 @@ EXPORT_SYMBOL_GPL(mf_dax_kill_procs); #endif /* CONFIG_FS_DAX */ =20 #ifdef CONFIG_HUGETLB_PAGE -/* - * Struct raw_hwp_page represents information about "raw error page", - * constructing singly linked list from ->_hugetlb_hwpoison field of folio. - */ -struct raw_hwp_page { - struct llist_node node; - struct page *page; -}; =20 -static inline struct llist_head *raw_hwp_list_head(struct folio *folio) +bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage) { - return (struct llist_head *)&folio->_hugetlb_hwpoison; + struct llist_head *raw_hwp_head; + struct raw_hwp_page *p, *tmp; + bool ret =3D false; + + if (!folio_test_hwpoison(folio)) + return false; + + /* + * When RawHwpUnreliable is set, kernel lost track of which subpages + * are HWPOISON. So return as if ALL subpages are HWPOISONed. + */ + if (folio_test_hugetlb_raw_hwp_unreliable(folio)) + return true; + + raw_hwp_head =3D raw_hwp_list_head(folio); + llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) { + if (subpage =3D=3D p->page) { + ret =3D true; + break; + } + } + + return ret; } =20 static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_f= lag) --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 19:45:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7AE7EB64DD for ; Fri, 23 Jun 2023 16:40:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232465AbjFWQkq (ORCPT ); Fri, 23 Jun 2023 12:40:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231894AbjFWQk2 (ORCPT ); Fri, 23 Jun 2023 12:40:28 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF559271F for ; Fri, 23 Jun 2023 09:40:23 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-53f06f7cc74so549769a12.1 for ; Fri, 23 Jun 2023 09:40:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687538423; x=1690130423; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=bgkvS77jKBasQJ+d6+AqQuQkcASYxscDuOzFX9WAgds=; b=5jLczQMp0KSTUIagObRwmH0mr7CUCxFkzN7ENpojhmLHZ5GyofnMb+Nwwg0KbCrGdj 3WAW+iB30hh1CIP2jXtjzzqMP6aQHq0SumA8ktu4p1yPKHIcW5cNVVYSJhPCOejXnvPu nKl3TIA+Sl6yCFYzA+PCG++nBBaxmuxCyqVDC21b2OKlQOdzRKNglDYsR9F3RTTJRMYl AAMVjrlGjICuPd2odT2iHSoPtCqMZtxg6+MlTsPgioLvqTryZxWthG+Vh2bh22MjQDaQ 9B3I8J2H+/UZPeZ9NSrWwZ6tFvGN95BBwimNdsppJUqYqfC4usU/76Fm0JacOMdSXTse zXfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687538423; x=1690130423; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bgkvS77jKBasQJ+d6+AqQuQkcASYxscDuOzFX9WAgds=; b=bbmQA4hbEWOdzHKmRJuLjNnmXN6BVrOGgZuHl3zpy6po5SfZKydHVurMsjF8oRMizW 6pdL00kUMtbvgRb5b/gBk1m58osQ/l+8T+EJP+Vh25GqafjnHswtaK2Jp7WV42pUASHU fn6iwOsel3hv193+zL8F6wiXceQxrAH9FB84cKODnnqkrpX4w2tHlHi6iwE6G0VJ17+t sy4k1EXgZVKtEb41hMbJC63B6mEKzASVo1/trxFRByHewvxntL7iNLlH1ru63NNlAuNW SOqE6rDlyMC7JeSoD8fKWNpL0mKH39woFYS45azydADvtZ/OJv89/9tNFqw4QvdAeotV V7Kg== X-Gm-Message-State: AC+VfDz57/pOeeiZY0tKCGipU0DOa64ay0+WTpaq0eVv1pOZpGyXIk2f Kc8bGm9dGO7K3ByEEn0pEcXRo4RIRlKbhw== X-Google-Smtp-Source: ACHHUZ7SVzvvZkW36k+iXmGUMtIymkIH4w+s1GAMmdhSj7Lx2MUYP2jnFMjcbaXPnGb5kaXvodFcOEtl1lhEHg== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a63:fc01:0:b0:557:2a54:cf08 with SMTP id j1-20020a63fc01000000b005572a54cf08mr1060696pgi.6.1687538423324; Fri, 23 Jun 2023 09:40:23 -0700 (PDT) Date: Fri, 23 Jun 2023 16:40:14 +0000 In-Reply-To: <20230623164015.3431990-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230623164015.3431990-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230623164015.3431990-4-jiaqiyan@google.com> Subject: [PATCH v2 3/4] hugetlbfs: improve read HWPOISON hugepage From: Jiaqi Yan To: mike.kravetz@oracle.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, shy828301@gmail.com, linmiaohe@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, Jiaqi Yan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When a hugepage contains HWPOISON pages, read() fails to read any byte of the hugepage and returns -EIO, although many bytes in the HWPOISON hugepage are readable. Improve this by allowing hugetlbfs_read_iter returns as many bytes as possible. For a requested range [offset, offset + len) that contains HWPOISON page, return [offset, first HWPOISON page addr); the next read attempt will fail and return -EIO. Signed-off-by: Jiaqi Yan Reviewed-by: Mike Kravetz Reviewed-by: Naoya Horiguchi --- fs/hugetlbfs/inode.c | 58 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 52 insertions(+), 6 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 90361a922cec..86879ca3ff1e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -282,6 +282,42 @@ hugetlb_get_unmapped_area(struct file *file, unsigned = long addr, } #endif =20 +/* + * Someone wants to read @bytes from a HWPOISON hugetlb @page from @offset. + * Returns the maximum number of bytes one can read without touching the 1= st raw + * HWPOISON subpage. + * + * The implementation borrows the iteration logic from copy_page_to_iter*. + */ +static size_t adjust_range_hwpoison(struct page *page, size_t offset, size= _t bytes) +{ + size_t n =3D 0; + size_t res =3D 0; + struct folio *folio =3D page_folio(page); + + /* First subpage to start the loop. */ + page +=3D offset / PAGE_SIZE; + offset %=3D PAGE_SIZE; + while (1) { + if (is_raw_hwp_subpage(folio, page)) + break; + + /* Safe to read n bytes without touching HWPOISON subpage. */ + n =3D min(bytes, (size_t)PAGE_SIZE - offset); + res +=3D n; + bytes -=3D n; + if (!bytes || !n) + break; + offset +=3D n; + if (offset =3D=3D PAGE_SIZE) { + page++; + offset =3D 0; + } + } + + return res; +} + /* * Support for read() - Find the page attached to f_mapping and copy out t= he * data. This provides functionality similar to filemap_read(). @@ -300,7 +336,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, = struct iov_iter *to) =20 while (iov_iter_count(to)) { struct page *page; - size_t nr, copied; + size_t nr, copied, want; =20 /* nr is the maximum number of bytes to copy from this page */ nr =3D huge_page_size(h); @@ -328,16 +364,26 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb= , struct iov_iter *to) } else { unlock_page(page); =20 - if (PageHWPoison(page)) { - put_page(page); - retval =3D -EIO; - break; + if (!PageHWPoison(page)) + want =3D nr; + else { + /* + * Adjust how many bytes safe to read without + * touching the 1st raw HWPOISON subpage after + * offset. + */ + want =3D adjust_range_hwpoison(page, offset, nr); + if (want =3D=3D 0) { + put_page(page); + retval =3D -EIO; + break; + } } =20 /* * We have the page, copy it to user space buffer. */ - copied =3D copy_page_to_iter(page, offset, nr, to); + copied =3D copy_page_to_iter(page, offset, want, to); put_page(page); } offset +=3D copied; --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 19:45:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CD73EB64DD for ; Fri, 23 Jun 2023 16:40:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231623AbjFWQk4 (ORCPT ); Fri, 23 Jun 2023 12:40:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232422AbjFWQkd (ORCPT ); Fri, 23 Jun 2023 12:40:33 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C1CE295D for ; Fri, 23 Jun 2023 09:40:26 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bf179fcc200so1194431276.0 for ; Fri, 23 Jun 2023 09:40:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687538425; x=1690130425; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YIyoXn+1WkJIHdk0UP5PwbMUr6M5AyBU7f22CzTi81I=; b=Qcx52ve9KgnXSTCkr0swZ2WTRjlDvDcsTWcwzblh7tcWx6rHnkyrDMr08/A05+ATOe IA56VxpPNT4kOPlnQyBmMq4uwH5E01E1zJj3irtYUFzCwkTgTjgj/miAX+7YfOdXiaIG FoNf3ahcKhTXUjgjjszmudDkx0Opu+tRA2YOhKowD3T1Z19zz2Gdoa4f0lkkGsPiiokE kaVIPt8Bkw+1+XNTrEoN4uwjtVzikw5oQRYfBZy+WD0cnNsxcyqL/Cd9mapd8EUSMpMH 9q8Yug4/qANGyBOhnT2VJXRFRlG2PBI3piIArguO6lpdO6lpCxW45PSZg9IBGGNrPrq0 X45Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687538425; x=1690130425; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YIyoXn+1WkJIHdk0UP5PwbMUr6M5AyBU7f22CzTi81I=; b=YmUugR8s6x2EfhFgOYGeYWs4FZqz5+Gb3NkAaTqmZMogqHJqP2AgIFDInYBpCkLIUs 1xO0pbDr4P2N++AYqAnFHaMV8nLf/sPkr+DEigi1jbu/yIrllOa2O69s43c4kI7nrrTG owsqVKIhQwtb56cKn3UeDhiPfuOJIRLKzk+HTEyRyuQ2JZsXiXW1GgphFb3Uu5/jdc70 7lE3Xjj/PtDvVUCRN+kuaOuttHEoLcfSJYRWHXL/aMADIPAfd/MOpSobWaF7uP/fiLFo ivVOw5UXJXeq/AtFjyMOnOtbmItWu9Jl+oO6XgfYZ3e1e99pnpqjC2B5pEtRZzME+CKb cYdg== X-Gm-Message-State: AC+VfDzOhEAXDEPcybpb1Wwr3PCaYWn/OHTuUvQyzr/2OgEKeKF+Ylkj WDOn4b6iZN1NQdqBg0Rbwb1XgLA9sTgefA== X-Google-Smtp-Source: ACHHUZ4ZR54fVwuIAJMMF1saAYALIlGVpfUcualmuGLE51PKcjNTNVVMGPaP9n2mhGuS+ww/528Oud+RHoTUsA== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a25:e78a:0:b0:bc4:ef98:681c with SMTP id e132-20020a25e78a000000b00bc4ef98681cmr3885496ybh.13.1687538425732; Fri, 23 Jun 2023 09:40:25 -0700 (PDT) Date: Fri, 23 Jun 2023 16:40:15 +0000 In-Reply-To: <20230623164015.3431990-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230623164015.3431990-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230623164015.3431990-5-jiaqiyan@google.com> Subject: [PATCH v2 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read From: Jiaqi Yan To: mike.kravetz@oracle.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, shy828301@gmail.com, linmiaohe@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, Jiaqi Yan Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add tests for the improvement made to read operation on HWPOISON hugetlb page with different read granularities. For each chunk size, three read scenarios are tested: 1. Simple regression test on read without HWPOISON. 2. Sequential read page by page should succeed until encounters the 1st raw HWPOISON subpage. 3. After skip raw HWPOISON subpage by lseek, read()s always succeed. Signed-off-by: Jiaqi Yan Acked-by: Mike Kravetz Reviewed-by: Naoya Horiguchi --- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + .../selftests/mm/hugetlb-read-hwpoison.c | 322 ++++++++++++++++++ 3 files changed, 324 insertions(+) create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftest= s/mm/.gitignore index 5599cf287694..37419296bf79 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -5,6 +5,7 @@ hugepage-mremap hugepage-shm hugepage-vmemmap hugetlb-madvise +hugetlb-read-hwpoison khugepaged map_hugetlb map_populate diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/= mm/Makefile index 95acb099315e..63fcc7e3e9f0 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -38,6 +38,7 @@ TEST_GEN_PROGS +=3D gup_longterm TEST_GEN_PROGS +=3D gup_test TEST_GEN_PROGS +=3D hmm-tests TEST_GEN_PROGS +=3D hugetlb-madvise +TEST_GEN_PROGS +=3D hugetlb-read-hwpoison TEST_GEN_PROGS +=3D hugepage-mmap TEST_GEN_PROGS +=3D hugepage-mremap TEST_GEN_PROGS +=3D hugepage-shm diff --git a/tools/testing/selftests/mm/hugetlb-read-hwpoison.c b/tools/tes= ting/selftests/mm/hugetlb-read-hwpoison.c new file mode 100644 index 000000000000..ba6cc6f9cabc --- /dev/null +++ b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c @@ -0,0 +1,322 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "../kselftest.h" + +#define PREFIX " ... " +#define ERROR_PREFIX " !!! " + +#define MAX_WRITE_READ_CHUNK_SIZE (getpagesize() * 16) +#define MAX(a, b) (((a) > (b)) ? (a) : (b)) + +enum test_status { + TEST_PASSED =3D 0, + TEST_FAILED =3D 1, + TEST_SKIPPED =3D 2, +}; + +static char *status_to_str(enum test_status status) +{ + switch (status) { + case TEST_PASSED: + return "TEST_PASSED"; + case TEST_FAILED: + return "TEST_FAILED"; + case TEST_SKIPPED: + return "TEST_SKIPPED"; + default: + return "TEST_???"; + } +} + +static int setup_filemap(char *filemap, size_t len, size_t wr_chunk_size) +{ + char iter =3D 0; + + for (size_t offset =3D 0; offset < len; + offset +=3D wr_chunk_size) { + iter++; + memset(filemap + offset, iter, wr_chunk_size); + } + + return 0; +} + +static bool verify_chunk(char *buf, size_t len, char val) +{ + size_t i; + + for (i =3D 0; i < len; ++i) { + if (buf[i] !=3D val) { + printf(PREFIX ERROR_PREFIX "check fail: buf[%lu] =3D %u !=3D %u\n", + i, buf[i], val); + return false; + } + } + + return true; +} + +static bool seek_read_hugepage_filemap(int fd, size_t len, size_t wr_chunk= _size, + off_t offset, size_t expected) +{ + char buf[MAX_WRITE_READ_CHUNK_SIZE]; + ssize_t ret_count =3D 0; + ssize_t total_ret_count =3D 0; + char val =3D offset / wr_chunk_size + offset % wr_chunk_size; + + printf(PREFIX PREFIX "init val=3D%u with offset=3D0x%lx\n", val, offset); + printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n", + expected); + if (lseek(fd, offset, SEEK_SET) < 0) { + perror(PREFIX ERROR_PREFIX "seek failed"); + return false; + } + + while (offset + total_ret_count < len) { + ret_count =3D read(fd, buf, wr_chunk_size); + if (ret_count =3D=3D 0) { + printf(PREFIX PREFIX "read reach end of the file\n"); + break; + } else if (ret_count < 0) { + perror(PREFIX ERROR_PREFIX "read failed"); + break; + } + ++val; + if (!verify_chunk(buf, ret_count, val)) + return false; + + total_ret_count +=3D ret_count; + } + printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n", + total_ret_count); + + return total_ret_count =3D=3D expected; +} + +static bool read_hugepage_filemap(int fd, size_t len, + size_t wr_chunk_size, size_t expected) +{ + char buf[MAX_WRITE_READ_CHUNK_SIZE]; + ssize_t ret_count =3D 0; + ssize_t total_ret_count =3D 0; + char val =3D 0; + + printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n", + expected); + while (total_ret_count < len) { + ret_count =3D read(fd, buf, wr_chunk_size); + if (ret_count =3D=3D 0) { + printf(PREFIX PREFIX "read reach end of the file\n"); + break; + } else if (ret_count < 0) { + perror(PREFIX ERROR_PREFIX "read failed"); + break; + } + ++val; + if (!verify_chunk(buf, ret_count, val)) + return false; + + total_ret_count +=3D ret_count; + } + printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n", + total_ret_count); + + return total_ret_count =3D=3D expected; +} + +static enum test_status +test_hugetlb_read(int fd, size_t len, size_t wr_chunk_size) +{ + enum test_status status =3D TEST_SKIPPED; + char *filemap =3D NULL; + + if (ftruncate(fd, len) < 0) { + perror(PREFIX ERROR_PREFIX "ftruncate failed"); + return status; + } + + filemap =3D mmap(NULL, len, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, 0); + if (filemap =3D=3D MAP_FAILED) { + perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed"); + goto done; + } + + setup_filemap(filemap, len, wr_chunk_size); + status =3D TEST_FAILED; + + if (read_hugepage_filemap(fd, len, wr_chunk_size, len)) + status =3D TEST_PASSED; + + munmap(filemap, len); +done: + if (ftruncate(fd, 0) < 0) { + perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed"); + status =3D TEST_FAILED; + } + + return status; +} + +static enum test_status +test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size, + bool skip_hwpoison_page) +{ + enum test_status status =3D TEST_SKIPPED; + char *filemap =3D NULL; + char *hwp_addr =3D NULL; + const unsigned long pagesize =3D getpagesize(); + + if (ftruncate(fd, len) < 0) { + perror(PREFIX ERROR_PREFIX "ftruncate failed"); + return status; + } + + filemap =3D mmap(NULL, len, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, 0); + if (filemap =3D=3D MAP_FAILED) { + perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed"); + goto done; + } + + setup_filemap(filemap, len, wr_chunk_size); + status =3D TEST_FAILED; + + /* + * Poisoned hugetlb page layout (assume hugepagesize=3D2MB): + * |<---------------------- 1MB ---------------------->| + * |<---- healthy page ---->|<---- HWPOISON page ----->| + * |<------------------- (1MB - 8KB) ----------------->| + */ + hwp_addr =3D filemap + len / 2 + pagesize; + if (madvise(hwp_addr, pagesize, MADV_HWPOISON) < 0) { + perror(PREFIX ERROR_PREFIX "MADV_HWPOISON failed"); + goto unmap; + } + + if (!skip_hwpoison_page) { + /* + * Userspace should be able to read (1MB + 1 page) from + * the beginning of the HWPOISONed hugepage. + */ + if (read_hugepage_filemap(fd, len, wr_chunk_size, + len / 2 + pagesize)) + status =3D TEST_PASSED; + } else { + /* + * Userspace should be able to read (1MB - 2 pages) from + * HWPOISONed hugepage. + */ + if (seek_read_hugepage_filemap(fd, len, wr_chunk_size, + len / 2 + MAX(2 * pagesize, wr_chunk_size), + len / 2 - MAX(2 * pagesize, wr_chunk_size))) + status =3D TEST_PASSED; + } + +unmap: + munmap(filemap, len); +done: + if (ftruncate(fd, 0) < 0) { + perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed"); + status =3D TEST_FAILED; + } + + return status; +} + +static int create_hugetlbfs_file(struct statfs *file_stat) +{ + int fd; + + fd =3D memfd_create("hugetlb_tmp", MFD_HUGETLB); + if (fd < 0) { + perror(PREFIX ERROR_PREFIX "could not open hugetlbfs file"); + return -1; + } + + memset(file_stat, 0, sizeof(*file_stat)); + if (fstatfs(fd, file_stat)) { + perror(PREFIX ERROR_PREFIX "fstatfs failed"); + goto close; + } + if (file_stat->f_type !=3D HUGETLBFS_MAGIC) { + printf(PREFIX ERROR_PREFIX "not hugetlbfs file\n"); + goto close; + } + + return fd; +close: + close(fd); + return -1; +} + +int main(void) +{ + int fd; + struct statfs file_stat; + enum test_status status; + /* Test read() in different granularity. */ + size_t wr_chunk_sizes[] =3D { + getpagesize() / 2, getpagesize(), + getpagesize() * 2, getpagesize() * 4 + }; + size_t i; + + for (i =3D 0; i < ARRAY_SIZE(wr_chunk_sizes); ++i) { + printf("Write/read chunk size=3D0x%lx\n", + wr_chunk_sizes[i]); + + fd =3D create_hugetlbfs_file(&file_stat); + if (fd < 0) + goto create_failure; + printf(PREFIX "HugeTLB read regression test...\n"); + status =3D test_hugetlb_read(fd, file_stat.f_bsize, + wr_chunk_sizes[i]); + printf(PREFIX "HugeTLB read regression test...%s\n", + status_to_str(status)); + close(fd); + if (status =3D=3D TEST_FAILED) + return -1; + + fd =3D create_hugetlbfs_file(&file_stat); + if (fd < 0) + goto create_failure; + printf(PREFIX "HugeTLB read HWPOISON test...\n"); + status =3D test_hugetlb_read_hwpoison(fd, file_stat.f_bsize, + wr_chunk_sizes[i], false); + printf(PREFIX "HugeTLB read HWPOISON test...%s\n", + status_to_str(status)); + close(fd); + if (status =3D=3D TEST_FAILED) + return -1; + + fd =3D create_hugetlbfs_file(&file_stat); + if (fd < 0) + goto create_failure; + printf(PREFIX "HugeTLB seek then read HWPOISON test...\n"); + status =3D test_hugetlb_read_hwpoison(fd, file_stat.f_bsize, + wr_chunk_sizes[i], true); + printf(PREFIX "HugeTLB seek then read HWPOISON test...%s\n", + status_to_str(status)); + close(fd); + if (status =3D=3D TEST_FAILED) + return -1; + } + + return 0; + +create_failure: + printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n"); + return -1; +} --=20 2.41.0.162.gfafddb0af9-goog