From nobody Tue May 21 23:27:44 2024
Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 092B23BB3D
	for <linux-kernel@vger.kernel.org>; Thu, 18 Apr 2024 02:22:26 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=45.249.212.255
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1713406949; cv=none;
 b=D/VVFBtM/7BN0ybzT9+X/CUX1uD8V4w974YRCRDfvPhAiQGYU9ClwtaAH1qmynpNNR44pXBG05Bbwg1jBB1LKs+DYnp67RbWVo9xoDHb0NNN1IhTMUuJLNmuvNPJVoYz0xdjBFfLbW0PL2CPVQAZvTTVrsqdZEwjYNYafqhHkFI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1713406949; c=relaxed/simple;
	bh=Lrf4cHp1Lgls4Vg6XrpVdCt0hx7PCfE+6fiigxbg+A0=;
	h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=dTy20QnuWkq8UBZZUlQb4DMAdfE+2fjVj10YnQMNYCNTO/6rgnkLLI4etJP6kvfceH1Qe2xMQtwwXmpS6ZpRV2RrvOVUqiE1dDCQerkv/oyQ3B+QU+nir6vED6ZQpEw1tGWlf/3laKJ+jTvGTTnRCfLOVSulQr/ET27IOajBJkQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=huawei.com;
 spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.255
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=huawei.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=huawei.com
Received: from mail.maildlp.com (unknown [172.19.163.174])
	by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4VKhLC1SFdz1R8Ww;
	Thu, 18 Apr 2024 10:19:31 +0800 (CST)
Received: from canpemm500002.china.huawei.com (unknown [7.192.104.244])
	by mail.maildlp.com (Postfix) with ESMTPS id 704AF140120;
	Thu, 18 Apr 2024 10:22:24 +0800 (CST)
Received: from huawei.com (10.173.135.154) by canpemm500002.china.huawei.com
 (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Thu, 18 Apr
 2024 10:22:23 +0800
From: Miaohe Lin <linmiaohe@huawei.com>
To: <akpm@linux-foundation.org>, <muchun.song@linux.dev>
CC: <david@redhat.com>, <vbabka@suse.cz>, <willy@infradead.org>,
	<linmiaohe@huawei.com>, <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: [PATCH 1/2] mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when
 dissolve_free_hugetlb_folio()
Date: Thu, 18 Apr 2024 10:19:59 +0800
Message-ID: <20240418022000.3524229-2-linmiaohe@huawei.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20240418022000.3524229-1-linmiaohe@huawei.com>
References: <20240418022000.3524229-1-linmiaohe@huawei.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To
 canpemm500002.china.huawei.com (7.192.104.244)
Content-Type: text/plain; charset="utf-8"

When I did memory failure tests recently, below warning occurs:

DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 8 PID: 1011 at kernel/locking/lockdep.c:232 __lock_acquire+0x=
ccb/0x1ca0
Modules linked in: mce_inject hwpoison_inject
CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-202404=
10-00012-gdb69f219f4be #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g1=
55821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:__lock_acquire+0xccb/0x1ca0
RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
FS:  00007ff9f32aa740(0000) GS:ffffa1ce5fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff9f3134ba0 CR3: 00000008484e4000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 lock_acquire+0xbe/0x2d0
 _raw_spin_lock_irqsave+0x3a/0x60
 hugepage_subpool_put_pages.part.0+0xe/0xc0
 free_huge_folio+0x253/0x3f0
 dissolve_free_huge_page+0x147/0x210
 __page_handle_poison+0x9/0x70
 memory_failure+0x4e6/0x8c0
 hard_offline_page_store+0x55/0xa0
 kernfs_fop_write_iter+0x12c/0x1d0
 vfs_write+0x380/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xbc/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff9f3114887
RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
 </TASK>
Kernel panic - not syncing: kernel: panic_on_warn set ...
CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-202404=
10-00012-gdb69f219f4be #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g1=
55821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 panic+0x326/0x350
 check_panic_on_warn+0x4f/0x50
 __warn+0x98/0x190
 report_bug+0x18e/0x1a0
 handle_bug+0x3d/0x70
 exc_invalid_op+0x18/0x70
 asm_exc_invalid_op+0x1a/0x20
RIP: 0010:__lock_acquire+0xccb/0x1ca0
RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
 lock_acquire+0xbe/0x2d0
 _raw_spin_lock_irqsave+0x3a/0x60
 hugepage_subpool_put_pages.part.0+0xe/0xc0
 free_huge_folio+0x253/0x3f0
 dissolve_free_huge_page+0x147/0x210
 __page_handle_poison+0x9/0x70
 memory_failure+0x4e6/0x8c0
 hard_offline_page_store+0x55/0xa0
 kernfs_fop_write_iter+0x12c/0x1d0
 vfs_write+0x380/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xbc/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff9f3114887
RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
 </TASK>

After git bisecting and digging into the code, I believe the root cause
is that _deferred_list field of folio is unioned with _hugetlb_subpool
field. In __update_and_free_hugetlb_folio(), folio->_deferred_list is
always initialized leading to corrupted folio->_hugetlb_subpool when
folio is hugetlb. Later free_huge_folio() will use _hugetlb_subpool
and above warning happens. Fix this by initialise folio->_deferred_list
iff folio is not hugetlb.

Fixes: b6952b6272dd ("mm: always initialise folio->_deferred_list")
CC: stable@vger.kernel.org
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/hugetlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 26ab9dfc7d63..1da9a14a5513 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1788,7 +1788,8 @@ static void __update_and_free_hugetlb_folio(struct hs=
tate *h,
 		destroy_compound_gigantic_folio(folio, huge_page_order(h));
 		free_gigantic_folio(folio, huge_page_order(h));
 	} else {
-		INIT_LIST_HEAD(&folio->_deferred_list);
+		if (!folio_test_hugetlb(folio))
+			INIT_LIST_HEAD(&folio->_deferred_list);
 		folio_put(folio);
 	}
 }
--=20
2.33.0
From nobody Tue May 21 23:27:44 2024
Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67EB5495CB
	for <linux-kernel@vger.kernel.org>; Thu, 18 Apr 2024 02:22:27 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=45.249.212.187
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1713406950; cv=none;
 b=bcv7Rk4wZDCo0ap8Gra7wqrFIub79d7GJCXFoFRz68BmrsiQp/lIzD1vsim5DwHOIax9gRpyZRklGH5KwSl9+9AqEy1AB5KdbhaQOj6GMG7DhJAqxBEoFW0D4/YkNIk9RGZFSvLiSzgaMCGIroqK9i+3Z1NZQOL+y/trWWG56HU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1713406950; c=relaxed/simple;
	bh=G70+RZNAEMmyYjIqT9sWZ/xjR9oHRWpamG9o+KTbufk=;
	h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=Ln/EwtqUfzZNiWwjb+UFhWesSxtzabFqvwdVwPXOd19+jsc8sN1HdPDIbuDH4+ii4N0qIT9AUaakbGWni+qsiBSsPgeIYbovRusoXi6DpG+9C0GMU/dGkGbeSoX3nONxFwiHotMJRF+3NJBJ0cajoY9dPJVUm9XoWyPGwMOIvVY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=huawei.com;
 spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.187
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=huawei.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=huawei.com
Received: from mail.maildlp.com (unknown [172.19.162.254])
	by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4VKhL10tW4zwSxx;
	Thu, 18 Apr 2024 10:19:21 +0800 (CST)
Received: from canpemm500002.china.huawei.com (unknown [7.192.104.244])
	by mail.maildlp.com (Postfix) with ESMTPS id EB9C01800C5;
	Thu, 18 Apr 2024 10:22:24 +0800 (CST)
Received: from huawei.com (10.173.135.154) by canpemm500002.china.huawei.com
 (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Thu, 18 Apr
 2024 10:22:24 +0800
From: Miaohe Lin <linmiaohe@huawei.com>
To: <akpm@linux-foundation.org>, <muchun.song@linux.dev>
CC: <david@redhat.com>, <vbabka@suse.cz>, <willy@infradead.org>,
	<linmiaohe@huawei.com>, <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: [PATCH 2/2] mm/hugetlb: fix unable to handle page fault for address
 dead000000000108
Date: Thu, 18 Apr 2024 10:20:00 +0800
Message-ID: <20240418022000.3524229-3-linmiaohe@huawei.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20240418022000.3524229-1-linmiaohe@huawei.com>
References: <20240418022000.3524229-1-linmiaohe@huawei.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To
 canpemm500002.china.huawei.com (7.192.104.244)
Content-Type: text/plain; charset="utf-8"

Below panic occurs when I did memory failure test:

BUG: unable to handle page fault for address: dead000000000108
PGD 0 P4D 0
Oops: Oops: 0001 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 1073 Comm: bash Not tainted 6.9.0-rc4-next-20240417-dirty #52
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g1=
55821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:enqueue_hugetlb_folio+0x46/0xe0
RSP: 0018:ffff9e0207f03d10 EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000122
RDX: ffffcbb244460008 RSI: dead000000000100 RDI: ffff976a09da6f90
RBP: ffffcbb244460000 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 7a088d6100000000 R12: ffffffffbcc93160
R13: 0000000000000246 R14: 0000000000000000 R15: 0000000000000000
FS:  00007fdb749b1740(0000) GS:ffff97711fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: dead000000000108 CR3: 00000001078ac000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 free_huge_folio+0x28d/0x420
 dissolve_free_hugetlb_folio+0x135/0x1d0
 __page_handle_poison+0x18/0xb0
 memory_failure+0x712/0xd30
 hard_offline_page_store+0x55/0xa0
 kernfs_fop_write_iter+0x12c/0x1d0
 vfs_write+0x380/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xbc/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fdb74714887
RSP: 002b:00007ffdfc7074e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fdb74714887
RDX: 000000000000000c RSI: 00005653ec7c0e10 RDI: 0000000000000001
RBP: 00005653ec7c0e10 R08: 00007fdb747d1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
R13: 00007fdb7481b780 R14: 00007fdb74817600 R15: 00007fdb74816a00
 </TASK>
Modules linked in: mce_inject hwpoison_inject
CR2: dead000000000108
---[ end trace 0000000000000000 ]---
RIP: 0010:enqueue_hugetlb_folio+0x46/0xe0
RSP: 0018:ffff9e0207f03d10 EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000122
RDX: ffffcbb244460008 RSI: dead000000000100 RDI: ffff976a09da6f90
RBP: ffffcbb244460000 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 7a088d6100000000 R12: ffffffffbcc93160
R13: 0000000000000246 R14: 0000000000000000 R15: 0000000000000000
FS:  00007fdb749b1740(0000) GS:ffff97711fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: dead000000000108 CR3: 00000001078ac000 CR4: 00000000000006f0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x38a00000 from 0xffffffff81000000 (relocation range: 0xffff=
ffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---

The root cause is that list_del() is used to remove folio from list when
dissolve_free_hugetlb_folio(). But list_move() might be used to reenqueue
hugetlb folio when free_huge_folio() leading to above panic. Fix this
issue by using list_del_init() to remove folio.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1da9a14a5513..08634732dca4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1642,7 +1642,7 @@ static void __remove_hugetlb_folio(struct hstate *h, =
struct folio *folio,
 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
 		return;
=20
-	list_del(&folio->lru);
+	list_del_init(&folio->lru);
=20
 	if (folio_test_hugetlb_freed(folio)) {
 		h->free_huge_pages--;
--=20
2.33.0