From nobody Fri Dec 19 12:33:09 2025 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FDCE16403; Wed, 20 Dec 2023 06:52:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R221e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046056;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0VytDXe5_1703055119; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VytDXe5_1703055119) by smtp.aliyun-inc.com; Wed, 20 Dec 2023 14:52:00 +0800 From: Baolin Wang To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, nphamcs@gmail.com, david@redhat.com, ying.huang@intel.com, shy828301@gmail.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH] mm: memcg: fix split queue list crash when large folio migration Date: Wed, 20 Dec 2023 14:51:40 +0800 Message-Id: <61273e5e9b490682388377c20f52d19de4a80460.1703054559.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When running autonuma with enabling multi-size THP, I encountered the follo= wing kernel crash issue: [ 134.290216] list_del corruption. prev->next should be fffff9ad42e1c490, but was dead000000000100. (prev=3Dfffff9ad42399890) [ 134.290877] kernel BUG at lib/list_debug.c:62! [ 134.291052] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 134.291210] CPU: 56 PID: 8037 Comm: numa01 Kdump: loaded Tainted: G E 6.7.0-rc4+ #20 [ 134.291649] RIP: 0010:__list_del_entry_valid_or_report+0x97/0xb0 ...... [ 134.294252] Call Trace: [ 134.294362] [ 134.294440] ? die+0x33/0x90 [ 134.294561] ? do_trap+0xe0/0x110 ...... [ 134.295681] ? __list_del_entry_valid_or_report+0x97/0xb0 [ 134.295842] folio_undo_large_rmappable+0x99/0x100 [ 134.296003] destroy_large_folio+0x68/0x70 [ 134.296172] migrate_folio_move+0x12e/0x260 [ 134.296264] ? __pfx_remove_migration_pte+0x10/0x10 [ 134.296389] migrate_pages_batch+0x495/0x6b0 [ 134.296523] migrate_pages+0x1d0/0x500 [ 134.296646] ? __pfx_alloc_misplaced_dst_folio+0x10/0x10 [ 134.296799] migrate_misplaced_folio+0x12d/0x2b0 [ 134.296953] do_numa_page+0x1f4/0x570 [ 134.297121] __handle_mm_fault+0x2b0/0x6c0 [ 134.297254] handle_mm_fault+0x107/0x270 [ 134.300897] do_user_addr_fault+0x167/0x680 [ 134.304561] exc_page_fault+0x65/0x140 [ 134.307919] asm_exc_page_fault+0x22/0x30 The reason for the crash is that, the commit 85ce2c517ade ("memcontrol: only transfer the memcg data for migration") removed the charging and uncharging operations of the migration folios and cleared the memcg data of the old fo= lio. During the subsequent release process of the old large folio in destroy_lar= ge_folio(), if the large folio needs to be removed from the split queue, an incorrect s= plit queue can be obtained (which is pgdat->deferred_split_queue) because the old folio's memcg is NULL now. This can lead to list operations being performed under the wrong split queue lock protection, resulting in a list crash as a= bove. After the migration, the old folio is going to be freed, so we can remove it from the split queue in mem_cgroup_migrate() a bit earlier before clearing = the memcg data to avoid getting incorrect split queue. Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migratio= n") Signed-off-by: Baolin Wang Reviewed-by: Nhat Pham --- mm/huge_memory.c | 2 +- mm/memcontrol.c | 11 +++++++++++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6be1a380a298..c50dc2e1483f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3124,7 +3124,7 @@ void folio_undo_large_rmappable(struct folio *folio) spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (!list_empty(&folio->_deferred_list)) { ds_queue->split_queue_len--; - list_del(&folio->_deferred_list); + list_del_init(&folio->_deferred_list); } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ae8c62c7aa53..e66e0811cccc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7575,6 +7575,17 @@ void mem_cgroup_migrate(struct folio *old, struct fo= lio *new) =20 /* Transfer the charge and the css ref */ commit_charge(new, memcg); + /* + * If the old folio a large folio and is in the split queue, it needs + * to be removed from the split queue now, in case getting an incorrect + * split queue in destroy_large_folio() after the memcg of the old folio + * is cleared. + * + * In addition, the old folio is about to be freed after migration, so + * removing from the split queue a bit earlier seems reasonable. + */ + if (folio_test_large(old) && folio_test_large_rmappable(old)) + folio_undo_large_rmappable(old); old->memcg_data =3D 0; } =20 --=20 2.39.3