From nobody Fri Dec 19 15:23:15 2025 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 451061D0491 for ; Wed, 4 Sep 2024 11:13:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.101 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725448442; cv=none; b=rrN7gNeWs1TDJm5vkU3W32O8KQlBRj2JcpXqERB7Pbf3vh+U3W2i/XowbN2LzI3AffVgh3vN090vXKT56MY01DcP0mY1ETpqhtrg65NheOMQS49OZrnNyOGvS3OtCJ8dvjkuiUoDjTgb7mLWowitIltdcj2lY9ABl3tuSc7Txj0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725448442; c=relaxed/simple; bh=+Lu83yZcAgbnrKOwHQOf/9z+C6xsqgLzgxJ9XXGb+jI=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=GYQ1DBHZU0ad2heTbKmZpGYmUcD255IBG27ix+Ap3gR7S4QbTOZ9b873aJg/mdxmG8U0WfCstML9qGlqIf9Vl9/QFAgD3fIuWMLxiyf05gHhpj1EJAjAKhZWtLF1O5HWmkIhMPfBlnUHSUvU1wbQ/g+3RX9/+v04/lNA2us9tkE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=ZEYDAc/i; arc=none smtp.client-ip=115.124.30.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="ZEYDAc/i" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1725448436; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=1Wpx06Mx9JHuiExsk7qakCiyNk5ZorpNX57U9/Sy8hc=; b=ZEYDAc/i2Mf8YbzUZMu5BdmpYJzYXi8G1NdNJW6N15Mg0ywd6ZVgbOXlWVIyCfnr6/+WJkN5HBytu8J1SpxG7epnfQ0adJQCTlXoqIC07xssZ6FcjrPTimeyzwg0GolEJw3m4k7mgJetKxVbEbXZjJfPODLT8sqIyuNE/b6lBn0= Received: from x31i01179.sqa.na131.tbsite.net(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0WEHSWbl_1725448416) by smtp.aliyun-inc.com; Wed, 04 Sep 2024 19:13:56 +0800 From: Gao Xiang To: linux-erofs@lists.ozlabs.org Cc: LKML , Gao Xiang , syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com Subject: [PATCH] erofs: handle overlapped pclusters out of crafted images properly Date: Wed, 4 Sep 2024 19:13:34 +0800 Message-ID: <20240904111334.995920-1-hsiangkao@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" syzbot reported a task hang issue due to a deadlock case where it is waiting for the folio lock of a cached folio that will be used for cache I/Os. After looking into the crafted fuzzed image, I found it's formed with several overlapped big pclusters as below: Ext: logical offset | length : physical offset | length 0: 0.. 16384 | 16384 : 151552.. 167936 | 16384 1: 16384.. 32768 | 16384 : 155648.. 172032 | 16384 2: 32768.. 49152 | 16384 : 537223168.. 537239552 | 16384 ... Here, extent 0/1 are physically overlapped although it's entirely _impossible_ for normal filesystem images generated by mkfs. First, managed folios containing compressed data will be marked as up-to-date and then unlocked immediately (unlike in-place folios) when compressed I/Os are complete. If physical blocks are not submitted in the incremental order, there should be separate BIOs to avoid dependency issues. However, the current code mis-arranges z_erofs_fill_bio_vec() and BIO submission which causes unexpected BIO waits. Second, managed folios will be connected to their own pclusters for efficient inter-queries. However, this is somewhat hard to implement easily if overlapped big pclusters exist. Again, these only appear in fuzzed images so let's simply fall back to temporary short-lived pages for correctness. Additionally, it justifies that referenced managed folios cannot be truncated for now and reverts part of commit 2080ca1ed3e4 ("erofs: tidy up `struct z_erofs_bvec`") for simplicity although it shouldn't be any difference. Reported-by: syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/0000000000002fda01061e334873@google.com Fixes: 8e6c8fa9f2e9 ("erofs: enable big pcluster feature") Signed-off-by: Gao Xiang --- fs/erofs/zdata.c | 56 +++++++++++++++++++++++++----------------------- 1 file changed, 29 insertions(+), 27 deletions(-) diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 424f656cd765..382e2e4e60e5 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -1428,6 +1428,7 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec, struct z_erofs_bvec zbv; struct address_space *mapping; struct folio *folio; + struct page *page; int bs =3D i_blocksize(f->inode); =20 /* Except for inplace folios, the entire folio can be used for I/Os */ @@ -1450,7 +1451,6 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec, * file-backed folios will be used instead. */ if (folio->private =3D=3D (void *)Z_EROFS_PREALLOCATED_PAGE) { - folio->private =3D 0; tocache =3D true; goto out_tocache; } @@ -1468,7 +1468,7 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec, } =20 folio_lock(folio); - if (folio->mapping =3D=3D mc) { + if (likely(folio->mapping =3D=3D mc)) { /* * The cached folio is still in managed cache but without * a valid `->private` pcluster hint. Let's reconnect them. @@ -1478,41 +1478,44 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bv= ec, /* compressed_bvecs[] already takes a ref before */ folio_put(folio); } - - /* no need to submit if it is already up-to-date */ - if (folio_test_uptodate(folio)) { - folio_unlock(folio); - bvec->bv_page =3D NULL; + if (likely(folio->private =3D=3D pcl)) { + /* don't submit cache I/Os again if already uptodate */ + if (folio_test_uptodate(folio)) { + folio_unlock(folio); + bvec->bv_page =3D NULL; + } + return; } - return; + /* + * Already linked with another pcluster, which only appears in + * crafted images by fuzzers for now. But handle this anyway. + */ + tocache =3D false; /* use temporary short-lived pages */ + } else { + DBG_BUGON(1); /* referenced managed folios can't be truncated */ + tocache =3D true; } - - /* - * It has been truncated, so it's unsafe to reuse this one. Let's - * allocate a new page for compressed data. - */ - DBG_BUGON(folio->mapping); - tocache =3D true; folio_unlock(folio); folio_put(folio); out_allocfolio: - zbv.page =3D erofs_allocpage(&f->pagepool, gfp | __GFP_NOFAIL); + page =3D erofs_allocpage(&f->pagepool, gfp | __GFP_NOFAIL); spin_lock(&pcl->obj.lockref.lock); - if (pcl->compressed_bvecs[nr].page) { - erofs_pagepool_add(&f->pagepool, zbv.page); + if (unlikely(pcl->compressed_bvecs[nr].page !=3D zbv.page)) { + erofs_pagepool_add(&f->pagepool, page); spin_unlock(&pcl->obj.lockref.lock); cond_resched(); goto repeat; } - bvec->bv_page =3D pcl->compressed_bvecs[nr].page =3D zbv.page; - folio =3D page_folio(zbv.page); - /* first mark it as a temporary shortlived folio (now 1 ref) */ - folio->private =3D (void *)Z_EROFS_SHORTLIVED_PAGE; + bvec->bv_page =3D pcl->compressed_bvecs[nr].page =3D page; + folio =3D page_folio(page); spin_unlock(&pcl->obj.lockref.lock); out_tocache: if (!tocache || bs !=3D PAGE_SIZE || - filemap_add_folio(mc, folio, pcl->obj.index + nr, gfp)) + filemap_add_folio(mc, folio, pcl->obj.index + nr, gfp)) { + /* turn into a temporary shortlived folio (1 ref) */ + folio->private =3D (void *)Z_EROFS_SHORTLIVED_PAGE; return; + } folio_attach_private(folio, pcl); /* drop a refcount added by allocpage (then 2 refs in total here) */ folio_put(folio); @@ -1647,10 +1650,6 @@ static void z_erofs_submit_queue(struct z_erofs_deco= mpress_frontend *f, cur =3D mdev.m_pa; end =3D cur + pcl->pclustersize; do { - z_erofs_fill_bio_vec(&bvec, f, pcl, i++, mc); - if (!bvec.bv_page) - continue; - if (bio && (cur !=3D last_pa || bio->bi_bdev !=3D mdev.m_bdev)) { io_retry: @@ -1665,6 +1664,9 @@ static void z_erofs_submit_queue(struct z_erofs_decom= press_frontend *f, } bio =3D NULL; } + z_erofs_fill_bio_vec(&bvec, f, pcl, i++, mc); + if (!bvec.bv_page) + continue; =20 if (unlikely(PageWorkingset(bvec.bv_page)) && !memstall) { --=20 2.43.5