From nobody Sun Feb 8 12:36:58 2026 Received: from mail-pg1-f193.google.com (mail-pg1-f193.google.com [209.85.215.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 27D54302756 for ; Fri, 6 Feb 2026 07:28:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.193 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770362885; cv=none; b=SUhFZF2ThQY0moVy20rJmdGMZIds+tzrySNXPSyWGpShNL6JqmRsiVQbrr4VoHagQkojLp1TKEERI/hI2Bh5LNITHEPY9yfUkJW2CFhxWGTOYdhQtqOQkyyv7Agz/buDfHwGE4/hTjlWiSLugmdMEzdH+1K++kDKkXq5qRGw/A0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770362885; c=relaxed/simple; bh=COYXNN/43QFCdq06nL+TjyuCUWrX8/dycIRCVy/CySY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=vEEb6scKBA0vGefKNZHAgJ5VdrylqOBUvxi7MpV79ftKCaxx0LPjubdq+wU1q5U/0xJrI/9WODdIFRm6PHJrVCfOGn3LWKhEMMDhuE+iiTVcsfQLhh6IHM/p14Zgt7sK7/LiHLZ/WbnhQN/cSXNNOothMH0armKHzwHPnL2r3qU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.215.193 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pg1-f193.google.com with SMTP id 41be03b00d2f7-c636487ccaeso614873a12.1 for ; Thu, 05 Feb 2026 23:28:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770362884; x=1770967684; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=e7e+Bb5PHWmSDIH9J3ooT3ilhrdRJUQ2C+oURT38CMQ=; b=E74PkUCilIHZ+AYsLQKeXVSSO5a6FhhjCLfG/c7VupVC1+oXP0nX2L1zxJ7UW1a7fW yj2xx4DJieCyk7IZA3OB+XFMcCoXdnVNaPMaqANxyQk6I0TRFRVnTfX7wSa6DRUucKRN D/mjZpZTv5or1b3vNl21s1Wtxoguf/7MsoGev9SY3T4G+nL/Ye6lkispOWiv5+EblP4I vD3QzedU+TOPFFvC9FZTzoRUgs5731MhSgrU5/KscBohjXsIgQIgkoUKIE1MHa3nZhvL z43a89w3ZbFxjLBnLHvT5jj3CZYcWpeEfIA1CIDPEB7x1MOzgc9iG3CGjSf4D+U9iX2V Lyqg== X-Forwarded-Encrypted: i=1; AJvYcCVLnrlOe6YZjfIXmNwq1jt0T1uGwzZ/+x+D0N9Jy0a7V8l0cAsB09ONkiWsDOuWB8ddfztvKdlM8xpm4Kc=@vger.kernel.org X-Gm-Message-State: AOJu0YyszmR+NQ4l4WFcjCOgjfIhlO5AAxdCFhqpdzQYkeufdKwWzv68 6zy0DAW+rRC4XloNKAVFCEfSpNzKYIQD0XVnq/QOds7Wq3a0BTiFd/tB X-Gm-Gg: AZuq6aKMEGuZJCBWW9Nkzn1FIOYG5QSNUQPedHSHtVB2hMNnDNY950x5RFtkwWrPr++ VlO+BYOK8XPf71BHTyKfl3HXcxIgKefczEeiHwiEzp1IzLuJwBbcLW/4/u7tsPpr4xd3XLdAdPP 5QqiVTaiwxPXQ9XIh9Oz0pbYfS7LsEqrw7+bNcuUNpSRYj8RaZQT7rS3kM6hQYkjVdDw9k0DiWk pvC+l0sXJtVlp6ez3J5oGdmgAAjWholReqeC3rc1mD+3vs74Qoa6hdOqPm5hQeOctihlvI9CH47 XyjqjWHnXXw2i9+A+gQMWB1b54pkRzRc5cT+70vgCaYJ53ZfaZ/CoNHdVnl4AB3RUp5w+m2ImOw 4Xw1aYuL7yGKLbfdIvOWw7PN9C2IxwzzuEWOVZJV3VO5rT+vK/30p5fljy6CdUcAYZBj3EdIEYi mT44ywu5J3z5sAjqVbA6ihdpj+/g== X-Received: by 2002:a17:902:f70e:b0:2a9:4c2:e46 with SMTP id d9443c01a7336-2a95225f9bfmr19440585ad.54.1770362882991; Thu, 05 Feb 2026 23:28:02 -0800 (PST) Received: from localhost.localdomain ([1.227.206.162]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2a951c7a047sm13575125ad.27.2026.02.05.23.27.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Feb 2026 23:28:02 -0800 (PST) From: Namjae Jeon To: viro@zeniv.linux.org.uk, brauner@kernel.org, hch@lst.de, tytso@mit.edu, willy@infradead.org, jack@suse.cz, djwong@kernel.org, dsterba@suse.com, pali@kernel.org, amir73il@gmail.com, xiang@kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Namjae Jeon , Hyunchul Lee Subject: [PATCH v8 06/17] ntfs: update mft operations Date: Fri, 6 Feb 2026 16:18:49 +0900 Message-Id: <20260206071900.6800-7-linkinjeon@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260206071900.6800-1-linkinjeon@kernel.org> References: <20260206071900.6800-1-linkinjeon@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactors MFT record handling to use folio APIs with consistency validation, and improving allocation extension and writeback paths for $MFT and $MFTMirr. Acked-by: Christoph Hellwig Signed-off-by: Hyunchul Lee Signed-off-by: Namjae Jeon --- fs/ntfs/mft.c | 2497 +++++++++++++++++++++++++------------------------ fs/ntfs/mst.c | 65 +- 2 files changed, 1290 insertions(+), 1272 deletions(-) diff --git a/fs/ntfs/mft.c b/fs/ntfs/mft.c index 6fd1dc4b08c8..79fbe90e2bb4 100644 --- a/fs/ntfs/mft.c +++ b/fs/ntfs/mft.c @@ -1,57 +1,119 @@ // SPDX-License-Identifier: GPL-2.0-or-later /* - * mft.c - NTFS kernel mft record operations. Part of the Linux-NTFS proje= ct. + * NTFS kernel mft record operations. + * Part of this file is based on code from the NTFS-3G. * * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc. * Copyright (c) 2002 Richard Russon + * Copyright (c) 2025 LG Electronics Co., Ltd. */ =20 -#include -#include -#include +#include #include +#include =20 -#include "attrib.h" -#include "aops.h" #include "bitmap.h" -#include "debug.h" -#include "dir.h" #include "lcnalloc.h" -#include "malloc.h" #include "mft.h" #include "ntfs.h" =20 -#define MAX_BHS (PAGE_SIZE / NTFS_BLOCK_SIZE) +/* + * ntfs_mft_record_check - Check the consistency of an MFT record + * + * Make sure its general fields are safe, then examine all its + * attributes and apply generic checks to them. + * + * Returns 0 if the checks are successful. If not, return -EIO. + */ +int ntfs_mft_record_check(const struct ntfs_volume *vol, struct mft_record= *m, + unsigned long mft_no) +{ + struct attr_record *a; + struct super_block *sb =3D vol->sb; + + if (!ntfs_is_file_record(m->magic)) { + ntfs_error(sb, "Record %llu has no FILE magic (0x%x)\n", + (unsigned long long)mft_no, le32_to_cpu(*(__le32 *)m)); + goto err_out; + } + + if (le16_to_cpu(m->usa_ofs) & 0x1 || + (vol->mft_record_size >> NTFS_BLOCK_SIZE_BITS) + 1 !=3D le16_to_cpu(m= ->usa_count) || + le16_to_cpu(m->usa_ofs) + le16_to_cpu(m->usa_count) * 2 > vol->mft_re= cord_size) { + ntfs_error(sb, "Record %llu has corrupt fix-up values fields\n", + (unsigned long long)mft_no); + goto err_out; + } + + if (le32_to_cpu(m->bytes_allocated) !=3D vol->mft_record_size) { + ntfs_error(sb, "Record %llu has corrupt allocation size (%u <> %u)\n", + (unsigned long long)mft_no, + vol->mft_record_size, + le32_to_cpu(m->bytes_allocated)); + goto err_out; + } + + if (le32_to_cpu(m->bytes_in_use) > vol->mft_record_size) { + ntfs_error(sb, "Record %llu has corrupt in-use size (%u > %u)\n", + (unsigned long long)mft_no, + le32_to_cpu(m->bytes_in_use), + vol->mft_record_size); + goto err_out; + } + + if (le16_to_cpu(m->attrs_offset) & 7) { + ntfs_error(sb, "Attributes badly aligned in record %llu\n", + (unsigned long long)mft_no); + goto err_out; + } + + a =3D (struct attr_record *)((char *)m + le16_to_cpu(m->attrs_offset)); + if ((char *)a < (char *)m || (char *)a > (char *)m + vol->mft_record_size= ) { + ntfs_error(sb, "Record %llu is corrupt\n", + (unsigned long long)mft_no); + goto err_out; + } + + return 0; + +err_out: + return -EIO; +} =20 -/** - * map_mft_record_page - map the page in which a specific mft record resid= es +/* + * map_mft_record_folio - map the folio in which a specific mft record res= ides * @ni: ntfs inode whose mft record page to map * - * This maps the page in which the mft record of the ntfs inode @ni is sit= uated - * and returns a pointer to the mft record within the mapped page. + * This maps the folio in which the mft record of the ntfs inode @ni is + * situated. + * + * This allocates a new buffer (@ni->mrec), copies the MFT record data from + * the mapped folio into this buffer, and applies the MST (Multi Sector + * Transfer) fixups on the copy. * - * Return value needs to be checked with IS_ERR() and if that is true PTR_= ERR() - * contains the negative error code returned. + * The folio is pinned (referenced) in @ni->folio to ensure the data remai= ns + * valid in the page cache, but the returned pointer is the allocated copy. + * + * Return: A pointer to the allocated and fixed-up mft record (@ni->mrec). + * The return value needs to be checked with IS_ERR(). If it is true, + * PTR_ERR() contains the negative error code. */ -static inline MFT_RECORD *map_mft_record_page(ntfs_inode *ni) +static inline struct mft_record *map_mft_record_folio(struct ntfs_inode *n= i) { loff_t i_size; - ntfs_volume *vol =3D ni->vol; + struct ntfs_volume *vol =3D ni->vol; struct inode *mft_vi =3D vol->mft_ino; - struct page *page; + struct folio *folio; unsigned long index, end_index; - unsigned ofs; + unsigned int ofs; =20 - BUG_ON(ni->page); + WARN_ON(ni->folio); /* * The index into the page cache and the offset within the page cache - * page of the wanted mft record. FIXME: We need to check for - * overflowing the unsigned long, but I don't think we would ever get - * here if the volume was that big... + * page of the wanted mft record. */ - index =3D (u64)ni->mft_no << vol->mft_record_size_bits >> - PAGE_SHIFT; - ofs =3D (ni->mft_no << vol->mft_record_size_bits) & ~PAGE_MASK; + index =3D NTFS_MFT_NR_TO_PIDX(vol, ni->mft_no); + ofs =3D NTFS_MFT_NR_TO_POFS(vol, ni->mft_no); =20 i_size =3D i_size_read(mft_vi); /* The maximum valid index into the page cache for $MFT's data. */ @@ -61,169 +123,126 @@ static inline MFT_RECORD *map_mft_record_page(ntfs_i= node *ni) if (unlikely(index >=3D end_index)) { if (index > end_index || (i_size & ~PAGE_MASK) < ofs + vol->mft_record_size) { - page =3D ERR_PTR(-ENOENT); - ntfs_error(vol->sb, "Attempt to read mft record 0x%lx, " - "which is beyond the end of the mft. " - "This is probably a bug in the ntfs " - "driver.", ni->mft_no); + folio =3D ERR_PTR(-ENOENT); + ntfs_error(vol->sb, + "Attempt to read mft record 0x%lx, which is beyond the end of the mft.= This is probably a bug in the ntfs driver.", + ni->mft_no); goto err_out; } } - /* Read, map, and pin the page. */ - page =3D ntfs_map_page(mft_vi->i_mapping, index); - if (!IS_ERR(page)) { + + /* Read, map, and pin the folio. */ + folio =3D read_mapping_folio(mft_vi->i_mapping, index, NULL); + if (!IS_ERR(folio)) { + u8 *addr; + + ni->mrec =3D kmalloc(vol->mft_record_size, GFP_NOFS); + if (!ni->mrec) { + folio_put(folio); + folio =3D ERR_PTR(-ENOMEM); + goto err_out; + } + + addr =3D kmap_local_folio(folio, 0); + memcpy(ni->mrec, addr + ofs, vol->mft_record_size); + post_read_mst_fixup((struct ntfs_record *)ni->mrec, vol->mft_record_size= ); + /* Catch multi sector transfer fixup errors. */ - if (likely(ntfs_is_mft_recordp((le32*)(page_address(page) + - ofs)))) { - ni->page =3D page; - ni->page_ofs =3D ofs; - return page_address(page) + ofs; + if (!ntfs_mft_record_check(vol, (struct mft_record *)ni->mrec, ni->mft_n= o)) { + kunmap_local(addr); + ni->folio =3D folio; + ni->folio_ofs =3D ofs; + return ni->mrec; } - ntfs_error(vol->sb, "Mft record 0x%lx is corrupt. " - "Run chkdsk.", ni->mft_no); - ntfs_unmap_page(page); - page =3D ERR_PTR(-EIO); + kunmap_local(addr); + folio_put(folio); + kfree(ni->mrec); + ni->mrec =3D NULL; + folio =3D ERR_PTR(-EIO); NVolSetErrors(vol); } err_out: - ni->page =3D NULL; - ni->page_ofs =3D 0; - return (void*)page; + ni->folio =3D NULL; + ni->folio_ofs =3D 0; + return (struct mft_record *)folio; } =20 -/** - * map_mft_record - map, pin and lock an mft record +/* + * map_mft_record - map and pin an mft record * @ni: ntfs inode whose MFT record to map * - * First, take the mrec_lock mutex. We might now be sleeping, while waiti= ng - * for the mutex if it was already locked by someone else. - * - * The page of the record is mapped using map_mft_record_page() before bei= ng - * returned to the caller. - * - * This in turn uses ntfs_map_page() to get the page containing the wanted= mft - * record (it in turn calls read_cache_page() which reads it in from disk = if - * necessary, increments the use count on the page so that it cannot disap= pear - * under us and returns a reference to the page cache page). - * - * If read_cache_page() invokes ntfs_readpage() to load the page from disk= , it - * sets PG_locked and clears PG_uptodate on the page. Once I/O has complet= ed - * and the post-read mst fixups on each mft record in the page have been - * performed, the page gets PG_uptodate set and PG_locked cleared (this is= done - * in our asynchronous I/O completion handler end_buffer_read_mft_async()). - * ntfs_map_page() waits for PG_locked to become clear and checks if - * PG_uptodate is set and returns an error code if not. This provides - * sufficient protection against races when reading/using the page. - * - * However there is the write mapping to think about. Doing the above desc= ribed - * checking here will be fine, because when initiating the write we will s= et - * PG_locked and clear PG_uptodate making sure nobody is touching the page - * contents. Doing the locking this way means that the commit to disk code= in - * the page cache code paths is automatically sufficiently locked with us = as - * we will not touch a page that has been locked or is not uptodate. The o= nly - * locking problem then is them locking the page while we are accessing it. - * - * So that code will end up having to own the mrec_lock of all mft - * records/inodes present in the page before I/O can proceed. In that case= we - * wouldn't need to bother with PG_locked and PG_uptodate as nobody will be - * accessing anything without owning the mrec_lock mutex. But we do need = to - * use them because of the read_cache_page() invocation and the code becom= es so - * much simpler this way that it is well worth it. - * - * The mft record is now ours and we return a pointer to it. You need to c= heck - * the returned pointer with IS_ERR() and if that is true, PTR_ERR() will = return - * the error code. - * - * NOTE: Caller is responsible for setting the mft record dirty before cal= ling - * unmap_mft_record(). This is obviously only necessary if the caller real= ly - * modified the mft record... - * Q: Do we want to recycle one of the VFS inode state bits instead? - * A: No, the inode ones mean we want to change the mft record, not we wan= t to - * write it out. + * This function ensures the MFT record for the given inode is mapped and + * accessible. + * + * It increments the reference count of the ntfs inode. If the record is + * already mapped (@ni->folio is set), it returns the cached record + * immediately. + * + * Otherwise, it calls map_mft_record_folio() to read the folio from disk + * (if necessary via read_mapping_folio), allocate a buffer, and copy the + * record data. + * + * Return: A pointer to the mft record. You need to check the returned + * pointer with IS_ERR(). */ -MFT_RECORD *map_mft_record(ntfs_inode *ni) +struct mft_record *map_mft_record(struct ntfs_inode *ni) { - MFT_RECORD *m; + struct mft_record *m; + + if (!ni) + return ERR_PTR(-EINVAL); =20 ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no); =20 /* Make sure the ntfs inode doesn't go away. */ atomic_inc(&ni->count); =20 - /* Serialize access to this mft record. */ - mutex_lock(&ni->mrec_lock); + if (ni->folio) + return (struct mft_record *)ni->mrec; =20 - m =3D map_mft_record_page(ni); + m =3D map_mft_record_folio(ni); if (!IS_ERR(m)) return m; =20 - mutex_unlock(&ni->mrec_lock); atomic_dec(&ni->count); ntfs_error(ni->vol->sb, "Failed with error code %lu.", -PTR_ERR(m)); return m; } =20 -/** - * unmap_mft_record_page - unmap the page in which a specific mft record r= esides - * @ni: ntfs inode whose mft record page to unmap - * - * This unmaps the page in which the mft record of the ntfs inode @ni is - * situated and returns. This is a NOOP if highmem is not configured. - * - * The unmap happens via ntfs_unmap_page() which in turn decrements the use - * count on the page thus releasing it from the pinned state. - * - * We do not actually unmap the page from memory of course, as that will be - * done by the page cache code itself when memory pressure increases or - * whatever. - */ -static inline void unmap_mft_record_page(ntfs_inode *ni) -{ - BUG_ON(!ni->page); - - // TODO: If dirty, blah... - ntfs_unmap_page(ni->page); - ni->page =3D NULL; - ni->page_ofs =3D 0; - return; -} - -/** - * unmap_mft_record - release a mapped mft record +/* + * unmap_mft_record - release a reference to a mapped mft record * @ni: ntfs inode whose MFT record to unmap * - * We release the page mapping and the mrec_lock mutex which unmaps the mft - * record and releases it for others to get hold of. We also release the n= tfs - * inode by decrementing the ntfs inode reference count. + * This decrements the reference count of the ntfs inode. + * + * It releases the caller's hold on the inode. If the reference count indi= cates + * that there are still other users (count > 1), the function returns + * immediately, keeping the resources (folio and mrec buffer) pinned for + * those users. * * NOTE: If caller has modified the mft record, it is imperative to set th= e mft * record dirty BEFORE calling unmap_mft_record(). */ -void unmap_mft_record(ntfs_inode *ni) +void unmap_mft_record(struct ntfs_inode *ni) { - struct page *page =3D ni->page; + struct folio *folio; =20 - BUG_ON(!page); + if (!ni) + return; =20 ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no); =20 - unmap_mft_record_page(ni); - mutex_unlock(&ni->mrec_lock); - atomic_dec(&ni->count); - /* - * If pure ntfs_inode, i.e. no vfs inode attached, we leave it to - * ntfs_clear_extent_inode() in the extent inode case, and to the - * caller in the non-extent, yet pure ntfs inode case, to do the actual - * tear down of all structures and freeing of all allocated memory. - */ - return; + folio =3D ni->folio; + if (atomic_dec_return(&ni->count) > 1) + return; + WARN_ON(!folio); } =20 -/** +/* * map_extent_mft_record - load an extent inode and attach it to its base * @base_ni: base ntfs inode * @mref: mft reference of the extent inode to load - * @ntfs_ino: on successful return, pointer to the ntfs_inode structure + * @ntfs_ino: on successful return, pointer to the struct ntfs_inode struc= ture * * Load the extent mft record @mref and attach it to its base inode @base_= ni. * Return the mapped extent mft record if IS_ERR(result) is false. Otherw= ise @@ -232,12 +251,12 @@ void unmap_mft_record(ntfs_inode *ni) * On successful return, @ntfs_ino contains a pointer to the ntfs_inode * structure of the mapped extent inode. */ -MFT_RECORD *map_extent_mft_record(ntfs_inode *base_ni, MFT_REF mref, - ntfs_inode **ntfs_ino) +struct mft_record *map_extent_mft_record(struct ntfs_inode *base_ni, u64 m= ref, + struct ntfs_inode **ntfs_ino) { - MFT_RECORD *m; - ntfs_inode *ni =3D NULL; - ntfs_inode **extent_nis =3D NULL; + struct mft_record *m; + struct ntfs_inode *ni =3D NULL; + struct ntfs_inode **extent_nis =3D NULL; int i; unsigned long mft_no =3D MREF(mref); u16 seq_no =3D MSEQNO(mref); @@ -252,6 +271,7 @@ MFT_RECORD *map_extent_mft_record(ntfs_inode *base_ni, = MFT_REF mref, * in which case just return it. If not found, add it to the base * inode before returning it. */ +retry: mutex_lock(&base_ni->extent_lock); if (base_ni->nr_extents > 0) { extent_nis =3D base_ni->ext.extent_ntfs_inos; @@ -279,20 +299,21 @@ MFT_RECORD *map_extent_mft_record(ntfs_inode *base_ni= , MFT_REF mref, return m; } unmap_mft_record(ni); - ntfs_error(base_ni->vol->sb, "Found stale extent mft " - "reference! Corrupt filesystem. " - "Run chkdsk."); + ntfs_error(base_ni->vol->sb, + "Found stale extent mft reference! Corrupt filesystem. Run chkdsk."); return ERR_PTR(-EIO); } map_err_out: - ntfs_error(base_ni->vol->sb, "Failed to map extent " - "mft record, error code %ld.", -PTR_ERR(m)); + ntfs_error(base_ni->vol->sb, + "Failed to map extent mft record, error code %ld.", + -PTR_ERR(m)); return m; } + mutex_unlock(&base_ni->extent_lock); + /* Record wasn't there. Get a new ntfs inode and initialize it. */ ni =3D ntfs_new_extent_inode(base_ni->vol->sb, mft_no); if (unlikely(!ni)) { - mutex_unlock(&base_ni->extent_lock); atomic_dec(&base_ni->count); return ERR_PTR(-ENOMEM); } @@ -303,37 +324,44 @@ MFT_RECORD *map_extent_mft_record(ntfs_inode *base_ni= , MFT_REF mref, /* Now map the record. */ m =3D map_mft_record(ni); if (IS_ERR(m)) { - mutex_unlock(&base_ni->extent_lock); atomic_dec(&base_ni->count); ntfs_clear_extent_inode(ni); goto map_err_out; } /* Verify the sequence number if it is present. */ if (seq_no && (le16_to_cpu(m->sequence_number) !=3D seq_no)) { - ntfs_error(base_ni->vol->sb, "Found stale extent mft " - "reference! Corrupt filesystem. Run chkdsk."); + ntfs_error(base_ni->vol->sb, + "Found stale extent mft reference! Corrupt filesystem. Run chkdsk."); destroy_ni =3D true; m =3D ERR_PTR(-EIO); - goto unm_err_out; + goto unm_nolock_err_out; + } + + mutex_lock(&base_ni->extent_lock); + for (i =3D 0; i < base_ni->nr_extents; i++) { + if (mft_no =3D=3D extent_nis[i]->mft_no) { + mutex_unlock(&base_ni->extent_lock); + ntfs_clear_extent_inode(ni); + goto retry; + } } /* Attach extent inode to base inode, reallocating memory if needed. */ if (!(base_ni->nr_extents & 3)) { - ntfs_inode **tmp; - int new_size =3D (base_ni->nr_extents + 4) * sizeof(ntfs_inode *); + struct ntfs_inode **tmp; + int new_size =3D (base_ni->nr_extents + 4) * sizeof(struct ntfs_inode *); =20 - tmp =3D kmalloc(new_size, GFP_NOFS); + tmp =3D kvzalloc(new_size, GFP_NOFS); if (unlikely(!tmp)) { - ntfs_error(base_ni->vol->sb, "Failed to allocate " - "internal buffer."); + ntfs_error(base_ni->vol->sb, "Failed to allocate internal buffer."); destroy_ni =3D true; m =3D ERR_PTR(-ENOMEM); goto unm_err_out; } if (base_ni->nr_extents) { - BUG_ON(!base_ni->ext.extent_ntfs_inos); + WARN_ON(!base_ni->ext.extent_ntfs_inos); memcpy(tmp, base_ni->ext.extent_ntfs_inos, new_size - - 4 * sizeof(ntfs_inode *)); - kfree(base_ni->ext.extent_ntfs_inos); + 4 * sizeof(struct ntfs_inode *)); + kvfree(base_ni->ext.extent_ntfs_inos); } base_ni->ext.extent_ntfs_inos =3D tmp; } @@ -344,8 +372,9 @@ MFT_RECORD *map_extent_mft_record(ntfs_inode *base_ni, = MFT_REF mref, *ntfs_ino =3D ni; return m; unm_err_out: - unmap_mft_record(ni); mutex_unlock(&base_ni->extent_lock); +unm_nolock_err_out: + unmap_mft_record(ni); atomic_dec(&base_ni->count); /* * If the extent inode was not attached to the base inode we need to @@ -356,18 +385,14 @@ MFT_RECORD *map_extent_mft_record(ntfs_inode *base_ni= , MFT_REF mref, return m; } =20 -#ifdef NTFS_RW - -/** - * __mark_mft_record_dirty - set the mft record and the page containing it= dirty +/* + * __mark_mft_record_dirty - mark the base vfs inode dirty * @ni: ntfs inode describing the mapped mft record * * Internal function. Users should call mark_mft_record_dirty() instead. * - * Set the mapped (extent) mft record of the (base or extent) ntfs inode @= ni, - * as well as the page containing the mft record, dirty. Also, mark the b= ase - * vfs inode dirty. This ensures that any changes to the mft record are - * written out to disk. + * This function determines the base ntfs inode (in case @ni is an extent + * inode) and marks the corresponding VFS inode dirty. * * NOTE: We only set I_DIRTY_DATASYNC (and not I_DIRTY_PAGES) * on the base vfs inode, because even though file data may have been modi= fied, @@ -381,64 +406,40 @@ MFT_RECORD *map_extent_mft_record(ntfs_inode *base_ni= , MFT_REF mref, * I_DIRTY_SYNC, since the file data has not actually hit the block device= yet, * which is not what I_DIRTY_SYNC on its own would suggest. */ -void __mark_mft_record_dirty(ntfs_inode *ni) +void __mark_mft_record_dirty(struct ntfs_inode *ni) { - ntfs_inode *base_ni; + struct ntfs_inode *base_ni; =20 ntfs_debug("Entering for inode 0x%lx.", ni->mft_no); - BUG_ON(NInoAttr(ni)); - mark_ntfs_record_dirty(ni->page, ni->page_ofs); + WARN_ON(NInoAttr(ni)); /* Determine the base vfs inode and mark it dirty, too. */ - mutex_lock(&ni->extent_lock); if (likely(ni->nr_extents >=3D 0)) base_ni =3D ni; else base_ni =3D ni->ext.base_ntfs_ino; - mutex_unlock(&ni->extent_lock); __mark_inode_dirty(VFS_I(base_ni), I_DIRTY_DATASYNC); } =20 -static const char *ntfs_please_email =3D "Please email " - "linux-ntfs-dev@lists.sourceforge.net and say that you saw " - "this message. Thank you."; - -/** - * ntfs_sync_mft_mirror_umount - synchronise an mft record to the mft mirr= or - * @vol: ntfs volume on which the mft record to synchronize resides - * @mft_no: mft record number of mft record to synchronize - * @m: mapped, mst protected (extent) mft record to synchronize - * - * Write the mapped, mst protected (extent) mft record @m with mft record - * number @mft_no to the mft mirror ($MFTMirr) of the ntfs volume @vol, - * bypassing the page cache and the $MFTMirr inode itself. - * - * This function is only for use at umount time when the mft mirror inode = has - * already been disposed off. We BUG() if we are called while the mft mir= ror - * inode is still attached to the volume. - * - * On success return 0. On error return -errno. +/* + * ntfs_bio_end_io - bio completion callback for MFT record writes * - * NOTE: This function is not implemented yet as I am not convinced it can - * actually be triggered considering the sequence of commits we do in supe= r.c:: - * ntfs_put_super(). But just in case we provide this place holder as the - * alternative would be either to BUG() or to get a NULL pointer dereferen= ce - * and Oops. + * Decrements the folio reference count that was incremented before + * submit_bio(). This prevents a race condition where umount could + * evict the inode and release the folio while I/O is still in flight, + * potentially causing data corruption or use-after-free. */ -static int ntfs_sync_mft_mirror_umount(ntfs_volume *vol, - const unsigned long mft_no, MFT_RECORD *m) +static void ntfs_bio_end_io(struct bio *bio) { - BUG_ON(vol->mftmirr_ino); - ntfs_error(vol->sb, "Umount time mft mirror syncing is not " - "implemented yet. %s", ntfs_please_email); - return -EOPNOTSUPP; + if (bio->bi_private) + folio_put((struct folio *)bio->bi_private); + bio_put(bio); } =20 -/** +/* * ntfs_sync_mft_mirror - synchronize an mft record to the mft mirror * @vol: ntfs volume on which the mft record to synchronize resides * @mft_no: mft record number of mft record to synchronize * @m: mapped, mst protected (extent) mft record to synchronize - * @sync: if true, wait for i/o completion * * Write the mapped, mst protected (extent) mft record @m with mft record * number @mft_no to the mft mirror ($MFTMirr) of the ntfs volume @vol. @@ -446,187 +447,81 @@ static int ntfs_sync_mft_mirror_umount(ntfs_volume *= vol, * On success return 0. On error return -errno and set the volume errors = flag * in the ntfs volume @vol. * - * NOTE: We always perform synchronous i/o and ignore the @sync parameter. - * - * TODO: If @sync is false, want to do truly asynchronous i/o, i.e. just - * schedule i/o via ->writepage or do it via kntfsd or whatever. + * NOTE: We always perform synchronous i/o. */ -int ntfs_sync_mft_mirror(ntfs_volume *vol, const unsigned long mft_no, - MFT_RECORD *m, int sync) +int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const unsigned long mft_= no, + struct mft_record *m) { - struct page *page; - unsigned int blocksize =3D vol->sb->s_blocksize; - int max_bhs =3D vol->mft_record_size / blocksize; - struct buffer_head *bhs[MAX_BHS]; - struct buffer_head *bh, *head; - u8 *kmirr; - runlist_element *rl; - unsigned int block_start, block_end, m_start, m_end, page_ofs; - int i_bhs, nr_bhs, err =3D 0; - unsigned char blocksize_bits =3D vol->sb->s_blocksize_bits; + u8 *kmirr =3D NULL; + struct folio *folio; + unsigned int folio_ofs, lcn_folio_off =3D 0; + int err =3D 0; + struct bio *bio; =20 ntfs_debug("Entering for inode 0x%lx.", mft_no); - BUG_ON(!max_bhs); - if (WARN_ON(max_bhs > MAX_BHS)) - return -EINVAL; + if (unlikely(!vol->mftmirr_ino)) { /* This could happen during umount... */ - err =3D ntfs_sync_mft_mirror_umount(vol, mft_no, m); - if (likely(!err)) - return err; + err =3D -EIO; goto err_out; } /* Get the page containing the mirror copy of the mft record @m. */ - page =3D ntfs_map_page(vol->mftmirr_ino->i_mapping, mft_no >> - (PAGE_SHIFT - vol->mft_record_size_bits)); - if (IS_ERR(page)) { + folio =3D read_mapping_folio(vol->mftmirr_ino->i_mapping, + NTFS_MFT_NR_TO_PIDX(vol, mft_no), NULL); + if (IS_ERR(folio)) { ntfs_error(vol->sb, "Failed to map mft mirror page."); - err =3D PTR_ERR(page); + err =3D PTR_ERR(folio); goto err_out; } - lock_page(page); - BUG_ON(!PageUptodate(page)); - ClearPageUptodate(page); + + folio_lock(folio); + folio_clear_uptodate(folio); /* Offset of the mft mirror record inside the page. */ - page_ofs =3D (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK; + folio_ofs =3D NTFS_MFT_NR_TO_POFS(vol, mft_no); /* The address in the page of the mirror copy of the mft record @m. */ - kmirr =3D page_address(page) + page_ofs; + kmirr =3D kmap_local_folio(folio, 0) + folio_ofs; /* Copy the mst protected mft record to the mirror. */ memcpy(kmirr, m, vol->mft_record_size); - /* Create uptodate buffers if not present. */ - if (unlikely(!page_has_buffers(page))) { - struct buffer_head *tail; - - bh =3D head =3D alloc_page_buffers(page, blocksize, true); - do { - set_buffer_uptodate(bh); - tail =3D bh; - bh =3D bh->b_this_page; - } while (bh); - tail->b_this_page =3D head; - attach_page_private(page, head); - } - bh =3D head =3D page_buffers(page); - BUG_ON(!bh); - rl =3D NULL; - nr_bhs =3D 0; - block_start =3D 0; - m_start =3D kmirr - (u8*)page_address(page); - m_end =3D m_start + vol->mft_record_size; - do { - block_end =3D block_start + blocksize; - /* If the buffer is outside the mft record, skip it. */ - if (block_end <=3D m_start) - continue; - if (unlikely(block_start >=3D m_end)) - break; - /* Need to map the buffer if it is not mapped already. */ - if (unlikely(!buffer_mapped(bh))) { - VCN vcn; - LCN lcn; - unsigned int vcn_ofs; - - bh->b_bdev =3D vol->sb->s_bdev; - /* Obtain the vcn and offset of the current block. */ - vcn =3D ((VCN)mft_no << vol->mft_record_size_bits) + - (block_start - m_start); - vcn_ofs =3D vcn & vol->cluster_size_mask; - vcn >>=3D vol->cluster_size_bits; - if (!rl) { - down_read(&NTFS_I(vol->mftmirr_ino)-> - runlist.lock); - rl =3D NTFS_I(vol->mftmirr_ino)->runlist.rl; - /* - * $MFTMirr always has the whole of its runlist - * in memory. - */ - BUG_ON(!rl); - } - /* Seek to element containing target vcn. */ - while (rl->length && rl[1].vcn <=3D vcn) - rl++; - lcn =3D ntfs_rl_vcn_to_lcn(rl, vcn); - /* For $MFTMirr, only lcn >=3D 0 is a successful remap. */ - if (likely(lcn >=3D 0)) { - /* Setup buffer head to correct block. */ - bh->b_blocknr =3D ((lcn << - vol->cluster_size_bits) + - vcn_ofs) >> blocksize_bits; - set_buffer_mapped(bh); - } else { - bh->b_blocknr =3D -1; - ntfs_error(vol->sb, "Cannot write mft mirror " - "record 0x%lx because its " - "location on disk could not " - "be determined (error code " - "%lli).", mft_no, - (long long)lcn); - err =3D -EIO; - } - } - BUG_ON(!buffer_uptodate(bh)); - BUG_ON(!nr_bhs && (m_start !=3D block_start)); - BUG_ON(nr_bhs >=3D max_bhs); - bhs[nr_bhs++] =3D bh; - BUG_ON((nr_bhs >=3D max_bhs) && (m_end !=3D block_end)); - } while (block_start =3D block_end, (bh =3D bh->b_this_page) !=3D head); - if (unlikely(rl)) - up_read(&NTFS_I(vol->mftmirr_ino)->runlist.lock); - if (likely(!err)) { - /* Lock buffers and start synchronous write i/o on them. */ - for (i_bhs =3D 0; i_bhs < nr_bhs; i_bhs++) { - struct buffer_head *tbh =3D bhs[i_bhs]; - - if (!trylock_buffer(tbh)) - BUG(); - BUG_ON(!buffer_uptodate(tbh)); - clear_buffer_dirty(tbh); - get_bh(tbh); - tbh->b_end_io =3D end_buffer_write_sync; - submit_bh(REQ_OP_WRITE, tbh); - } - /* Wait on i/o completion of buffers. */ - for (i_bhs =3D 0; i_bhs < nr_bhs; i_bhs++) { - struct buffer_head *tbh =3D bhs[i_bhs]; =20 - wait_on_buffer(tbh); - if (unlikely(!buffer_uptodate(tbh))) { - err =3D -EIO; - /* - * Set the buffer uptodate so the page and - * buffer states do not become out of sync. - */ - set_buffer_uptodate(tbh); - } - } - } else /* if (unlikely(err)) */ { - /* Clean the buffers. */ - for (i_bhs =3D 0; i_bhs < nr_bhs; i_bhs++) - clear_buffer_dirty(bhs[i_bhs]); + if (vol->cluster_size_bits > PAGE_SHIFT) { + lcn_folio_off =3D folio->index << PAGE_SHIFT; + lcn_folio_off &=3D vol->cluster_size_mask; } + + bio =3D bio_alloc(vol->sb->s_bdev, 1, REQ_OP_WRITE, GFP_NOIO); + bio->bi_iter.bi_sector =3D + NTFS_B_TO_SECTOR(vol, NTFS_CLU_TO_B(vol, vol->mftmirr_lcn) + + lcn_folio_off + folio_ofs); + + if (!bio_add_folio(bio, folio, vol->mft_record_size, folio_ofs)) { + err =3D -EIO; + bio_put(bio); + goto unlock_folio; + } + + bio->bi_end_io =3D ntfs_bio_end_io; + submit_bio(bio); /* Current state: all buffers are clean, unlocked, and uptodate. */ - /* Remove the mst protection fixups again. */ - post_write_mst_fixup((NTFS_RECORD*)kmirr); - flush_dcache_page(page); - SetPageUptodate(page); - unlock_page(page); - ntfs_unmap_page(page); + folio_mark_uptodate(folio); + +unlock_folio: + folio_unlock(folio); + kunmap_local(kmirr); + folio_put(folio); if (likely(!err)) { ntfs_debug("Done."); } else { - ntfs_error(vol->sb, "I/O error while writing mft mirror " - "record 0x%lx!", mft_no); + ntfs_error(vol->sb, "I/O error while writing mft mirror record 0x%lx!", = mft_no); err_out: - ntfs_error(vol->sb, "Failed to synchronize $MFTMirr (error " - "code %i). Volume will be left marked dirty " - "on umount. Run ntfsfix on the partition " - "after umounting to correct this.", -err); + ntfs_error(vol->sb, + "Failed to synchronize $MFTMirr (error code %i). Volume will be left m= arked dirty on umount. Run chkdsk on the partition after umounting to corr= ect this.", + err); NVolSetErrors(vol); } return err; } =20 -/** +/* * write_mft_record_nolock - write out a mapped (extent) mft record * @ni: ntfs inode describing the mapped (extent) mft record * @m: mapped (extent) mft record to write @@ -636,201 +531,103 @@ int ntfs_sync_mft_mirror(ntfs_volume *vol, const un= signed long mft_no, * ntfs inode @ni to backing store. If the mft record @m has a counterpar= t in * the mft mirror, that is also updated. * - * We only write the mft record if the ntfs inode @ni is dirty and the fir= st - * buffer belonging to its mft record is dirty, too. We ignore the dirty = state - * of subsequent buffers because we could have raced with - * fs/ntfs/aops.c::mark_ntfs_record_dirty(). - * - * On success, clean the mft record and return 0. On error, leave the mft - * record dirty and return -errno. - * - * NOTE: We always perform synchronous i/o and ignore the @sync parameter. - * However, if the mft record has a counterpart in the mft mirror and @syn= c is - * true, we write the mft record, wait for i/o completion, and only then w= rite - * the mft mirror copy. This ensures that if the system crashes either th= e mft - * or the mft mirror will contain a self-consistent mft record @m. If @sy= nc is - * false on the other hand, we start i/o on both and then wait for complet= ion - * on them. This provides a speedup but no longer guarantees that you wil= l end - * up with a self-consistent mft record in the case of a crash but if you = asked - * for asynchronous writing you probably do not care about that anyway. - * - * TODO: If @sync is false, want to do truly asynchronous i/o, i.e. just - * schedule i/o via ->writepage or do it via kntfsd or whatever. + * We only write the mft record if the ntfs inode @ni is dirty. + * + * On success, clean the mft record and return 0. + * On error (specifically ENOMEM), we redirty the record so it can be retr= ied. + * For other errors, we mark the volume with errors. */ -int write_mft_record_nolock(ntfs_inode *ni, MFT_RECORD *m, int sync) +int write_mft_record_nolock(struct ntfs_inode *ni, struct mft_record *m, i= nt sync) { - ntfs_volume *vol =3D ni->vol; - struct page *page =3D ni->page; - unsigned int blocksize =3D vol->sb->s_blocksize; - unsigned char blocksize_bits =3D vol->sb->s_blocksize_bits; - int max_bhs =3D vol->mft_record_size / blocksize; - struct buffer_head *bhs[MAX_BHS]; - struct buffer_head *bh, *head; - runlist_element *rl; - unsigned int block_start, block_end, m_start, m_end; - int i_bhs, nr_bhs, err =3D 0; + struct ntfs_volume *vol =3D ni->vol; + struct folio *folio =3D ni->folio; + int err =3D 0, i =3D 0; + u8 *kaddr; + struct mft_record *fixup_m; + struct bio *bio; + unsigned int offset =3D 0, folio_size; =20 ntfs_debug("Entering for inode 0x%lx.", ni->mft_no); - BUG_ON(NInoAttr(ni)); - BUG_ON(!max_bhs); - BUG_ON(!PageLocked(page)); - if (WARN_ON(max_bhs > MAX_BHS)) { - err =3D -EINVAL; - goto err_out; - } + + WARN_ON(NInoAttr(ni)); + WARN_ON(!folio_test_locked(folio)); + /* - * If the ntfs_inode is clean no need to do anything. If it is dirty, + * If the struct ntfs_inode is clean no need to do anything. If it is di= rty, * mark it as clean now so that it can be redirtied later on if needed. * There is no danger of races since the caller is holding the locks * for the mft record @m and the page it is in. */ if (!NInoTestClearDirty(ni)) goto done; - bh =3D head =3D page_buffers(page); - BUG_ON(!bh); - rl =3D NULL; - nr_bhs =3D 0; - block_start =3D 0; - m_start =3D ni->page_ofs; - m_end =3D m_start + vol->mft_record_size; - do { - block_end =3D block_start + blocksize; - /* If the buffer is outside the mft record, skip it. */ - if (block_end <=3D m_start) - continue; - if (unlikely(block_start >=3D m_end)) - break; - /* - * If this block is not the first one in the record, we ignore - * the buffer's dirty state because we could have raced with a - * parallel mark_ntfs_record_dirty(). - */ - if (block_start =3D=3D m_start) { - /* This block is the first one in the record. */ - if (!buffer_dirty(bh)) { - BUG_ON(nr_bhs); - /* Clean records are not written out. */ - break; - } - } - /* Need to map the buffer if it is not mapped already. */ - if (unlikely(!buffer_mapped(bh))) { - VCN vcn; - LCN lcn; - unsigned int vcn_ofs; - - bh->b_bdev =3D vol->sb->s_bdev; - /* Obtain the vcn and offset of the current block. */ - vcn =3D ((VCN)ni->mft_no << vol->mft_record_size_bits) + - (block_start - m_start); - vcn_ofs =3D vcn & vol->cluster_size_mask; - vcn >>=3D vol->cluster_size_bits; - if (!rl) { - down_read(&NTFS_I(vol->mft_ino)->runlist.lock); - rl =3D NTFS_I(vol->mft_ino)->runlist.rl; - BUG_ON(!rl); - } - /* Seek to element containing target vcn. */ - while (rl->length && rl[1].vcn <=3D vcn) - rl++; - lcn =3D ntfs_rl_vcn_to_lcn(rl, vcn); - /* For $MFT, only lcn >=3D 0 is a successful remap. */ - if (likely(lcn >=3D 0)) { - /* Setup buffer head to correct block. */ - bh->b_blocknr =3D ((lcn << - vol->cluster_size_bits) + - vcn_ofs) >> blocksize_bits; - set_buffer_mapped(bh); - } else { - bh->b_blocknr =3D -1; - ntfs_error(vol->sb, "Cannot write mft record " - "0x%lx because its location " - "on disk could not be " - "determined (error code %lli).", - ni->mft_no, (long long)lcn); - err =3D -EIO; - } - } - BUG_ON(!buffer_uptodate(bh)); - BUG_ON(!nr_bhs && (m_start !=3D block_start)); - BUG_ON(nr_bhs >=3D max_bhs); - bhs[nr_bhs++] =3D bh; - BUG_ON((nr_bhs >=3D max_bhs) && (m_end !=3D block_end)); - } while (block_start =3D block_end, (bh =3D bh->b_this_page) !=3D head); - if (unlikely(rl)) - up_read(&NTFS_I(vol->mft_ino)->runlist.lock); - if (!nr_bhs) - goto done; - if (unlikely(err)) - goto cleanup_out; + + kaddr =3D kmap_local_folio(folio, 0); + fixup_m =3D (struct mft_record *)(kaddr + ni->folio_ofs); + memcpy(fixup_m, m, vol->mft_record_size); + /* Apply the mst protection fixups. */ - err =3D pre_write_mst_fixup((NTFS_RECORD*)m, vol->mft_record_size); + err =3D pre_write_mst_fixup((struct ntfs_record *)fixup_m, vol->mft_recor= d_size); if (err) { ntfs_error(vol->sb, "Failed to apply mst fixups!"); - goto cleanup_out; - } - flush_dcache_mft_record_page(ni); - /* Lock buffers and start synchronous write i/o on them. */ - for (i_bhs =3D 0; i_bhs < nr_bhs; i_bhs++) { - struct buffer_head *tbh =3D bhs[i_bhs]; - - if (!trylock_buffer(tbh)) - BUG(); - BUG_ON(!buffer_uptodate(tbh)); - clear_buffer_dirty(tbh); - get_bh(tbh); - tbh->b_end_io =3D end_buffer_write_sync; - submit_bh(REQ_OP_WRITE, tbh); - } - /* Synchronize the mft mirror now if not @sync. */ - if (!sync && ni->mft_no < vol->mftmirr_size) - ntfs_sync_mft_mirror(vol, ni->mft_no, m, sync); - /* Wait on i/o completion of buffers. */ - for (i_bhs =3D 0; i_bhs < nr_bhs; i_bhs++) { - struct buffer_head *tbh =3D bhs[i_bhs]; - - wait_on_buffer(tbh); - if (unlikely(!buffer_uptodate(tbh))) { + goto err_out; + } + + folio_size =3D vol->mft_record_size / ni->mft_lcn_count; + while (i < ni->mft_lcn_count) { + unsigned int clu_off; + + clu_off =3D (unsigned int)((s64)ni->mft_no * vol->mft_record_size + offs= et) & + vol->cluster_size_mask; + + bio =3D bio_alloc(vol->sb->s_bdev, 1, REQ_OP_WRITE, GFP_NOIO); + bio->bi_iter.bi_sector =3D + NTFS_B_TO_SECTOR(vol, NTFS_CLU_TO_B(vol, ni->mft_lcn[i]) + + clu_off); + + if (!bio_add_folio(bio, folio, folio_size, + ni->folio_ofs + offset)) { err =3D -EIO; - /* - * Set the buffer uptodate so the page and buffer - * states do not become out of sync. - */ - if (PageUptodate(page)) - set_buffer_uptodate(tbh); + goto put_bio_out; } + + /* Synchronize the mft mirror now if not @sync. */ + if (!sync && ni->mft_no < vol->mftmirr_size) + ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m); + + folio_get(folio); + bio->bi_private =3D folio; + bio->bi_end_io =3D ntfs_bio_end_io; + submit_bio(bio); + offset +=3D vol->cluster_size; + i++; } + /* If @sync, now synchronize the mft mirror. */ if (sync && ni->mft_no < vol->mftmirr_size) - ntfs_sync_mft_mirror(vol, ni->mft_no, m, sync); - /* Remove the mst protection fixups again. */ - post_write_mst_fixup((NTFS_RECORD*)m); - flush_dcache_mft_record_page(ni); + ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m); + kunmap_local(kaddr); if (unlikely(err)) { /* I/O error during writing. This is really bad! */ - ntfs_error(vol->sb, "I/O error while writing mft record " - "0x%lx! Marking base inode as bad. You " - "should unmount the volume and run chkdsk.", - ni->mft_no); + ntfs_error(vol->sb, + "I/O error while writing mft record 0x%lx! Marking base inode as bad. = You should unmount the volume and run chkdsk.", + ni->mft_no); goto err_out; } done: ntfs_debug("Done."); return 0; -cleanup_out: - /* Clean the buffers. */ - for (i_bhs =3D 0; i_bhs < nr_bhs; i_bhs++) - clear_buffer_dirty(bhs[i_bhs]); +put_bio_out: + bio_put(bio); err_out: /* * Current state: all buffers are clean, unlocked, and uptodate. * The caller should mark the base inode as bad so that no more i/o - * happens. ->clear_inode() will still be invoked so all extent inodes + * happens. ->drop_inode() will still be invoked so all extent inodes * and other allocated memory will be freed. */ if (err =3D=3D -ENOMEM) { - ntfs_error(vol->sb, "Not enough memory to write mft record. " - "Redirtying so the write is retried later."); + ntfs_error(vol->sb, + "Not enough memory to write mft record. Redirtying so the write is retr= ied later."); mark_mft_record_dirty(ni); err =3D 0; } else @@ -838,12 +635,46 @@ int write_mft_record_nolock(ntfs_inode *ni, MFT_RECOR= D *m, int sync) return err; } =20 -/** +static int ntfs_test_inode_wb(struct inode *vi, unsigned long ino, void *d= ata) +{ + struct ntfs_attr *na =3D data; + + if (!ntfs_test_inode(vi, na)) + return 0; + + /* + * Without this, ntfs_write_mst_block() could call iput_final() + * , and ntfs_evict_big_inode() could try to unlink this inode + * and the contex could be blocked infinitly in map_mft_record(). + */ + if (NInoBeingDeleted(NTFS_I(vi))) { + na->state =3D NI_BeingDeleted; + return -1; + } + + /* + * This condition can prevent ntfs_write_mst_block() + * from applying/undo fixups while ntfs_create() being + * called + */ + spin_lock(&vi->i_lock); + if (inode_state_read_once(vi) & I_CREATING) { + spin_unlock(&vi->i_lock); + na->state =3D NI_BeingCreated; + return -1; + } + spin_unlock(&vi->i_lock); + + return igrab(vi) ? 1 : -1; +} + +/* * ntfs_may_write_mft_record - check if an mft record may be written out * @vol: [IN] ntfs volume on which the mft record to check resides * @mft_no: [IN] mft record number of the mft record to check * @m: [IN] mapped mft record to check * @locked_ni: [OUT] caller has to unlock this ntfs inode if one is return= ed + * @ref_vi: [OUT] caller has to drop this vfs inode if one is returned * * Check if the mapped (base or extent) mft record @m with mft record numb= er * @mft_no belonging to the ntfs volume @vol may be written out. If neces= sary @@ -852,23 +683,23 @@ int write_mft_record_nolock(ntfs_inode *ni, MFT_RECOR= D *m, int sync) * caller is responsible for unlocking the ntfs inode and unpinning the ba= se * vfs inode. * + * To avoid deadlock when the caller holds a folio lock, if the function + * returns @ref_vi it defers dropping the vfs inode reference by returning + * it in @ref_vi instead of calling iput() directly. The caller must call + * iput() on @ref_vi after releasing the folio lock. + * * Return 'true' if the mft record may be written out and 'false' if not. * * The caller has locked the page and cleared the uptodate flag on it which * means that we can safely write out any dirty mft records that do not ha= ve - * their inodes in icache as determined by ilookup5() as anyone - * opening/creating such an inode would block when attempting to map the m= ft - * record in read_cache_page() until we are finished with the write out. + * their inodes in icache as determined by find_inode_nowait(). * * Here is a description of the tests we perform: * * If the inode is found in icache we know the mft record must be a base m= ft * record. If it is dirty, we do not write it and return 'false' as the v= fs * inode write paths will result in the access times being updated which w= ould - * cause the base mft record to be redirtied and written out again. (We k= now - * the access time update will modify the base mft record because Windows - * chkdsk complains if the standard information attribute is not in the ba= se - * mft record.) + * cause the base mft record to be redirtied and written out again. * * If the inode is in icache and not dirty, we attempt to lock the mft rec= ord * and if we find the lock was already taken, it is not safe to write the = mft @@ -879,9 +710,7 @@ int write_mft_record_nolock(ntfs_inode *ni, MFT_RECORD = *m, int sync) * @locked_ni to the locked ntfs inode and return 'true'. * * Note we cannot just lock the mft record and sleep while waiting for the= lock - * because this would deadlock due to lock reversal (normally the mft reco= rd is - * locked before the page is locked but we already have the page locked he= re - * when we try to lock the mft record). + * because this would deadlock due to lock reversal. * * If the inode is not in icache we need to perform further checks. * @@ -889,58 +718,46 @@ int write_mft_record_nolock(ntfs_inode *ni, MFT_RECOR= D *m, int sync) * safely write it and return 'true'. * * We now know the mft record is an extent mft record. We check if the in= ode - * corresponding to its base mft record is in icache and obtain a referenc= e to - * it if it is. If it is not, we can safely write it and return 'true'. + * corresponding to its base mft record is in icache. If it is not, we can= not + * safely determine the state of the extent inode, so we return 'false'. * * We now have the base inode for the extent mft record. We check if it h= as an - * ntfs inode for the extent mft record attached and if not it is safe to = write + * ntfs inode for the extent mft record attached. If not, it is safe to wr= ite * the extent mft record and we return 'true'. * - * The ntfs inode for the extent mft record is attached to the base inode = so we - * attempt to lock the extent mft record and if we find the lock was alrea= dy - * taken, it is not safe to write the extent mft record and we return 'fal= se'. + * If the extent inode is attached, we check if it is dirty. If so, we ret= urn + * 'false' (letting the standard write_inode path handle it). + * + * If it is not dirty, we attempt to lock the extent mft record. If the lo= ck + * was already taken, it is not safe to write and we return 'false'. * * If we manage to obtain the lock we have exclusive access to the extent = mft - * record, which also allows us safe writeout of the extent mft record. We - * set the ntfs inode of the extent mft record clean and then set @locked_= ni to - * the now locked ntfs inode and return 'true'. - * - * Note, the reason for actually writing dirty mft records here and not ju= st - * relying on the vfs inode dirty code paths is that we can have mft recor= ds - * modified without them ever having actual inodes in memory. Also we can= have - * dirty mft records with clean ntfs inodes in memory. None of the descri= bed - * cases would result in the dirty mft records being written out if we only - * relied on the vfs inode dirty code paths. And these cases can really o= ccur - * during allocation of new mft records and in particular when the - * initialized_size of the $MFT/$DATA attribute is extended and the new sp= ace - * is initialized using ntfs_mft_record_format(). The clean inode can then - * appear if the mft record is reused for a new inode before it got written - * out. + * record. We set @locked_ni to the now locked ntfs inode and return 'true= '. */ -bool ntfs_may_write_mft_record(ntfs_volume *vol, const unsigned long mft_n= o, - const MFT_RECORD *m, ntfs_inode **locked_ni) +bool ntfs_may_write_mft_record(struct ntfs_volume *vol, const unsigned lon= g mft_no, + const struct mft_record *m, struct ntfs_inode **locked_ni, + struct inode **ref_vi) { struct super_block *sb =3D vol->sb; struct inode *mft_vi =3D vol->mft_ino; struct inode *vi; - ntfs_inode *ni, *eni, **extent_nis; + struct ntfs_inode *ni, *eni, **extent_nis; int i; - ntfs_attr na; + struct ntfs_attr na =3D {0}; =20 ntfs_debug("Entering for inode 0x%lx.", mft_no); /* * Normally we do not return a locked inode so set @locked_ni to NULL. */ - BUG_ON(!locked_ni); *locked_ni =3D NULL; + *ref_vi =3D NULL; + /* * Check if the inode corresponding to this mft record is in the VFS * inode cache and obtain a reference to it if it is. */ ntfs_debug("Looking for inode 0x%lx in icache.", mft_no); na.mft_no =3D mft_no; - na.name =3D NULL; - na.name_len =3D 0; na.type =3D AT_UNUSED; /* * Optimize inode 0, i.e. $MFT itself, since we have it in memory and @@ -949,16 +766,16 @@ bool ntfs_may_write_mft_record(ntfs_volume *vol, cons= t unsigned long mft_no, if (!mft_no) { /* Balance the below iput(). */ vi =3D igrab(mft_vi); - BUG_ON(vi !=3D mft_vi); + WARN_ON(vi !=3D mft_vi); } else { /* - * Have to use ilookup5_nowait() since ilookup5() waits for the - * inode lock which causes ntfs to deadlock when a concurrent - * inode write via the inode dirty code paths and the page - * dirty code path of the inode dirty code path when writing - * $MFT occurs. + * Have to use find_inode_nowait() since ilookup5_nowait() + * waits for inode with I_FREEING, which causes ntfs to deadlock + * when inodes are unlinked concurrently */ - vi =3D ilookup5_nowait(sb, mft_no, ntfs_test_inode, &na); + vi =3D find_inode_nowait(sb, mft_no, ntfs_test_inode_wb, &na); + if (na.state =3D=3D NI_BeingDeleted || na.state =3D=3D NI_BeingCreated) + return false; } if (vi) { ntfs_debug("Base inode 0x%lx is in icache.", mft_no); @@ -971,16 +788,15 @@ bool ntfs_may_write_mft_record(ntfs_volume *vol, cons= t unsigned long mft_no, ntfs_debug("Inode 0x%lx is dirty, do not write it.", mft_no); atomic_dec(&ni->count); - iput(vi); + *ref_vi =3D vi; return false; } ntfs_debug("Inode 0x%lx is not dirty.", mft_no); /* The inode is not dirty, try to take the mft record lock. */ if (unlikely(!mutex_trylock(&ni->mrec_lock))) { - ntfs_debug("Mft record 0x%lx is already locked, do " - "not write it.", mft_no); + ntfs_debug("Mft record 0x%lx is already locked, do not write it.", mft_= no); atomic_dec(&ni->count); - iput(vi); + *ref_vi =3D vi; return false; } ntfs_debug("Managed to lock mft record 0x%lx, write it.", @@ -1012,24 +828,21 @@ bool ntfs_may_write_mft_record(ntfs_volume *vol, con= st unsigned long mft_no, * is. */ na.mft_no =3D MREF_LE(m->base_mft_record); - ntfs_debug("Mft record 0x%lx is an extent record. Looking for base " - "inode 0x%lx in icache.", mft_no, na.mft_no); + na.state =3D 0; + ntfs_debug("Mft record 0x%lx is an extent record. Looking for base inode= 0x%lx in icache.", + mft_no, na.mft_no); if (!na.mft_no) { /* Balance the below iput(). */ vi =3D igrab(mft_vi); - BUG_ON(vi !=3D mft_vi); - } else - vi =3D ilookup5_nowait(sb, na.mft_no, ntfs_test_inode, - &na); - if (!vi) { - /* - * The base inode is not in icache, write this extent mft - * record. - */ - ntfs_debug("Base inode 0x%lx is not in icache, write the " - "extent record.", na.mft_no); - return true; + WARN_ON(vi !=3D mft_vi); + } else { + vi =3D find_inode_nowait(sb, mft_no, ntfs_test_inode_wb, &na); + if (na.state =3D=3D NI_BeingDeleted || na.state =3D=3D NI_BeingCreated) + return false; } + + if (!vi) + return false; ntfs_debug("Base inode 0x%lx is in icache.", na.mft_no); /* * The base inode is in icache. Check if it has the extent inode @@ -1043,9 +856,9 @@ bool ntfs_may_write_mft_record(ntfs_volume *vol, const= unsigned long mft_no, * extent mft record. */ mutex_unlock(&ni->extent_lock); - iput(vi); - ntfs_debug("Base inode 0x%lx has no attached extent inodes, " - "write the extent record.", na.mft_no); + *ref_vi =3D vi; + ntfs_debug("Base inode 0x%lx has no attached extent inodes, write the ex= tent record.", + na.mft_no); return true; } /* Iterate over the attached extent inodes. */ @@ -1066,9 +879,8 @@ bool ntfs_may_write_mft_record(ntfs_volume *vol, const= unsigned long mft_no, */ if (!eni) { mutex_unlock(&ni->extent_lock); - iput(vi); - ntfs_debug("Extent inode 0x%lx is not attached to its base " - "inode 0x%lx, write the extent record.", + *ref_vi =3D vi; + ntfs_debug("Extent inode 0x%lx is not attached to its base inode 0x%lx, = write the extent record.", mft_no, na.mft_no); return true; } @@ -1077,22 +889,27 @@ bool ntfs_may_write_mft_record(ntfs_volume *vol, con= st unsigned long mft_no, /* Take a reference to the extent ntfs inode. */ atomic_inc(&eni->count); mutex_unlock(&ni->extent_lock); + + /* if extent inode is dirty, write_inode will write it */ + if (NInoDirty(eni)) { + atomic_dec(&eni->count); + *ref_vi =3D vi; + return false; + } + /* * Found the extent inode coresponding to this extent mft record. * Try to take the mft record lock. */ if (unlikely(!mutex_trylock(&eni->mrec_lock))) { atomic_dec(&eni->count); - iput(vi); - ntfs_debug("Extent mft record 0x%lx is already locked, do " - "not write it.", mft_no); + *ref_vi =3D vi; + ntfs_debug("Extent mft record 0x%lx is already locked, do not write it.", + mft_no); return false; } ntfs_debug("Managed to lock extent mft record 0x%lx, write it.", mft_no); - if (NInoTestClearDirty(eni)) - ntfs_debug("Extent inode 0x%lx is dirty, marking it clean.", - mft_no); /* * The write has to occur while we hold the mft record lock so return * the locked extent ntfs inode. @@ -1101,10 +918,11 @@ bool ntfs_may_write_mft_record(ntfs_volume *vol, con= st unsigned long mft_no, return true; } =20 -static const char *es =3D " Leaving inconsistent metadata. Unmount and r= un " - "chkdsk."; +static const char *es =3D " Leaving inconsistent metadata. Unmount and r= un chkdsk."; + +#define RESERVED_MFT_RECORDS 64 =20 -/** +/* * ntfs_mft_bitmap_find_and_alloc_free_rec_nolock - see name * @vol: volume on which to search for a free mft record * @base_ni: open base inode if allocating an extent mft record or NULL @@ -1123,19 +941,18 @@ static const char *es =3D " Leaving inconsistent me= tadata. Unmount and run " * * Locking: Caller must hold vol->mftbmp_lock for writing. */ -static int ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(ntfs_volume *vol, - ntfs_inode *base_ni) +static int ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(struct ntfs_volu= me *vol, + struct ntfs_inode *base_ni) { s64 pass_end, ll, data_pos, pass_start, ofs, bit; unsigned long flags; struct address_space *mftbmp_mapping; - u8 *buf, *byte; - struct page *page; - unsigned int page_ofs, size; + u8 *buf =3D NULL, *byte; + struct folio *folio; + unsigned int folio_ofs, size; u8 pass, b; =20 - ntfs_debug("Searching for free mft record in the currently " - "initialized mft bitmap."); + ntfs_debug("Searching for free mft record in the currently initialized mf= t bitmap."); mftbmp_mapping =3D vol->mftbmp_ino->i_mapping; /* * Set the end of the pass making sure we do not overflow the mft @@ -1155,26 +972,30 @@ static int ntfs_mft_bitmap_find_and_alloc_free_rec_n= olock(ntfs_volume *vol, data_pos =3D vol->mft_data_pos; else data_pos =3D base_ni->mft_no + 1; - if (data_pos < 24) - data_pos =3D 24; + if (data_pos < RESERVED_MFT_RECORDS) + data_pos =3D RESERVED_MFT_RECORDS; if (data_pos >=3D pass_end) { - data_pos =3D 24; + data_pos =3D RESERVED_MFT_RECORDS; pass =3D 2; /* This happens on a freshly formatted volume. */ if (data_pos >=3D pass_end) return -ENOSPC; } + + if (base_ni && base_ni->mft_no =3D=3D FILE_MFT) { + data_pos =3D 0; + pass =3D 2; + } + pass_start =3D data_pos; - ntfs_debug("Starting bitmap search: pass %u, pass_start 0x%llx, " - "pass_end 0x%llx, data_pos 0x%llx.", pass, - (long long)pass_start, (long long)pass_end, - (long long)data_pos); + ntfs_debug("Starting bitmap search: pass %u, pass_start 0x%llx, pass_end = 0x%llx, data_pos 0x%llx.", + pass, pass_start, pass_end, data_pos); /* Loop until a free mft record is found. */ for (; pass <=3D 2;) { /* Cap size to pass_end. */ ofs =3D data_pos >> 3; - page_ofs =3D ofs & ~PAGE_MASK; - size =3D PAGE_SIZE - page_ofs; + folio_ofs =3D ofs & ~PAGE_MASK; + size =3D PAGE_SIZE - folio_ofs; ll =3D ((pass_end + 7) >> 3) - ofs; if (size > ll) size =3D ll; @@ -1184,21 +1005,32 @@ static int ntfs_mft_bitmap_find_and_alloc_free_rec_= nolock(ntfs_volume *vol, * for a zero bit. */ if (size) { - page =3D ntfs_map_page(mftbmp_mapping, - ofs >> PAGE_SHIFT); - if (IS_ERR(page)) { - ntfs_error(vol->sb, "Failed to read mft " - "bitmap, aborting."); - return PTR_ERR(page); + folio =3D read_mapping_folio(mftbmp_mapping, + ofs >> PAGE_SHIFT, NULL); + if (IS_ERR(folio)) { + ntfs_error(vol->sb, "Failed to read mft bitmap, aborting."); + return PTR_ERR(folio); } - buf =3D (u8*)page_address(page) + page_ofs; + folio_lock(folio); + buf =3D (u8 *)kmap_local_folio(folio, 0) + folio_ofs; bit =3D data_pos & 7; data_pos &=3D ~7ull; - ntfs_debug("Before inner for loop: size 0x%x, " - "data_pos 0x%llx, bit 0x%llx", size, - (long long)data_pos, (long long)bit); + ntfs_debug("Before inner for loop: size 0x%x, data_pos 0x%llx, bit 0x%l= lx", + size, data_pos, bit); for (; bit < size && data_pos + bit < pass_end; bit &=3D ~7ull, bit +=3D 8) { + /* + * If we're extending $MFT and running out of the first + * mft record (base record) then give up searching since + * no guarantee that the found record will be accessible. + */ + if (base_ni && base_ni->mft_no =3D=3D FILE_MFT && bit > 400) { + folio_unlock(folio); + kunmap_local(buf); + folio_put(folio); + return -ENOSPC; + } + byte =3D buf + (bit >> 3); if (*byte =3D=3D 0xff) continue; @@ -1206,25 +1038,27 @@ static int ntfs_mft_bitmap_find_and_alloc_free_rec_= nolock(ntfs_volume *vol, if (b < 8 && b >=3D (bit & 7)) { ll =3D data_pos + (bit & ~7ull) + b; if (unlikely(ll > (1ll << 32))) { - ntfs_unmap_page(page); + folio_unlock(folio); + kunmap_local(buf); + folio_put(folio); return -ENOSPC; } *byte |=3D 1 << b; - flush_dcache_page(page); - set_page_dirty(page); - ntfs_unmap_page(page); - ntfs_debug("Done. (Found and " - "allocated mft record " - "0x%llx.)", - (long long)ll); + folio_mark_dirty(folio); + folio_unlock(folio); + kunmap_local(buf); + folio_put(folio); + ntfs_debug("Done. (Found and allocated mft record 0x%llx.)", + ll); return ll; } } - ntfs_debug("After inner for loop: size 0x%x, " - "data_pos 0x%llx, bit 0x%llx", size, - (long long)data_pos, (long long)bit); + ntfs_debug("After inner for loop: size 0x%x, data_pos 0x%llx, bit 0x%ll= x", + size, data_pos, bit); data_pos +=3D size; - ntfs_unmap_page(page); + folio_unlock(folio); + kunmap_local(buf); + folio_put(folio); /* * If the end of the pass has not been reached yet, * continue searching the mft bitmap for a zero bit. @@ -1239,21 +1073,48 @@ static int ntfs_mft_bitmap_find_and_alloc_free_rec_= nolock(ntfs_volume *vol, * part of the zone which we omitted earlier. */ pass_end =3D pass_start; - data_pos =3D pass_start =3D 24; - ntfs_debug("pass %i, pass_start 0x%llx, pass_end " - "0x%llx.", pass, (long long)pass_start, - (long long)pass_end); + data_pos =3D pass_start =3D RESERVED_MFT_RECORDS; + ntfs_debug("pass %i, pass_start 0x%llx, pass_end 0x%llx.", + pass, pass_start, pass_end); if (data_pos >=3D pass_end) break; } } /* No free mft records in currently initialized mft bitmap. */ - ntfs_debug("Done. (No free mft records left in currently initialized " - "mft bitmap.)"); + ntfs_debug("Done. (No free mft records left in currently initialized mft= bitmap.)"); return -ENOSPC; } =20 -/** +static int ntfs_mft_attr_extend(struct ntfs_inode *ni) +{ + int ret =3D 0; + struct ntfs_inode *base_ni; + + if (NInoAttr(ni)) + base_ni =3D ni->ext.base_ntfs_ino; + else + base_ni =3D ni; + + if (!NInoAttrList(base_ni)) { + ret =3D ntfs_inode_add_attrlist(base_ni); + if (ret) { + pr_err("Can not add attrlist\n"); + goto out; + } else { + ret =3D -EAGAIN; + goto out; + } + } + + ret =3D ntfs_attr_update_mapping_pairs(ni, 0); + if (ret) + pr_err("MP update failed\n"); + +out: + return ret; +} + +/* * ntfs_mft_bitmap_extend_allocation_nolock - extend mft bitmap by a clust= er * @vol: volume on which to extend the mft bitmap attribute * @@ -1270,17 +1131,17 @@ static int ntfs_mft_bitmap_find_and_alloc_free_rec_= nolock(ntfs_volume *vol, * - This function takes vol->lcnbmp_lock for writing and releases it * before returning. */ -static int ntfs_mft_bitmap_extend_allocation_nolock(ntfs_volume *vol) +static int ntfs_mft_bitmap_extend_allocation_nolock(struct ntfs_volume *vo= l) { - LCN lcn; + s64 lcn; s64 ll; unsigned long flags; - struct page *page; - ntfs_inode *mft_ni, *mftbmp_ni; - runlist_element *rl, *rl2 =3D NULL; - ntfs_attr_search_ctx *ctx =3D NULL; - MFT_RECORD *mrec; - ATTR_RECORD *a =3D NULL; + struct folio *folio; + struct ntfs_inode *mft_ni, *mftbmp_ni; + struct runlist_element *rl, *rl2 =3D NULL; + struct ntfs_attr_search_ctx *ctx =3D NULL; + struct mft_record *mrec; + struct attr_record *a =3D NULL; int ret, mp_size; u32 old_alen =3D 0; u8 *b, tb; @@ -1288,7 +1149,9 @@ static int ntfs_mft_bitmap_extend_allocation_nolock(n= tfs_volume *vol) u8 added_cluster:1; u8 added_run:1; u8 mp_rebuilt:1; - } status =3D { 0, 0, 0 }; + u8 mp_extended:1; + } status =3D { 0, 0, 0, 0 }; + size_t new_rl_count; =20 ntfs_debug("Extending mft bitmap allocation."); mft_ni =3D NTFS_I(vol->mft_ino); @@ -1302,11 +1165,11 @@ static int ntfs_mft_bitmap_extend_allocation_nolock= (ntfs_volume *vol) ll =3D mftbmp_ni->allocated_size; read_unlock_irqrestore(&mftbmp_ni->size_lock, flags); rl =3D ntfs_attr_find_vcn_nolock(mftbmp_ni, - (ll - 1) >> vol->cluster_size_bits, NULL); + NTFS_B_TO_CLU(vol, ll - 1), NULL); if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) { up_write(&mftbmp_ni->runlist.lock); - ntfs_error(vol->sb, "Failed to determine last allocated " - "cluster of mft bitmap attribute."); + ntfs_error(vol->sb, + "Failed to determine last allocated cluster of mft bitmap attribute."); if (!IS_ERR(rl)) ret =3D -EIO; else @@ -1322,54 +1185,59 @@ static int ntfs_mft_bitmap_extend_allocation_nolock= (ntfs_volume *vol) * to us. */ ll =3D lcn >> 3; - page =3D ntfs_map_page(vol->lcnbmp_ino->i_mapping, - ll >> PAGE_SHIFT); - if (IS_ERR(page)) { + folio =3D read_mapping_folio(vol->lcnbmp_ino->i_mapping, + ll >> PAGE_SHIFT, NULL); + if (IS_ERR(folio)) { up_write(&mftbmp_ni->runlist.lock); ntfs_error(vol->sb, "Failed to read from lcn bitmap."); - return PTR_ERR(page); + return PTR_ERR(folio); } - b =3D (u8*)page_address(page) + (ll & ~PAGE_MASK); - tb =3D 1 << (lcn & 7ull); + down_write(&vol->lcnbmp_lock); + folio_lock(folio); + b =3D (u8 *)kmap_local_folio(folio, 0) + (ll & ~PAGE_MASK); + tb =3D 1 << (lcn & 7ull); if (*b !=3D 0xff && !(*b & tb)) { /* Next cluster is free, allocate it. */ *b |=3D tb; - flush_dcache_page(page); - set_page_dirty(page); + folio_mark_dirty(folio); + folio_unlock(folio); + kunmap_local(b); + folio_put(folio); up_write(&vol->lcnbmp_lock); - ntfs_unmap_page(page); /* Update the mft bitmap runlist. */ rl->length++; rl[1].vcn++; status.added_cluster =3D 1; ntfs_debug("Appending one cluster to mft bitmap."); } else { + folio_unlock(folio); + kunmap_local(b); + folio_put(folio); up_write(&vol->lcnbmp_lock); - ntfs_unmap_page(page); /* Allocate a cluster from the DATA_ZONE. */ rl2 =3D ntfs_cluster_alloc(vol, rl[1].vcn, 1, lcn, DATA_ZONE, - true); + true, false, false); if (IS_ERR(rl2)) { up_write(&mftbmp_ni->runlist.lock); - ntfs_error(vol->sb, "Failed to allocate a cluster for " - "the mft bitmap."); + ntfs_error(vol->sb, + "Failed to allocate a cluster for the mft bitmap."); return PTR_ERR(rl2); } - rl =3D ntfs_runlists_merge(mftbmp_ni->runlist.rl, rl2); + rl =3D ntfs_runlists_merge(&mftbmp_ni->runlist, rl2, 0, &new_rl_count); if (IS_ERR(rl)) { up_write(&mftbmp_ni->runlist.lock); - ntfs_error(vol->sb, "Failed to merge runlists for mft " - "bitmap."); + ntfs_error(vol->sb, "Failed to merge runlists for mft bitmap."); if (ntfs_cluster_free_from_rl(vol, rl2)) { - ntfs_error(vol->sb, "Failed to deallocate " - "allocated cluster.%s", es); + ntfs_error(vol->sb, "Failed to deallocate allocated cluster.%s", + es); NVolSetErrors(vol); } - ntfs_free(rl2); + kvfree(rl2); return PTR_ERR(rl); } mftbmp_ni->runlist.rl =3D rl; + mftbmp_ni->runlist.count =3D new_rl_count; status.added_run =3D 1; ntfs_debug("Adding one run to mft bitmap."); /* Find the last run in the new runlist. */ @@ -1396,26 +1264,26 @@ static int ntfs_mft_bitmap_extend_allocation_nolock= (ntfs_volume *vol) mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx); if (unlikely(ret)) { - ntfs_error(vol->sb, "Failed to find last attribute extent of " - "mft bitmap attribute."); + ntfs_error(vol->sb, + "Failed to find last attribute extent of mft bitmap attribute."); if (ret =3D=3D -ENOENT) ret =3D -EIO; goto undo_alloc; } a =3D ctx->attr; - ll =3D sle64_to_cpu(a->data.non_resident.lowest_vcn); + ll =3D le64_to_cpu(a->data.non_resident.lowest_vcn); /* Search back for the previous last allocated cluster of mft bitmap. */ for (rl2 =3D rl; rl2 > mftbmp_ni->runlist.rl; rl2--) { if (ll >=3D rl2->vcn) break; } - BUG_ON(ll < rl2->vcn); - BUG_ON(ll >=3D rl2->vcn + rl2->length); + WARN_ON(ll < rl2->vcn); + WARN_ON(ll >=3D rl2->vcn + rl2->length); /* Get the size for the new mapping pairs array for this extent. */ - mp_size =3D ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1); + mp_size =3D ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1, -1); if (unlikely(mp_size <=3D 0)) { - ntfs_error(vol->sb, "Get size for mapping pairs failed for " - "mft bitmap attribute extent."); + ntfs_error(vol->sb, + "Get size for mapping pairs failed for mft bitmap attribute extent."); ret =3D mp_size; if (!ret) ret =3D -EIO; @@ -1426,76 +1294,67 @@ static int ntfs_mft_bitmap_extend_allocation_nolock= (ntfs_volume *vol) ret =3D ntfs_attr_record_resize(ctx->mrec, a, mp_size + le16_to_cpu(a->data.non_resident.mapping_pairs_offset)); if (unlikely(ret)) { - if (ret !=3D -ENOSPC) { - ntfs_error(vol->sb, "Failed to resize attribute " - "record for mft bitmap attribute."); - goto undo_alloc; - } - // TODO: Deal with this by moving this extent to a new mft - // record or by starting a new extent in a new mft record or by - // moving other attributes out of this mft record. - // Note: It will need to be a special mft record and if none of - // those are available it gets rather complicated... - ntfs_error(vol->sb, "Not enough space in this mft record to " - "accommodate extended mft bitmap attribute " - "extent. Cannot handle this yet."); - ret =3D -EOPNOTSUPP; + ret =3D ntfs_mft_attr_extend(mftbmp_ni); + if (!ret) + goto extended_ok; + status.mp_extended =3D 1; goto undo_alloc; } status.mp_rebuilt =3D 1; /* Generate the mapping pairs array directly into the attr record. */ - ret =3D ntfs_mapping_pairs_build(vol, (u8*)a + + ret =3D ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu(a->data.non_resident.mapping_pairs_offset), - mp_size, rl2, ll, -1, NULL); + mp_size, rl2, ll, -1, NULL, NULL, NULL); if (unlikely(ret)) { - ntfs_error(vol->sb, "Failed to build mapping pairs array for " - "mft bitmap attribute."); + ntfs_error(vol->sb, + "Failed to build mapping pairs array for mft bitmap attribute."); goto undo_alloc; } /* Update the highest_vcn. */ - a->data.non_resident.highest_vcn =3D cpu_to_sle64(rl[1].vcn - 1); + a->data.non_resident.highest_vcn =3D cpu_to_le64(rl[1].vcn - 1); /* * We now have extended the mft bitmap allocated_size by one cluster. - * Reflect this in the ntfs_inode structure and the attribute record. + * Reflect this in the struct ntfs_inode structure and the attribute reco= rd. */ if (a->data.non_resident.lowest_vcn) { /* * We are not in the first attribute extent, switch to it, but * first ensure the changes will make it to disk later. */ - flush_dcache_mft_record_page(ctx->ntfs_ino); mark_mft_record_dirty(ctx->ntfs_ino); ntfs_attr_reinit_search_ctx(ctx); ret =3D ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name, mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx); if (unlikely(ret)) { - ntfs_error(vol->sb, "Failed to find first attribute " - "extent of mft bitmap attribute."); + ntfs_error(vol->sb, + "Failed to find first attribute extent of mft bitmap attribute."); goto restore_undo_alloc; } a =3D ctx->attr; } + +extended_ok: write_lock_irqsave(&mftbmp_ni->size_lock, flags); mftbmp_ni->allocated_size +=3D vol->cluster_size; a->data.non_resident.allocated_size =3D - cpu_to_sle64(mftbmp_ni->allocated_size); + cpu_to_le64(mftbmp_ni->allocated_size); write_unlock_irqrestore(&mftbmp_ni->size_lock, flags); /* Ensure the changes make it to disk. */ - flush_dcache_mft_record_page(ctx->ntfs_ino); mark_mft_record_dirty(ctx->ntfs_ino); ntfs_attr_put_search_ctx(ctx); unmap_mft_record(mft_ni); up_write(&mftbmp_ni->runlist.lock); ntfs_debug("Done."); return 0; + restore_undo_alloc: ntfs_attr_reinit_search_ctx(ctx); if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name, mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx)) { - ntfs_error(vol->sb, "Failed to find last attribute extent of " - "mft bitmap attribute.%s", es); + ntfs_error(vol->sb, + "Failed to find last attribute extent of mft bitmap attribute.%s", es); write_lock_irqsave(&mftbmp_ni->size_lock, flags); mftbmp_ni->allocated_size +=3D vol->cluster_size; write_unlock_irqrestore(&mftbmp_ni->size_lock, flags); @@ -1510,7 +1369,7 @@ static int ntfs_mft_bitmap_extend_allocation_nolock(n= tfs_volume *vol) return ret; } a =3D ctx->attr; - a->data.non_resident.highest_vcn =3D cpu_to_sle64(rl[1].vcn - 2); + a->data.non_resident.highest_vcn =3D cpu_to_le64(rl[1].vcn - 2); undo_alloc: if (status.added_cluster) { /* Truncate the last run in the runlist by one cluster. */ @@ -1521,31 +1380,33 @@ static int ntfs_mft_bitmap_extend_allocation_nolock= (ntfs_volume *vol) /* Remove the last run from the runlist. */ rl->lcn =3D rl[1].lcn; rl->length =3D 0; + mftbmp_ni->runlist.count--; } /* Deallocate the cluster. */ down_write(&vol->lcnbmp_lock); if (ntfs_bitmap_clear_bit(vol->lcnbmp_ino, lcn)) { ntfs_error(vol->sb, "Failed to free allocated cluster.%s", es); NVolSetErrors(vol); - } + } else + ntfs_inc_free_clusters(vol, 1); up_write(&vol->lcnbmp_lock); if (status.mp_rebuilt) { - if (ntfs_mapping_pairs_build(vol, (u8*)a + le16_to_cpu( + if (ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu( a->data.non_resident.mapping_pairs_offset), old_alen - le16_to_cpu( a->data.non_resident.mapping_pairs_offset), - rl2, ll, -1, NULL)) { - ntfs_error(vol->sb, "Failed to restore mapping pairs " - "array.%s", es); + rl2, ll, -1, NULL, NULL, NULL)) { + ntfs_error(vol->sb, "Failed to restore mapping pairs array.%s", es); NVolSetErrors(vol); } if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) { - ntfs_error(vol->sb, "Failed to restore attribute " - "record.%s", es); + ntfs_error(vol->sb, "Failed to restore attribute record.%s", es); NVolSetErrors(vol); } - flush_dcache_mft_record_page(ctx->ntfs_ino); mark_mft_record_dirty(ctx->ntfs_ino); + } else if (status.mp_extended && ntfs_attr_update_mapping_pairs(mftbmp_ni= , 0)) { + ntfs_error(vol->sb, "Failed to restore mapping pairs.%s", es); + NVolSetErrors(vol); } if (ctx) ntfs_attr_put_search_ctx(ctx); @@ -1555,7 +1416,7 @@ static int ntfs_mft_bitmap_extend_allocation_nolock(n= tfs_volume *vol) return ret; } =20 -/** +/* * ntfs_mft_bitmap_extend_initialized_nolock - extend mftbmp initialized d= ata * @vol: volume on which to extend the mft bitmap attribute * @@ -1569,15 +1430,15 @@ static int ntfs_mft_bitmap_extend_allocation_nolock= (ntfs_volume *vol) * * Locking: Caller must hold vol->mftbmp_lock for writing. */ -static int ntfs_mft_bitmap_extend_initialized_nolock(ntfs_volume *vol) +static int ntfs_mft_bitmap_extend_initialized_nolock(struct ntfs_volume *v= ol) { s64 old_data_size, old_initialized_size; unsigned long flags; struct inode *mftbmp_vi; - ntfs_inode *mft_ni, *mftbmp_ni; - ntfs_attr_search_ctx *ctx; - MFT_RECORD *mrec; - ATTR_RECORD *a; + struct ntfs_inode *mft_ni, *mftbmp_ni; + struct ntfs_attr_search_ctx *ctx; + struct mft_record *mrec; + struct attr_record *a; int ret; =20 ntfs_debug("Extending mft bitmap initiailized (and data) size."); @@ -1599,8 +1460,8 @@ static int ntfs_mft_bitmap_extend_initialized_nolock(= ntfs_volume *vol) ret =3D ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name, mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx); if (unlikely(ret)) { - ntfs_error(vol->sb, "Failed to find first attribute extent of " - "mft bitmap attribute."); + ntfs_error(vol->sb, + "Failed to find first attribute extent of mft bitmap attribute."); if (ret =3D=3D -ENOENT) ret =3D -EIO; goto put_err_out; @@ -1616,23 +1477,22 @@ static int ntfs_mft_bitmap_extend_initialized_noloc= k(ntfs_volume *vol) */ mftbmp_ni->initialized_size +=3D 8; a->data.non_resident.initialized_size =3D - cpu_to_sle64(mftbmp_ni->initialized_size); + cpu_to_le64(mftbmp_ni->initialized_size); if (mftbmp_ni->initialized_size > old_data_size) { i_size_write(mftbmp_vi, mftbmp_ni->initialized_size); a->data.non_resident.data_size =3D - cpu_to_sle64(mftbmp_ni->initialized_size); + cpu_to_le64(mftbmp_ni->initialized_size); } write_unlock_irqrestore(&mftbmp_ni->size_lock, flags); /* Ensure the changes make it to disk. */ - flush_dcache_mft_record_page(ctx->ntfs_ino); mark_mft_record_dirty(ctx->ntfs_ino); ntfs_attr_put_search_ctx(ctx); unmap_mft_record(mft_ni); /* Initialize the mft bitmap attribute value with zeroes. */ ret =3D ntfs_attr_set(mftbmp_ni, old_initialized_size, 8, 0); if (likely(!ret)) { - ntfs_debug("Done. (Wrote eight initialized bytes to mft " - "bitmap."); + ntfs_debug("Done. (Wrote eight initialized bytes to mft bitmap."); + ntfs_inc_free_mft_records(vol, 8 * 8); return 0; } ntfs_error(vol->sb, "Failed to write to mft bitmap."); @@ -1651,8 +1511,8 @@ static int ntfs_mft_bitmap_extend_initialized_nolock(= ntfs_volume *vol) } if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name, mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx)) { - ntfs_error(vol->sb, "Failed to find first attribute extent of " - "mft bitmap attribute.%s", es); + ntfs_error(vol->sb, + "Failed to find first attribute extent of mft bitmap attribute.%s", es); NVolSetErrors(vol); put_err_out: ntfs_attr_put_search_ctx(ctx); @@ -1664,30 +1524,27 @@ static int ntfs_mft_bitmap_extend_initialized_noloc= k(ntfs_volume *vol) write_lock_irqsave(&mftbmp_ni->size_lock, flags); mftbmp_ni->initialized_size =3D old_initialized_size; a->data.non_resident.initialized_size =3D - cpu_to_sle64(old_initialized_size); + cpu_to_le64(old_initialized_size); if (i_size_read(mftbmp_vi) !=3D old_data_size) { i_size_write(mftbmp_vi, old_data_size); - a->data.non_resident.data_size =3D cpu_to_sle64(old_data_size); + a->data.non_resident.data_size =3D cpu_to_le64(old_data_size); } write_unlock_irqrestore(&mftbmp_ni->size_lock, flags); - flush_dcache_mft_record_page(ctx->ntfs_ino); mark_mft_record_dirty(ctx->ntfs_ino); ntfs_attr_put_search_ctx(ctx); unmap_mft_record(mft_ni); #ifdef DEBUG read_lock_irqsave(&mftbmp_ni->size_lock, flags); - ntfs_debug("Restored status of mftbmp: allocated_size 0x%llx, " - "data_size 0x%llx, initialized_size 0x%llx.", - (long long)mftbmp_ni->allocated_size, - (long long)i_size_read(mftbmp_vi), - (long long)mftbmp_ni->initialized_size); + ntfs_debug("Restored status of mftbmp: allocated_size 0x%llx, data_size 0= x%llx, initialized_size 0x%llx.", + mftbmp_ni->allocated_size, i_size_read(mftbmp_vi), + mftbmp_ni->initialized_size); read_unlock_irqrestore(&mftbmp_ni->size_lock, flags); #endif /* DEBUG */ err_out: return ret; } =20 -/** +/* * ntfs_mft_data_extend_allocation_nolock - extend mft data attribute * @vol: volume on which to extend the mft data attribute * @@ -1706,20 +1563,21 @@ static int ntfs_mft_bitmap_extend_initialized_noloc= k(ntfs_volume *vol) * - This function calls functions which take vol->lcnbmp_lock for * writing and release it before returning. */ -static int ntfs_mft_data_extend_allocation_nolock(ntfs_volume *vol) +static int ntfs_mft_data_extend_allocation_nolock(struct ntfs_volume *vol) { - LCN lcn; - VCN old_last_vcn; + s64 lcn; + s64 old_last_vcn; s64 min_nr, nr, ll; unsigned long flags; - ntfs_inode *mft_ni; - runlist_element *rl, *rl2; - ntfs_attr_search_ctx *ctx =3D NULL; - MFT_RECORD *mrec; - ATTR_RECORD *a =3D NULL; + struct ntfs_inode *mft_ni; + struct runlist_element *rl, *rl2; + struct ntfs_attr_search_ctx *ctx =3D NULL; + struct mft_record *mrec; + struct attr_record *a =3D NULL; int ret, mp_size; u32 old_alen =3D 0; - bool mp_rebuilt =3D false; + bool mp_rebuilt =3D false, mp_extended =3D false; + size_t new_rl_count; =20 ntfs_debug("Extending mft data allocation."); mft_ni =3D NTFS_I(vol->mft_ino); @@ -1733,11 +1591,11 @@ static int ntfs_mft_data_extend_allocation_nolock(n= tfs_volume *vol) ll =3D mft_ni->allocated_size; read_unlock_irqrestore(&mft_ni->size_lock, flags); rl =3D ntfs_attr_find_vcn_nolock(mft_ni, - (ll - 1) >> vol->cluster_size_bits, NULL); + NTFS_B_TO_CLU(vol, ll - 1), NULL); if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) { up_write(&mft_ni->runlist.lock); - ntfs_error(vol->sb, "Failed to determine last allocated " - "cluster of mft data attribute."); + ntfs_error(vol->sb, + "Failed to determine last allocated cluster of mft data attribute."); if (!IS_ERR(rl)) ret =3D -EIO; else @@ -1745,9 +1603,9 @@ static int ntfs_mft_data_extend_allocation_nolock(ntf= s_volume *vol) return ret; } lcn =3D rl->lcn + rl->length; - ntfs_debug("Last lcn of mft data attribute is 0x%llx.", (long long)lcn); + ntfs_debug("Last lcn of mft data attribute is 0x%llx.", lcn); /* Minimum allocation is one mft record worth of clusters. */ - min_nr =3D vol->mft_record_size >> vol->cluster_size_bits; + min_nr =3D NTFS_B_TO_CLU(vol, vol->mft_record_size); if (!min_nr) min_nr =3D 1; /* Want to allocate 16 mft records worth of clusters. */ @@ -1758,14 +1616,13 @@ static int ntfs_mft_data_extend_allocation_nolock(n= tfs_volume *vol) read_lock_irqsave(&mft_ni->size_lock, flags); ll =3D mft_ni->allocated_size; read_unlock_irqrestore(&mft_ni->size_lock, flags); - if (unlikely((ll + (nr << vol->cluster_size_bits)) >> + if (unlikely((ll + NTFS_CLU_TO_B(vol, nr)) >> vol->mft_record_size_bits >=3D (1ll << 32))) { nr =3D min_nr; - if (unlikely((ll + (nr << vol->cluster_size_bits)) >> + if (unlikely((ll + NTFS_CLU_TO_B(vol, nr)) >> vol->mft_record_size_bits >=3D (1ll << 32))) { - ntfs_warning(vol->sb, "Cannot allocate mft record " - "because the maximum number of inodes " - "(2^32) has already been reached."); + ntfs_warning(vol->sb, + "Cannot allocate mft record because the maximum number of inodes (2^32= ) has already been reached."); up_write(&mft_ni->runlist.lock); return -ENOSPC; } @@ -1773,16 +1630,24 @@ static int ntfs_mft_data_extend_allocation_nolock(n= tfs_volume *vol) ntfs_debug("Trying mft data allocation with %s cluster count %lli.", nr > min_nr ? "default" : "minimal", (long long)nr); old_last_vcn =3D rl[1].vcn; + /* + * We can release the mft_ni runlist lock, Because this function is + * the only one that expends $MFT data attribute and is called with + * mft_ni->mrec_lock. + * This is required for the lock order, vol->lcnbmp_lock =3D> + * mft_ni->runlist.lock. + */ + up_write(&mft_ni->runlist.lock); + do { rl2 =3D ntfs_cluster_alloc(vol, old_last_vcn, nr, lcn, MFT_ZONE, - true); + true, false, false); if (!IS_ERR(rl2)) break; if (PTR_ERR(rl2) !=3D -ENOSPC || nr =3D=3D min_nr) { - ntfs_error(vol->sb, "Failed to allocate the minimal " - "number of clusters (%lli) for the " - "mft data attribute.", (long long)nr); - up_write(&mft_ni->runlist.lock); + ntfs_error(vol->sb, + "Failed to allocate the minimal number of clusters (%lli) for the mft = data attribute.", + nr); return PTR_ERR(rl2); } /* @@ -1791,32 +1656,36 @@ static int ntfs_mft_data_extend_allocation_nolock(n= tfs_volume *vol) * before failing. */ nr =3D min_nr; - ntfs_debug("Retrying mft data allocation with minimal cluster " - "count %lli.", (long long)nr); + ntfs_debug("Retrying mft data allocation with minimal cluster count %lli= .", nr); } while (1); - rl =3D ntfs_runlists_merge(mft_ni->runlist.rl, rl2); + + down_write(&mft_ni->runlist.lock); + rl =3D ntfs_runlists_merge(&mft_ni->runlist, rl2, 0, &new_rl_count); if (IS_ERR(rl)) { up_write(&mft_ni->runlist.lock); - ntfs_error(vol->sb, "Failed to merge runlists for mft data " - "attribute."); + ntfs_error(vol->sb, "Failed to merge runlists for mft data attribute."); if (ntfs_cluster_free_from_rl(vol, rl2)) { - ntfs_error(vol->sb, "Failed to deallocate clusters " - "from the mft data attribute.%s", es); + ntfs_error(vol->sb, + "Failed to deallocate clusters from the mft data attribute.%s", es); NVolSetErrors(vol); } - ntfs_free(rl2); + kvfree(rl2); return PTR_ERR(rl); } mft_ni->runlist.rl =3D rl; + mft_ni->runlist.count =3D new_rl_count; ntfs_debug("Allocated %lli clusters.", (long long)nr); /* Find the last run in the new runlist. */ for (; rl[1].length; rl++) ; + up_write(&mft_ni->runlist.lock); + /* Update the attribute record as well. */ mrec =3D map_mft_record(mft_ni); if (IS_ERR(mrec)) { ntfs_error(vol->sb, "Failed to map mft record."); ret =3D PTR_ERR(mrec); + down_write(&mft_ni->runlist.lock); goto undo_alloc; } ctx =3D ntfs_attr_get_search_ctx(mft_ni, mrec); @@ -1828,72 +1697,60 @@ static int ntfs_mft_data_extend_allocation_nolock(n= tfs_volume *vol) ret =3D ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx); if (unlikely(ret)) { - ntfs_error(vol->sb, "Failed to find last attribute extent of " - "mft data attribute."); + ntfs_error(vol->sb, "Failed to find last attribute extent of mft data at= tribute."); if (ret =3D=3D -ENOENT) ret =3D -EIO; goto undo_alloc; } a =3D ctx->attr; - ll =3D sle64_to_cpu(a->data.non_resident.lowest_vcn); + ll =3D le64_to_cpu(a->data.non_resident.lowest_vcn); + + down_write(&mft_ni->runlist.lock); /* Search back for the previous last allocated cluster of mft bitmap. */ for (rl2 =3D rl; rl2 > mft_ni->runlist.rl; rl2--) { if (ll >=3D rl2->vcn) break; } - BUG_ON(ll < rl2->vcn); - BUG_ON(ll >=3D rl2->vcn + rl2->length); + WARN_ON(ll < rl2->vcn); + WARN_ON(ll >=3D rl2->vcn + rl2->length); /* Get the size for the new mapping pairs array for this extent. */ - mp_size =3D ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1); + mp_size =3D ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1, -1); if (unlikely(mp_size <=3D 0)) { - ntfs_error(vol->sb, "Get size for mapping pairs failed for " - "mft data attribute extent."); + ntfs_error(vol->sb, + "Get size for mapping pairs failed for mft data attribute extent."); ret =3D mp_size; if (!ret) ret =3D -EIO; + up_write(&mft_ni->runlist.lock); goto undo_alloc; } + up_write(&mft_ni->runlist.lock); + /* Expand the attribute record if necessary. */ old_alen =3D le32_to_cpu(a->length); ret =3D ntfs_attr_record_resize(ctx->mrec, a, mp_size + le16_to_cpu(a->data.non_resident.mapping_pairs_offset)); if (unlikely(ret)) { - if (ret !=3D -ENOSPC) { - ntfs_error(vol->sb, "Failed to resize attribute " - "record for mft data attribute."); - goto undo_alloc; - } - // TODO: Deal with this by moving this extent to a new mft - // record or by starting a new extent in a new mft record or by - // moving other attributes out of this mft record. - // Note: Use the special reserved mft records and ensure that - // this extent is not required to find the mft record in - // question. If no free special records left we would need to - // move an existing record away, insert ours in its place, and - // then place the moved record into the newly allocated space - // and we would then need to update all references to this mft - // record appropriately. This is rather complicated... - ntfs_error(vol->sb, "Not enough space in this mft record to " - "accommodate extended mft data attribute " - "extent. Cannot handle this yet."); - ret =3D -EOPNOTSUPP; + ret =3D ntfs_mft_attr_extend(mft_ni); + if (!ret) + goto extended_ok; + mp_extended =3D true; goto undo_alloc; } mp_rebuilt =3D true; /* Generate the mapping pairs array directly into the attr record. */ - ret =3D ntfs_mapping_pairs_build(vol, (u8*)a + + ret =3D ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu(a->data.non_resident.mapping_pairs_offset), - mp_size, rl2, ll, -1, NULL); + mp_size, rl2, ll, -1, NULL, NULL, NULL); if (unlikely(ret)) { - ntfs_error(vol->sb, "Failed to build mapping pairs array of " - "mft data attribute."); + ntfs_error(vol->sb, "Failed to build mapping pairs array of mft data att= ribute."); goto undo_alloc; } /* Update the highest_vcn. */ - a->data.non_resident.highest_vcn =3D cpu_to_sle64(rl[1].vcn - 1); + a->data.non_resident.highest_vcn =3D cpu_to_le64(rl[1].vcn - 1); /* * We now have extended the mft data allocated_size by nr clusters. - * Reflect this in the ntfs_inode structure and the attribute record. + * Reflect this in the struct ntfs_inode structure and the attribute reco= rd. * @rl is the last (non-terminator) runlist element of mft data * attribute. */ @@ -1902,40 +1759,39 @@ static int ntfs_mft_data_extend_allocation_nolock(n= tfs_volume *vol) * We are not in the first attribute extent, switch to it, but * first ensure the changes will make it to disk later. */ - flush_dcache_mft_record_page(ctx->ntfs_ino); mark_mft_record_dirty(ctx->ntfs_ino); ntfs_attr_reinit_search_ctx(ctx); ret =3D ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx); if (unlikely(ret)) { - ntfs_error(vol->sb, "Failed to find first attribute " - "extent of mft data attribute."); + ntfs_error(vol->sb, + "Failed to find first attribute extent of mft data attribute."); goto restore_undo_alloc; } a =3D ctx->attr; } + +extended_ok: write_lock_irqsave(&mft_ni->size_lock, flags); - mft_ni->allocated_size +=3D nr << vol->cluster_size_bits; + mft_ni->allocated_size +=3D NTFS_CLU_TO_B(vol, nr); a->data.non_resident.allocated_size =3D - cpu_to_sle64(mft_ni->allocated_size); + cpu_to_le64(mft_ni->allocated_size); write_unlock_irqrestore(&mft_ni->size_lock, flags); /* Ensure the changes make it to disk. */ - flush_dcache_mft_record_page(ctx->ntfs_ino); mark_mft_record_dirty(ctx->ntfs_ino); ntfs_attr_put_search_ctx(ctx); unmap_mft_record(mft_ni); - up_write(&mft_ni->runlist.lock); ntfs_debug("Done."); return 0; restore_undo_alloc: ntfs_attr_reinit_search_ctx(ctx); if (ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx)) { - ntfs_error(vol->sb, "Failed to find last attribute extent of " - "mft data attribute.%s", es); + ntfs_error(vol->sb, + "Failed to find last attribute extent of mft data attribute.%s", es); write_lock_irqsave(&mft_ni->size_lock, flags); - mft_ni->allocated_size +=3D nr << vol->cluster_size_bits; + mft_ni->allocated_size +=3D NTFS_CLU_TO_B(vol, nr); write_unlock_irqrestore(&mft_ni->size_lock, flags); ntfs_attr_put_search_ctx(ctx); unmap_mft_record(mft_ni); @@ -1948,17 +1804,20 @@ static int ntfs_mft_data_extend_allocation_nolock(n= tfs_volume *vol) return ret; } ctx->attr->data.non_resident.highest_vcn =3D - cpu_to_sle64(old_last_vcn - 1); + cpu_to_le64(old_last_vcn - 1); undo_alloc: if (ntfs_cluster_free(mft_ni, old_last_vcn, -1, ctx) < 0) { - ntfs_error(vol->sb, "Failed to free clusters from mft data " - "attribute.%s", es); + ntfs_error(vol->sb, "Failed to free clusters from mft data attribute.%s"= , es); NVolSetErrors(vol); } =20 if (ntfs_rl_truncate_nolock(vol, &mft_ni->runlist, old_last_vcn)) { - ntfs_error(vol->sb, "Failed to truncate mft data attribute " - "runlist.%s", es); + ntfs_error(vol->sb, "Failed to truncate mft data attribute runlist.%s", = es); + NVolSetErrors(vol); + } + if (mp_extended && ntfs_attr_update_mapping_pairs(mft_ni, 0)) { + ntfs_error(vol->sb, "Failed to restore mapping pairs.%s", + es); NVolSetErrors(vol); } if (ctx) { @@ -1968,32 +1827,27 @@ static int ntfs_mft_data_extend_allocation_nolock(n= tfs_volume *vol) a->data.non_resident.mapping_pairs_offset), old_alen - le16_to_cpu( a->data.non_resident.mapping_pairs_offset), - rl2, ll, -1, NULL)) { - ntfs_error(vol->sb, "Failed to restore mapping pairs " - "array.%s", es); + rl2, ll, -1, NULL, NULL, NULL)) { + ntfs_error(vol->sb, "Failed to restore mapping pairs array.%s", es); NVolSetErrors(vol); } if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) { - ntfs_error(vol->sb, "Failed to restore attribute " - "record.%s", es); + ntfs_error(vol->sb, "Failed to restore attribute record.%s", es); NVolSetErrors(vol); } - flush_dcache_mft_record_page(ctx->ntfs_ino); mark_mft_record_dirty(ctx->ntfs_ino); } else if (IS_ERR(ctx->mrec)) { - ntfs_error(vol->sb, "Failed to restore attribute search " - "context.%s", es); + ntfs_error(vol->sb, "Failed to restore attribute search context.%s", es= ); NVolSetErrors(vol); } ntfs_attr_put_search_ctx(ctx); } if (!IS_ERR(mrec)) unmap_mft_record(mft_ni); - up_write(&mft_ni->runlist.lock); return ret; } =20 -/** +/* * ntfs_mft_record_layout - layout an mft record into a memory buffer * @vol: volume to which the mft record will belong * @mft_no: mft reference specifying the mft record number @@ -2006,24 +1860,24 @@ static int ntfs_mft_data_extend_allocation_nolock(n= tfs_volume *vol) * * Return 0 on success and -errno on error. */ -static int ntfs_mft_record_layout(const ntfs_volume *vol, const s64 mft_no, - MFT_RECORD *m) +static int ntfs_mft_record_layout(const struct ntfs_volume *vol, const s64= mft_no, + struct mft_record *m) { - ATTR_RECORD *a; + struct attr_record *a; =20 ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no); if (mft_no >=3D (1ll << 32)) { - ntfs_error(vol->sb, "Mft record number 0x%llx exceeds " - "maximum of 2^32.", (long long)mft_no); + ntfs_error(vol->sb, "Mft record number 0x%llx exceeds maximum of 2^32.", + (long long)mft_no); return -ERANGE; } /* Start by clearing the whole mft record to gives us a clean slate. */ memset(m, 0, vol->mft_record_size); /* Aligned to 2-byte boundary. */ if (vol->major_ver < 3 || (vol->major_ver =3D=3D 3 && !vol->minor_ver)) - m->usa_ofs =3D cpu_to_le16((sizeof(MFT_RECORD_OLD) + 1) & ~1); + m->usa_ofs =3D cpu_to_le16((sizeof(struct mft_record_old) + 1) & ~1); else { - m->usa_ofs =3D cpu_to_le16((sizeof(MFT_RECORD) + 1) & ~1); + m->usa_ofs =3D cpu_to_le16((sizeof(struct mft_record) + 1) & ~1); /* * Set the NTFS 3.1+ specific fields while we know that the * volume version is 3.1+. @@ -2037,16 +1891,11 @@ static int ntfs_mft_record_layout(const ntfs_volume= *vol, const s64 mft_no, NTFS_BLOCK_SIZE + 1); else { m->usa_count =3D cpu_to_le16(1); - ntfs_warning(vol->sb, "Sector size is bigger than mft record " - "size. Setting usa_count to 1. If chkdsk " - "reports this as corruption, please email " - "linux-ntfs-dev@lists.sourceforge.net stating " - "that you saw this message and that the " - "modified filesystem created was corrupt. " - "Thank you."); + ntfs_warning(vol->sb, + "Sector size is bigger than mft record size. Setting usa_count to 1. = If chkdsk reports this as corruption"); } /* Set the update sequence number to 1. */ - *(le16*)((u8*)m + le16_to_cpu(m->usa_ofs)) =3D cpu_to_le16(1); + *(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs)) =3D cpu_to_le16(1); m->lsn =3D 0; m->sequence_number =3D cpu_to_le16(1); m->link_count =3D 0; @@ -2067,14 +1916,14 @@ static int ntfs_mft_record_layout(const ntfs_volume= *vol, const s64 mft_no, m->base_mft_record =3D 0; m->next_attr_instance =3D 0; /* Add the termination attribute. */ - a =3D (ATTR_RECORD*)((u8*)m + le16_to_cpu(m->attrs_offset)); + a =3D (struct attr_record *)((u8 *)m + le16_to_cpu(m->attrs_offset)); a->type =3D AT_END; a->length =3D 0; ntfs_debug("Done."); return 0; } =20 -/** +/* * ntfs_mft_record_format - format an mft record on an ntfs volume * @vol: volume on which to format the mft record * @mft_no: mft record number to format @@ -2085,12 +1934,12 @@ static int ntfs_mft_record_layout(const ntfs_volume= *vol, const s64 mft_no, * * Return 0 on success and -errno on error. */ -static int ntfs_mft_record_format(const ntfs_volume *vol, const s64 mft_no) +static int ntfs_mft_record_format(const struct ntfs_volume *vol, const s64= mft_no) { loff_t i_size; struct inode *mft_vi =3D vol->mft_ino; - struct page *page; - MFT_RECORD *m; + struct folio *folio; + struct mft_record *m; pgoff_t index, end_index; unsigned int ofs; int err; @@ -2100,59 +1949,62 @@ static int ntfs_mft_record_format(const ntfs_volume= *vol, const s64 mft_no) * The index into the page cache and the offset within the page cache * page of the wanted mft record. */ - index =3D mft_no << vol->mft_record_size_bits >> PAGE_SHIFT; - ofs =3D (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK; + index =3D NTFS_MFT_NR_TO_PIDX(vol, mft_no); + ofs =3D NTFS_MFT_NR_TO_POFS(vol, mft_no); /* The maximum valid index into the page cache for $MFT's data. */ i_size =3D i_size_read(mft_vi); end_index =3D i_size >> PAGE_SHIFT; if (unlikely(index >=3D end_index)) { - if (unlikely(index > end_index || ofs + vol->mft_record_size >=3D - (i_size & ~PAGE_MASK))) { - ntfs_error(vol->sb, "Tried to format non-existing mft " - "record 0x%llx.", (long long)mft_no); + if (unlikely(index > end_index || + ofs + vol->mft_record_size > (i_size & ~PAGE_MASK))) { + ntfs_error(vol->sb, "Tried to format non-existing mft record 0x%llx.", + (long long)mft_no); return -ENOENT; } } - /* Read, map, and pin the page containing the mft record. */ - page =3D ntfs_map_page(mft_vi->i_mapping, index); - if (IS_ERR(page)) { - ntfs_error(vol->sb, "Failed to map page containing mft record " - "to format 0x%llx.", (long long)mft_no); - return PTR_ERR(page); - } - lock_page(page); - BUG_ON(!PageUptodate(page)); - ClearPageUptodate(page); - m =3D (MFT_RECORD*)((u8*)page_address(page) + ofs); + + /* Read, map, and pin the folio containing the mft record. */ + folio =3D read_mapping_folio(mft_vi->i_mapping, index, NULL); + if (IS_ERR(folio)) { + ntfs_error(vol->sb, "Failed to map page containing mft record to format = 0x%llx.", + (long long)mft_no); + return PTR_ERR(folio); + } + folio_lock(folio); + folio_clear_uptodate(folio); + m =3D (struct mft_record *)((u8 *)kmap_local_folio(folio, 0) + ofs); err =3D ntfs_mft_record_layout(vol, mft_no, m); if (unlikely(err)) { ntfs_error(vol->sb, "Failed to layout mft record 0x%llx.", (long long)mft_no); - SetPageUptodate(page); - unlock_page(page); - ntfs_unmap_page(page); + folio_mark_uptodate(folio); + folio_unlock(folio); + kunmap_local(m); + folio_put(folio); return err; } - flush_dcache_page(page); - SetPageUptodate(page); - unlock_page(page); + pre_write_mst_fixup((struct ntfs_record *)m, vol->mft_record_size); + folio_mark_uptodate(folio); /* * Make sure the mft record is written out to disk. We could use * ilookup5() to check if an inode is in icache and so on but this is * unnecessary as ntfs_writepage() will write the dirty record anyway. */ - mark_ntfs_record_dirty(page, ofs); - ntfs_unmap_page(page); + ntfs_mft_mark_dirty(folio); + folio_unlock(folio); + kunmap_local(m); + folio_put(folio); ntfs_debug("Done."); return 0; } =20 -/** +/* * ntfs_mft_record_alloc - allocate an mft record on an ntfs volume * @vol: [IN] volume on which to allocate the mft record * @mode: [IN] mode if want a file or directory, i.e. base inode or 0 + * @ni: [OUT] on success, set to the allocated ntfs inode * @base_ni: [IN] open base inode if allocating an extent mft record or N= ULL - * @mrec: [OUT] on successful return this is the mapped mft record + * @ni_mrec: [OUT] on successful return this is the mapped mft record * * Allocate an mft record in $MFT/$DATA of an open ntfs volume @vol. * @@ -2180,8 +2032,8 @@ static int ntfs_mft_record_format(const ntfs_volume *= vol, const s64 mft_no) * optimize this we start scanning at the place specified by @base_ni or if * @base_ni is NULL we start where we last stopped and we perform wrap aro= und * when we reach the end. Note, we do not try to allocate mft records bel= ow - * number 24 because numbers 0 to 15 are the defined system files anyway a= nd 16 - * to 24 are special in that they are used for storing extension mft recor= ds + * number 64 because numbers 0 to 15 are the defined system files anyway a= nd 16 + * to 64 are special in that they are used for storing extension mft recor= ds * for the $DATA attribute of $MFT. This is required to avoid the possibi= lity * of creating a runlist with a circular dependency which once written to = disk * can never be read in again. Windows will only use records 16 to 24 for @@ -2191,7 +2043,7 @@ static int ntfs_mft_record_format(const ntfs_volume *= vol, const s64 mft_no) * doing this at some later time, it does not matter much for now. * * When scanning the mft bitmap, we only search up to the last allocated m= ft - * record. If there are no free records left in the range 24 to number of + * record. If there are no free records left in the range 64 to number of * allocated mft records, then we extend the $MFT/$DATA attribute in order= to * create free mft records. We extend the allocated size of $MFT/$DATA by= 16 * records at a time or one cluster, if cluster size is above 16kiB. If t= here @@ -2200,24 +2052,24 @@ static int ntfs_mft_record_format(const ntfs_volume= *vol, const s64 mft_no) * * No matter how many mft records we allocate, we initialize only the first * allocated mft record, incrementing mft data size and initialized size - * accordingly, open an ntfs_inode for it and return it to the caller, unl= ess - * there are less than 24 mft records, in which case we allocate and initi= alize - * mft records until we reach record 24 which we consider as the first fre= e mft + * accordingly, open an struct ntfs_inode for it and return it to the call= er, unless + * there are less than 64 mft records, in which case we allocate and initi= alize + * mft records until we reach record 64 which we consider as the first fre= e mft * record for use by normal files. * * If during any stage we overflow the initialized data in the mft bitmap,= we * extend the initialized size (and data size) by 8 bytes, allocating anot= her * cluster if required. The bitmap data size has to be at least equal to = the * number of mft records in the mft, but it can be bigger, in which case t= he - * superflous bits are padded with zeroes. + * superfluous bits are padded with zeroes. * * Thus, when we return successfully (IS_ERR() is false), we will have: * - initialized / extended the mft bitmap if necessary, * - initialized / extended the mft data if necessary, * - set the bit corresponding to the mft record being allocated in the * mft bitmap, - * - opened an ntfs_inode for the allocated mft record, and we will have - * - returned the ntfs_inode as well as the allocated mapped, pinned, and + * - opened an struct ntfs_inode for the allocated mft record, and we will= have + * - returned the struct ntfs_inode as well as the allocated mapped, pinne= d, and * locked mft record. * * On error, the volume will be left in a consistent state and no record w= ill @@ -2237,42 +2089,46 @@ static int ntfs_mft_record_format(const ntfs_volume= *vol, const s64 mft_no) * easier because otherwise there might be circular invocations of functio= ns * when reading the bitmap. */ -ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol, const int mode, - ntfs_inode *base_ni, MFT_RECORD **mrec) +int ntfs_mft_record_alloc(struct ntfs_volume *vol, const int mode, + struct ntfs_inode **ni, struct ntfs_inode *base_ni, + struct mft_record **ni_mrec) { s64 ll, bit, old_data_initialized, old_data_size; unsigned long flags; - struct inode *vi; - struct page *page; - ntfs_inode *mft_ni, *mftbmp_ni, *ni; - ntfs_attr_search_ctx *ctx; - MFT_RECORD *m; - ATTR_RECORD *a; + struct folio *folio; + struct ntfs_inode *mft_ni, *mftbmp_ni; + struct ntfs_attr_search_ctx *ctx; + struct mft_record *m =3D NULL; + struct attr_record *a; pgoff_t index; unsigned int ofs; int err; - le16 seq_no, usn; + __le16 seq_no, usn; bool record_formatted =3D false; + unsigned int memalloc_flags; =20 - if (base_ni) { - ntfs_debug("Entering (allocating an extent mft record for " - "base mft record 0x%llx).", + if (base_ni && *ni) + return -EINVAL; + + /* @mode and @base_ni are mutually exclusive. */ + if (mode && base_ni) + return -EINVAL; + + if (base_ni) + ntfs_debug("Entering (allocating an extent mft record for base mft recor= d 0x%llx).", (long long)base_ni->mft_no); - /* @mode and @base_ni are mutually exclusive. */ - BUG_ON(mode); - } else + else ntfs_debug("Entering (allocating a base mft record)."); - if (mode) { - /* @mode and @base_ni are mutually exclusive. */ - BUG_ON(base_ni); - /* We only support creation of normal files and directories. */ - if (!S_ISREG(mode) && !S_ISDIR(mode)) - return ERR_PTR(-EOPNOTSUPP); - } - BUG_ON(!mrec); + + memalloc_flags =3D memalloc_nofs_save(); + mft_ni =3D NTFS_I(vol->mft_ino); + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) + mutex_lock(&mft_ni->mrec_lock); mftbmp_ni =3D NTFS_I(vol->mftbmp_ino); - down_write(&vol->mftbmp_lock); +search_free_rec: + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) + down_write(&vol->mftbmp_lock); bit =3D ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(vol, base_ni); if (bit >=3D 0) { ntfs_debug("Found and allocated free record (#1), bit 0x%llx.", @@ -2280,9 +2136,19 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol, = const int mode, goto have_alloc_rec; } if (bit !=3D -ENOSPC) { - up_write(&vol->mftbmp_lock); - return ERR_PTR(bit); + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) { + up_write(&vol->mftbmp_lock); + mutex_unlock(&mft_ni->mrec_lock); + } + memalloc_nofs_restore(memalloc_flags); + return bit; + } + + if (base_ni && base_ni->mft_no =3D=3D FILE_MFT) { + memalloc_nofs_restore(memalloc_flags); + return bit; } + /* * No free mft records left. If the mft bitmap already covers more * than the currently used mft records, the next records are all free, @@ -2297,10 +2163,11 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol,= const int mode, read_lock_irqsave(&mftbmp_ni->size_lock, flags); old_data_initialized =3D mftbmp_ni->initialized_size; read_unlock_irqrestore(&mftbmp_ni->size_lock, flags); - if (old_data_initialized << 3 > ll && old_data_initialized > 3) { + if (old_data_initialized << 3 > ll && + old_data_initialized > RESERVED_MFT_RECORDS / 8) { bit =3D ll; - if (bit < 24) - bit =3D 24; + if (bit < RESERVED_MFT_RECORDS) + bit =3D RESERVED_MFT_RECORDS; if (unlikely(bit >=3D (1ll << 32))) goto max_err_out; ntfs_debug("Found free record (#2), bit 0x%llx.", @@ -2317,28 +2184,28 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol,= const int mode, goto max_err_out; read_lock_irqsave(&mftbmp_ni->size_lock, flags); old_data_size =3D mftbmp_ni->allocated_size; - ntfs_debug("Status of mftbmp before extension: allocated_size 0x%llx, " - "data_size 0x%llx, initialized_size 0x%llx.", - (long long)old_data_size, - (long long)i_size_read(vol->mftbmp_ino), - (long long)old_data_initialized); + ntfs_debug("Status of mftbmp before extension: allocated_size 0x%llx, dat= a_size 0x%llx, initialized_size 0x%llx.", + old_data_size, i_size_read(vol->mftbmp_ino), + old_data_initialized); read_unlock_irqrestore(&mftbmp_ni->size_lock, flags); if (old_data_initialized + 8 > old_data_size) { /* Need to extend bitmap by one more cluster. */ ntfs_debug("mftbmp: initialized_size + 8 > allocated_size."); err =3D ntfs_mft_bitmap_extend_allocation_nolock(vol); + if (err =3D=3D -EAGAIN) + err =3D ntfs_mft_bitmap_extend_allocation_nolock(vol); + if (unlikely(err)) { - up_write(&vol->mftbmp_lock); + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) + up_write(&vol->mftbmp_lock); goto err_out; } #ifdef DEBUG read_lock_irqsave(&mftbmp_ni->size_lock, flags); - ntfs_debug("Status of mftbmp after allocation extension: " - "allocated_size 0x%llx, data_size 0x%llx, " - "initialized_size 0x%llx.", - (long long)mftbmp_ni->allocated_size, - (long long)i_size_read(vol->mftbmp_ino), - (long long)mftbmp_ni->initialized_size); + ntfs_debug("Status of mftbmp after allocation extension: allocated_size = 0x%llx, data_size 0x%llx, initialized_size 0x%llx.", + mftbmp_ni->allocated_size, + i_size_read(vol->mftbmp_ino), + mftbmp_ni->initialized_size); read_unlock_irqrestore(&mftbmp_ni->size_lock, flags); #endif /* DEBUG */ } @@ -2349,17 +2216,16 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol,= const int mode, */ err =3D ntfs_mft_bitmap_extend_initialized_nolock(vol); if (unlikely(err)) { - up_write(&vol->mftbmp_lock); + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) + up_write(&vol->mftbmp_lock); goto err_out; } #ifdef DEBUG read_lock_irqsave(&mftbmp_ni->size_lock, flags); - ntfs_debug("Status of mftbmp after initialized extension: " - "allocated_size 0x%llx, data_size 0x%llx, " - "initialized_size 0x%llx.", - (long long)mftbmp_ni->allocated_size, - (long long)i_size_read(vol->mftbmp_ino), - (long long)mftbmp_ni->initialized_size); + ntfs_debug("Status of mftbmp after initialized extension: allocated_size = 0x%llx, data_size 0x%llx, initialized_size 0x%llx.", + mftbmp_ni->allocated_size, + i_size_read(vol->mftbmp_ino), + mftbmp_ni->initialized_size); read_unlock_irqrestore(&mftbmp_ni->size_lock, flags); #endif /* DEBUG */ ntfs_debug("Found free record (#3), bit 0x%llx.", (long long)bit); @@ -2369,7 +2235,8 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol, c= onst int mode, err =3D ntfs_bitmap_set_bit(vol->mftbmp_ino, bit); if (unlikely(err)) { ntfs_error(vol->sb, "Failed to allocate bit in mft bitmap."); - up_write(&vol->mftbmp_lock); + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) + up_write(&vol->mftbmp_lock); goto err_out; } ntfs_debug("Set bit 0x%llx in mft bitmap.", (long long)bit); @@ -2397,34 +2264,35 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol,= const int mode, * actually traversed more than once when a freshly formatted volume is * first written to so it optimizes away nicely in the common case. */ - read_lock_irqsave(&mft_ni->size_lock, flags); - ntfs_debug("Status of mft data before extension: " - "allocated_size 0x%llx, data_size 0x%llx, " - "initialized_size 0x%llx.", - (long long)mft_ni->allocated_size, - (long long)i_size_read(vol->mft_ino), - (long long)mft_ni->initialized_size); - while (ll > mft_ni->allocated_size) { - read_unlock_irqrestore(&mft_ni->size_lock, flags); - err =3D ntfs_mft_data_extend_allocation_nolock(vol); - if (unlikely(err)) { - ntfs_error(vol->sb, "Failed to extend mft data " - "allocation."); - goto undo_mftbmp_alloc_nolock; - } + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) { read_lock_irqsave(&mft_ni->size_lock, flags); - ntfs_debug("Status of mft data after allocation extension: " - "allocated_size 0x%llx, data_size 0x%llx, " - "initialized_size 0x%llx.", - (long long)mft_ni->allocated_size, - (long long)i_size_read(vol->mft_ino), - (long long)mft_ni->initialized_size); + ntfs_debug("Status of mft data before extension: allocated_size 0x%llx, = data_size 0x%llx, initialized_size 0x%llx.", + mft_ni->allocated_size, i_size_read(vol->mft_ino), + mft_ni->initialized_size); + while (ll > mft_ni->allocated_size) { + read_unlock_irqrestore(&mft_ni->size_lock, flags); + err =3D ntfs_mft_data_extend_allocation_nolock(vol); + if (err =3D=3D -EAGAIN) + err =3D ntfs_mft_data_extend_allocation_nolock(vol); + + if (unlikely(err)) { + ntfs_error(vol->sb, "Failed to extend mft data allocation."); + goto undo_mftbmp_alloc_nolock; + } + read_lock_irqsave(&mft_ni->size_lock, flags); + ntfs_debug("Status of mft data after allocation extension: allocated_si= ze 0x%llx, data_size 0x%llx, initialized_size 0x%llx.", + mft_ni->allocated_size, i_size_read(vol->mft_ino), + mft_ni->initialized_size); + } + read_unlock_irqrestore(&mft_ni->size_lock, flags); + } else if (ll > mft_ni->allocated_size) { + err =3D -ENOSPC; + goto undo_mftbmp_alloc_nolock; } - read_unlock_irqrestore(&mft_ni->size_lock, flags); /* * Extend mft data initialized size (and data size of course) to reach * the allocated mft record, formatting the mft records allong the way. - * Note: We only modify the ntfs_inode structure as that is all that is + * Note: We only modify the struct ntfs_inode structure as that is all th= at is * needed by ntfs_mft_record_format(). We will update the attribute * record itself in one fell swoop later on. */ @@ -2433,7 +2301,7 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol, c= onst int mode, old_data_size =3D vol->mft_ino->i_size; while (ll > mft_ni->initialized_size) { s64 new_initialized_size, mft_no; - =09 + new_initialized_size =3D mft_ni->initialized_size + vol->mft_record_size; mft_no =3D mft_ni->initialized_size >> vol->mft_record_size_bits; @@ -2469,8 +2337,7 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol, c= onst int mode, err =3D ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx); if (unlikely(err)) { - ntfs_error(vol->sb, "Failed to find first attribute extent of " - "mft data attribute."); + ntfs_error(vol->sb, "Failed to find first attribute extent of mft data a= ttribute."); ntfs_attr_put_search_ctx(ctx); unmap_mft_record(mft_ni); goto undo_data_init; @@ -2478,24 +2345,20 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol,= const int mode, a =3D ctx->attr; read_lock_irqsave(&mft_ni->size_lock, flags); a->data.non_resident.initialized_size =3D - cpu_to_sle64(mft_ni->initialized_size); + cpu_to_le64(mft_ni->initialized_size); a->data.non_resident.data_size =3D - cpu_to_sle64(i_size_read(vol->mft_ino)); + cpu_to_le64(i_size_read(vol->mft_ino)); read_unlock_irqrestore(&mft_ni->size_lock, flags); /* Ensure the changes make it to disk. */ - flush_dcache_mft_record_page(ctx->ntfs_ino); mark_mft_record_dirty(ctx->ntfs_ino); ntfs_attr_put_search_ctx(ctx); unmap_mft_record(mft_ni); read_lock_irqsave(&mft_ni->size_lock, flags); - ntfs_debug("Status of mft data after mft record initialization: " - "allocated_size 0x%llx, data_size 0x%llx, " - "initialized_size 0x%llx.", - (long long)mft_ni->allocated_size, - (long long)i_size_read(vol->mft_ino), - (long long)mft_ni->initialized_size); - BUG_ON(i_size_read(vol->mft_ino) > mft_ni->allocated_size); - BUG_ON(mft_ni->initialized_size > i_size_read(vol->mft_ino)); + ntfs_debug("Status of mft data after mft record initialization: allocated= _size 0x%llx, data_size 0x%llx, initialized_size 0x%llx.", + mft_ni->allocated_size, i_size_read(vol->mft_ino), + mft_ni->initialized_size); + WARN_ON(i_size_read(vol->mft_ino) > mft_ni->allocated_size); + WARN_ON(mft_ni->initialized_size > i_size_read(vol->mft_ino)); read_unlock_irqrestore(&mft_ni->size_lock, flags); mft_rec_already_initialized: /* @@ -2507,41 +2370,39 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol,= const int mode, * that it is allocated in the mft bitmap means that no-one will try to * allocate it either. */ - up_write(&vol->mftbmp_lock); + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) + up_write(&vol->mftbmp_lock); /* * We now have allocated and initialized the mft record. Calculate the * index of and the offset within the page cache page the record is in. */ - index =3D bit << vol->mft_record_size_bits >> PAGE_SHIFT; - ofs =3D (bit << vol->mft_record_size_bits) & ~PAGE_MASK; - /* Read, map, and pin the page containing the mft record. */ - page =3D ntfs_map_page(vol->mft_ino->i_mapping, index); - if (IS_ERR(page)) { - ntfs_error(vol->sb, "Failed to map page containing allocated " - "mft record 0x%llx.", (long long)bit); - err =3D PTR_ERR(page); + index =3D NTFS_MFT_NR_TO_PIDX(vol, bit); + ofs =3D NTFS_MFT_NR_TO_POFS(vol, bit); + /* Read, map, and pin the folio containing the mft record. */ + folio =3D read_mapping_folio(vol->mft_ino->i_mapping, index, NULL); + if (IS_ERR(folio)) { + ntfs_error(vol->sb, "Failed to map page containing allocated mft record = 0x%llx.", + bit); + err =3D PTR_ERR(folio); goto undo_mftbmp_alloc; } - lock_page(page); - BUG_ON(!PageUptodate(page)); - ClearPageUptodate(page); - m =3D (MFT_RECORD*)((u8*)page_address(page) + ofs); + folio_lock(folio); + folio_clear_uptodate(folio); + m =3D (struct mft_record *)((u8 *)kmap_local_folio(folio, 0) + ofs); /* If we just formatted the mft record no need to do it again. */ if (!record_formatted) { /* Sanity check that the mft record is really not in use. */ if (ntfs_is_file_record(m->magic) && (m->flags & MFT_RECORD_IN_USE)) { - ntfs_error(vol->sb, "Mft record 0x%llx was marked " - "free in mft bitmap but is marked " - "used itself. Corrupt filesystem. " - "Unmount and run chkdsk.", - (long long)bit); - err =3D -EIO; - SetPageUptodate(page); - unlock_page(page); - ntfs_unmap_page(page); + ntfs_warning(vol->sb, + "Mft record 0x%llx was marked free in mft bitmap but is marked used it= self. Unmount and run chkdsk.", + bit); + folio_mark_uptodate(folio); + folio_unlock(folio); + kunmap_local(m); + folio_put(folio); NVolSetErrors(vol); - goto undo_mftbmp_alloc; + goto search_free_rec; } /* * We need to (re-)format the mft record, preserving the @@ -2551,29 +2412,30 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol,= const int mode, * wrong with the previous mft record. */ seq_no =3D m->sequence_number; - usn =3D *(le16*)((u8*)m + le16_to_cpu(m->usa_ofs)); + usn =3D *(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs)); err =3D ntfs_mft_record_layout(vol, bit, m); if (unlikely(err)) { - ntfs_error(vol->sb, "Failed to layout allocated mft " - "record 0x%llx.", (long long)bit); - SetPageUptodate(page); - unlock_page(page); - ntfs_unmap_page(page); + ntfs_error(vol->sb, "Failed to layout allocated mft record 0x%llx.", + bit); + folio_mark_uptodate(folio); + folio_unlock(folio); + kunmap_local(m); + folio_put(folio); goto undo_mftbmp_alloc; } if (seq_no) m->sequence_number =3D seq_no; if (usn && le16_to_cpu(usn) !=3D 0xffff) - *(le16*)((u8*)m + le16_to_cpu(m->usa_ofs)) =3D usn; + *(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs)) =3D usn; + pre_write_mst_fixup((struct ntfs_record *)m, vol->mft_record_size); } /* Set the mft record itself in use. */ m->flags |=3D MFT_RECORD_IN_USE; if (S_ISDIR(mode)) m->flags |=3D MFT_RECORD_IS_DIRECTORY; - flush_dcache_page(page); - SetPageUptodate(page); + folio_mark_uptodate(folio); if (base_ni) { - MFT_RECORD *m_tmp; + struct mft_record *m_tmp; =20 /* * Setup the base mft record in the extent mft record. This @@ -2587,22 +2449,24 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol,= const int mode, * attach it to the base inode @base_ni and map, pin, and lock * its, i.e. the allocated, mft record. */ - m_tmp =3D map_extent_mft_record(base_ni, bit, &ni); + m_tmp =3D map_extent_mft_record(base_ni, + MK_MREF(bit, le16_to_cpu(m->sequence_number)), + ni); if (IS_ERR(m_tmp)) { - ntfs_error(vol->sb, "Failed to map allocated extent " - "mft record 0x%llx.", (long long)bit); + ntfs_error(vol->sb, "Failed to map allocated extent mft record 0x%llx.", + bit); err =3D PTR_ERR(m_tmp); /* Set the mft record itself not in use. */ m->flags &=3D cpu_to_le16( ~le16_to_cpu(MFT_RECORD_IN_USE)); - flush_dcache_page(page); /* Make sure the mft record is written out to disk. */ - mark_ntfs_record_dirty(page, ofs); - unlock_page(page); - ntfs_unmap_page(page); + ntfs_mft_mark_dirty(folio); + folio_unlock(folio); + kunmap_local(m); + folio_put(folio); goto undo_mftbmp_alloc; } - BUG_ON(m !=3D m_tmp); + /* * Make sure the allocated mft record is written out to disk. * No need to set the inode dirty because the caller is going @@ -2610,96 +2474,20 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol,= const int mode, * record (e.g. at a minimum a new attribute will be added to * the mft record. */ - mark_ntfs_record_dirty(page, ofs); - unlock_page(page); + ntfs_mft_mark_dirty(folio); + folio_unlock(folio); /* * Need to unmap the page since map_extent_mft_record() mapped * it as well so we have it mapped twice at the moment. */ - ntfs_unmap_page(page); + kunmap_local(m); + folio_put(folio); } else { - /* - * Allocate a new VFS inode and set it up. NOTE: @vi->i_nlink - * is set to 1 but the mft record->link_count is 0. The caller - * needs to bear this in mind. - */ - vi =3D new_inode(vol->sb); - if (unlikely(!vi)) { - err =3D -ENOMEM; - /* Set the mft record itself not in use. */ - m->flags &=3D cpu_to_le16( - ~le16_to_cpu(MFT_RECORD_IN_USE)); - flush_dcache_page(page); - /* Make sure the mft record is written out to disk. */ - mark_ntfs_record_dirty(page, ofs); - unlock_page(page); - ntfs_unmap_page(page); - goto undo_mftbmp_alloc; - } - vi->i_ino =3D bit; - - /* The owner and group come from the ntfs volume. */ - vi->i_uid =3D vol->uid; - vi->i_gid =3D vol->gid; - - /* Initialize the ntfs specific part of @vi. */ - ntfs_init_big_inode(vi); - ni =3D NTFS_I(vi); - /* - * Set the appropriate mode, attribute type, and name. For - * directories, also setup the index values to the defaults. - */ - if (S_ISDIR(mode)) { - vi->i_mode =3D S_IFDIR | S_IRWXUGO; - vi->i_mode &=3D ~vol->dmask; - - NInoSetMstProtected(ni); - ni->type =3D AT_INDEX_ALLOCATION; - ni->name =3D I30; - ni->name_len =3D 4; - - ni->itype.index.block_size =3D 4096; - ni->itype.index.block_size_bits =3D ntfs_ffs(4096) - 1; - ni->itype.index.collation_rule =3D COLLATION_FILE_NAME; - if (vol->cluster_size <=3D ni->itype.index.block_size) { - ni->itype.index.vcn_size =3D vol->cluster_size; - ni->itype.index.vcn_size_bits =3D - vol->cluster_size_bits; - } else { - ni->itype.index.vcn_size =3D vol->sector_size; - ni->itype.index.vcn_size_bits =3D - vol->sector_size_bits; - } - } else { - vi->i_mode =3D S_IFREG | S_IRWXUGO; - vi->i_mode &=3D ~vol->fmask; - - ni->type =3D AT_DATA; - ni->name =3D NULL; - ni->name_len =3D 0; - } - if (IS_RDONLY(vi)) - vi->i_mode &=3D ~S_IWUGO; - - /* Set the inode times to the current time. */ - simple_inode_init_ts(vi); - /* - * Set the file size to 0, the ntfs inode sizes are set to 0 by - * the call to ntfs_init_big_inode() below. - */ - vi->i_size =3D 0; - vi->i_blocks =3D 0; - - /* Set the sequence number. */ - vi->i_generation =3D ni->seq_no =3D le16_to_cpu(m->sequence_number); /* * Manually map, pin, and lock the mft record as we already * have its page mapped and it is very easy to do. */ - atomic_inc(&ni->count); - mutex_lock(&ni->mrec_lock); - ni->page =3D page; - ni->page_ofs =3D ofs; + (*ni)->seq_no =3D le16_to_cpu(m->sequence_number); /* * Make sure the allocated mft record is written out to disk. * NOTE: We do not set the ntfs inode dirty because this would @@ -2710,23 +2498,40 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol,= const int mode, * a minimum some new attributes will be added to the mft * record. */ - mark_ntfs_record_dirty(page, ofs); - unlock_page(page); =20 - /* Add the inode to the inode hash for the superblock. */ - insert_inode_hash(vi); + (*ni)->mrec =3D kmalloc(vol->mft_record_size, GFP_NOFS); + if (!(*ni)->mrec) { + folio_unlock(folio); + kunmap_local(m); + folio_put(folio); + goto undo_mftbmp_alloc; + } =20 + memcpy((*ni)->mrec, m, vol->mft_record_size); + post_read_mst_fixup((struct ntfs_record *)(*ni)->mrec, vol->mft_record_s= ize); + ntfs_mft_mark_dirty(folio); + folio_unlock(folio); + (*ni)->folio =3D folio; + (*ni)->folio_ofs =3D ofs; + atomic_inc(&(*ni)->count); /* Update the default mft allocation position. */ vol->mft_data_pos =3D bit + 1; } + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) + mutex_unlock(&mft_ni->mrec_lock); + memalloc_nofs_restore(memalloc_flags); + /* * Return the opened, allocated inode of the allocated mft record as * well as the mapped, pinned, and locked mft record. */ ntfs_debug("Returning opened, allocated %sinode 0x%llx.", - base_ni ? "extent " : "", (long long)bit); - *mrec =3D m; - return ni; + base_ni ? "extent " : "", bit); + (*ni)->mft_no =3D bit; + if (ni_mrec) + *ni_mrec =3D (*ni)->mrec; + ntfs_dec_free_mft_records(vol, 1); + return 0; undo_data_init: write_lock_irqsave(&mft_ni->size_lock, flags); mft_ni->initialized_size =3D old_data_initialized; @@ -2734,114 +2539,83 @@ ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol= , const int mode, write_unlock_irqrestore(&mft_ni->size_lock, flags); goto undo_mftbmp_alloc_nolock; undo_mftbmp_alloc: - down_write(&vol->mftbmp_lock); + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) + down_write(&vol->mftbmp_lock); undo_mftbmp_alloc_nolock: if (ntfs_bitmap_clear_bit(vol->mftbmp_ino, bit)) { ntfs_error(vol->sb, "Failed to clear bit in mft bitmap.%s", es); NVolSetErrors(vol); } - up_write(&vol->mftbmp_lock); + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) + up_write(&vol->mftbmp_lock); err_out: - return ERR_PTR(err); + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) + mutex_unlock(&mft_ni->mrec_lock); + memalloc_nofs_restore(memalloc_flags); + return err; max_err_out: - ntfs_warning(vol->sb, "Cannot allocate mft record because the maximum " - "number of inodes (2^32) has already been reached."); - up_write(&vol->mftbmp_lock); - return ERR_PTR(-ENOSPC); + ntfs_warning(vol->sb, + "Cannot allocate mft record because the maximum number of inodes (2^32) = has already been reached."); + if (!base_ni || base_ni->mft_no !=3D FILE_MFT) { + up_write(&vol->mftbmp_lock); + mutex_unlock(&mft_ni->mrec_lock); + } + memalloc_nofs_restore(memalloc_flags); + return -ENOSPC; } =20 -/** - * ntfs_extent_mft_record_free - free an extent mft record on an ntfs volu= me - * @ni: ntfs inode of the mapped extent mft record to free - * @m: mapped extent mft record of the ntfs inode @ni - * - * Free the mapped extent mft record @m of the extent ntfs inode @ni. - * - * Note that this function unmaps the mft record and closes and destroys @= ni - * internally and hence you cannot use either @ni nor @m any more after th= is - * function returns success. +/* + * ntfs_mft_record_free - free an mft record on an ntfs volume + * @vol: volume on which to free the mft record + * @ni: open ntfs inode of the mft record to free * - * On success return 0 and on error return -errno. @ni and @m are still v= alid - * in this case and have not been freed. + * Free the mft record of the open inode @ni on the mounted ntfs volume @v= ol. + * Note that this function calls ntfs_inode_close() internally and hence y= ou + * cannot use the pointer @ni any more after this function returns success. * - * For some errors an error message is displayed and the success code 0 is - * returned and the volume is then left dirty on umount. This makes sense= in - * case we could not rollback the changes that were already done since the - * caller no longer wants to reference this mft record so it does not matt= er to - * the caller if something is wrong with it as long as it is properly deta= ched - * from the base inode. + * On success return 0 and on error return -1 with errno set to the error = code. */ -int ntfs_extent_mft_record_free(ntfs_inode *ni, MFT_RECORD *m) +int ntfs_mft_record_free(struct ntfs_volume *vol, struct ntfs_inode *ni) { - unsigned long mft_no =3D ni->mft_no; - ntfs_volume *vol =3D ni->vol; - ntfs_inode *base_ni; - ntfs_inode **extent_nis; - int i, err; - le16 old_seq_no; + u64 mft_no; + int err; u16 seq_no; -=09 - BUG_ON(NInoAttr(ni)); - BUG_ON(ni->nr_extents !=3D -1); - - mutex_lock(&ni->extent_lock); - base_ni =3D ni->ext.base_ntfs_ino; - mutex_unlock(&ni->extent_lock); - - BUG_ON(base_ni->nr_extents <=3D 0); - - ntfs_debug("Entering for extent inode 0x%lx, base inode 0x%lx.\n", - mft_no, base_ni->mft_no); - - mutex_lock(&base_ni->extent_lock); + __le16 old_seq_no; + struct mft_record *ni_mrec; + unsigned int memalloc_flags; + struct ntfs_inode *base_ni; =20 - /* Make sure we are holding the only reference to the extent inode. */ - if (atomic_read(&ni->count) > 2) { - ntfs_error(vol->sb, "Tried to free busy extent inode 0x%lx, " - "not freeing.", base_ni->mft_no); - mutex_unlock(&base_ni->extent_lock); - return -EBUSY; - } - - /* Dissociate the ntfs inode from the base inode. */ - extent_nis =3D base_ni->ext.extent_ntfs_inos; - err =3D -ENOENT; - for (i =3D 0; i < base_ni->nr_extents; i++) { - if (ni !=3D extent_nis[i]) - continue; - extent_nis +=3D i; - base_ni->nr_extents--; - memmove(extent_nis, extent_nis + 1, (base_ni->nr_extents - i) * - sizeof(ntfs_inode*)); - err =3D 0; - break; - } + if (!vol || !ni) + return -EINVAL; =20 - mutex_unlock(&base_ni->extent_lock); + ntfs_debug("Entering for inode 0x%llx.\n", (long long)ni->mft_no); =20 - if (unlikely(err)) { - ntfs_error(vol->sb, "Extent inode 0x%lx is not attached to " - "its base inode 0x%lx.", mft_no, - base_ni->mft_no); - BUG(); - } + ni_mrec =3D map_mft_record(ni); + if (IS_ERR(ni_mrec)) + return -EIO; =20 - /* - * The extent inode is no longer attached to the base inode so no one - * can get a reference to it any more. - */ + /* Cache the mft reference for later. */ + mft_no =3D ni->mft_no; =20 /* Mark the mft record as not in use. */ - m->flags &=3D ~MFT_RECORD_IN_USE; + ni_mrec->flags &=3D ~MFT_RECORD_IN_USE; =20 /* Increment the sequence number, skipping zero, if it is not zero. */ - old_seq_no =3D m->sequence_number; + old_seq_no =3D ni_mrec->sequence_number; seq_no =3D le16_to_cpu(old_seq_no); if (seq_no =3D=3D 0xffff) seq_no =3D 1; else if (seq_no) seq_no++; - m->sequence_number =3D cpu_to_le16(seq_no); + ni_mrec->sequence_number =3D cpu_to_le16(seq_no); + + down_read(&NTFS_I(vol->mft_ino)->runlist.lock); + err =3D ntfs_get_block_mft_record(NTFS_I(vol->mft_ino), ni); + up_read(&NTFS_I(vol->mft_ino)->runlist.lock); + if (err) { + unmap_mft_record(ni); + return err; + } =20 /* * Set the ntfs inode dirty and write it out. We do not need to worry @@ -2849,59 +2623,298 @@ int ntfs_extent_mft_record_free(ntfs_inode *ni, MF= T_RECORD *m) * record to be freed is guaranteed to do it already. */ NInoSetDirty(ni); - err =3D write_mft_record(ni, m, 0); - if (unlikely(err)) { - ntfs_error(vol->sb, "Failed to write mft record 0x%lx, not " - "freeing.", mft_no); - goto rollback; - } -rollback_error: - /* Unmap and throw away the now freed extent inode. */ - unmap_extent_mft_record(ni); - ntfs_clear_extent_inode(ni); + err =3D write_mft_record(ni, ni_mrec, 0); + if (err) + goto sync_rollback; + + if (likely(ni->nr_extents >=3D 0)) + base_ni =3D ni; + else + base_ni =3D ni->ext.base_ntfs_ino; =20 /* Clear the bit in the $MFT/$BITMAP corresponding to this record. */ - down_write(&vol->mftbmp_lock); + memalloc_flags =3D memalloc_nofs_save(); + if (base_ni->mft_no !=3D FILE_MFT) + down_write(&vol->mftbmp_lock); err =3D ntfs_bitmap_clear_bit(vol->mftbmp_ino, mft_no); - up_write(&vol->mftbmp_lock); - if (unlikely(err)) { - /* - * The extent inode is gone but we failed to deallocate it in - * the mft bitmap. Just emit a warning and leave the volume - * dirty on umount. - */ - ntfs_error(vol->sb, "Failed to clear bit in mft bitmap.%s", es); - NVolSetErrors(vol); - } + if (base_ni->mft_no !=3D FILE_MFT) + up_write(&vol->mftbmp_lock); + memalloc_nofs_restore(memalloc_flags); + if (err) + goto bitmap_rollback; + + unmap_mft_record(ni); + ntfs_inc_free_mft_records(vol, 1); return 0; -rollback: + /* Rollback what we did... */ - mutex_lock(&base_ni->extent_lock); - extent_nis =3D base_ni->ext.extent_ntfs_inos; - if (!(base_ni->nr_extents & 3)) { - int new_size =3D (base_ni->nr_extents + 4) * sizeof(ntfs_inode*); +bitmap_rollback: + memalloc_flags =3D memalloc_nofs_save(); + if (base_ni->mft_no !=3D FILE_MFT) + down_write(&vol->mftbmp_lock); + if (ntfs_bitmap_set_bit(vol->mftbmp_ino, mft_no)) + ntfs_error(vol->sb, "ntfs_bitmap_set_bit failed in bitmap_rollback\n"); + if (base_ni->mft_no !=3D FILE_MFT) + up_write(&vol->mftbmp_lock); + memalloc_nofs_restore(memalloc_flags); +sync_rollback: + ntfs_error(vol->sb, + "Eeek! Rollback failed in %s. Leaving inconsistent metadata!\n", __func_= _); + ni_mrec->flags |=3D MFT_RECORD_IN_USE; + ni_mrec->sequence_number =3D old_seq_no; + NInoSetDirty(ni); + write_mft_record(ni, ni_mrec, 0); + unmap_mft_record(ni); + return err; +} =20 - extent_nis =3D kmalloc(new_size, GFP_NOFS); - if (unlikely(!extent_nis)) { - ntfs_error(vol->sb, "Failed to allocate internal " - "buffer during rollback.%s", es); - mutex_unlock(&base_ni->extent_lock); - NVolSetErrors(vol); - goto rollback_error; - } - if (base_ni->nr_extents) { - BUG_ON(!base_ni->ext.extent_ntfs_inos); - memcpy(extent_nis, base_ni->ext.extent_ntfs_inos, - new_size - 4 * sizeof(ntfs_inode*)); - kfree(base_ni->ext.extent_ntfs_inos); - } - base_ni->ext.extent_ntfs_inos =3D extent_nis; +static s64 lcn_from_index(struct ntfs_volume *vol, struct ntfs_inode *ni, + unsigned long index) +{ + s64 vcn; + s64 lcn; + + vcn =3D ntfs_pidx_to_cluster(vol, index); + + down_read(&ni->runlist.lock); + lcn =3D ntfs_attr_vcn_to_lcn_nolock(ni, vcn, false); + up_read(&ni->runlist.lock); + + return lcn; +} + +/* + * ntfs_write_mft_block - Write back a folio containing MFT records + * @folio: The folio to write back (contains one or more MFT records) + * @wbc: Writeback control structure + * + * This function is called as part of the address_space_operations + * .writepages implementation for the $MFT inode (or $MFTMirr). + * It handles writing one folio (normally 4KiB page) worth of MFT records + * to the underlying block device. + * + * Return: 0 on success, or -errno on error. + */ +static int ntfs_write_mft_block(struct folio *folio, struct writeback_cont= rol *wbc) +{ + struct address_space *mapping =3D folio->mapping; + struct inode *vi =3D mapping->host; + struct ntfs_inode *ni =3D NTFS_I(vi); + struct ntfs_volume *vol =3D ni->vol; + u8 *kaddr; + struct ntfs_inode *locked_nis[PAGE_SIZE / NTFS_BLOCK_SIZE]; + int nr_locked_nis =3D 0, err =3D 0, mft_ofs, prev_mft_ofs; + struct inode *ref_inos[PAGE_SIZE / NTFS_BLOCK_SIZE]; + int nr_ref_inos =3D 0; + struct bio *bio =3D NULL; + unsigned long mft_no; + struct ntfs_inode *tni; + s64 lcn; + s64 vcn =3D ntfs_pidx_to_cluster(vol, folio->index); + s64 end_vcn =3D ntfs_bytes_to_cluster(vol, ni->allocated_size); + unsigned int folio_sz; + struct runlist_element *rl; + loff_t i_size =3D i_size_read(vi); + + ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, folio index 0x= %lx.", + vi->i_ino, ni->type, folio->index); + + /* We have to zero every time due to mmap-at-end-of-file. */ + if (folio->index >=3D (i_size >> folio_shift(folio))) + /* The page straddles i_size. */ + folio_zero_segment(folio, + offset_in_folio(folio, i_size), + folio_size(folio)); + + lcn =3D lcn_from_index(vol, ni, folio->index); + if (lcn <=3D LCN_HOLE) { + folio_start_writeback(folio); + folio_unlock(folio); + folio_end_writeback(folio); + return -EIO; + } + + /* Map folio so we can access its contents. */ + kaddr =3D kmap_local_folio(folio, 0); + /* Clear the page uptodate flag whilst the mst fixups are applied. */ + folio_clear_uptodate(folio); + + for (mft_ofs =3D 0; mft_ofs < PAGE_SIZE && vcn < end_vcn; + mft_ofs +=3D vol->mft_record_size) { + /* Get the mft record number. */ + mft_no =3D (((s64)folio->index << PAGE_SHIFT) + mft_ofs) >> + vol->mft_record_size_bits; + vcn =3D ntfs_mft_no_to_cluster(vol, mft_no); + /* Check whether to write this mft record. */ + tni =3D NULL; + if (ntfs_may_write_mft_record(vol, mft_no, + (struct mft_record *)(kaddr + mft_ofs), + &tni, &ref_inos[nr_ref_inos])) { + unsigned int mft_record_off =3D 0; + s64 vcn_off =3D vcn; + + /* + * Skip $MFT extent mft records and let them being written + * by writeback to avioid deadlocks. the $MFT runlist + * lock must be taken before $MFT extent mrec_lock is taken. + */ + if (tni && tni->nr_extents < 0 && + tni->ext.base_ntfs_ino =3D=3D NTFS_I(vol->mft_ino)) { + mutex_unlock(&tni->mrec_lock); + atomic_dec(&tni->count); + iput(vol->mft_ino); + continue; + } + + /* + * The record should be written. If a locked ntfs + * inode was returned, add it to the array of locked + * ntfs inodes. + */ + if (tni) + locked_nis[nr_locked_nis++] =3D tni; + else if (ref_inos[nr_ref_inos]) + nr_ref_inos++; + + if (bio && (mft_ofs !=3D prev_mft_ofs + vol->mft_record_size)) { +flush_bio: + bio->bi_end_io =3D ntfs_bio_end_io; + submit_bio(bio); + bio =3D NULL; + } + + if (vol->cluster_size < folio_size(folio)) { + down_write(&ni->runlist.lock); + rl =3D ntfs_attr_vcn_to_rl(ni, vcn_off, &lcn); + up_write(&ni->runlist.lock); + if (IS_ERR(rl) || lcn < 0) { + err =3D -EIO; + goto unm_done; + } + + if (bio && + (bio_end_sector(bio) >> (vol->cluster_size_bits - 9)) !=3D + lcn) { + bio->bi_end_io =3D ntfs_bio_end_io; + submit_bio(bio); + bio =3D NULL; + } + } + + if (!bio) { + unsigned int off; + + off =3D ((mft_no << vol->mft_record_size_bits) + + mft_record_off) & vol->cluster_size_mask; + + bio =3D bio_alloc(vol->sb->s_bdev, 1, REQ_OP_WRITE, + GFP_NOIO); + bio->bi_iter.bi_sector =3D + ntfs_bytes_to_sector(vol, + ntfs_cluster_to_bytes(vol, lcn) + off); + } + + if (vol->cluster_size =3D=3D NTFS_BLOCK_SIZE && + (mft_record_off || + rl->length - (vcn_off - rl->vcn) =3D=3D 1 || + mft_ofs + NTFS_BLOCK_SIZE >=3D PAGE_SIZE)) + folio_sz =3D NTFS_BLOCK_SIZE; + else + folio_sz =3D vol->mft_record_size; + if (!bio_add_folio(bio, folio, folio_sz, + mft_ofs + mft_record_off)) { + err =3D -EIO; + bio_put(bio); + goto unm_done; + } + mft_record_off +=3D folio_sz; + + if (mft_record_off !=3D vol->mft_record_size) { + vcn_off++; + goto flush_bio; + } + prev_mft_ofs =3D mft_ofs; + + if (mft_no < vol->mftmirr_size) + ntfs_sync_mft_mirror(vol, mft_no, + (struct mft_record *)(kaddr + mft_ofs)); + } else if (ref_inos[nr_ref_inos]) + nr_ref_inos++; } - m->flags |=3D MFT_RECORD_IN_USE; - m->sequence_number =3D old_seq_no; - extent_nis[base_ni->nr_extents++] =3D ni; - mutex_unlock(&base_ni->extent_lock); - mark_mft_record_dirty(ni); + + if (bio) { + bio->bi_end_io =3D ntfs_bio_end_io; + submit_bio(bio); + } +unm_done: + folio_mark_uptodate(folio); + kunmap_local(kaddr); + + folio_start_writeback(folio); + folio_unlock(folio); + folio_end_writeback(folio); + + /* Unlock any locked inodes. */ + while (nr_locked_nis-- > 0) { + struct ntfs_inode *base_tni; + + tni =3D locked_nis[nr_locked_nis]; + mutex_unlock(&tni->mrec_lock); + + /* Get the base inode. */ + mutex_lock(&tni->extent_lock); + if (tni->nr_extents >=3D 0) + base_tni =3D tni; + else + base_tni =3D tni->ext.base_ntfs_ino; + mutex_unlock(&tni->extent_lock); + ntfs_debug("Unlocking %s inode 0x%lx.", + tni =3D=3D base_tni ? "base" : "extent", + tni->mft_no); + atomic_dec(&tni->count); + iput(VFS_I(base_tni)); + } + + /* Dropping deferred references */ + while (nr_ref_inos-- > 0) { + if (ref_inos[nr_ref_inos]) + iput(ref_inos[nr_ref_inos]); + } + + if (unlikely(err && err !=3D -ENOMEM)) + NVolSetErrors(vol); + if (likely(!err)) + ntfs_debug("Done."); return err; } -#endif /* NTFS_RW */ + +/* + * ntfs_mft_writepages - Write back dirty folios for the $MFT inode + * @mapping: address space of the $MFT inode + * @wbc: writeback control + * + * Writeback iterator for MFT records. Iterates over dirty folios and + * delegates actual writing to ntfs_write_mft_block() for each folio. + * Called from the address_space_operations .writepages vector of the + * $MFT inode. + * + * Returns 0 on success, or the first error encountered. + */ +int ntfs_mft_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + struct folio *folio =3D NULL; + int error; + + if (NVolShutdown(NTFS_I(mapping->host)->vol)) + return -EIO; + + while ((folio =3D writeback_iter(mapping, wbc, folio, &error))) + error =3D ntfs_write_mft_block(folio, wbc); + return error; +} + +void ntfs_mft_mark_dirty(struct folio *folio) +{ + iomap_dirty_folio(folio->mapping, folio); +} diff --git a/fs/ntfs/mst.c b/fs/ntfs/mst.c index 16b3c884abfc..7f9faad924ad 100644 --- a/fs/ntfs/mst.c +++ b/fs/ntfs/mst.c @@ -1,14 +1,15 @@ // SPDX-License-Identifier: GPL-2.0-or-later /* - * mst.c - NTFS multi sector transfer protection handling code. Part of the - * Linux-NTFS project. + * NTFS multi sector transfer protection handling code. * * Copyright (c) 2001-2004 Anton Altaparmakov */ =20 +#include + #include "ntfs.h" =20 -/** +/* * post_read_mst_fixup - deprotect multi sector transfer protected data * @b: pointer to the data to deprotect * @size: size in bytes of @b @@ -25,7 +26,7 @@ * be fixed up. Thus, we return success and not failure in this case. This= is * in contrast to pre_write_mst_fixup(), see below. */ -int post_read_mst_fixup(NTFS_RECORD *b, const u32 size) +int post_read_mst_fixup(struct ntfs_record *b, const u32 size) { u16 usa_ofs, usa_count, usn; u16 *usa_pos, *data_pos; @@ -35,13 +36,12 @@ int post_read_mst_fixup(NTFS_RECORD *b, const u32 size) /* Decrement usa_count to get number of fixups. */ usa_count =3D le16_to_cpu(b->usa_count) - 1; /* Size and alignment checks. */ - if ( size & (NTFS_BLOCK_SIZE - 1) || - usa_ofs & 1 || - usa_ofs + (usa_count * 2) > size || - (size >> NTFS_BLOCK_SIZE_BITS) !=3D usa_count) + if (size & (NTFS_BLOCK_SIZE - 1) || usa_ofs & 1 || + usa_ofs + (usa_count * 2) > size || + (size >> NTFS_BLOCK_SIZE_BITS) !=3D usa_count) return 0; /* Position of usn in update sequence array. */ - usa_pos =3D (u16*)b + usa_ofs/sizeof(u16); + usa_pos =3D (u16 *)b + usa_ofs/sizeof(u16); /* * The update sequence number which has to be equal to each of the * u16 values before they are fixed up. Note no need to care for @@ -53,12 +53,18 @@ int post_read_mst_fixup(NTFS_RECORD *b, const u32 size) /* * Position in protected data of first u16 that needs fixing up. */ - data_pos =3D (u16*)b + NTFS_BLOCK_SIZE/sizeof(u16) - 1; + data_pos =3D (u16 *)b + NTFS_BLOCK_SIZE / sizeof(u16) - 1; /* * Check for incomplete multi sector transfer(s). */ while (usa_count--) { if (*data_pos !=3D usn) { + struct mft_record *m =3D (struct mft_record *)b; + + pr_err_ratelimited("ntfs: Incomplete multi sector transfer detected! (R= ecord magic : 0x%x, mft number : 0x%x, base mft number : 0x%lx, mft in use = : %d, data : 0x%x, usn 0x%x)\n", + le32_to_cpu(m->magic), le32_to_cpu(m->mft_record_number), + MREF_LE(m->base_mft_record), m->flags & MFT_RECORD_IN_USE, + *data_pos, usn); /* * Incomplete multi sector transfer detected! )-: * Set the magic to "BAAD" and return failure. @@ -67,11 +73,11 @@ int post_read_mst_fixup(NTFS_RECORD *b, const u32 size) b->magic =3D magic_BAAD; return -EINVAL; } - data_pos +=3D NTFS_BLOCK_SIZE/sizeof(u16); + data_pos +=3D NTFS_BLOCK_SIZE / sizeof(u16); } /* Re-setup the variables. */ usa_count =3D le16_to_cpu(b->usa_count) - 1; - data_pos =3D (u16*)b + NTFS_BLOCK_SIZE/sizeof(u16) - 1; + data_pos =3D (u16 *)b + NTFS_BLOCK_SIZE / sizeof(u16) - 1; /* Fixup all sectors. */ while (usa_count--) { /* @@ -85,7 +91,7 @@ int post_read_mst_fixup(NTFS_RECORD *b, const u32 size) return 0; } =20 -/** +/* * pre_write_mst_fixup - apply multi sector transfer protection * @b: pointer to the data to protect * @size: size in bytes of @b @@ -106,28 +112,27 @@ int post_read_mst_fixup(NTFS_RECORD *b, const u32 siz= e) * otherwise a random word will be used (whatever was in the record at that * position at that time). */ -int pre_write_mst_fixup(NTFS_RECORD *b, const u32 size) +int pre_write_mst_fixup(struct ntfs_record *b, const u32 size) { - le16 *usa_pos, *data_pos; + __le16 *usa_pos, *data_pos; u16 usa_ofs, usa_count, usn; - le16 le_usn; + __le16 le_usn; =20 /* Sanity check + only fixup if it makes sense. */ if (!b || ntfs_is_baad_record(b->magic) || - ntfs_is_hole_record(b->magic)) + ntfs_is_hole_record(b->magic)) return -EINVAL; /* Setup the variables. */ usa_ofs =3D le16_to_cpu(b->usa_ofs); /* Decrement usa_count to get number of fixups. */ usa_count =3D le16_to_cpu(b->usa_count) - 1; /* Size and alignment checks. */ - if ( size & (NTFS_BLOCK_SIZE - 1) || - usa_ofs & 1 || - usa_ofs + (usa_count * 2) > size || - (size >> NTFS_BLOCK_SIZE_BITS) !=3D usa_count) + if (size & (NTFS_BLOCK_SIZE - 1) || usa_ofs & 1 || + usa_ofs + (usa_count * 2) > size || + (size >> NTFS_BLOCK_SIZE_BITS) !=3D usa_count) return -EINVAL; /* Position of usn in update sequence array. */ - usa_pos =3D (le16*)((u8*)b + usa_ofs); + usa_pos =3D (__le16 *)((u8 *)b + usa_ofs); /* * Cyclically increment the update sequence number * (skipping 0 and -1, i.e. 0xffff). @@ -138,7 +143,7 @@ int pre_write_mst_fixup(NTFS_RECORD *b, const u32 size) le_usn =3D cpu_to_le16(usn); *usa_pos =3D le_usn; /* Position in data of first u16 that needs fixing up. */ - data_pos =3D (le16*)b + NTFS_BLOCK_SIZE/sizeof(le16) - 1; + data_pos =3D (__le16 *)b + NTFS_BLOCK_SIZE/sizeof(__le16) - 1; /* Fixup all sectors. */ while (usa_count--) { /* @@ -149,12 +154,12 @@ int pre_write_mst_fixup(NTFS_RECORD *b, const u32 siz= e) /* Apply fixup to data. */ *data_pos =3D le_usn; /* Increment position in data as well. */ - data_pos +=3D NTFS_BLOCK_SIZE/sizeof(le16); + data_pos +=3D NTFS_BLOCK_SIZE / sizeof(__le16); } return 0; } =20 -/** +/* * post_write_mst_fixup - fast deprotect multi sector transfer protected d= ata * @b: pointer to the data to deprotect * @@ -162,18 +167,18 @@ int pre_write_mst_fixup(NTFS_RECORD *b, const u32 siz= e) * for any errors, because we assume we have just used pre_write_mst_fixup= (), * thus the data will be fine or we would never have gotten here. */ -void post_write_mst_fixup(NTFS_RECORD *b) +void post_write_mst_fixup(struct ntfs_record *b) { - le16 *usa_pos, *data_pos; + __le16 *usa_pos, *data_pos; =20 u16 usa_ofs =3D le16_to_cpu(b->usa_ofs); u16 usa_count =3D le16_to_cpu(b->usa_count) - 1; =20 /* Position of usn in update sequence array. */ - usa_pos =3D (le16*)b + usa_ofs/sizeof(le16); + usa_pos =3D (__le16 *)b + usa_ofs/sizeof(__le16); =20 /* Position in protected data of first u16 that needs fixing up. */ - data_pos =3D (le16*)b + NTFS_BLOCK_SIZE/sizeof(le16) - 1; + data_pos =3D (__le16 *)b + NTFS_BLOCK_SIZE/sizeof(__le16) - 1; =20 /* Fixup all sectors. */ while (usa_count--) { @@ -184,6 +189,6 @@ void post_write_mst_fixup(NTFS_RECORD *b) *data_pos =3D *(++usa_pos); =20 /* Increment position in data as well. */ - data_pos +=3D NTFS_BLOCK_SIZE/sizeof(le16); + data_pos +=3D NTFS_BLOCK_SIZE/sizeof(__le16); } } --=20 2.25.1