From nobody Sun Feb  8 02:39:38 2026
Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com
 [209.85.214.181])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A8BE2FB96A
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:00:12 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.181
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219620; cv=none;
 b=hNB3zFRUUhGSQilaKdHxFwF3/2XkUrUsgg0OZq0fyqqCxga1c9TqP+DJi6+wii9Ok/px68U0D/4+ZKgpY9RPYTxShtiEXjGB4aNGz66qlgsW8anG0nKrkptFZz2rDOdIEwLF+JfJ+pBCh5HAfYZtDt419V+6BoUS//AADg/gqYQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219620; c=relaxed/simple;
	bh=TBEwZGsvyZ3rXnviREFOR+464USsEpfv5Dc3dW5f1AY=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=WJJhpVjWyCDV2qYjSOX9DjZx+4eNQWAE54FJ2e+QnyvOzLHjoy9Ty3Th1yiNwbZvW7NIcvjPnOgepK916h4jELGFzihKXjfUEgDKxuxCQd3OanX8AHISJGTvrDvrvxNOWVuijY/XZcL9LazAr1+PtPg6ONmmnsb+Y2y7ZJTdREs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.181
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f181.google.com with SMTP id
 d9443c01a7336-2957850c63bso3655455ad.0
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:00:12 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219612; x=1764824412;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=XboXtcqvJIJj2nVZp7ltnvviV7yULIPEwN20R3Hu2XE=;
        b=YxGl3BsfamRHS2IiE8zUUwt2FmZ96TRG4I1Lh6UlTVzWyVSGSia6WwKvDOHhsLz2a/
         BQkN+AOoKEeiXBgIi5YaKNMpfyapzwNDjKEomZCC0I3jGQOOjh96gnqml2IqJwO6EgM9
         jOKFl/EEpbfrhhXZ1Ycv4lJLIlMN1B1UZTMswOkZwAdQ/zlhMoyEqFdZyyDNvlyn5Ii7
         OCJgp7Pk7tAmun7E99HjLg60tuD0BDh9RtIwMNeD9zfcoCe095MkSIefRiMgrvQzDCOS
         uABaMsB/P5V/cDy91MRWdBptO1GltQEH8YLAEf+/ur45I98Tc/gYSipCK6xB2F1RqLI2
         DHng==
X-Forwarded-Encrypted: i=1;
 AJvYcCX+KT73L+X67agE9F3A7zPBkXjBitH/Pv/lCesaAiakr/YhhYw0A6Y6puxLVmNg2/U/ya67WSOqleeMt1w=@vger.kernel.org
X-Gm-Message-State: AOJu0YzllJR/2ZVZuebc3LRviNbu9Cc1TNuD9AsKtZozMeTCUqJ8Jx4z
	fTb7/klngReW42uOHicakbFLveiLHuHq7WhXatVeXgSL/gLxOCqn9XCA
X-Gm-Gg: ASbGncvS+kivhfkOVwOsENgW379tSz0ITSwliqkVKf+aTljSeceso36xg25XR5eQ/RV
	UKoI8E2xJAyAbXft4dksCK692B8/CoETpFcv1o0o6xybLbKyZNO5atyutyQCqN2FcJ41iZe94jc
	bygsvYMdvz5biCU4R+Nfx5xo29c43aS8vSiVjN1xB5uacURD9xogluxMdnAlMBlA+o6aIoZmn6W
	aYASOyuoH4w/fPYkG5Rg2AJdbFWK9BaWOv/LQLBdInsdUKHoiiMsrO2bfgbVjFyqJ13k/30Ldr1
	fktUAdmKPYjdOtCbJ0IMzN5I/x3HjzRKfNV1vxuLkQbpQ/ccwLhfkfnFAZEwdQmjl1JZu68Y+qb
	9Ry/BB4I0bm0dGVU23mtuqXDmAwcI4mCbBnDue9ePeTsoqDl3rPstNVUIF6DK/tlMebsIiYVlo9
	VjvzIyOCz30RdhBMglfFV32cff4w==
X-Google-Smtp-Source: 
 AGHT+IGOTgbKdo0BKyqhDbbRplPZKMN8lktgAwZiAhvakc1pZz0eabS55oDRwHwb39R1aXM4I9tS3Q==
X-Received: by 2002:a17:903:1b47:b0:297:df84:bd18 with SMTP id
 d9443c01a7336-29b6c02226cmr287464025ad.30.1764219610040;
        Wed, 26 Nov 2025 21:00:10 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.00.03
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:00:08 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>
Subject: [PATCH v2 01/11] ntfsplus: in-memory, on-disk structures and headers
Date: Thu, 27 Nov 2025 13:59:34 +0900
Message-Id: <20251127045944.26009-2-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds in-memory, on-disk structures, headers and documentation.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 Documentation/filesystems/index.rst    |    1 +
 Documentation/filesystems/ntfsplus.rst |  199 +++
 fs/ntfsplus/aops.h                     |   92 +
 fs/ntfsplus/attrib.h                   |  159 ++
 fs/ntfsplus/attrlist.h                 |   21 +
 fs/ntfsplus/bitmap.h                   |   93 +
 fs/ntfsplus/collate.h                  |   37 +
 fs/ntfsplus/dir.h                      |   33 +
 fs/ntfsplus/ea.h                       |   25 +
 fs/ntfsplus/index.h                    |  127 ++
 fs/ntfsplus/inode.h                    |  353 ++++
 fs/ntfsplus/layout.h                   | 2288 ++++++++++++++++++++++++
 fs/ntfsplus/lcnalloc.h                 |  127 ++
 fs/ntfsplus/logfile.h                  |  316 ++++
 fs/ntfsplus/mft.h                      |   92 +
 fs/ntfsplus/misc.h                     |  218 +++
 fs/ntfsplus/ntfs.h                     |  180 ++
 fs/ntfsplus/ntfs_iomap.h               |   22 +
 fs/ntfsplus/reparse.h                  |   15 +
 fs/ntfsplus/runlist.h                  |   91 +
 fs/ntfsplus/volume.h                   |  254 +++
 include/uapi/linux/ntfs.h              |   23 +
 22 files changed, 4766 insertions(+)
 create mode 100644 Documentation/filesystems/ntfsplus.rst
 create mode 100644 fs/ntfsplus/aops.h
 create mode 100644 fs/ntfsplus/attrib.h
 create mode 100644 fs/ntfsplus/attrlist.h
 create mode 100644 fs/ntfsplus/bitmap.h
 create mode 100644 fs/ntfsplus/collate.h
 create mode 100644 fs/ntfsplus/dir.h
 create mode 100644 fs/ntfsplus/ea.h
 create mode 100644 fs/ntfsplus/index.h
 create mode 100644 fs/ntfsplus/inode.h
 create mode 100644 fs/ntfsplus/layout.h
 create mode 100644 fs/ntfsplus/lcnalloc.h
 create mode 100644 fs/ntfsplus/logfile.h
 create mode 100644 fs/ntfsplus/mft.h
 create mode 100644 fs/ntfsplus/misc.h
 create mode 100644 fs/ntfsplus/ntfs.h
 create mode 100644 fs/ntfsplus/ntfs_iomap.h
 create mode 100644 fs/ntfsplus/reparse.h
 create mode 100644 fs/ntfsplus/runlist.h
 create mode 100644 fs/ntfsplus/volume.h
 create mode 100644 include/uapi/linux/ntfs.h

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystem=
s/index.rst
index af516e528ded..dec2d3d393d3 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -101,6 +101,7 @@ Documentation for filesystem implementations.
    nilfs2
    nfs/index
    ntfs3
+   ntfsplus
    ocfs2
    ocfs2-online-filecheck
    omfs
diff --git a/Documentation/filesystems/ntfsplus.rst b/Documentation/filesys=
tems/ntfsplus.rst
new file mode 100644
index 000000000000..d12b55e0fb97
--- /dev/null
+++ b/Documentation/filesystems/ntfsplus.rst
@@ -0,0 +1,199 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
+The Linux NTFS+ filesystem driver
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+
+.. Table of contents
+
+   - Overview
+   - Features
+   - Utilities support
+   - Supported mount options
+
+
+Overview
+=3D=3D=3D=3D=3D=3D=3D=3D
+
+The ntfsplus is an implementation that supports write and the current
+trends(iomap, no buffer-head) based on read-only classic NTFS.
+The old read-only ntfs code is much cleaner, with extensive comments,
+offers readability that makes understanding NTFS easier. This is why
+ntfsplus was developed on old read-only NTFS base.
+The target is to provide current trends(iomap, no buffer head, folio),
+enhanced performance, stable maintenance, utility support including fsck.
+
+Features
+=3D=3D=3D=3D=3D=3D=3D=3D
+
+- Write support:
+   Implement write support on classic read-only NTFS. Additionally,
+   integrate delayed allocation to enhance write performance through
+   multi-cluster allocation and minimized fragmentation of cluster bitmap.
+
+- Switch to using iomap:
+   Use iomap for buffered IO writes, reads, direct IO, file extent mapping,
+   readpages, writepages operations.
+
+- Stop using the buffer head:
+   The use of buffer head in old ntfs and switched to use folio instead.
+   As a result, CONFIG_BUFFER_HEAD option enable is removed in Kconfig als=
o.
+
+- Performance Enhancements:
+  write, file list browsing, mount performance are improved with
+  the following.
+     - Use iomap aops.
+     - Delayed allocation support.
+     - Optimize zero out for newly allocated clusters.
+     - Optimize runlist merge overhead with small chunck size.
+     - pre-load mft(inode) blocks and index(dentry) blocks to improve
+       readdir + stat performance.
+     - Load lcn bitmap on background.
+
+- Stability improvement:
+   a. Pass more xfstests tests:
+      ntfsplus implement fallocate, idmapped mount and permission, etc,
+      resulting in a significantly high number(287) of xfstests pass.
+   b. Bonnie++ issue[3]:
+      The Bonnie++ benchmark fails on ntfs3 with a "Directory not empty"
+      error during file deletion. ntfs3 currently iterates directory
+      entries by reading index blocks one by one. When entries are deleted
+      concurrently, index block merging or entry relocation can cause
+      readdir() to skip some entries, leaving files undeleted in
+      workloads(bonnie++) that mix unlink and directory scans.
+      ntfsplus implement leaf chain traversal in readdir to avoid entry sk=
ip
+      on deletion.
+
+
+Utilities support
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+While ntfs-3g includes ntfsprogs as a component, it notably lacks
+the fsck implementation. So we have launched a new ntfs utilitiies
+project called ntfsprogs-plus by forking from ntfs-3g after removing
+unnecessary ntfs fuse implementation. fsck.ntfs can be used for ntfs
+testing with xfstests as well as for recovering corrupted NTFS device.
+Download the following ntfsplus-plus and can use mkfs.ntfs and fsck.ntfs.
+
+  https://github.com/ntfsprogs-plus/ntfsprogs-plus
+
+
+Supported mount options
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+The NTFS+ driver supports the following mount options:
+
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
+iocharset=3Dname		Deprecated option.  Still supported but please use
+			nls=3Dname in the future.  See description for nls=3Dname.
+
+nls=3Dname		Character set to use when returning file names.
+			Unlike VFAT, NTFS suppresses names that contain
+			unconvertible characters.  Note that most character
+			sets contain insufficient characters to represent all
+			possible Unicode characters that can exist on NTFS.
+			To be sure you are not missing any files, you are
+			advised to use nls=3Dutf8 which is capable of
+			representing all Unicode characters.
+
+uid=3D
+gid=3D
+umask=3D			Provide default owner, group, and access mode mask.
+			These options work as documented in mount(8).  By
+			default, the files/directories are owned by root and
+			he/she has read and write permissions, as well as
+			browse permission for directories.  No one else has any
+			access permissions.  I.e. the mode on all files is by
+			default rw------- and for directories rwx------, a
+			consequence of the default fmask=3D0177 and dmask=3D0077.
+			Using a umask of zero will grant all permissions to
+			everyone, i.e. all files and directories will have mode
+			rwxrwxrwx.
+
+fmask=3D
+dmask=3D			Instead of specifying umask which applies both to
+			files and directories, fmask applies only to files and
+			dmask only to directories.
+
+showmeta=3D<BOOL>
+show_sys_files=3D<BOOL>	If show_sys_files is specified, show the system fi=
les
+			in directory listings.  Otherwise the default behaviour
+			is to hide the system files.
+			Note that even when show_sys_files is specified, "$MFT"
+			will not be visible due to bugs/mis-features in glibc.
+			Further, note that irrespective of show_sys_files, all
+			files are accessible by name, i.e. you can always do
+			"ls -l \$UpCase" for example to specifically show the
+			system file containing the Unicode upcase table.
+
+case_sensitive=3D<BOOL>	If case_sensitive is specified, treat all file nam=
es as
+			case sensitive and create file names in the POSIX
+			namespace (default behavior). Note, the Linux NTFS
+			driver will never create short file names and will
+			remove them on rename/delete of the corresponding long
+			file name. Note that files remain accessible via their
+			short file name, if it exists.
+
+nocase=3D<BOOL>		If nocase is specified, treat file names case-insensitive=
ly.
+
+disable_sparse=3D<BOOL>	If disable_sparse is specified, creation of sparse
+			regions, i.e. holes, inside files is disabled for the
+			volume (for the duration of this mount only).  By
+			default, creation of sparse regions is enabled, which
+			is consistent with the behaviour of traditional Unix
+			filesystems.
+
+errors=3Dopt		Specify NTFS+ behavior on critical errors: panic,
+                        remount the partition in read-only mode or continue
+                        without doing anything (default behavior).
+
+mft_zone_multiplier=3D	Set the MFT zone multiplier for the volume (this
+			setting is not persistent across mounts and can be
+			changed from mount to mount but cannot be changed on
+			remount).  Values of 1 to 4 are allowed, 1 being the
+			default.  The MFT zone multiplier determines how much
+			space is reserved for the MFT on the volume.  If all
+			other space is used up, then the MFT zone will be
+			shrunk dynamically, so this has no impact on the
+			amount of free space.  However, it can have an impact
+			on performance by affecting fragmentation of the MFT.
+			In general use the default.  If you have a lot of small
+			files then use a higher value.  The values have the
+			following meaning:
+
+			      =3D=3D=3D=3D=3D	    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+			      Value	     MFT zone size (% of volume size)
+			      =3D=3D=3D=3D=3D	    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+				1		12.5%
+				2		25%
+				3		37.5%
+				4		50%
+			      =3D=3D=3D=3D=3D	    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+			Note this option is irrelevant for read-only mounts.
+
+preallocated_size=3D	Set preallocated size to optimize runlist merge
+                        overhead with small chunck size.(64KB size by defa=
ult)
+
+acl=3D<BOOL>		Enable POSIX ACL support. When specified, POSIX ACLs stored
+			in extended attributes are enforced. Default is off.
+			Requires kernel config NTFSPLUS_FS_POSIX_ACL enabled.
+
+sys_immutable=3D<BOOL>	Make NTFS system files (e.g. $MFT, $LogFile, $Bitma=
p,
+			$UpCase, etc.) immutable to user initiated modifications
+			for extra safety. Default is off.
+
+nohidden=3D<BOOL>		Hide files and directories marked with the Windows
+			"hidden" attribute. By default hidden items are shown.
+
+hide_dot_files=3D<BOOL>	Hide names beginning with a dot ("."). By default =
dot
+			files are shown. When enabled, files and directories created
+			with a leading '.' will be hidden from directory listings.
+
+windows_names=3D<BOOL>	Refuse creation/rename of files with characters or
+			reserved device names disallowed on Windows (e.g. CON,
+			NUL, AUX, COM1, LPT1, etc.). Default is off.
+discard=3D<BOOL>		Issue block device discard for clusters freed on
+			file deletion/truncation to inform underlying storage.
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
diff --git a/fs/ntfsplus/aops.h b/fs/ntfsplus/aops.h
new file mode 100644
index 000000000000..333bbae8c566
--- /dev/null
+++ b/fs/ntfsplus/aops.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/**
+ * Defines for NTFS kernel address space operations and page cache
+ * handling.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_AOPS_H
+#define _LINUX_NTFS_AOPS_H
+
+#include <linux/pagemap.h>
+#include <linux/iomap.h>
+
+#include "volume.h"
+#include "inode.h"
+
+/**
+ * ntfs_unmap_folio - release a folio that was mapped using ntfs_folio_pag=
e()
+ * @folio:	the folio to release
+ *
+ * Unpin, unmap and release a folio that was obtained from ntfs_folio_page=
().
+ */
+static inline void ntfs_unmap_folio(struct folio *folio, void *addr)
+{
+	if (addr)
+		kunmap_local(addr);
+	folio_put(folio);
+}
+
+/**
+ * ntfs_read_mapping_folio - map a folio into accessible memory, reading i=
t if necessary
+ * @mapping:	address space for which to obtain the page
+ * @index:	index into the page cache for @mapping of the page to map
+ *
+ * Read a page from the page cache of the address space @mapping at positi=
on
+ * @index, where @index is in units of PAGE_SIZE, and not in bytes.
+ *
+ * If the page is not in memory it is loaded from disk first using the
+ * read_folio method defined in the address space operations of @mapping
+ * and the page is added to the page cache of @mapping in the process.
+ *
+ * If the page belongs to an mst protected attribute and it is marked as s=
uch
+ * in its ntfs inode (NInoMstProtected()) the mst fixups are applied but no
+ * error checking is performed.  This means the caller has to verify wheth=
er
+ * the ntfs record(s) contained in the page are valid or not using one of =
the
+ * ntfs_is_XXXX_record{,p}() macros, where XXXX is the record type you are
+ * expecting to see.  (For details of the macros, see fs/ntfs/layout.h.)
+ *
+ * If the page is in high memory it is mapped into memory directly address=
ible
+ * by the kernel.
+ *
+ * Finally the page count is incremented, thus pinning the page into place.
+ *
+ * The above means that page_address(page) can be used on all pages obtain=
ed
+ * with ntfs_map_page() to get the kernel virtual address of the page.
+ *
+ * When finished with the page, the caller has to call ntfs_unmap_page() to
+ * unpin, unmap and release the page.
+ *
+ * Note this does not grant exclusive access. If such is desired, the call=
er
+ * must provide it independently of the ntfs_{un}map_page() calls by using
+ * a {rw_}semaphore or other means of serialization. A spin lock cannot be
+ * used as ntfs_map_page() can block.
+ *
+ * The unlocked and uptodate page is returned on success or an encoded err=
or
+ * on failure. Caller has to test for error using the IS_ERR() macro on the
+ * return value. If that evaluates to 'true', the negative error code can =
be
+ * obtained using PTR_ERR() on the return value of ntfs_map_page().
+ */
+static inline struct folio *ntfs_read_mapping_folio(struct address_space *=
mapping,
+		unsigned long index)
+{
+	struct folio *folio;
+
+retry:
+	folio =3D read_mapping_folio(mapping, index, NULL);
+	if (PTR_ERR(folio) =3D=3D -EINTR)
+		goto retry;
+
+	return folio;
+}
+
+void mark_ntfs_record_dirty(struct folio *folio);
+struct bio *ntfs_setup_bio(struct ntfs_volume *vol, unsigned int opf, s64 =
lcn,
+		unsigned int pg_ofs);
+int ntfs_dev_read(struct super_block *sb, void *buf, loff_t start, loff_t =
end);
+int ntfs_dev_write(struct super_block *sb, void *buf, loff_t start,
+			loff_t size, bool wait);
+#endif /* _LINUX_NTFS_AOPS_H */
diff --git a/fs/ntfsplus/attrib.h b/fs/ntfsplus/attrib.h
new file mode 100644
index 000000000000..e7991851dc9a
--- /dev/null
+++ b/fs/ntfsplus/attrib.h
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for attribute handling in NTFS Linux kernel driver.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2005 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_ATTRIB_H
+#define _LINUX_NTFS_ATTRIB_H
+
+#include "ntfs.h"
+#include "dir.h"
+
+extern __le16 AT_UNNAMED[];
+
+/**
+ * ntfs_attr_search_ctx - used in attribute search functions
+ * @mrec:	buffer containing mft record to search
+ * @attr:	attribute record in @mrec where to begin/continue search
+ * @is_first:	if true ntfs_attr_lookup() begins search with @attr, else af=
ter
+ *
+ * Structure must be initialized to zero before the first call to one of t=
he
+ * attribute search functions. Initialize @mrec to point to the mft record=
 to
+ * search, and @attr to point to the first attribute within @mrec (not nec=
essary
+ * if calling the _first() functions), and set @is_first to 'true' (not ne=
cessary
+ * if calling the _first() functions).
+ *
+ * If @is_first is 'true', the search begins with @attr. If @is_first is '=
false',
+ * the search begins after @attr. This is so that, after the first call to=
 one
+ * of the search attribute functions, we can call the function again, with=
out
+ * any modification of the search context, to automagically get the next
+ * matching attribute.
+ */
+struct ntfs_attr_search_ctx {
+	struct mft_record *mrec;
+	bool mapped_mrec;
+	struct attr_record *attr;
+	bool is_first;
+	struct ntfs_inode *ntfs_ino;
+	struct attr_list_entry *al_entry;
+	struct ntfs_inode *base_ntfs_ino;
+	struct mft_record *base_mrec;
+	bool mapped_base_mrec;
+	struct attr_record *base_attr;
+};
+
+enum {                  /* ways of processing holes when expanding */
+	HOLES_NO,
+	HOLES_OK,
+};
+
+int ntfs_map_runlist_nolock(struct ntfs_inode *ni, s64 vcn,
+		struct ntfs_attr_search_ctx *ctx);
+int ntfs_map_runlist(struct ntfs_inode *ni, s64 vcn);
+s64 ntfs_attr_vcn_to_lcn_nolock(struct ntfs_inode *ni, const s64 vcn,
+		const bool write_locked);
+struct runlist_element *ntfs_attr_find_vcn_nolock(struct ntfs_inode *ni,
+		const s64 vcn, struct ntfs_attr_search_ctx *ctx);
+struct runlist_element *__ntfs_attr_find_vcn_nolock(struct runlist *runlis=
t,
+		const s64 vcn);
+int ntfs_attr_map_whole_runlist(struct ntfs_inode *ni);
+int ntfs_attr_lookup(const __le32 type, const __le16 *name,
+		const u32 name_len, const u32 ic,
+		const s64 lowest_vcn, const u8 *val, const u32 val_len,
+		struct ntfs_attr_search_ctx *ctx);
+int load_attribute_list(struct ntfs_inode *base_ni,
+			       u8 *al_start, const s64 size);
+
+static inline s64 ntfs_attr_size(const struct attr_record *a)
+{
+	if (!a->non_resident)
+		return (s64)le32_to_cpu(a->data.resident.value_length);
+	return le64_to_cpu(a->data.non_resident.data_size);
+}
+
+void ntfs_attr_reinit_search_ctx(struct ntfs_attr_search_ctx *ctx);
+struct ntfs_attr_search_ctx *ntfs_attr_get_search_ctx(struct ntfs_inode *n=
i,
+		struct mft_record *mrec);
+void ntfs_attr_put_search_ctx(struct ntfs_attr_search_ctx *ctx);
+int ntfs_attr_size_bounds_check(const struct ntfs_volume *vol,
+		const __le32 type, const s64 size);
+int ntfs_attr_can_be_resident(const struct ntfs_volume *vol,
+		const __le32 type);
+int ntfs_attr_map_cluster(struct ntfs_inode *ni, s64 vcn_start, s64 *lcn_s=
tart,
+		s64 *lcn_count, s64 max_clu_count, bool *balloc, bool update_mp, bool sk=
ip_holes);
+int ntfs_attr_record_resize(struct mft_record *m, struct attr_record *a, u=
32 new_size);
+int ntfs_resident_attr_value_resize(struct mft_record *m, struct attr_reco=
rd *a,
+		const u32 new_size);
+int ntfs_attr_make_non_resident(struct ntfs_inode *ni, const u32 data_size=
);
+int ntfs_attr_set(struct ntfs_inode *ni, const s64 ofs, const s64 cnt,
+		const u8 val);
+int ntfs_attr_set_initialized_size(struct ntfs_inode *ni, loff_t new_size);
+int ntfs_attr_open(struct ntfs_inode *ni, const __le32 type,
+		__le16 *name, u32 name_len);
+void ntfs_attr_close(struct ntfs_inode *n);
+int ntfs_attr_fallocate(struct ntfs_inode *ni, loff_t start, loff_t byte_l=
en, bool keep_size);
+int ntfs_non_resident_attr_insert_range(struct ntfs_inode *ni, s64 start_v=
cn, s64 len);
+int ntfs_non_resident_attr_collapse_range(struct ntfs_inode *ni, s64 start=
_vcn, s64 len);
+int ntfs_non_resident_attr_punch_hole(struct ntfs_inode *ni, s64 start_vcn=
, s64 len);
+int __ntfs_attr_truncate_vfs(struct ntfs_inode *ni, const s64 newsize,
+		const s64 i_size);
+int ntfs_attr_expand(struct ntfs_inode *ni, const s64 newsize, const s64 p=
realloc_size);
+int ntfs_attr_truncate_i(struct ntfs_inode *ni, const s64 newsize, unsigne=
d int holes);
+int ntfs_attr_truncate(struct ntfs_inode *ni, const s64 newsize);
+int ntfs_attr_rm(struct ntfs_inode *ni);
+int ntfs_attr_exist(struct ntfs_inode *ni, const __le32 type, __le16 *name,
+		u32 name_len);
+int ntfs_attr_remove(struct ntfs_inode *ni, const __le32 type, __le16 *nam=
e,
+		u32 name_len);
+int ntfs_attr_record_rm(struct ntfs_attr_search_ctx *ctx);
+int ntfs_attr_record_move_to(struct ntfs_attr_search_ctx *ctx, struct ntfs=
_inode *ni);
+int ntfs_attr_add(struct ntfs_inode *ni, __le32 type,
+		__le16 *name, u8 name_len, u8 *val, s64 size);
+int ntfs_attr_record_move_away(struct ntfs_attr_search_ctx *ctx, int extra=
);
+char *ntfs_attr_name_get(const struct ntfs_volume *vol, const __le16 *unam=
e,
+		const int uname_len);
+void ntfs_attr_name_free(unsigned char **name);
+void *ntfs_attr_readall(struct ntfs_inode *ni, const __le32 type,
+		__le16 *name, u32 name_len, s64 *data_size);
+int ntfs_resident_attr_record_add(struct ntfs_inode *ni, __le32 type,
+		__le16 *name, u8 name_len, u8 *val, u32 size,
+		__le16 flags);
+int ntfs_attr_update_mapping_pairs(struct ntfs_inode *ni, s64 from_vcn);
+struct runlist_element *ntfs_attr_vcn_to_rl(struct ntfs_inode *ni, s64 vcn=
, s64 *lcn);
+
+/**
+ * ntfs_attrs_walk - syntactic sugar for walking all attributes in an inode
+ * @ctx:	initialised attribute search context
+ *
+ * Syntactic sugar for walking attributes in an inode.
+ *
+ * Return 0 on success and -1 on error with errno set to the error code fr=
om
+ * ntfs_attr_lookup().
+ *
+ * Example: When you want to enumerate all attributes in an open ntfs inode
+ *	    @ni, you can simply do:
+ *
+ *	int err;
+ *	struct ntfs_attr_search_ctx *ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+ *	if (!ctx)
+ *		// Error code is in errno. Handle this case.
+ *	while (!(err =3D ntfs_attrs_walk(ctx))) {
+ *		struct attr_record *attr =3D ctx->attr;
+ *		// attr now contains the next attribute. Do whatever you want
+ *		// with it and then just continue with the while loop.
+ *	}
+ *	if (err && errno !=3D ENOENT)
+ *		// Ooops. An error occurred! You should handle this case.
+ *	// Now finished with all attributes in the inode.
+ */
+static inline int ntfs_attrs_walk(struct ntfs_attr_search_ctx *ctx)
+{
+	return ntfs_attr_lookup(AT_UNUSED, NULL, 0, CASE_SENSITIVE, 0,
+			NULL, 0, ctx);
+}
+#endif /* _LINUX_NTFS_ATTRIB_H */
diff --git a/fs/ntfsplus/attrlist.h b/fs/ntfsplus/attrlist.h
new file mode 100644
index 000000000000..d0eadc5db1b0
--- /dev/null
+++ b/fs/ntfsplus/attrlist.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Exports for attribute list attribute handling.
+ * Originated from Linux-NTFS project.
+ *
+ * Copyright (c) 2004 Anton Altaparmakov
+ * Copyright (c) 2004 Yura Pakhuchiy
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _NTFS_ATTRLIST_H
+#define _NTFS_ATTRLIST_H
+
+#include "attrib.h"
+
+int ntfs_attrlist_need(struct ntfs_inode *ni);
+int ntfs_attrlist_entry_add(struct ntfs_inode *ni, struct attr_record *att=
r);
+int ntfs_attrlist_entry_rm(struct ntfs_attr_search_ctx *ctx);
+int ntfs_attrlist_update(struct ntfs_inode *base_ni);
+
+#endif /* defined _NTFS_ATTRLIST_H */
diff --git a/fs/ntfsplus/bitmap.h b/fs/ntfsplus/bitmap.h
new file mode 100644
index 000000000000..d58b3ebe5944
--- /dev/null
+++ b/fs/ntfsplus/bitmap.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for NTFS kernel bitmap handling.  Part of the Linux-NTFS
+ * project.
+ *
+ * Copyright (c) 2004 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_BITMAP_H
+#define _LINUX_NTFS_BITMAP_H
+
+#include <linux/fs.h>
+
+#include "volume.h"
+
+int ntfsp_trim_fs(struct ntfs_volume *vol, struct fstrim_range *range);
+int __ntfs_bitmap_set_bits_in_run(struct inode *vi, const s64 start_bit,
+		const s64 count, const u8 value, const bool is_rollback);
+
+/**
+ * ntfs_bitmap_set_bits_in_run - set a run of bits in a bitmap to a value
+ * @vi:			vfs inode describing the bitmap
+ * @start_bit:		first bit to set
+ * @count:		number of bits to set
+ * @value:		value to set the bits to (i.e. 0 or 1)
+ *
+ * Set @count bits starting at bit @start_bit in the bitmap described by t=
he
+ * vfs inode @vi to @value, where @value is either 0 or 1.
+ */
+static inline int ntfs_bitmap_set_bits_in_run(struct inode *vi,
+		const s64 start_bit, const s64 count, const u8 value)
+{
+	return __ntfs_bitmap_set_bits_in_run(vi, start_bit, count, value,
+			false);
+}
+
+/**
+ * ntfs_bitmap_set_run - set a run of bits in a bitmap
+ * @vi:		vfs inode describing the bitmap
+ * @start_bit:	first bit to set
+ * @count:	number of bits to set
+ *
+ * Set @count bits starting at bit @start_bit in the bitmap described by t=
he
+ * vfs inode @vi.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static inline int ntfs_bitmap_set_run(struct inode *vi, const s64 start_bi=
t,
+		const s64 count)
+{
+	return ntfs_bitmap_set_bits_in_run(vi, start_bit, count, 1);
+}
+
+/**
+ * ntfs_bitmap_clear_run - clear a run of bits in a bitmap
+ * @vi:		vfs inode describing the bitmap
+ * @start_bit:	first bit to clear
+ * @count:	number of bits to clear
+ *
+ * Clear @count bits starting at bit @start_bit in the bitmap described by=
 the
+ * vfs inode @vi.
+ */
+static inline int ntfs_bitmap_clear_run(struct inode *vi, const s64 start_=
bit,
+		const s64 count)
+{
+	return ntfs_bitmap_set_bits_in_run(vi, start_bit, count, 0);
+}
+
+/**
+ * ntfs_bitmap_set_bit - set a bit in a bitmap
+ * @vi:		vfs inode describing the bitmap
+ * @bit:	bit to set
+ *
+ * Set bit @bit in the bitmap described by the vfs inode @vi.
+ */
+static inline int ntfs_bitmap_set_bit(struct inode *vi, const s64 bit)
+{
+	return ntfs_bitmap_set_run(vi, bit, 1);
+}
+
+/**
+ * ntfs_bitmap_clear_bit - clear a bit in a bitmap
+ * @vi:		vfs inode describing the bitmap
+ * @bit:	bit to clear
+ *
+ * Clear bit @bit in the bitmap described by the vfs inode @vi.
+ */
+static inline int ntfs_bitmap_clear_bit(struct inode *vi, const s64 bit)
+{
+	return ntfs_bitmap_clear_run(vi, bit, 1);
+}
+
+#endif /* defined _LINUX_NTFS_BITMAP_H */
diff --git a/fs/ntfsplus/collate.h b/fs/ntfsplus/collate.h
new file mode 100644
index 000000000000..cf04508340f0
--- /dev/null
+++ b/fs/ntfsplus/collate.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for NTFS kernel collation handling.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2004 Anton Altaparmakov
+ *
+ * Part of this file is based on code from the NTFS-3G project.
+ * and is copyrighted by the respective authors below:
+ * Copyright (c) 2004 Anton Altaparmakov
+ * Copyright (c) 2005 Yura Pakhuchiy
+ */
+
+#ifndef _LINUX_NTFS_COLLATE_H
+#define _LINUX_NTFS_COLLATE_H
+
+#include "volume.h"
+
+static inline bool ntfs_is_collation_rule_supported(__le32 cr)
+{
+	int i;
+
+	if (unlikely(cr !=3D COLLATION_BINARY && cr !=3D COLLATION_NTOFS_ULONG &&
+		     cr !=3D COLLATION_FILE_NAME) && cr !=3D COLLATION_NTOFS_ULONGS)
+		return false;
+	i =3D le32_to_cpu(cr);
+	if (likely(((i >=3D 0) && (i <=3D 0x02)) ||
+			((i >=3D 0x10) && (i <=3D 0x13))))
+		return true;
+	return false;
+}
+
+int ntfs_collate(struct ntfs_volume *vol, __le32 cr,
+		const void *data1, const int data1_len,
+		const void *data2, const int data2_len);
+
+#endif /* _LINUX_NTFS_COLLATE_H */
diff --git a/fs/ntfsplus/dir.h b/fs/ntfsplus/dir.h
new file mode 100644
index 000000000000..5abe21c3d938
--- /dev/null
+++ b/fs/ntfsplus/dir.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for directory handling in NTFS Linux kernel driver.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2002-2004 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_DIR_H
+#define _LINUX_NTFS_DIR_H
+
+#include "inode.h"
+
+/*
+ * ntfs_name is used to return the file name to the caller of
+ * ntfs_lookup_inode_by_name() in order for the caller (namei.c::ntfs_look=
up())
+ * to be able to deal with dcache aliasing issues.
+ */
+struct ntfs_name {
+	u64 mref;
+	u8 type;
+	u8 len;
+	__le16 name[];
+} __packed;
+
+/* The little endian Unicode string $I30 as a global constant. */
+extern __le16 I30[5];
+
+u64 ntfs_lookup_inode_by_name(struct ntfs_inode *dir_ni,
+		const __le16 *uname, const int uname_len, struct ntfs_name **res);
+int ntfs_check_empty_dir(struct ntfs_inode *ni, struct mft_record *ni_mrec=
);
+
+#endif /* _LINUX_NTFS_FS_DIR_H */
diff --git a/fs/ntfsplus/ea.h b/fs/ntfsplus/ea.h
new file mode 100644
index 000000000000..2bad7c0383d7
--- /dev/null
+++ b/fs/ntfsplus/ea.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#define NTFS_EA_UID	BIT(1)
+#define NTFS_EA_GID	BIT(2)
+#define NTFS_EA_MODE	BIT(3)
+
+extern const struct xattr_handler *const ntfsp_xattr_handlers[];
+
+int ntfs_ea_set_wsl_not_symlink(struct ntfs_inode *ni, mode_t mode, dev_t =
dev);
+int ntfs_ea_get_wsl_inode(struct inode *inode, dev_t *rdevp, unsigned int =
flags);
+int ntfs_ea_set_wsl_inode(struct inode *inode, dev_t rdev, __le16 *ea_size,
+		unsigned int flags);
+ssize_t ntfsp_listxattr(struct dentry *dentry, char *buffer, size_t size);
+
+#ifdef CONFIG_NTFSPLUS_FS_POSIX_ACL
+struct posix_acl *ntfsp_get_acl(struct mnt_idmap *idmap, struct dentry *de=
ntry,
+			       int type);
+int ntfsp_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
+		 struct posix_acl *acl, int type);
+int ntfsp_init_acl(struct mnt_idmap *idmap, struct inode *inode,
+		  struct inode *dir);
+#else
+#define ntfsp_get_acl NULL
+#define ntfsp_set_acl NULL
+#endif
diff --git a/fs/ntfsplus/index.h b/fs/ntfsplus/index.h
new file mode 100644
index 000000000000..b5c719910ab6
--- /dev/null
+++ b/fs/ntfsplus/index.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for NTFS kernel index handling.  Part of the Linux-NTFS
+ * project.
+ *
+ * Copyright (c) 2004 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_INDEX_H
+#define _LINUX_NTFS_INDEX_H
+
+#include <linux/fs.h>
+
+#include "attrib.h"
+#include "mft.h"
+#include "aops.h"
+
+#define  VCN_INDEX_ROOT_PARENT  ((s64)-2)
+
+#define MAX_PARENT_VCN	32
+
+/**
+ * @idx_ni:	index inode containing the @entry described by this context
+ * @entry:	index entry (points into @ir or @ia)
+ * @data:	index entry data (points into @entry)
+ * @data_len:	length in bytes of @data
+ * @is_in_root:	'true' if @entry is in @ir and 'false' if it is in @ia
+ * @ir:		index root if @is_in_root and NULL otherwise
+ * @actx:	attribute search context if @is_in_root and NULL otherwise
+ * @base_ni:	base inode if @is_in_root and NULL otherwise
+ * @ia:		index block if @is_in_root is 'false' and NULL otherwise
+ * @page:	page if @is_in_root is 'false' and NULL otherwise
+ *
+ * @idx_ni is the index inode this context belongs to.
+ *
+ * @entry is the index entry described by this context.  @data and @data_l=
en
+ * are the index entry data and its length in bytes, respectively.  @data
+ * simply points into @entry.  This is probably what the user is intereste=
d in.
+ *
+ * If @is_in_root is 'true', @entry is in the index root attribute @ir des=
cribed
+ * by the attribute search context @actx and the base inode @base_ni.  @ia=
 and
+ * @page are NULL in this case.
+ *
+ * If @is_in_root is 'false', @entry is in the index allocation attribute =
and @ia
+ * and @page point to the index allocation block and the mapped, locked pa=
ge it
+ * is in, respectively.  @ir, @actx and @base_ni are NULL in this case.
+ *
+ * To obtain a context call ntfs_index_ctx_get().
+ *
+ * We use this context to allow ntfs_index_lookup() to return the found in=
dex
+ * @entry and its @data without having to allocate a buffer and copy the @=
entry
+ * and/or its @data into it.
+ *
+ * When finished with the @entry and its @data, call ntfs_index_ctx_put() =
to
+ * free the context and other associated resources.
+ *
+ * If the index entry was modified, call flush_dcache_index_entry_page()
+ * immediately after the modification and either ntfs_index_entry_mark_dir=
ty()
+ * or ntfs_index_entry_write() before the call to ntfs_index_ctx_put() to
+ * ensure that the changes are written to disk.
+ */
+struct ntfs_index_context {
+	struct ntfs_inode *idx_ni;
+	__le16 *name;
+	u32 name_len;
+	struct index_entry *entry;
+	__le32 cr;
+	void *data;
+	u16 data_len;
+	bool is_in_root;
+	struct index_root *ir;
+	struct ntfs_attr_search_ctx *actx;
+	struct index_block *ib;
+	struct ntfs_inode *base_ni;
+	struct index_block *ia;
+	struct page *page;
+	struct ntfs_inode *ia_ni;
+	int parent_pos[MAX_PARENT_VCN];  /* parent entries' positions */
+	s64 parent_vcn[MAX_PARENT_VCN]; /* entry's parent nodes */
+	int pindex;          /* maximum it's the number of the parent nodes  */
+	bool ib_dirty;
+	u32 block_size;
+	u8 vcn_size_bits;
+	bool sync_write;
+};
+
+int ntfs_index_entry_inconsistent(struct ntfs_index_context *icx, struct n=
tfs_volume *vol,
+		const struct index_entry *ie, __le32 collation_rule, u64 inum);
+struct ntfs_index_context *ntfs_index_ctx_get(struct ntfs_inode *ni, __le1=
6 *name,
+		u32 name_len);
+void ntfs_index_ctx_put(struct ntfs_index_context *ictx);
+int ntfs_index_lookup(const void *key, const int key_len,
+		struct ntfs_index_context *ictx);
+
+/**
+ * ntfs_index_entry_flush_dcache_page - flush_dcache_page() for index entr=
ies
+ * @ictx:	ntfs index context describing the index entry
+ *
+ * Call flush_dcache_page() for the page in which an index entry resides.
+ *
+ * This must be called every time an index entry is modified, just after t=
he
+ * modification.
+ *
+ * If the index entry is in the index root attribute, simply flush the page
+ * containing the mft record containing the index root attribute.
+ *
+ * If the index entry is in an index block belonging to the index allocati=
on
+ * attribute, simply flush the page cache page containing the index block.
+ */
+static inline void ntfs_index_entry_flush_dcache_page(struct ntfs_index_co=
ntext *ictx)
+{
+	if (!ictx->is_in_root)
+		flush_dcache_page(ictx->page);
+}
+
+void ntfs_index_entry_mark_dirty(struct ntfs_index_context *ictx);
+int ntfs_index_add_filename(struct ntfs_inode *ni, struct file_name_attr *=
fn, u64 mref);
+int ntfs_index_remove(struct ntfs_inode *ni, const void *key, const int ke=
ylen);
+struct ntfs_inode *ntfs_ia_open(struct ntfs_index_context *icx, struct ntf=
s_inode *ni);
+struct index_entry *ntfs_index_walk_down(struct index_entry *ie, struct nt=
fs_index_context *ictx);
+struct index_entry *ntfs_index_next(struct index_entry *ie, struct ntfs_in=
dex_context *ictx);
+int ntfs_index_rm(struct ntfs_index_context *icx);
+void ntfs_index_ctx_reinit(struct ntfs_index_context *icx);
+int ntfs_ie_add(struct ntfs_index_context *icx, struct index_entry *ie);
+int ntfs_icx_ib_sync_write(struct ntfs_index_context *icx);
+
+#endif /* _LINUX_NTFS_INDEX_H */
diff --git a/fs/ntfsplus/inode.h b/fs/ntfsplus/inode.h
new file mode 100644
index 000000000000..95fee0fd2ddd
--- /dev/null
+++ b/fs/ntfsplus/inode.h
@@ -0,0 +1,353 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for inode structures NTFS Linux kernel driver. Part of
+ * the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2007 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_INODE_H
+#define _LINUX_NTFS_INODE_H
+
+#include "misc.h"
+
+enum ntfs_inode_mutex_lock_class {
+	NTFS_INODE_MUTEX_PARENT,
+	NTFS_INODE_MUTEX_NORMAL,
+	NTFS_INODE_MUTEX_PARENT_2,
+	NTFS_INODE_MUTEX_NORMAL_2,
+	NTFS_REPARSE_MUTEX_PARENT,
+	NTFS_EA_MUTEX_NORMAL
+};
+
+/*
+ * The NTFS in-memory inode structure. It is just used as an extension to =
the
+ * fields already provided in the VFS inode.
+ */
+struct ntfs_inode {
+	rwlock_t size_lock;	/* Lock serializing access to inode sizes. */
+	unsigned long state;	/*
+				 * NTFS specific flags describing this inode.
+				 * See ntfs_inode_state_bits below.
+				 */
+	__le32 flags;		/* Flags describing the file. (Copy from STANDARD_INFORMAT=
ION) */
+	unsigned long mft_no;	/* Number of the mft record / inode. */
+	u16 seq_no;		/* Sequence number of the mft record. */
+	atomic_t count;		/* Inode reference count for book keeping. */
+	struct ntfs_volume *vol; /* Pointer to the ntfs volume of this inode. */
+
+	/*
+	 * If NInoAttr() is true, the below fields describe the attribute which
+	 * this fake inode belongs to. The actual inode of this attribute is
+	 * pointed to by base_ntfs_ino and nr_extents is always set to -1 (see
+	 * below). For real inodes, we also set the type (AT_DATA for files and
+	 * AT_INDEX_ALLOCATION for directories), with the name =3D NULL and
+	 * name_len =3D 0 for files and name =3D I30 (global constant) and
+	 * name_len =3D 4 for directories.
+	 */
+	__le32 type;		/* Attribute type of this fake inode. */
+	__le16 *name;		/* Attribute name of this fake inode. */
+	u32 name_len;		/* Attribute name length of this fake inode. */
+	struct runlist runlist;	/*
+				 * If state has the NI_NonResident bit set,
+				 * the runlist of the unnamed data attribute
+				 * (if a file) or of the index allocation
+				 * attribute (directory) or of the attribute
+				 * described by the fake inode (if NInoAttr()).
+				 * If runlist.rl is NULL, the runlist has not
+				 * been read in yet or has been unmapped. If
+				 * NI_NonResident is clear, the attribute is
+				 * resident (file and fake inode) or there is
+				 * no $I30 index allocation attribute
+				 * (small directory). In the latter case
+				 * runlist.rl is always NULL.
+				 */
+	s64 lcn_seek_trunc;
+
+	s64 data_size;		/* Copy from the attribute record. */
+	s64 initialized_size;	/* Copy from the attribute record. */
+	s64 allocated_size;	/* Copy from the attribute record. */
+
+	struct timespec64 i_crtime;
+
+	/*
+	 * The following fields are only valid for real inodes and extent
+	 * inodes.
+	 */
+	void *mrec;
+	struct mutex mrec_lock;	/*
+				 * Lock for serializing access to the
+				 * mft record belonging to this inode.
+				 */
+	struct folio *folio;	/*
+				 * The folio containing the mft record of the
+				 * inode. This should only be touched by the
+				 * (un)map_mft_record*() functions.
+				 */
+	int folio_ofs;		/*
+				 * Offset into the folio at which the mft record
+				 * begins. This should only be touched by the
+				 * (un)map_mft_record*() functions.
+				 */
+	s64 mft_lcn[2];		/* s64 number containing the mft record */
+	unsigned int mft_lcn_count;
+
+	/*
+	 * Attribute list support (only for use by the attribute lookup
+	 * functions). Setup during read_inode for all inodes with attribute
+	 * lists. Only valid if NI_AttrList is set in state.
+	 */
+	u32 attr_list_size;	/* Length of attribute list value in bytes. */
+	u8 *attr_list;		/* Attribute list value itself. */
+
+	union {
+		struct { /* It is a directory, $MFT, or an index inode. */
+			u32 block_size;		/* Size of an index block. */
+			u32 vcn_size;		/* Size of a vcn in this index. */
+			__le32 collation_rule;	/* The collation rule for the index. */
+			u8 block_size_bits;	/* Log2 of the above. */
+			u8 vcn_size_bits;	/* Log2 of the above. */
+		} index;
+		struct { /* It is a compressed/sparse file/attribute inode. */
+			s64 size;		/* Copy of compressed_size from $DATA. */
+			u32 block_size;		/* Size of a compression block (cb). */
+			u8 block_size_bits;	/* Log2 of the size of a cb. */
+			u8 block_clusters;	/* Number of clusters per cb. */
+		} compressed;
+	} itype;
+	struct mutex extent_lock;	/* Lock for accessing/modifying the below . */
+	s32 nr_extents;	/*
+			 * For a base mft record, the number of attached extent\
+			 * inodes (0 if none), for extent records and for fake
+			 * inodes describing an attribute this is -1.
+			 */
+	union {		/* This union is only used if nr_extents !=3D 0. */
+		struct ntfs_inode **extent_ntfs_inos;	/*
+							 * For nr_extents > 0, array of
+							 * the ntfs inodes of the extent
+							 * mft records belonging to
+							 * this base inode which have
+							 * been loaded.
+							 */
+		struct ntfs_inode *base_ntfs_ino;	/*
+							 * For nr_extents =3D=3D -1, the
+							 * ntfs inode of the base mft
+							 * record. For fake inodes, the
+							 * real (base) inode to which
+							 * the attribute belongs.
+							 */
+	} ext;
+
+	unsigned int i_dealloc_clusters;
+	char *target;
+};
+
+/*
+ * Defined bits for the state field in the ntfs_inode structure.
+ * (f) =3D files only, (d) =3D directories only, (a) =3D attributes/fake i=
nodes only
+ */
+enum {
+	NI_Dirty,		/* 1: Mft record needs to be written to disk. */
+	NI_AttrListDirty,	/* 1: Mft record contains an attribute list. */
+	NI_AttrList,		/* 1: Mft record contains an attribute list. */
+	NI_AttrListNonResident,	/*
+				 * 1: Attribute list is non-resident. Implies
+				 *    NI_AttrList is set.
+				 */
+
+	NI_Attr,		/*
+				 * 1: Fake inode for attribute i/o.
+				 * 0: Real inode or extent inode.
+				 */
+
+	NI_MstProtected,	/*
+				 * 1: Attribute is protected by MST fixups.
+				 * 0: Attribute is not protected by fixups.
+				 */
+	NI_NonResident,		/*
+				 * 1: Unnamed data attr is non-resident (f).
+				 * 1: Attribute is non-resident (a).
+				 */
+	NI_IndexAllocPresent,	/* 1: $I30 index alloc attr is present (d). */
+	NI_Compressed,		/*
+				 * 1: Unnamed data attr is compressed (f).
+				 * 1: Create compressed files by default (d).
+				 * 1: Attribute is compressed (a).
+				 */
+	NI_Encrypted,		/*
+				 * 1: Unnamed data attr is encrypted (f).
+				 * 1: Create encrypted files by default (d).
+				 * 1: Attribute is encrypted (a).
+				 */
+	NI_Sparse,		/*
+				 * 1: Unnamed data attr is sparse (f).
+				 * 1: Create sparse files by default (d).
+				 * 1: Attribute is sparse (a).
+				 */
+	NI_SparseDisabled,	/* 1: May not create sparse regions. */
+	NI_FullyMapped,
+	NI_FileNameDirty,
+	NI_BeingDeleted,
+	NI_BeingCreated,
+	NI_HasEA,
+	NI_RunlistDirty,
+};
+
+/*
+ * NOTE: We should be adding dirty mft records to a list somewhere and they
+ * should be independent of the (ntfs/vfs) inode structure so that an inod=
e can
+ * be removed but the record can be left dirty for syncing later.
+ */
+
+/*
+ * Macro tricks to expand the NInoFoo(), NInoSetFoo(), and NInoClearFoo()
+ * functions.
+ */
+#define NINO_FNS(flag)						\
+static inline int NIno##flag(struct ntfs_inode *ni)		\
+{								\
+	return test_bit(NI_##flag, &(ni)->state);		\
+}								\
+static inline void NInoSet##flag(struct ntfs_inode *ni)		\
+{								\
+	set_bit(NI_##flag, &(ni)->state);			\
+}								\
+static inline void NInoClear##flag(struct ntfs_inode *ni)	\
+{								\
+	clear_bit(NI_##flag, &(ni)->state);			\
+}
+
+/*
+ * As above for NInoTestSetFoo() and NInoTestClearFoo().
+ */
+#define TAS_NINO_FNS(flag)						\
+static inline int NInoTestSet##flag(struct ntfs_inode *ni)		\
+{									\
+	return test_and_set_bit(NI_##flag, &(ni)->state);		\
+}									\
+static inline int NInoTestClear##flag(struct ntfs_inode *ni)		\
+{									\
+	return test_and_clear_bit(NI_##flag, &(ni)->state);		\
+}
+
+/* Emit the ntfs inode bitops functions. */
+NINO_FNS(Dirty)
+TAS_NINO_FNS(Dirty)
+NINO_FNS(AttrList)
+NINO_FNS(AttrListDirty)
+NINO_FNS(AttrListNonResident)
+NINO_FNS(Attr)
+NINO_FNS(MstProtected)
+NINO_FNS(NonResident)
+NINO_FNS(IndexAllocPresent)
+NINO_FNS(Compressed)
+NINO_FNS(Encrypted)
+NINO_FNS(Sparse)
+NINO_FNS(SparseDisabled)
+NINO_FNS(FullyMapped)
+NINO_FNS(FileNameDirty)
+TAS_NINO_FNS(FileNameDirty)
+NINO_FNS(BeingDeleted)
+NINO_FNS(HasEA)
+NINO_FNS(RunlistDirty)
+
+/*
+ * The full structure containing a ntfs_inode and a vfs struct inode. Used=
 for
+ * all real and fake inodes but not for extent inodes which lack the vfs s=
truct
+ * inode.
+ */
+struct big_ntfs_inode {
+	struct ntfs_inode ntfs_inode;
+	struct inode vfs_inode;		/* The vfs inode structure. */
+};
+
+/**
+ * NTFS_I - return the ntfs inode given a vfs inode
+ * @inode:	VFS inode
+ *
+ * NTFS_I() returns the ntfs inode associated with the VFS @inode.
+ */
+static inline struct ntfs_inode *NTFS_I(struct inode *inode)
+{
+	return (struct ntfs_inode *)container_of(inode, struct big_ntfs_inode, vf=
s_inode);
+}
+
+static inline struct inode *VFS_I(struct ntfs_inode *ni)
+{
+	return &((struct big_ntfs_inode *)ni)->vfs_inode;
+}
+
+/**
+ * ntfs_attr - ntfs in memory attribute structure
+ *
+ * This structure exists only to provide a small structure for the
+ * ntfs_{attr_}iget()/ntfs_test_inode()/ntfs_init_locked_inode() mechanism.
+ *
+ * NOTE: Elements are ordered by size to make the structure as compact as
+ * possible on all architectures.
+ */
+struct ntfs_attr {
+	unsigned long mft_no;
+	__le16 *name;
+	u32 name_len;
+	__le32 type;
+	unsigned long state;
+};
+
+int ntfs_test_inode(struct inode *vi, void *data);
+struct inode *ntfs_iget(struct super_block *sb, unsigned long mft_no);
+struct inode *ntfs_attr_iget(struct inode *base_vi, __le32 type,
+		__le16 *name, u32 name_len);
+struct inode *ntfs_index_iget(struct inode *base_vi, __le16 *name,
+		u32 name_len);
+struct inode *ntfs_alloc_big_inode(struct super_block *sb);
+void ntfs_free_big_inode(struct inode *inode);
+int ntfs_drop_big_inode(struct inode *inode);
+void ntfs_evict_big_inode(struct inode *vi);
+void __ntfs_init_inode(struct super_block *sb, struct ntfs_inode *ni);
+
+static inline void ntfs_init_big_inode(struct inode *vi)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+
+	ntfs_debug("Entering.");
+	__ntfs_init_inode(vi->i_sb, ni);
+	ni->mft_no =3D vi->i_ino;
+}
+
+struct ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
+		unsigned long mft_no);
+void ntfs_clear_extent_inode(struct ntfs_inode *ni);
+int ntfs_read_inode_mount(struct inode *vi);
+int ntfs_show_options(struct seq_file *sf, struct dentry *root);
+int ntfs_truncate_vfs(struct inode *vi, loff_t new_size, loff_t i_size);
+
+int ntfsp_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
+		 struct iattr *attr);
+int ntfsp_getattr(struct mnt_idmap *idmap, const struct path *path,
+		struct kstat *stat, unsigned int request_mask,
+		unsigned int query_flags);
+
+int __ntfs_write_inode(struct inode *vi, int sync);
+int ntfs_inode_attach_all_extents(struct ntfs_inode *ni);
+int ntfs_inode_add_attrlist(struct ntfs_inode *ni);
+void ntfs_destroy_ext_inode(struct ntfs_inode *ni);
+int ntfs_inode_free_space(struct ntfs_inode *ni, int size);
+s64 ntfs_inode_attr_pread(struct inode *vi, s64 pos, s64 count, u8 *buf);
+s64 ntfs_inode_attr_pwrite(struct inode *vi, s64 pos, s64 count, u8 *buf,
+		bool sync);
+int ntfs_inode_close(struct ntfs_inode *ni);
+
+static inline void ntfs_commit_inode(struct inode *vi)
+{
+	__ntfs_write_inode(vi, 1);
+}
+
+int ntfs_inode_sync_filename(struct ntfs_inode *ni);
+int ntfs_extend_initialized_size(struct inode *vi, const loff_t offset,
+		const loff_t new_size);
+void ntfs_set_vfs_operations(struct inode *inode, mode_t mode, dev_t dev);
+
+#endif /* _LINUX_NTFS_INODE_H */
diff --git a/fs/ntfsplus/layout.h b/fs/ntfsplus/layout.h
new file mode 100644
index 000000000000..d0067e4c975a
--- /dev/null
+++ b/fs/ntfsplus/layout.h
@@ -0,0 +1,2288 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * All NTFS associated on-disk structures. Part of the Linux-NTFS
+ * project.
+ *
+ * Copyright (c) 2001-2005 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ */
+
+#ifndef _LINUX_NTFS_LAYOUT_H
+#define _LINUX_NTFS_LAYOUT_H
+
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/list.h>
+#include <asm/byteorder.h>
+
+/* The NTFS oem_id "NTFS    " */
+#define magicNTFS	cpu_to_le64(0x202020205346544eULL)
+
+/*
+ * Location of bootsector on partition:
+ *	The standard NTFS_BOOT_SECTOR is on sector 0 of the partition.
+ *	On NT4 and above there is one backup copy of the boot sector to
+ *	be found on the last sector of the partition (not normally accessible
+ *	from within Windows as the bootsector contained number of sectors
+ *	value is one less than the actual value!).
+ *	On versions of NT 3.51 and earlier, the backup copy was located at
+ *	number of sectors/2 (integer divide), i.e. in the middle of the volume.
+ */
+
+/*
+ * BIOS parameter block (bpb) structure.
+ */
+struct bios_parameter_block {
+	__le16 bytes_per_sector;	/* Size of a sector in bytes. */
+	u8  sectors_per_cluster;	/* Size of a cluster in sectors. */
+	__le16 reserved_sectors;	/* zero */
+	u8  fats;			/* zero */
+	__le16 root_entries;		/* zero */
+	__le16 sectors;			/* zero */
+	u8  media_type;			/* 0xf8 =3D hard disk */
+	__le16 sectors_per_fat;		/* zero */
+	__le16 sectors_per_track;		/* irrelevant */
+	__le16 heads;			/* irrelevant */
+	__le32 hidden_sectors;		/* zero */
+	__le32 large_sectors;		/* zero */
+} __packed;
+
+/*
+ * NTFS boot sector structure.
+ */
+struct ntfs_boot_sector {
+	u8  jump[3];			/* Irrelevant (jump to boot up code).*/
+	__le64 oem_id;			/* Magic "NTFS    ". */
+	struct bios_parameter_block bpb;	/* See BIOS_PARAMETER_BLOCK. */
+	u8  unused[4];			/*
+					 * zero, NTFS diskedit.exe states that
+					 * this is actually:
+					 *	__u8 physical_drive;	// 0x80
+					 *	__u8 current_head;	// zero
+					 *	__u8 extended_boot_signature;
+					 *				// 0x80
+					 *	__u8 unused;		// zero
+					 */
+	__le64 number_of_sectors;	/*
+					 * Number of sectors in volume. Gives
+					 * maximum volume size of 2^63 sectors.
+					 * Assuming standard sector size of 512
+					 * bytes, the maximum byte size is
+					 * approx. 4.7x10^21 bytes. (-;
+					 */
+	__le64 mft_lcn;			/* Cluster location of mft data. */
+	__le64 mftmirr_lcn;		/* Cluster location of copy of mft. */
+	s8  clusters_per_mft_record;	/* Mft record size in clusters. */
+	u8  reserved0[3];		/* zero */
+	s8  clusters_per_index_record;	/* Index block size in clusters. */
+	u8  reserved1[3];		/* zero */
+	__le64 volume_serial_number;	/* Irrelevant (serial number). */
+	__le32 checksum;			/* Boot sector checksum. */
+	u8  bootstrap[426];		/* Irrelevant (boot up code). */
+	__le16 end_of_sector_marker;	/*
+					 * End of bootsector magic. Always is
+					 * 0xaa55 in little endian.
+					 */
+/* sizeof() =3D 512 (0x200) bytes */
+} __packed;
+
+/*
+ * Magic identifiers present at the beginning of all ntfs record containing
+ * records (like mft records for example).
+ */
+enum {
+	/* Found in $MFT/$DATA. */
+	magic_FILE =3D cpu_to_le32(0x454c4946), /* Mft entry. */
+	magic_INDX =3D cpu_to_le32(0x58444e49), /* Index buffer. */
+	magic_HOLE =3D cpu_to_le32(0x454c4f48), /* ? (NTFS 3.0+?) */
+
+	/* Found in LogFile/DATA. */
+	magic_RSTR =3D cpu_to_le32(0x52545352), /* Restart page. */
+	magic_RCRD =3D cpu_to_le32(0x44524352), /* Log record page. */
+
+	/* Found in LogFile/DATA.  (May be found in $MFT/$DATA, also?) */
+	magic_CHKD =3D cpu_to_le32(0x444b4843), /* Modified by chkdsk. */
+
+	/* Found in all ntfs record containing records. */
+	magic_BAAD =3D cpu_to_le32(0x44414142), /*
+					       * Failed multi sector
+					       * transfer was detected.
+					       */
+	/*
+	 * Found in LogFile/DATA when a page is full of 0xff bytes and is
+	 * thus not initialized.  Page must be initialized before using it.
+	 */
+	magic_empty =3D cpu_to_le32(0xffffffff) /* Record is empty. */
+};
+
+/*
+ * Generic magic comparison macros. Finally found a use for the ## preproc=
essor
+ * operator! (-8
+ */
+
+static inline bool __ntfs_is_magic(__le32 x, __le32 r)
+{
+	return (x =3D=3D r);
+}
+#define ntfs_is_magic(x, m)	__ntfs_is_magic(x, magic_##m)
+
+static inline bool __ntfs_is_magicp(__le32 *p, __le32 r)
+{
+	return (*p =3D=3D r);
+}
+#define ntfs_is_magicp(p, m)	__ntfs_is_magicp(p, magic_##m)
+
+/*
+ * Specialised magic comparison macros for the NTFS_RECORD_TYPEs defined a=
bove.
+ */
+#define ntfs_is_file_record(x)		(ntfs_is_magic(x, FILE))
+#define ntfs_is_file_recordp(p)		(ntfs_is_magicp(p, FILE))
+#define ntfs_is_mft_record(x)		(ntfs_is_file_record(x))
+#define ntfs_is_mft_recordp(p)		(ntfs_is_file_recordp(p))
+#define ntfs_is_indx_record(x)		(ntfs_is_magic(x, INDX))
+#define ntfs_is_indx_recordp(p)		(ntfs_is_magicp(p, INDX))
+#define ntfs_is_hole_record(x)		(ntfs_is_magic(x, HOLE))
+#define ntfs_is_hole_recordp(p)		(ntfs_is_magicp(p, HOLE))
+
+#define ntfs_is_rstr_record(x)		(ntfs_is_magic(x, RSTR))
+#define ntfs_is_rstr_recordp(p)		(ntfs_is_magicp(p, RSTR))
+#define ntfs_is_rcrd_record(x)		(ntfs_is_magic(x, RCRD))
+#define ntfs_is_rcrd_recordp(p)		(ntfs_is_magicp(p, RCRD))
+
+#define ntfs_is_chkd_record(x)		(ntfs_is_magic(x, CHKD))
+#define ntfs_is_chkd_recordp(p)		(ntfs_is_magicp(p, CHKD))
+
+#define ntfs_is_baad_record(x)		(ntfs_is_magic(x, BAAD))
+#define ntfs_is_baad_recordp(p)		(ntfs_is_magicp(p, BAAD))
+
+#define ntfs_is_empty_record(x)		(ntfs_is_magic(x, empty))
+#define ntfs_is_empty_recordp(p)	(ntfs_is_magicp(p, empty))
+
+/*
+ * The Update Sequence Array (usa) is an array of the __le16 values which =
belong
+ * to the end of each sector protected by the update sequence record in wh=
ich
+ * this array is contained. Note that the first entry is the Update Sequen=
ce
+ * Number (usn), a cyclic counter of how many times the protected record h=
as
+ * been written to disk. The values 0 and -1 (ie. 0xffff) are not used. All
+ * last le16's of each sector have to be equal to the usn (during reading)=
 or
+ * are set to it (during writing). If they are not, an incomplete multi se=
ctor
+ * transfer has occurred when the data was written.
+ * The maximum size for the update sequence array is fixed to:
+ *	maximum size =3D usa_ofs + (usa_count * 2) =3D 510 bytes
+ * The 510 bytes comes from the fact that the last __le16 in the array has=
 to
+ * (obviously) finish before the last __le16 of the first 512-byte sector.
+ * This formula can be used as a consistency check in that usa_ofs +
+ * (usa_count * 2) has to be less than or equal to 510.
+ */
+struct ntfs_record {
+	__le32 magic;		/*
+				 * A four-byte magic identifying the record
+				 * type and/or status.
+				 */
+	__le16 usa_ofs;		/*
+				 * Offset to the Update Sequence Array (usa)
+				 * from the start of the ntfs record.
+				 */
+	__le16 usa_count;	/*
+				 * Number of __le16 sized entries in the usa
+				 * including the Update Sequence Number (usn),
+				 * thus the number of fixups is the usa_count
+				 * minus 1.
+				 */
+} __packed;
+
+/*
+ * System files mft record numbers. All these files are always marked as u=
sed
+ * in the bitmap attribute of the mft; presumably in order to avoid accide=
ntal
+ * allocation for random other mft records. Also, the sequence number for =
each
+ * of the system files is always equal to their mft record number and it is
+ * never modified.
+ */
+enum {
+	FILE_MFT       =3D 0,	/*
+				 * Master file table (mft). Data attribute
+				 * contains the entries and bitmap attribute
+				 * records which ones are in use (bit=3D=3D1).
+				 */
+	FILE_MFTMirr   =3D 1,	/* Mft mirror: copy of first four mft records
+				 * in data attribute. If cluster size > 4kiB,
+				 * copy of first N mft records, with
+				 *     N =3D cluster_size / mft_record_size.
+				 */
+	FILE_LogFile   =3D 2,	/* Journalling log in data attribute. */
+	FILE_Volume    =3D 3,	/*
+				 * Volume name attribute and volume information
+				 * attribute (flags and ntfs version). Windows
+				 * refers to this file as volume DASD (Direct
+				 * Access Storage Device).
+				 */
+	FILE_AttrDef   =3D 4,	/*
+				 * Array of attribute definitions in data
+				 * attribute.
+				 */
+	FILE_root      =3D 5,	/* Root directory. */
+	FILE_Bitmap    =3D 6,	/*
+				 * Allocation bitmap of all clusters (lcns) in
+				 * data attribute.
+				 */
+	FILE_Boot      =3D 7,	/*
+				 * Boot sector (always at cluster 0) in data
+				 * attribute.
+				 */
+	FILE_BadClus   =3D 8,	/*
+				 * Contains all bad clusters in the non-resident
+				 * data attribute.
+				 */
+	FILE_Secure    =3D 9,	/*
+				 * Shared security descriptors in data attribute
+				 * and two indexes into the descriptors.
+				 * Appeared in Windows 2000. Before that, this
+				 * file was named $Quota but was unused.
+				 */
+	FILE_UpCase    =3D 10,	/*
+				 * Uppercase equivalents of all 65536 Unicode
+				 * characters in data attribute.
+				 */
+	FILE_Extend    =3D 11,	/*
+				 * Directory containing other system files (eg.
+				 * $ObjId, $Quota, $Reparse and $UsnJrnl). This
+				 * is new to NTFS3.0.
+				 */
+	FILE_reserved12 =3D 12,	/* Reserved for future use (records 12-15). */
+	FILE_reserved13 =3D 13,
+	FILE_reserved14 =3D 14,
+	FILE_reserved15 =3D 15,
+	FILE_first_user =3D 16,	/*
+				 * First user file, used as test limit for
+				 * whether to allow opening a file or not.
+				 */
+};
+
+/*
+ * These are the so far known MFT_RECORD_* flags (16-bit) which contain
+ * information about the mft record in which they are present.
+ */
+enum {
+	MFT_RECORD_IN_USE		=3D cpu_to_le16(0x0001),
+	MFT_RECORD_IS_DIRECTORY		=3D cpu_to_le16(0x0002),
+	MFT_RECORD_IS_4			=3D cpu_to_le16(0x0004),
+	MFT_RECORD_IS_VIEW_INDEX	=3D cpu_to_le16(0x0008),
+	MFT_REC_SPACE_FILLER		=3D 0xffff, /*Just to make flags 16-bit.*/
+} __packed;
+
+/*
+ * mft references (aka file references or file record segment references) =
are
+ * used whenever a structure needs to refer to a record in the mft.
+ *
+ * A reference consists of a 48-bit index into the mft and a 16-bit sequen=
ce
+ * number used to detect stale references.
+ *
+ * For error reporting purposes we treat the 48-bit index as a signed quan=
tity.
+ *
+ * The sequence number is a circular counter (skipping 0) describing how m=
any
+ * times the referenced mft record has been (re)used. This has to match the
+ * sequence number of the mft record being referenced, otherwise the refer=
ence
+ * is considered stale and removed.
+ *
+ * If the sequence number is zero it is assumed that no sequence number
+ * consistency checking should be performed.
+ */
+
+/*
+ * Define two unpacking macros to get to the reference (MREF) and
+ * sequence number (MSEQNO) respectively.
+ * The _LE versions are to be applied on little endian MFT_REFs.
+ * Note: The _LE versions will return a CPU endian formatted value!
+ */
+#define MFT_REF_MASK_CPU 0x0000ffffffffffffULL
+#define MFT_REF_MASK_LE cpu_to_le64(MFT_REF_MASK_CPU)
+
+#define MK_MREF(m, s)	((u64)(((u64)(s) << 48) |		\
+					((u64)(m) & MFT_REF_MASK_CPU)))
+#define MK_LE_MREF(m, s) cpu_to_le64(MK_MREF(m, s))
+
+#define MREF(x)		((unsigned long)((x) & MFT_REF_MASK_CPU))
+#define MSEQNO(x)	((u16)(((x) >> 48) & 0xffff))
+#define MREF_LE(x)	((unsigned long)(le64_to_cpu(x) & MFT_REF_MASK_CPU))
+#define MREF_INO(x)	((unsigned long)MREF_LE(x))
+#define MSEQNO_LE(x)	((u16)((le64_to_cpu(x) >> 48) & 0xffff))
+
+#define IS_ERR_MREF(x)	(((x) & 0x0000800000000000ULL) ? true : false)
+#define ERR_MREF(x)	((u64)((s64)(x)))
+#define MREF_ERR(x)	((int)((s64)(x)))
+
+/*
+ * The mft record header present at the beginning of every record in the m=
ft.
+ * This is followed by a sequence of variable length attribute records whi=
ch
+ * is terminated by an attribute of type AT_END which is a truncated attri=
bute
+ * in that it only consists of the attribute type code AT_END and none of =
the
+ * other members of the attribute structure are present.
+ */
+struct mft_record {
+	__le32 magic;		/* Usually the magic is "FILE". */
+	__le16 usa_ofs;		/* See ntfs_record struct definition above. */
+	__le16 usa_count;		/* See ntfs_record struct  definition above. */
+
+	__le64 lsn;		/*
+				 * LogFile sequence number for this record.
+				 * Changed every time the record is modified.
+				 */
+	__le16 sequence_number;	/*
+				 * Number of times this mft record has been
+				 * reused. (See description for MFT_REF
+				 * above.) NOTE: The increment (skipping zero)
+				 * is done when the file is deleted. NOTE: If
+				 * this is zero it is left zero.
+				 */
+	__le16 link_count;	/*
+				 * Number of hard links, i.e. the number of
+				 * directory entries referencing this record.
+				 * NOTE: Only used in mft base records.
+				 * NOTE: When deleting a directory entry we
+				 * check the link_count and if it is 1 we
+				 * delete the file. Otherwise we delete the
+				 * struct file_name_attr being referenced by the
+				 * directory entry from the mft record and
+				 * decrement the link_count.
+				 */
+	__le16 attrs_offset;	/*
+				 * Byte offset to the first attribute in this
+				 * mft record from the start of the mft record.
+				 * NOTE: Must be aligned to 8-byte boundary.
+				 */
+	__le16 flags;		/*
+				 * Bit array of MFT_RECORD_FLAGS. When a file
+				 * is deleted, the MFT_RECORD_IN_USE flag is
+				 * set to zero.
+				 */
+	__le32 bytes_in_use;	/*
+				 * Number of bytes used in this mft record.
+				 * NOTE: Must be aligned to 8-byte boundary.
+				 */
+	__le32 bytes_allocated;	/*
+				 * Number of bytes allocated for this mft
+				 * record. This should be equal to the mft
+				 * record size.
+				 */
+	__le64 base_mft_record;	  /*
+				   * This is zero for base mft records.
+				   * When it is not zero it is a mft reference
+				   * pointing to the base mft record to which
+				   * this record belongs (this is then used to
+				   * locate the attribute list attribute present
+				   * in the base record which describes this
+				   * extension record and hence might need
+				   * modification when the extension record
+				   * itself is modified, also locating the
+				   * attribute list also means finding the other
+				   * potential extents, belonging to the non-base
+				   * mft record).
+				   */
+	__le16 next_attr_instance; /*
+				    * The instance number that will be assigned to
+				    * the next attribute added to this mft record.
+				    * NOTE: Incremented each time after it is used.
+				    * NOTE: Every time the mft record is reused
+				    * this number is set to zero.  NOTE: The first
+				    * instance number is always 0.
+				    */
+/* The below fields are specific to NTFS 3.1+ (Windows XP and above): */
+	__le16 reserved;		/* Reserved/alignment. */
+	__le32 mft_record_number;	/* Number of this mft record. */
+/* sizeof() =3D 48 bytes */
+/*
+ * When (re)using the mft record, we place the update sequence array at th=
is
+ * offset, i.e. before we start with the attributes.  This also makes sens=
e,
+ * otherwise we could run into problems with the update sequence array
+ * containing in itself the last two bytes of a sector which would mean th=
at
+ * multi sector transfer protection wouldn't work.  As you can't protect d=
ata
+ * by overwriting it since you then can't get it back...
+ * When reading we obviously use the data from the ntfs record header.
+ */
+} __packed;
+
+/* This is the version without the NTFS 3.1+ specific fields. */
+struct mft_record_old {
+	__le32 magic;		/* Usually the magic is "FILE". */
+	__le16 usa_ofs;		/* See ntfs_record struct definition above. */
+	__le16 usa_count;	/* See ntfs_record struct  definition above. */
+
+	__le64 lsn;		/*
+				 * LogFile sequence number for this record.
+				 * Changed every time the record is modified.
+				 */
+	__le16 sequence_number;	/*
+				 * Number of times this mft record has been
+				 * reused. (See description for MFT_REF
+				 * above.) NOTE: The increment (skipping zero)
+				 * is done when the file is deleted. NOTE: If
+				 * this is zero it is left zero.
+				 */
+	__le16 link_count;	/*
+				 * Number of hard links, i.e. the number of
+				 * directory entries referencing this record.
+				 * NOTE: Only used in mft base records.
+				 * NOTE: When deleting a directory entry we
+				 * check the link_count and if it is 1 we
+				 * delete the file. Otherwise we delete the
+				 * struct file_name_attr being referenced by the
+				 * directory entry from the mft record and
+				 * decrement the link_count.
+				 */
+	__le16 attrs_offset;	/*
+				 * Byte offset to the first attribute in this
+				 * mft record from the start of the mft record.
+				 * NOTE: Must be aligned to 8-byte boundary.
+				 */
+	__le16 flags;		/*
+				 * Bit array of MFT_RECORD_FLAGS. When a file
+				 * is deleted, the MFT_RECORD_IN_USE flag is
+				 * set to zero.
+				 */
+	__le32 bytes_in_use;	/*
+				 * Number of bytes used in this mft record.
+				 * NOTE: Must be aligned to 8-byte boundary.
+				 */
+	__le32 bytes_allocated;	/*
+				 * Number of bytes allocated for this mft
+				 * record. This should be equal to the mft
+				 * record size.
+				 */
+	__le64 base_mft_record;	  /*
+				   * This is zero for base mft records.
+				   * When it is not zero it is a mft reference
+				   * pointing to the base mft record to which
+				   * this record belongs (this is then used to
+				   * locate the attribute list attribute present
+				   * in the base record which describes this
+				   * extension record and hence might need
+				   * modification when the extension record
+				   * itself is modified, also locating the
+				   * attribute list also means finding the other
+				   * potential extents, belonging to the non-base
+				   * mft record).
+				   */
+	__le16 next_attr_instance; /*
+				    * The instance number that will be assigned to
+				    * the next attribute added to this mft record.
+				    * NOTE: Incremented each time after it is used.
+				    * NOTE: Every time the mft record is reused
+				    * this number is set to zero.  NOTE: The first
+				    * instance number is always 0.
+				    */
+/* sizeof() =3D 42 bytes */
+/*
+ * When (re)using the mft record, we place the update sequence array at th=
is
+ * offset, i.e. before we start with the attributes.  This also makes sens=
e,
+ * otherwise we could run into problems with the update sequence array
+ * containing in itself the last two bytes of a sector which would mean th=
at
+ * multi sector transfer protection wouldn't work.  As you can't protect d=
ata
+ * by overwriting it since you then can't get it back...
+ * When reading we obviously use the data from the ntfs record header.
+ */
+} __packed;
+
+/*
+ * System defined attributes (32-bit).  Each attribute type has a correspo=
nding
+ * attribute name (Unicode string of maximum 64 character length) as descr=
ibed
+ * by the attribute definitions present in the data attribute of the $Attr=
Def
+ * system file.  On NTFS 3.0 volumes the names are just as the types are n=
amed
+ * in the below defines exchanging AT_ for the dollar sign ($).  If that i=
s not
+ * a revealing choice of symbol I do not know what is... (-;
+ */
+enum {
+	AT_UNUSED			=3D cpu_to_le32(0),
+	AT_STANDARD_INFORMATION		=3D cpu_to_le32(0x10),
+	AT_ATTRIBUTE_LIST		=3D cpu_to_le32(0x20),
+	AT_FILE_NAME			=3D cpu_to_le32(0x30),
+	AT_OBJECT_ID			=3D cpu_to_le32(0x40),
+	AT_SECURITY_DESCRIPTOR		=3D cpu_to_le32(0x50),
+	AT_VOLUME_NAME			=3D cpu_to_le32(0x60),
+	AT_VOLUME_INFORMATION		=3D cpu_to_le32(0x70),
+	AT_DATA				=3D cpu_to_le32(0x80),
+	AT_INDEX_ROOT			=3D cpu_to_le32(0x90),
+	AT_INDEX_ALLOCATION		=3D cpu_to_le32(0xa0),
+	AT_BITMAP			=3D cpu_to_le32(0xb0),
+	AT_REPARSE_POINT		=3D cpu_to_le32(0xc0),
+	AT_EA_INFORMATION		=3D cpu_to_le32(0xd0),
+	AT_EA				=3D cpu_to_le32(0xe0),
+	AT_PROPERTY_SET			=3D cpu_to_le32(0xf0),
+	AT_LOGGED_UTILITY_STREAM	=3D cpu_to_le32(0x100),
+	AT_FIRST_USER_DEFINED_ATTRIBUTE	=3D cpu_to_le32(0x1000),
+	AT_END				=3D cpu_to_le32(0xffffffff)
+};
+
+/*
+ * The collation rules for sorting views/indexes/etc (32-bit).
+ *
+ * COLLATION_BINARY - Collate by binary compare where the first byte is mo=
st
+ *	significant.
+ * COLLATION_UNICODE_STRING - Collate Unicode strings by comparing their b=
inary
+ *	Unicode values, except that when a character can be uppercased, the
+ *	upper case value collates before the lower case one.
+ * COLLATION_FILE_NAME - Collate file names as Unicode strings. The collat=
ion
+ *	is done very much like COLLATION_UNICODE_STRING. In fact I have no idea
+ *	what the difference is. Perhaps the difference is that file names
+ *	would treat some special characters in an odd way (see
+ *	unistr.c::ntfs_collate_names() and unistr.c::legal_ansi_char_array[]
+ *	for what I mean but COLLATION_UNICODE_STRING would not give any special
+ *	treatment to any characters at all, but this is speculation.
+ * COLLATION_NTOFS_ULONG - Sorting is done according to ascending __le32 k=
ey
+ *	values. E.g. used for $SII index in FILE_Secure, which sorts by
+ *	security_id (le32).
+ * COLLATION_NTOFS_SID - Sorting is done according to ascending SID values.
+ *	E.g. used for $O index in FILE_Extend/$Quota.
+ * COLLATION_NTOFS_SECURITY_HASH - Sorting is done first by ascending hash
+ *	values and second by ascending security_id values. E.g. used for $SDH
+ *	index in FILE_Secure.
+ * COLLATION_NTOFS_ULONGS - Sorting is done according to a sequence of asc=
ending
+ *	__le32 key values. E.g. used for $O index in FILE_Extend/$ObjId, which
+ *	sorts by object_id (16-byte), by splitting up the object_id in four
+ *	__le32 values and using them as individual keys. E.g. take the following
+ *	two security_ids, stored as follows on disk:
+ *		1st: a1 61 65 b7 65 7b d4 11 9e 3d 00 e0 81 10 42 59
+ *		2nd: 38 14 37 d2 d2 f3 d4 11 a5 21 c8 6b 79 b1 97 45
+ *	To compare them, they are split into four __le32 values each, like so:
+ *		1st: 0xb76561a1 0x11d47b65 0xe0003d9e 0x59421081
+ *		2nd: 0xd2371438 0x11d4f3d2 0x6bc821a5 0x4597b179
+ *	Now, it is apparent why the 2nd object_id collates after the 1st: the
+ *	first __le32 value of the 1st object_id is less than the first __le32 of
+ *	the 2nd object_id. If the first __le32 values of both object_ids were
+ *	equal then the second __le32 values would be compared, etc.
+ */
+enum {
+	COLLATION_BINARY		=3D cpu_to_le32(0x00),
+	COLLATION_FILE_NAME		=3D cpu_to_le32(0x01),
+	COLLATION_UNICODE_STRING	=3D cpu_to_le32(0x02),
+	COLLATION_NTOFS_ULONG		=3D cpu_to_le32(0x10),
+	COLLATION_NTOFS_SID		=3D cpu_to_le32(0x11),
+	COLLATION_NTOFS_SECURITY_HASH	=3D cpu_to_le32(0x12),
+	COLLATION_NTOFS_ULONGS		=3D cpu_to_le32(0x13),
+};
+
+/*
+ * The flags (32-bit) describing attribute properties in the attribute
+ * definition structure.
+ * The INDEXABLE flag is fairly certainly correct as only the file
+ * name attribute has this flag set and this is the only attribute indexed=
 in
+ * NT4.
+ */
+enum {
+	ATTR_DEF_INDEXABLE	=3D cpu_to_le32(0x02), /* Attribute can be indexed. */
+	ATTR_DEF_MULTIPLE	=3D cpu_to_le32(0x04), /*
+						      * Attribute type can be present
+						      * multiple times in the mft records
+						      * of an inode.
+						      */
+	ATTR_DEF_NOT_ZERO	=3D cpu_to_le32(0x08), /*
+						      * Attribute value must contain
+						      * at least one non-zero byte.
+						      */
+	ATTR_DEF_INDEXED_UNIQUE	=3D cpu_to_le32(0x10), /*
+						      * Attribute must be indexed and
+						      * the attribute value must be unique
+						      * for the attribute type in all of
+						      * the mft records of an inode.
+						      */
+	ATTR_DEF_NAMED_UNIQUE	=3D cpu_to_le32(0x20), /*
+						      * Attribute must be named and
+						      * the name must be unique for
+						      * the attribute type in all of the mft
+						      * records of an inode.
+						      */
+	ATTR_DEF_RESIDENT	=3D cpu_to_le32(0x40), /* Attribute must be resident. */
+	ATTR_DEF_ALWAYS_LOG	=3D cpu_to_le32(0x80), /*
+						      * Always log modifications to this attribute,
+						      * regardless of whether it is resident or
+						      * non-resident.  Without this, only log
+						      * modifications if the attribute is resident.
+						      */
+};
+
+/*
+ * The data attribute of FILE_AttrDef contains a sequence of attribute
+ * definitions for the NTFS volume. With this, it is supposed to be safe f=
or an
+ * older NTFS driver to mount a volume containing a newer NTFS version wit=
hout
+ * damaging it (that's the theory. In practice it's: not damaging it too m=
uch).
+ * Entries are sorted by attribute type. The flags describe whether the
+ * attribute can be resident/non-resident and possibly other things, but t=
he
+ * actual bits are unknown.
+ */
+struct attr_def {
+	__le16 name[0x40];		/* Unicode name of the attribute. Zero terminated. */
+	__le32 type;			/* Type of the attribute. */
+	__le32 display_rule;		/* Default display rule. */
+	__le32 collation_rule;		/* Default collation rule. */
+	__le32 flags;			/* Flags describing the attribute. */
+	__le64 min_size;			/* Optional minimum attribute size. */
+	__le64 max_size;			/* Maximum size of attribute. */
+/* sizeof() =3D 0xa0 or 160 bytes */
+} __packed;
+
+/*
+ * Attribute flags (16-bit).
+ */
+enum {
+	ATTR_IS_COMPRESSED    =3D cpu_to_le16(0x0001),
+	ATTR_COMPRESSION_MASK =3D cpu_to_le16(0x00ff), /*
+						      * Compression method mask.
+						      * Also, first illegal value.
+						      */
+	ATTR_IS_ENCRYPTED     =3D cpu_to_le16(0x4000),
+	ATTR_IS_SPARSE	      =3D cpu_to_le16(0x8000),
+} __packed;
+
+/*
+ * Attribute compression.
+ *
+ * Only the data attribute is ever compressed in the current ntfs driver in
+ * Windows. Further, compression is only applied when the data attribute is
+ * non-resident. Finally, to use compression, the maximum allowed cluster =
size
+ * on a volume is 4kib.
+ *
+ * The compression method is based on independently compressing blocks of X
+ * clusters, where X is determined from the compression_unit value found i=
n the
+ * non-resident attribute record header (more precisely: X =3D 2^compressi=
on_unit
+ * clusters). On Windows NT/2k, X always is 16 clusters (compression_unit =
=3D 4).
+ *
+ * There are three different cases of how a compression block of X clusters
+ * can be stored:
+ *
+ *   1) The data in the block is all zero (a sparse block):
+ *	  This is stored as a sparse block in the runlist, i.e. the runlist
+ *	  entry has length =3D X and lcn =3D -1. The mapping pairs array actual=
ly
+ *	  uses a delta_lcn value length of 0, i.e. delta_lcn is not present at
+ *	  all, which is then interpreted by the driver as lcn =3D -1.
+ *	  NOTE: Even uncompressed files can be sparse on NTFS 3.0 volumes, then
+ *	  the same principles apply as above, except that the length is not
+ *	  restricted to being any particular value.
+ *
+ *   2) The data in the block is not compressed:
+ *	  This happens when compression doesn't reduce the size of the block
+ *	  in clusters. I.e. if compression has a small effect so that the
+ *	  compressed data still occupies X clusters, then the uncompressed data
+ *	  is stored in the block.
+ *	  This case is recognised by the fact that the runlist entry has
+ *	  length =3D X and lcn >=3D 0. The mapping pairs array stores this as
+ *	  normal with a run length of X and some specific delta_lcn, i.e.
+ *	  delta_lcn has to be present.
+ *
+ *   3) The data in the block is compressed:
+ *	  The common case. This case is recognised by the fact that the run
+ *	  list entry has length L < X and lcn >=3D 0. The mapping pairs array
+ *	  stores this as normal with a run length of X and some specific
+ *	  delta_lcn, i.e. delta_lcn has to be present. This runlist entry is
+ *	  immediately followed by a sparse entry with length =3D X - L and
+ *	  lcn =3D -1. The latter entry is to make up the vcn counting to the
+ *	  full compression block size X.
+ *
+ * In fact, life is more complicated because adjacent entries of the same =
type
+ * can be coalesced. This means that one has to keep track of the number of
+ * clusters handled and work on a basis of X clusters at a time being one
+ * block. An example: if length L > X this means that this particular runl=
ist
+ * entry contains a block of length X and part of one or more blocks of le=
ngth
+ * L - X. Another example: if length L < X, this does not necessarily mean=
 that
+ * the block is compressed as it might be that the lcn changes inside the =
block
+ * and hence the following runlist entry describes the continuation of the
+ * potentially compressed block. The block would be compressed if the
+ * following runlist entry describes at least X - L sparse clusters, thus
+ * making up the compression block length as described in point 3 above. (=
Of
+ * course, there can be several runlist entries with small lengths so that=
 the
+ * sparse entry does not follow the first data containing entry with
+ * length < X.)
+ *
+ * NOTE: At the end of the compressed attribute value, there most likely i=
s not
+ * just the right amount of data to make up a compression block, thus this=
 data
+ * is not even attempted to be compressed. It is just stored as is, unless
+ * the number of clusters it occupies is reduced when compressed in which =
case
+ * it is stored as a compressed compression block, complete with sparse
+ * clusters at the end.
+ */
+
+/*
+ * Flags of resident attributes (8-bit).
+ */
+enum {
+	RESIDENT_ATTR_IS_INDEXED =3D 0x01, /*
+					  * Attribute is referenced in an index
+					  * (has implications for deleting and
+					  * modifying the attribute).
+					  */
+} __packed;
+
+/*
+ * Attribute record header. Always aligned to 8-byte boundary.
+ */
+struct attr_record {
+	__le32 type;		/* The (32-bit) type of the attribute. */
+	__le32 length;		/*
+				 * Byte size of the resident part of the
+				 * attribute (aligned to 8-byte boundary).
+				 * Used to get to the next attribute.
+				 */
+	u8 non_resident;	/*
+				 * If 0, attribute is resident.
+				 * If 1, attribute is non-resident.
+				 */
+	u8 name_length;		/* Unicode character size of name of attribute. 0 if unn=
amed. */
+	__le16 name_offset;	/*
+				 * If name_length !=3D 0, the byte offset to the
+				 * beginning of the name from the attribute
+				 * record. Note that the name is stored as a
+				 * Unicode string. When creating, place offset
+				 * just at the end of the record header. Then,
+				 * follow with attribute value or mapping pairs
+				 * array, resident and non-resident attributes
+				 * respectively, aligning to an 8-byte
+				 * boundary.
+				 */
+	__le16 flags;	/* Flags describing the attribute. */
+	__le16 instance;	/*
+				 * The instance of this attribute record. This
+				 * number is unique within this mft record (see
+				 * MFT_RECORD/next_attribute_instance notes in
+				 * mft.h for more details).
+				 */
+	union {
+		/* Resident attributes. */
+		struct {
+			__le32 value_length; /* Byte size of attribute value. */
+			__le16 value_offset; /*
+					      * Byte offset of the attribute
+					      * value from the start of the
+					      * attribute record. When creating,
+					      * align to 8-byte boundary if we
+					      * have a name present as this might
+					      * not have a length of a multiple
+					      * of 8-bytes.
+					      */
+			u8 flags;	/* See above. */
+			s8 reserved;	  /* Reserved/alignment to 8-byte boundary. */
+		} __packed resident;
+		/* Non-resident attributes. */
+		struct {
+			__le64 lowest_vcn; /*
+					    * Lowest valid virtual cluster number
+					    * for this portion of the attribute value or
+					    * 0 if this is the only extent (usually the
+					    * case). - Only when an attribute list is used
+					    * does lowest_vcn !=3D 0 ever occur.
+					    */
+			__le64 highest_vcn; /*
+					     * Highest valid vcn of this extent of
+					     * the attribute value. - Usually there is only one
+					     * portion, so this usually equals the attribute
+					     * value size in clusters minus 1. Can be -1 for
+					     * zero length files. Can be 0 for "single extent"
+					     * attributes.
+					     */
+			__le16 mapping_pairs_offset; /*
+						      * Byte offset from the beginning of
+						      * the structure to the mapping pairs
+						      * array which contains the mappings
+						      * between the vcns and the logical cluster
+						      * numbers (lcns).
+						      * When creating, place this at the end of
+						      * this record header aligned to 8-byte
+						      * boundary.
+						      */
+			u8 compression_unit; /*
+					      * The compression unit expressed as the log
+					      * to the base 2 of the number of
+					      * clusters in a compression unit.  0 means not
+					      * compressed.  (This effectively limits the
+					      * compression unit size to be a power of two
+					      * clusters.)  WinNT4 only uses a value of 4.
+					      * Sparse files have this set to 0 on XPSP2.
+					      */
+			u8 reserved[5];		/* Align to 8-byte boundary. */
+/*
+ * The sizes below are only used when lowest_vcn is zero, as otherwise it =
would
+ * be difficult to keep them up-to-date.
+ */
+			__le64 allocated_size;	/*
+						 * Byte size of disk space allocated
+						 * to hold the attribute value. Always
+						 * is a multiple of the cluster size.
+						 * When a file is compressed, this field
+						 * is a multiple of the compression block
+						 * size (2^compression_unit) and it represents
+						 * the logically allocated space rather than
+						 * the actual on disk usage. For this use
+						 * the compressed_size (see below).
+						 */
+			__le64 data_size;	/*
+						 * Byte size of the attribute value. Can be
+						 * larger than allocated_size if attribute value
+						 * is compressed or sparse.
+						 */
+			__le64 initialized_size; /*
+						  * Byte size of initialized portion of
+						  * the attribute value. Usually equals data_size.
+						  */
+/* sizeof(uncompressed attr) =3D 64*/
+			__le64 compressed_size;	/*
+						 * Byte size of the attribute value after
+						 * compression.  Only present when compressed
+						 * or sparse.  Always is a multiple of the cluster
+						 * size.  Represents the actual amount of disk
+						 * space being used on the disk.
+						 */
+/* sizeof(compressed attr) =3D 72*/
+		} __packed non_resident;
+	} __packed data;
+} __packed;
+
+/*
+ * File attribute flags (32-bit) appearing in the file_attributes fields o=
f the
+ * STANDARD_INFORMATION attribute of MFT_RECORDs and the FILENAME_ATTR
+ * attributes of MFT_RECORDs and directory index entries.
+ *
+ * All of the below flags appear in the directory index entries but only s=
ome
+ * appear in the STANDARD_INFORMATION attribute whilst only some others ap=
pear
+ * in the FILENAME_ATTR attribute of MFT_RECORDs.  Unless otherwise stated=
 the
+ * flags appear in all of the above.
+ */
+enum {
+	FILE_ATTR_READONLY		=3D cpu_to_le32(0x00000001),
+	FILE_ATTR_HIDDEN		=3D cpu_to_le32(0x00000002),
+	FILE_ATTR_SYSTEM		=3D cpu_to_le32(0x00000004),
+	/* Old DOS volid. Unused in NT.	=3D cpu_to_le32(0x00000008), */
+
+	FILE_ATTR_DIRECTORY		=3D cpu_to_le32(0x00000010),
+	/*
+	 * Note, FILE_ATTR_DIRECTORY is not considered valid in NT.  It is
+	 * reserved for the DOS SUBDIRECTORY flag.
+	 */
+	FILE_ATTR_ARCHIVE		=3D cpu_to_le32(0x00000020),
+	FILE_ATTR_DEVICE		=3D cpu_to_le32(0x00000040),
+	FILE_ATTR_NORMAL		=3D cpu_to_le32(0x00000080),
+
+	FILE_ATTR_TEMPORARY		=3D cpu_to_le32(0x00000100),
+	FILE_ATTR_SPARSE_FILE		=3D cpu_to_le32(0x00000200),
+	FILE_ATTR_REPARSE_POINT		=3D cpu_to_le32(0x00000400),
+	FILE_ATTR_COMPRESSED		=3D cpu_to_le32(0x00000800),
+
+	FILE_ATTR_OFFLINE		=3D cpu_to_le32(0x00001000),
+	FILE_ATTR_NOT_CONTENT_INDEXED	=3D cpu_to_le32(0x00002000),
+	FILE_ATTR_ENCRYPTED		=3D cpu_to_le32(0x00004000),
+
+	FILE_ATTR_VALID_FLAGS		=3D cpu_to_le32(0x00007fb7),
+	/*
+	 * Note, FILE_ATTR_VALID_FLAGS masks out the old DOS VolId and the
+	 * FILE_ATTR_DEVICE and preserves everything else.  This mask is used
+	 * to obtain all flags that are valid for reading.
+	 */
+	FILE_ATTR_VALID_SET_FLAGS	=3D cpu_to_le32(0x000031a7),
+	/*
+	 * Note, FILE_ATTR_VALID_SET_FLAGS masks out the old DOS VolId, the
+	 * F_A_DEVICE, F_A_DIRECTORY, F_A_SPARSE_FILE, F_A_REPARSE_POINT,
+	 * F_A_COMPRESSED, and F_A_ENCRYPTED and preserves the rest.  This mask
+	 * is used to obtain all flags that are valid for setting.
+	 */
+	/* Supposed to mean no data locally, possibly repurposed */
+	FILE_ATTRIBUTE_RECALL_ON_OPEN	=3D cpu_to_le32(0x00040000),
+	/*
+	 * The flag FILE_ATTR_DUP_FILENAME_INDEX_PRESENT is present in all
+	 * FILENAME_ATTR attributes but not in the STANDARD_INFORMATION
+	 * attribute of an mft record.
+	 */
+	FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT	=3D cpu_to_le32(0x10000000),
+	/*
+	 * Note, this is a copy of the corresponding bit from the mft record,
+	 * telling us whether this is a directory or not, i.e. whether it has
+	 * an index root attribute or not.
+	 */
+	FILE_ATTR_DUP_VIEW_INDEX_PRESENT	=3D cpu_to_le32(0x20000000),
+	/*
+	 * Note, this is a copy of the corresponding bit from the mft record,
+	 * telling us whether this file has a view index present (eg. object id
+	 * index, quota index, one of the security indexes or the encrypting
+	 * filesystem related indexes).
+	 */
+};
+
+/*
+ * NOTE on times in NTFS: All times are in MS standard time format, i.e. t=
hey
+ * are the number of 100-nanosecond intervals since 1st January 1601, 00:0=
0:00
+ * universal coordinated time (UTC). (In Linux time starts 1st January 197=
0,
+ * 00:00:00 UTC and is stored as the number of 1-second intervals since th=
en.)
+ */
+
+/*
+ * Attribute: Standard information (0x10).
+ *
+ * NOTE: Always resident.
+ * NOTE: Present in all base file records on a volume.
+ * NOTE: There is conflicting information about the meaning of each of the=
 time
+ *	 fields but the meaning as defined below has been verified to be
+ *	 correct by practical experimentation on Windows NT4 SP6a and is hence
+ *	 assumed to be the one and only correct interpretation.
+ */
+struct standard_information {
+	__le64 creation_time;		/*
+					 * Time file was created. Updated when
+					 * a filename is changed(?).
+					 */
+	__le64 last_data_change_time;	/* Time the data attribute was last modifie=
d. */
+	__le64 last_mft_change_time;	/* Time this mft record was last modified. */
+	__le64 last_access_time;	/*
+					 * Approximate time when the file was
+					 * last accessed (obviously this is not
+					 * updated on read-only volumes). In
+					 * Windows this is only updated when
+					 * accessed if some time delta has
+					 * passed since the last update. Also,
+					 * last access time updates can be
+					 * disabled altogether for speed.
+					 */
+	__le32 file_attributes; /* Flags describing the file. */
+	union {
+	/* NTFS 1.2 */
+		struct {
+			u8 reserved12[12];	/* Reserved/alignment to 8-byte boundary. */
+		} __packed v1;
+	/* sizeof() =3D 48 bytes */
+	/* NTFS 3.x */
+		struct {
+/*
+ * If a volume has been upgraded from a previous NTFS version, then these
+ * fields are present only if the file has been accessed since the upgrade.
+ * Recognize the difference by comparing the length of the resident attrib=
ute
+ * value. If it is 48, then the following fields are missing. If it is 72 =
then
+ * the fields are present. Maybe just check like this:
+ *	if (resident.ValueLength < sizeof(struct standard_information)) {
+ *		Assume NTFS 1.2- format.
+ *		If (volume version is 3.x)
+ *			Upgrade attribute to NTFS 3.x format.
+ *		else
+ *			Use NTFS 1.2- format for access.
+ *	} else
+ *		Use NTFS 3.x format for access.
+ * Only problem is that it might be legal to set the length of the value to
+ * arbitrarily large values thus spoiling this check. - But chkdsk probably
+ * views that as a corruption, assuming that it behaves like this for all
+ * attributes.
+ */
+			__le32 maximum_versions; /*
+						  * Maximum allowed versions for
+						  * file. Zero if version numbering
+						  * is disabled.
+						  */
+			__le32 version_number;	/*
+						 * This file's version (if any).
+						 * Set to zero if maximum_versions
+						 * is zero.
+						 */
+			__le32 class_id;	/*
+						 * Class id from bidirectional
+						 * class id index (?).
+						 */
+			__le32 owner_id;	/*
+						 * Owner_id of the user owning
+						 * the file. Translate via $Q index
+						 * in FILE_Extend /$Quota to the quota
+						 * control entry for the user owning
+						 * the file. Zero if quotas are disabled.
+						 */
+			__le32 security_id;	/*
+						 * Security_id for the file. Translate via
+						 * $SII index and $SDS data stream in
+						 * FILE_Secure to the security descriptor.
+						 */
+			__le64 quota_charged;	/*
+						 * Byte size of the charge to the quota for
+						 * all streams of the file. Note: Is zero
+						 * if quotas are disabled.
+						 */
+			__le64 usn;		/*
+						 * Last update sequence number of the file.
+						 * This is a direct index into the transaction
+						 * log file ($UsnJrnl).  It is zero if the usn
+						 * journal is disabled or this file has not been
+						 * subject to logging yet.  See usnjrnl.h
+						 * for details.
+						 */
+		} __packed v3;
+	/* sizeof() =3D 72 bytes (NTFS 3.x) */
+	} __packed ver;
+} __packed;
+
+/*
+ * Attribute: Attribute list (0x20).
+ *
+ * - Can be either resident or non-resident.
+ * - Value consists of a sequence of variable length, 8-byte aligned,
+ * ATTR_LIST_ENTRY records.
+ * - The list is not terminated by anything at all! The only way to know w=
hen
+ * the end is reached is to keep track of the current offset and compare i=
t to
+ * the attribute value size.
+ * - The attribute list attribute contains one entry for each attribute of
+ * the file in which the list is located, except for the list attribute
+ * itself. The list is sorted: first by attribute type, second by attribute
+ * name (if present), third by instance number. The extents of one
+ * non-resident attribute (if present) immediately follow after the initial
+ * extent. They are ordered by lowest_vcn and have their instance set to z=
ero.
+ * It is not allowed to have two attributes with all sorting keys equal.
+ * - Further restrictions:
+ *	- If not resident, the vcn to lcn mapping array has to fit inside the
+ *	  base mft record.
+ *	- The attribute list attribute value has a maximum size of 256kb. This
+ *	  is imposed by the Windows cache manager.
+ * - Attribute lists are only used when the attributes of mft record do not
+ * fit inside the mft record despite all attributes (that can be made
+ * non-resident) having been made non-resident. This can happen e.g. when:
+ *	- File has a large number of hard links (lots of file name
+ *	  attributes present).
+ *	- The mapping pairs array of some non-resident attribute becomes so
+ *	  large due to fragmentation that it overflows the mft record.
+ *	- The security descriptor is very complex (not applicable to
+ *	  NTFS 3.0 volumes).
+ *	- There are many named streams.
+ */
+struct attr_list_entry {
+	__le32 type;		/* Type of referenced attribute. */
+	__le16 length;		/* Byte size of this entry (8-byte aligned). */
+	u8 name_length;		/*
+				 * Size in Unicode chars of the name of the
+				 * attribute or 0 if unnamed.
+				 */
+	u8 name_offset;		/*
+				 * Byte offset to beginning of attribute name
+				 * (always set this to where the name would
+				 * start even if unnamed).
+				 */
+	__le64 lowest_vcn;	/*
+				 * Lowest virtual cluster number of this portion
+				 * of the attribute value. This is usually 0. It
+				 * is non-zero for the case where one attribute
+				 * does not fit into one mft record and thus
+				 * several mft records are allocated to hold
+				 * this attribute. In the latter case, each mft
+				 * record holds one extent of the attribute and
+				 * there is one attribute list entry for each
+				 * extent. NOTE: This is DEFINITELY a signed
+				 * value! The windows driver uses cmp, followed
+				 * by jg when comparing this, thus it treats it
+				 * as signed.
+				 */
+	__le64 mft_reference;	/*
+				 * The reference of the mft record holding
+				 * the attr record for this portion of the
+				 * attribute value.
+				 */
+	__le16 instance;	/*
+				 * If lowest_vcn =3D 0, the instance of the
+				 * attribute being referenced; otherwise 0.
+				 */
+	__le16 name[];		/*
+				 * Use when creating only. When reading use
+				 * name_offset to determine the location of the name.
+				 */
+/* sizeof() =3D 26 + (attribute_name_length * 2) bytes */
+} __packed;
+
+/*
+ * The maximum allowed length for a file name.
+ */
+#define MAXIMUM_FILE_NAME_LENGTH	255
+
+/*
+ * Possible namespaces for filenames in ntfs (8-bit).
+ */
+enum {
+	FILE_NAME_POSIX		=3D 0x00,
+	/*
+	 * This is the largest namespace. It is case sensitive and allows all
+	 * Unicode characters except for: '\0' and '/'.  Beware that in
+	 * WinNT/2k/2003 by default files which eg have the same name except
+	 * for their case will not be distinguished by the standard utilities
+	 * and thus a "del filename" will delete both "filename" and "fileName"
+	 * without warning.  However if for example Services For Unix (SFU) are
+	 * installed and the case sensitive option was enabled at installation
+	 * time, then you can create/access/delete such files.
+	 * Note that even SFU places restrictions on the filenames beyond the
+	 * '\0' and '/' and in particular the following set of characters is
+	 * not allowed: '"', '/', '<', '>', '\'.  All other characters,
+	 * including the ones no allowed in WIN32 namespace are allowed.
+	 * Tested with SFU 3.5 (this is now free) running on Windows XP.
+	 */
+	FILE_NAME_WIN32		=3D 0x01,
+	/*
+	 * The standard WinNT/2k NTFS long filenames. Case insensitive.  All
+	 * Unicode chars except: '\0', '"', '*', '/', ':', '<', '>', '?', '\',
+	 * and '|'.  Further, names cannot end with a '.' or a space.
+	 */
+	FILE_NAME_DOS		=3D 0x02,
+	/*
+	 * The standard DOS filenames (8.3 format). Uppercase only.  All 8-bit
+	 * characters greater space, except: '"', '*', '+', ',', '/', ':', ';',
+	 * '<', '=3D', '>', '?', and '\'.\
+	 */
+	FILE_NAME_WIN32_AND_DOS	=3D 0x03,
+	/*
+	 * 3 means that both the Win32 and the DOS filenames are identical and
+	 * hence have been saved in this single filename record.
+	 */
+} __packed;
+
+/*
+ * Attribute: Filename (0x30).
+ *
+ * NOTE: Always resident.
+ * NOTE: All fields, except the parent_directory, are only updated when the
+ *	 filename is changed. Until then, they just become out of sync with
+ *	 reality and the more up to date values are present in the standard
+ *	 information attribute.
+ * NOTE: There is conflicting information about the meaning of each of the=
 time
+ *	 fields but the meaning as defined below has been verified to be
+ *	 correct by practical experimentation on Windows NT4 SP6a and is hence
+ *	 assumed to be the one and only correct interpretation.
+ */
+struct file_name_attr {
+/*hex ofs*/
+	__le64 parent_directory;		/* Directory this filename is referenced from. =
*/
+	__le64 creation_time;		/* Time file was created. */
+	__le64 last_data_change_time;	/* Time the data attribute was last modifie=
d. */
+	__le64 last_mft_change_time;	/* Time this mft record was last modified. */
+	__le64 last_access_time;		/* Time this mft record was last accessed. */
+	__le64 allocated_size;		/*
+					 * Byte size of on-disk allocated space
+					 * for the unnamed data attribute.  So for normal
+					 * $DATA, this is the allocated_size from
+					 * the unnamed $DATA attribute and for compressed
+					 * and/or sparse $DATA, this is the
+					 * compressed_size from the unnamed
+					 * $DATA attribute.  For a directory or
+					 * other inode without an unnamed $DATA attribute,
+					 * this is always 0.  NOTE: This is a multiple of
+					 * the cluster size.
+					 */
+	__le64 data_size;		/*
+					 * Byte size of actual data in unnamed
+					 * data attribute.  For a directory or
+					 * other inode without an unnamed $DATA
+					 * attribute, this is always 0.
+					 */
+	__le32 file_attributes;		/* Flags describing the file. */
+	union {
+		struct {
+			__le16 packed_ea_size;	/*
+						 * Size of the buffer needed to
+						 * pack the extended attributes
+						 * (EAs), if such are present.
+						 */
+			__le16 reserved;	/* Reserved for alignment. */
+		} __packed ea;
+		struct {
+			__le32 reparse_point_tag; /*
+						   * Type of reparse point,
+						   * present only in reparse
+						   * points and only if there are
+						   * no EAs.
+						   */
+		} __packed rp;
+	} __packed type;
+	u8 file_name_length;			/* Length of file name in (Unicode) characters. */
+	u8 file_name_type;			/* Namespace of the file name.*/
+	__le16 file_name[];			/* File name in Unicode. */
+} __packed;
+
+/*
+ * GUID structures store globally unique identifiers (GUID). A GUID is a
+ * 128-bit value consisting of one group of eight hexadecimal digits, foll=
owed
+ * by three groups of four hexadecimal digits each, followed by one group =
of
+ * twelve hexadecimal digits. GUIDs are Microsoft's implementation of the
+ * distributed computing environment (DCE) universally unique identifier (=
UUID).
+ * Example of a GUID:
+ *	1F010768-5A73-BC91-0010A52216A7
+ */
+struct guid {
+	__le32 data1;	/* The first eight hexadecimal digits of the GUID. */
+	__le16 data2;	/* The first group of four hexadecimal digits. */
+	__le16 data3;	/* The second group of four hexadecimal digits. */
+	u8 data4[8];	/*
+			 * The first two bytes are the third group of four
+			 * hexadecimal digits. The remaining six bytes are the
+			 * final 12 hexadecimal digits.
+			 */
+} __packed;
+
+/*
+ * These relative identifiers (RIDs) are used with the above identifier
+ * authorities to make up universal well-known SIDs.
+ *
+ * Note: The relative identifier (RID) refers to the portion of a SID, whi=
ch
+ * identifies a user or group in relation to the authority that issued the=
 SID.
+ * For example, the universal well-known SID Creator Owner ID (S-1-3-0) is
+ * made up of the identifier authority SECURITY_CREATOR_SID_AUTHORITY (3) =
and
+ * the relative identifier SECURITY_CREATOR_OWNER_RID (0).
+ */
+enum {					/* Identifier authority. */
+	SECURITY_NULL_RID			=3D 0,	/* S-1-0 */
+	SECURITY_WORLD_RID			=3D 0,	/* S-1-1 */
+	SECURITY_LOCAL_RID			=3D 0,	/* S-1-2 */
+
+	SECURITY_CREATOR_OWNER_RID		=3D 0,	/* S-1-3 */
+	SECURITY_CREATOR_GROUP_RID		=3D 1,	/* S-1-3 */
+
+	SECURITY_CREATOR_OWNER_SERVER_RID	=3D 2,	/* S-1-3 */
+	SECURITY_CREATOR_GROUP_SERVER_RID	=3D 3,	/* S-1-3 */
+
+	SECURITY_DIALUP_RID			=3D 1,
+	SECURITY_NETWORK_RID			=3D 2,
+	SECURITY_BATCH_RID			=3D 3,
+	SECURITY_INTERACTIVE_RID		=3D 4,
+	SECURITY_SERVICE_RID			=3D 6,
+	SECURITY_ANONYMOUS_LOGON_RID		=3D 7,
+	SECURITY_PROXY_RID			=3D 8,
+	SECURITY_ENTERPRISE_CONTROLLERS_RID	=3D 9,
+	SECURITY_SERVER_LOGON_RID		=3D 9,
+	SECURITY_PRINCIPAL_SELF_RID		=3D 0xa,
+	SECURITY_AUTHENTICATED_USER_RID		=3D 0xb,
+	SECURITY_RESTRICTED_CODE_RID		=3D 0xc,
+	SECURITY_TERMINAL_SERVER_RID		=3D 0xd,
+
+	SECURITY_LOGON_IDS_RID			=3D 5,
+	SECURITY_LOGON_IDS_RID_COUNT		=3D 3,
+
+	SECURITY_LOCAL_SYSTEM_RID		=3D 0x12,
+
+	SECURITY_NT_NON_UNIQUE			=3D 0x15,
+
+	SECURITY_BUILTIN_DOMAIN_RID		=3D 0x20,
+
+	/*
+	 * Well-known domain relative sub-authority values (RIDs).
+	 */
+
+	/* Users. */
+	DOMAIN_USER_RID_ADMIN			=3D 0x1f4,
+	DOMAIN_USER_RID_GUEST			=3D 0x1f5,
+	DOMAIN_USER_RID_KRBTGT			=3D 0x1f6,
+
+	/* Groups. */
+	DOMAIN_GROUP_RID_ADMINS			=3D 0x200,
+	DOMAIN_GROUP_RID_USERS			=3D 0x201,
+	DOMAIN_GROUP_RID_GUESTS			=3D 0x202,
+	DOMAIN_GROUP_RID_COMPUTERS		=3D 0x203,
+	DOMAIN_GROUP_RID_CONTROLLERS		=3D 0x204,
+	DOMAIN_GROUP_RID_CERT_ADMINS		=3D 0x205,
+	DOMAIN_GROUP_RID_SCHEMA_ADMINS		=3D 0x206,
+	DOMAIN_GROUP_RID_ENTERPRISE_ADMINS	=3D 0x207,
+	DOMAIN_GROUP_RID_POLICY_ADMINS		=3D 0x208,
+
+	/* Aliases. */
+	DOMAIN_ALIAS_RID_ADMINS			=3D 0x220,
+	DOMAIN_ALIAS_RID_USERS			=3D 0x221,
+	DOMAIN_ALIAS_RID_GUESTS			=3D 0x222,
+	DOMAIN_ALIAS_RID_POWER_USERS		=3D 0x223,
+
+	DOMAIN_ALIAS_RID_ACCOUNT_OPS		=3D 0x224,
+	DOMAIN_ALIAS_RID_SYSTEM_OPS		=3D 0x225,
+	DOMAIN_ALIAS_RID_PRINT_OPS		=3D 0x226,
+	DOMAIN_ALIAS_RID_BACKUP_OPS		=3D 0x227,
+
+	DOMAIN_ALIAS_RID_REPLICATOR		=3D 0x228,
+	DOMAIN_ALIAS_RID_RAS_SERVERS		=3D 0x229,
+	DOMAIN_ALIAS_RID_PREW2KCOMPACCESS	=3D 0x22a,
+};
+
+/*
+ * The universal well-known SIDs:
+ *
+ *	NULL_SID			S-1-0-0
+ *	WORLD_SID			S-1-1-0
+ *	LOCAL_SID			S-1-2-0
+ *	CREATOR_OWNER_SID		S-1-3-0
+ *	CREATOR_GROUP_SID		S-1-3-1
+ *	CREATOR_OWNER_SERVER_SID	S-1-3-2
+ *	CREATOR_GROUP_SERVER_SID	S-1-3-3
+ *
+ *	(Non-unique IDs)		S-1-4
+ *
+ * NT well-known SIDs:
+ *
+ *	NT_AUTHORITY_SID	S-1-5
+ *	DIALUP_SID		S-1-5-1
+ *
+ *	NETWORD_SID		S-1-5-2
+ *	BATCH_SID		S-1-5-3
+ *	INTERACTIVE_SID		S-1-5-4
+ *	SERVICE_SID		S-1-5-6
+ *	ANONYMOUS_LOGON_SID	S-1-5-7		(aka null logon session)
+ *	PROXY_SID		S-1-5-8
+ *	SERVER_LOGON_SID	S-1-5-9		(aka domain controller account)
+ *	SELF_SID		S-1-5-10	(self RID)
+ *	AUTHENTICATED_USER_SID	S-1-5-11
+ *	RESTRICTED_CODE_SID	S-1-5-12	(running restricted code)
+ *	TERMINAL_SERVER_SID	S-1-5-13	(running on terminal server)
+ *
+ *	(Logon IDs)		S-1-5-5-X-Y
+ *
+ *	(NT non-unique IDs)	S-1-5-0x15-...
+ *
+ *	(Built-in domain)	S-1-5-0x20
+ */
+
+/*
+ * The SID structure is a variable-length structure used to uniquely ident=
ify
+ * users or groups. SID stands for security identifier.
+ *
+ * The standard textual representation of the SID is of the form:
+ *	S-R-I-S-S...
+ * Where:
+ *    - The first "S" is the literal character 'S' identifying the followi=
ng
+ *	digits as a SID.
+ *    - R is the revision level of the SID expressed as a sequence of digi=
ts
+ *	either in decimal or hexadecimal (if the later, prefixed by "0x").
+ *    - I is the 48-bit identifier_authority, expressed as digits as R abo=
ve.
+ *    - S... is one or more sub_authority values, expressed as digits as a=
bove.
+ *
+ * Example SID; the domain-relative SID of the local Administrators group =
on
+ * Windows NT/2k:
+ *	S-1-5-32-544
+ * This translates to a SID with:
+ *	revision =3D 1,
+ *	sub_authority_count =3D 2,
+ *	identifier_authority =3D {0,0,0,0,0,5},	// SECURITY_NT_AUTHORITY
+ *	sub_authority[0] =3D 32,			// SECURITY_BUILTIN_DOMAIN_RID
+ *	sub_authority[1] =3D 544			// DOMAIN_ALIAS_RID_ADMINS
+ */
+struct ntfs_sid {
+	u8 revision;
+	u8 sub_authority_count;
+	union {
+		struct {
+			u16 high_part;  /* High 16-bits. */
+			u32 low_part;   /* Low 32-bits. */
+		} __packed parts;
+		u8 value[6];            /* Value as individual bytes. */
+	} identifier_authority;
+	__le32 sub_authority[];		/* At least one sub_authority. */
+} __packed;
+
+/*
+ * The predefined ACE types (8-bit, see below).
+ */
+enum {
+	ACCESS_MIN_MS_ACE_TYPE			=3D 0,
+	ACCESS_ALLOWED_ACE_TYPE			=3D 0,
+	ACCESS_DENIED_ACE_TYPE			=3D 1,
+	SYSTEM_AUDIT_ACE_TYPE			=3D 2,
+	SYSTEM_ALARM_ACE_TYPE			=3D 3, /* Not implemented as of Win2k. */
+	ACCESS_MAX_MS_V2_ACE_TYPE		=3D 3,
+
+	ACCESS_ALLOWED_COMPOUND_ACE_TYPE	=3D 4,
+	ACCESS_MAX_MS_V3_ACE_TYPE		=3D 4,
+
+	/* The following are Win2k only. */
+	ACCESS_MIN_MS_OBJECT_ACE_TYPE		=3D 5,
+	ACCESS_ALLOWED_OBJECT_ACE_TYPE		=3D 5,
+	ACCESS_DENIED_OBJECT_ACE_TYPE		=3D 6,
+	SYSTEM_AUDIT_OBJECT_ACE_TYPE		=3D 7,
+	SYSTEM_ALARM_OBJECT_ACE_TYPE		=3D 8,
+	ACCESS_MAX_MS_OBJECT_ACE_TYPE		=3D 8,
+
+	ACCESS_MAX_MS_V4_ACE_TYPE		=3D 8,
+
+	/* This one is for WinNT/2k. */
+	ACCESS_MAX_MS_ACE_TYPE			=3D 8,
+} __packed;
+
+/*
+ * The ACE flags (8-bit) for audit and inheritance (see below).
+ *
+ * SUCCESSFUL_ACCESS_ACE_FLAG is only used with system audit and alarm ACE
+ * types to indicate that a message is generated (in Windows!) for success=
ful
+ * accesses.
+ *
+ * FAILED_ACCESS_ACE_FLAG is only used with system audit and alarm ACE typ=
es
+ * to indicate that a message is generated (in Windows!) for failed access=
es.
+ */
+enum {
+	/* The inheritance flags. */
+	OBJECT_INHERIT_ACE		=3D 0x01,
+	CONTAINER_INHERIT_ACE		=3D 0x02,
+	NO_PROPAGATE_INHERIT_ACE	=3D 0x04,
+	INHERIT_ONLY_ACE		=3D 0x08,
+	INHERITED_ACE			=3D 0x10,	/* Win2k only. */
+	VALID_INHERIT_FLAGS		=3D 0x1f,
+
+	/* The audit flags. */
+	SUCCESSFUL_ACCESS_ACE_FLAG	=3D 0x40,
+	FAILED_ACCESS_ACE_FLAG		=3D 0x80,
+} __packed;
+
+/*
+ * The access mask (32-bit). Defines the access rights.
+ *
+ * The specific rights (bits 0 to 15).  These depend on the type of the ob=
ject
+ * being secured by the ACE.
+ */
+enum {
+	/* Specific rights for files and directories are as follows: */
+
+	/* Right to read data from the file. (FILE) */
+	FILE_READ_DATA			=3D cpu_to_le32(0x00000001),
+	/* Right to list contents of a directory. (DIRECTORY) */
+	FILE_LIST_DIRECTORY		=3D cpu_to_le32(0x00000001),
+
+	/* Right to write data to the file. (FILE) */
+	FILE_WRITE_DATA			=3D cpu_to_le32(0x00000002),
+	/* Right to create a file in the directory. (DIRECTORY) */
+	FILE_ADD_FILE			=3D cpu_to_le32(0x00000002),
+
+	/* Right to append data to the file. (FILE) */
+	FILE_APPEND_DATA		=3D cpu_to_le32(0x00000004),
+	/* Right to create a subdirectory. (DIRECTORY) */
+	FILE_ADD_SUBDIRECTORY		=3D cpu_to_le32(0x00000004),
+
+	/* Right to read extended attributes. (FILE/DIRECTORY) */
+	FILE_READ_EA			=3D cpu_to_le32(0x00000008),
+
+	/* Right to write extended attributes. (FILE/DIRECTORY) */
+	FILE_WRITE_EA			=3D cpu_to_le32(0x00000010),
+
+	/* Right to execute a file. (FILE) */
+	FILE_EXECUTE			=3D cpu_to_le32(0x00000020),
+	/* Right to traverse the directory. (DIRECTORY) */
+	FILE_TRAVERSE			=3D cpu_to_le32(0x00000020),
+
+	/*
+	 * Right to delete a directory and all the files it contains (its
+	 * children), even if the files are read-only. (DIRECTORY)
+	 */
+	FILE_DELETE_CHILD		=3D cpu_to_le32(0x00000040),
+
+	/* Right to read file attributes. (FILE/DIRECTORY) */
+	FILE_READ_ATTRIBUTES		=3D cpu_to_le32(0x00000080),
+
+	/* Right to change file attributes. (FILE/DIRECTORY) */
+	FILE_WRITE_ATTRIBUTES		=3D cpu_to_le32(0x00000100),
+
+	/*
+	 * The standard rights (bits 16 to 23).  These are independent of the
+	 * type of object being secured.
+	 */
+
+	/* Right to delete the object. */
+	DELETE				=3D cpu_to_le32(0x00010000),
+
+	/*
+	 * Right to read the information in the object's security descriptor,
+	 * not including the information in the SACL, i.e. right to read the
+	 * security descriptor and owner.
+	 */
+	READ_CONTROL			=3D cpu_to_le32(0x00020000),
+
+	/* Right to modify the DACL in the object's security descriptor. */
+	WRITE_DAC			=3D cpu_to_le32(0x00040000),
+
+	/* Right to change the owner in the object's security descriptor. */
+	WRITE_OWNER			=3D cpu_to_le32(0x00080000),
+
+	/*
+	 * Right to use the object for synchronization.  Enables a process to
+	 * wait until the object is in the signalled state.  Some object types
+	 * do not support this access right.
+	 */
+	SYNCHRONIZE			=3D cpu_to_le32(0x00100000),
+
+	/*
+	 * The following STANDARD_RIGHTS_* are combinations of the above for
+	 * convenience and are defined by the Win32 API.
+	 */
+
+	/* These are currently defined to READ_CONTROL. */
+	STANDARD_RIGHTS_READ		=3D cpu_to_le32(0x00020000),
+	STANDARD_RIGHTS_WRITE		=3D cpu_to_le32(0x00020000),
+	STANDARD_RIGHTS_EXECUTE		=3D cpu_to_le32(0x00020000),
+
+	/* Combines DELETE, READ_CONTROL, WRITE_DAC, and WRITE_OWNER access. */
+	STANDARD_RIGHTS_REQUIRED	=3D cpu_to_le32(0x000f0000),
+
+	/*
+	 * Combines DELETE, READ_CONTROL, WRITE_DAC, WRITE_OWNER, and
+	 * SYNCHRONIZE access.
+	 */
+	STANDARD_RIGHTS_ALL		=3D cpu_to_le32(0x001f0000),
+
+	/*
+	 * The access system ACL and maximum allowed access types (bits 24 to
+	 * 25, bits 26 to 27 are reserved).
+	 */
+	ACCESS_SYSTEM_SECURITY		=3D cpu_to_le32(0x01000000),
+	MAXIMUM_ALLOWED			=3D cpu_to_le32(0x02000000),
+
+	/*
+	 * The generic rights (bits 28 to 31).  These map onto the standard and
+	 * specific rights.
+	 */
+
+	/* Read, write, and execute access. */
+	GENERIC_ALL			=3D cpu_to_le32(0x10000000),
+
+	/* Execute access. */
+	GENERIC_EXECUTE			=3D cpu_to_le32(0x20000000),
+
+	/*
+	 * Write access.  For files, this maps onto:
+	 *	FILE_APPEND_DATA | FILE_WRITE_ATTRIBUTES | FILE_WRITE_DATA |
+	 *	FILE_WRITE_EA | STANDARD_RIGHTS_WRITE | SYNCHRONIZE
+	 * For directories, the mapping has the same numerical value.  See
+	 * above for the descriptions of the rights granted.
+	 */
+	GENERIC_WRITE			=3D cpu_to_le32(0x40000000),
+
+	/*
+	 * Read access.  For files, this maps onto:
+	 *	FILE_READ_ATTRIBUTES | FILE_READ_DATA | FILE_READ_EA |
+	 *	STANDARD_RIGHTS_READ | SYNCHRONIZE
+	 * For directories, the mapping has the same numberical value.  See
+	 * above for the descriptions of the rights granted.
+	 */
+	GENERIC_READ			=3D cpu_to_le32(0x80000000),
+};
+
+/*
+ * The predefined ACE type structures are as defined below.
+ */
+
+struct ntfs_ace {
+	u8 type;		/* Type of the ACE. */
+	u8 flags;		/* Flags describing the ACE. */
+	__le16 size;		/* Size in bytes of the ACE. */
+	__le32 mask;	/* Access mask associated with the ACE. */
+	struct ntfs_sid sid;	/* The SID associated with the ACE. */
+} __packed;
+
+/*
+ * The object ACE flags (32-bit).
+ */
+enum {
+	ACE_OBJECT_TYPE_PRESENT			=3D cpu_to_le32(1),
+	ACE_INHERITED_OBJECT_TYPE_PRESENT	=3D cpu_to_le32(2),
+};
+
+/*
+ * An ACL is an access-control list (ACL).
+ * An ACL starts with an ACL header structure, which specifies the size of
+ * the ACL and the number of ACEs it contains. The ACL header is followed =
by
+ * zero or more access control entries (ACEs). The ACL as well as each ACE
+ * are aligned on 4-byte boundaries.
+ */
+struct ntfs_acl {
+	u8 revision;	/* Revision of this ACL. */
+	u8 alignment1;
+	__le16 size;	/*
+			 * Allocated space in bytes for ACL. Includes this
+			 * header, the ACEs and the remaining free space.
+			 */
+	__le16 ace_count;	/* Number of ACEs in the ACL. */
+	__le16 alignment2;
+/* sizeof() =3D 8 bytes */
+} __packed;
+
+/*
+ * The security descriptor control flags (16-bit).
+ *
+ * SE_OWNER_DEFAULTED - This boolean flag, when set, indicates that the SID
+ *	pointed to by the Owner field was provided by a defaulting mechanism
+ *	rather than explicitly provided by the original provider of the
+ *	security descriptor.  This may affect the treatment of the SID with
+ *	respect to inheritance of an owner.
+ *
+ * SE_GROUP_DEFAULTED - This boolean flag, when set, indicates that the SI=
D in
+ *	the Group field was provided by a defaulting mechanism rather than
+ *	explicitly provided by the original provider of the security
+ *	descriptor.  This may affect the treatment of the SID with respect to
+ *	inheritance of a primary group.
+ *
+ * SE_DACL_PRESENT - This boolean flag, when set, indicates that the secur=
ity
+ *	descriptor contains a discretionary ACL.  If this flag is set and the
+ *	Dacl field of the SECURITY_DESCRIPTOR is null, then a null ACL is
+ *	explicitly being specified.
+ *
+ * SE_DACL_DEFAULTED - This boolean flag, when set, indicates that the ACL
+ *	pointed to by the Dacl field was provided by a defaulting mechanism
+ *	rather than explicitly provided by the original provider of the
+ *	security descriptor.  This may affect the treatment of the ACL with
+ *	respect to inheritance of an ACL.  This flag is ignored if the
+ *	DaclPresent flag is not set.
+ *
+ * SE_SACL_PRESENT - This boolean flag, when set,  indicates that the secu=
rity
+ *	descriptor contains a system ACL pointed to by the Sacl field.  If this
+ *	flag is set and the Sacl field of the SECURITY_DESCRIPTOR is null, then
+ *	an empty (but present) ACL is being specified.
+ *
+ * SE_SACL_DEFAULTED - This boolean flag, when set, indicates that the ACL
+ *	pointed to by the Sacl field was provided by a defaulting mechanism
+ *	rather than explicitly provided by the original provider of the
+ *	security descriptor.  This may affect the treatment of the ACL with
+ *	respect to inheritance of an ACL.  This flag is ignored if the
+ *	SaclPresent flag is not set.
+ *
+ * SE_SELF_RELATIVE - This boolean flag, when set, indicates that the secu=
rity
+ *	descriptor is in self-relative form.  In this form, all fields of the
+ *	security descriptor are contiguous in memory and all pointer fields are
+ *	expressed as offsets from the beginning of the security descriptor.
+ */
+enum {
+	SE_OWNER_DEFAULTED		=3D cpu_to_le16(0x0001),
+	SE_GROUP_DEFAULTED		=3D cpu_to_le16(0x0002),
+	SE_DACL_PRESENT			=3D cpu_to_le16(0x0004),
+	SE_DACL_DEFAULTED		=3D cpu_to_le16(0x0008),
+
+	SE_SACL_PRESENT			=3D cpu_to_le16(0x0010),
+	SE_SACL_DEFAULTED		=3D cpu_to_le16(0x0020),
+
+	SE_DACL_AUTO_INHERIT_REQ	=3D cpu_to_le16(0x0100),
+	SE_SACL_AUTO_INHERIT_REQ	=3D cpu_to_le16(0x0200),
+	SE_DACL_AUTO_INHERITED		=3D cpu_to_le16(0x0400),
+	SE_SACL_AUTO_INHERITED		=3D cpu_to_le16(0x0800),
+
+	SE_DACL_PROTECTED		=3D cpu_to_le16(0x1000),
+	SE_SACL_PROTECTED		=3D cpu_to_le16(0x2000),
+	SE_RM_CONTROL_VALID		=3D cpu_to_le16(0x4000),
+	SE_SELF_RELATIVE		=3D cpu_to_le16(0x8000)
+} __packed;
+
+/*
+ * Self-relative security descriptor. Contains the owner and group SIDs as=
 well
+ * as the sacl and dacl ACLs inside the security descriptor itself.
+ */
+struct security_descriptor_relative {
+	u8 revision;	/* Revision level of the security descriptor. */
+	u8 alignment;
+	__le16 control;	/*
+			 * Flags qualifying the type of * the descriptor as well as
+			 * the following fields.
+			 */
+	__le32 owner;	/*
+			 * Byte offset to a SID representing an object's
+			 * owner. If this is NULL, no owner SID is present in
+			 * the descriptor.
+			 */
+	__le32 group;	/*
+			 * Byte offset to a SID representing an object's
+			 * primary group. If this is NULL, no primary group
+			 * SID is present in the descriptor.
+			 */
+	__le32 sacl;	/*
+			 * Byte offset to a system ACL. Only valid, if
+			 * SE_SACL_PRESENT is set in the control field. If
+			 * SE_SACL_PRESENT is set but sacl is NULL, a NULL ACL
+			 * is specified.
+			 */
+	__le32 dacl;	/*
+			 * Byte offset to a discretionary ACL. Only valid, if
+			 * SE_DACL_PRESENT is set in the control field. If
+			 * SE_DACL_PRESENT is set but dacl is NULL, a NULL ACL
+			 * (unconditionally granting access) is specified.
+			 */
+/* sizeof() =3D 0x14 bytes */
+} __packed;
+
+/*
+ * On NTFS 3.0+, all security descriptors are stored in FILE_Secure. Only =
one
+ * referenced instance of each unique security descriptor is stored.
+ *
+ * FILE_Secure contains no unnamed data attribute, i.e. it has zero length=
. It
+ * does, however, contain two indexes ($SDH and $SII) as well as a named d=
ata
+ * stream ($SDS).
+ *
+ * Every unique security descriptor is assigned a unique security identifi=
er
+ * (security_id, not to be confused with a SID). The security_id is unique=
 for
+ * the NTFS volume and is used as an index into the $SII index, which maps
+ * security_ids to the security descriptor's storage location within the $=
SDS
+ * data attribute. The $SII index is sorted by ascending security_id.
+ *
+ * A simple hash is computed from each security descriptor. This hash is u=
sed
+ * as an index into the $SDH index, which maps security descriptor hashes =
to
+ * the security descriptor's storage location within the $SDS data attribu=
te.
+ * The $SDH index is sorted by security descriptor hash and is stored in a=
 B+
+ * tree. When searching $SDH (with the intent of determining whether or no=
t a
+ * new security descriptor is already present in the $SDS data stream), if=
 a
+ * matching hash is found, but the security descriptors do not match, the
+ * search in the $SDH index is continued, searching for a next matching ha=
sh.
+ *
+ * When a precise match is found, the security_id coresponding to the secu=
rity
+ * descriptor in the $SDS attribute is read from the found $SDH index entr=
y and
+ * is stored in the $STANDARD_INFORMATION attribute of the file/directory =
to
+ * which the security descriptor is being applied. The $STANDARD_INFORMATI=
ON
+ * attribute is present in all base mft records (i.e. in all files and
+ * directories).
+ *
+ * If a match is not found, the security descriptor is assigned a new uniq=
ue
+ * security_id and is added to the $SDS data attribute. Then, entries
+ * referencing the this security descriptor in the $SDS data attribute are
+ * added to the $SDH and $SII indexes.
+ *
+ * Note: Entries are never deleted from FILE_Secure, even if nothing
+ * references an entry any more.
+ */
+
+/*
+ * The index entry key used in the $SII index. The collation type is
+ * COLLATION_NTOFS_ULONG.
+ */
+struct sii_index_key {
+	__le32 security_id; /* The security_id assigned to the descriptor. */
+} __packed;
+
+/*
+ * The index entry key used in the $SDH index. The keys are sorted first by
+ * hash and then by security_id. The collation rule is
+ * COLLATION_NTOFS_SECURITY_HASH.
+ */
+struct sdh_index_key {
+	__le32 hash;	  /* Hash of the security descriptor. */
+	__le32 security_id; /* The security_id assigned to the descriptor. */
+} __packed;
+
+/*
+ * Possible flags for the volume (16-bit).
+ */
+enum {
+	VOLUME_IS_DIRTY			=3D cpu_to_le16(0x0001),
+	VOLUME_RESIZE_LOG_FILE		=3D cpu_to_le16(0x0002),
+	VOLUME_UPGRADE_ON_MOUNT		=3D cpu_to_le16(0x0004),
+	VOLUME_MOUNTED_ON_NT4		=3D cpu_to_le16(0x0008),
+
+	VOLUME_DELETE_USN_UNDERWAY	=3D cpu_to_le16(0x0010),
+	VOLUME_REPAIR_OBJECT_ID		=3D cpu_to_le16(0x0020),
+
+	VOLUME_CHKDSK_UNDERWAY		=3D cpu_to_le16(0x4000),
+	VOLUME_MODIFIED_BY_CHKDSK	=3D cpu_to_le16(0x8000),
+
+	VOLUME_FLAGS_MASK		=3D cpu_to_le16(0xc03f),
+
+	/* To make our life easier when checking if we must mount read-only. */
+	VOLUME_MUST_MOUNT_RO_MASK	=3D cpu_to_le16(0xc027),
+} __packed;
+
+/*
+ * Attribute: Volume information (0x70).
+ *
+ * NOTE: Always resident.
+ * NOTE: Present only in FILE_Volume.
+ * NOTE: Windows 2000 uses NTFS 3.0 while Windows NT4 service pack 6a uses
+ *	 NTFS 1.2. I haven't personally seen other values yet.
+ */
+struct volume_information {
+	__le64 reserved;		/* Not used (yet?). */
+	u8 major_ver;		/* Major version of the ntfs format. */
+	u8 minor_ver;		/* Minor version of the ntfs format. */
+	__le16 flags;		/* Bit array of VOLUME_* flags. */
+} __packed;
+
+/*
+ * Index header flags (8-bit).
+ */
+enum {
+	/*
+	 * When index header is in an index root attribute:
+	 */
+	SMALL_INDEX =3D 0, /*
+			  * The index is small enough to fit inside the index
+			  * root attribute and there is no index allocation
+			  * attribute present.
+			  */
+	LARGE_INDEX =3D 1, /*
+			  * The index is too large to fit in the index root
+			  * attribute and/or an index allocation attribute is
+			  * present.
+			  */
+	/*
+	 * When index header is in an index block, i.e. is part of index
+	 * allocation attribute:
+	 */
+	LEAF_NODE  =3D 0, /*
+			 * This is a leaf node, i.e. there are no more nodes
+			 * branching off it.
+			 */
+	INDEX_NODE =3D 1, /*
+			 * This node indexes other nodes, i.e. it is not a leaf
+			 * node.
+			 */
+	NODE_MASK  =3D 1, /* Mask for accessing the *_NODE bits. */
+} __packed;
+
+/*
+ * This is the header for indexes, describing the INDEX_ENTRY records, whi=
ch
+ * follow the index_header. Together the index header and the index entries
+ * make up a complete index.
+ *
+ * IMPORTANT NOTE: The offset, length and size structure members are count=
ed
+ * relative to the start of the index header structure and not relative to=
 the
+ * start of the index root or index allocation structures themselves.
+ */
+struct index_header {
+	__le32 entries_offset;		/*
+					 * Byte offset to first INDEX_ENTRY
+					 * aligned to 8-byte boundary.
+					 */
+	__le32 index_length;		/*
+					 * Data size of the index in bytes,
+					 * i.e. bytes used from allocated
+					 * size, aligned to 8-byte boundary.
+					 */
+	__le32 allocated_size;		/*
+					 * Byte size of this index (block),
+					 * multiple of 8 bytes.
+					 */
+	/*
+	 * NOTE: For the index root attribute, the above two numbers are always
+	 * equal, as the attribute is resident and it is resized as needed. In
+	 * the case of the index allocation attribute the attribute is not
+	 * resident and hence the allocated_size is a fixed value and must
+	 * equal the index_block_size specified by the INDEX_ROOT attribute
+	 * corresponding to the INDEX_ALLOCATION attribute this INDEX_BLOCK
+	 * belongs to.
+	 */
+	u8 flags;			/* Bit field of INDEX_HEADER_FLAGS. */
+	u8 reserved[3];			/* Reserved/align to 8-byte boundary. */
+} __packed;
+
+/*
+ * Attribute: Index root (0x90).
+ *
+ * NOTE: Always resident.
+ *
+ * This is followed by a sequence of index entries (INDEX_ENTRY structures)
+ * as described by the index header.
+ *
+ * When a directory is small enough to fit inside the index root then this
+ * is the only attribute describing the directory. When the directory is t=
oo
+ * large to fit in the index root, on the other hand, two additional attri=
butes
+ * are present: an index allocation attribute, containing sub-nodes of the=
 B+
+ * directory tree (see below), and a bitmap attribute, describing which vi=
rtual
+ * cluster numbers (vcns) in the index allocation attribute are in use by =
an
+ * index block.
+ *
+ * NOTE: The root directory (FILE_root) contains an entry for itself. Other
+ * directories do not contain entries for themselves, though.
+ */
+struct index_root {
+	__le32 type;			/*
+					 * Type of the indexed attribute. Is
+					 * $FILE_NAME for directories, zero
+					 * for view indexes. No other values
+					 * allowed.
+					 */
+	__le32 collation_rule;		/*
+					 * Collation rule used to sort the index
+					 * entries. If type is $FILE_NAME, this
+					 * must be COLLATION_FILE_NAME.
+					 */
+	__le32 index_block_size;	/*
+					 * Size of each index block in bytes (in
+					 * the index allocation attribute).
+					 */
+	u8 clusters_per_index_block;	/*
+					 * Cluster size of each index block (in
+					 * the index allocation attribute), when
+					 * an index block is >=3D than a cluster,
+					 * otherwise this will be the log of
+					 * the size (like how the encoding of
+					 * the mft record size and the index
+					 * record size found in the boot sector
+					 * work). Has to be a power of 2.
+					 */
+	u8 reserved[3];			/* Reserved/align to 8-byte boundary. */
+	struct index_header index;	/* Index header describing the following index=
 entries. */
+} __packed;
+
+/*
+ * Attribute: Index allocation (0xa0).
+ *
+ * NOTE: Always non-resident (doesn't make sense to be resident anyway!).
+ *
+ * This is an array of index blocks. Each index block starts with an
+ * index_block structure containing an index header, followed by a sequenc=
e of
+ * index entries (INDEX_ENTRY structures), as described by the struct inde=
x_header.
+ */
+struct index_block {
+	__le32 magic;		/* Magic is "INDX". */
+	__le16 usa_ofs;		/* See ntfs_record struct definition. */
+	__le16 usa_count;	/* See ntfs_record struct  definition. */
+
+	__le64 lsn;		/*
+				 * LogFile sequence number of the last
+				 * modification of this index block.
+				 */
+	__le64 index_block_vcn;	/*
+				 * Virtual cluster number of the index block.
+				 * If the cluster_size on the volume is <=3D the
+				 * index_block_size of the directory,
+				 * index_block_vcn counts in units of clusters,
+				 * and in units of sectors otherwise.
+				 */
+	struct index_header index;	/* Describes the following index entries. */
+/* sizeof()=3D 40 (0x28) bytes */
+/*
+ * When creating the index block, we place the update sequence array at th=
is
+ * offset, i.e. before we start with the index entries. This also makes se=
nse,
+ * otherwise we could run into problems with the update sequence array
+ * containing in itself the last two bytes of a sector which would mean th=
at
+ * multi sector transfer protection wouldn't work. As you can't protect da=
ta
+ * by overwriting it since you then can't get it back...
+ * When reading use the data from the ntfs record header.
+ */
+} __packed;
+
+/*
+ * The system file FILE_Extend/$Reparse contains an index named $R listing
+ * all reparse points on the volume. The index entry keys are as defined
+ * below. Note, that there is no index data associated with the index entr=
ies.
+ *
+ * The index entries are sorted by the index key file_id. The collation ru=
le is
+ * COLLATION_NTOFS_ULONGS.
+ */
+struct reparse_index_key {
+	__le32 reparse_tag;	/* Reparse point type (inc. flags). */
+	__le64 file_id;		/*
+				 * Mft record of the file containing
+				 * the reparse point attribute.
+				 */
+} __packed;
+
+/*
+ * Quota flags (32-bit).
+ *
+ * The user quota flags.  Names explain meaning.
+ */
+enum {
+	QUOTA_FLAG_DEFAULT_LIMITS	=3D cpu_to_le32(0x00000001),
+	QUOTA_FLAG_LIMIT_REACHED	=3D cpu_to_le32(0x00000002),
+	QUOTA_FLAG_ID_DELETED		=3D cpu_to_le32(0x00000004),
+
+	QUOTA_FLAG_USER_MASK		=3D cpu_to_le32(0x00000007),
+	/* This is a bit mask for the user quota flags. */
+
+	/*
+	 * These flags are only present in the quota defaults index entry, i.e.
+	 * in the entry where owner_id =3D QUOTA_DEFAULTS_ID.
+	 */
+	QUOTA_FLAG_TRACKING_ENABLED	=3D cpu_to_le32(0x00000010),
+	QUOTA_FLAG_ENFORCEMENT_ENABLED	=3D cpu_to_le32(0x00000020),
+	QUOTA_FLAG_TRACKING_REQUESTED	=3D cpu_to_le32(0x00000040),
+	QUOTA_FLAG_LOG_THRESHOLD	=3D cpu_to_le32(0x00000080),
+
+	QUOTA_FLAG_LOG_LIMIT		=3D cpu_to_le32(0x00000100),
+	QUOTA_FLAG_OUT_OF_DATE		=3D cpu_to_le32(0x00000200),
+	QUOTA_FLAG_CORRUPT		=3D cpu_to_le32(0x00000400),
+	QUOTA_FLAG_PENDING_DELETES	=3D cpu_to_le32(0x00000800),
+};
+
+/*
+ * The system file FILE_Extend/$Quota contains two indexes $O and $Q. Quot=
as
+ * are on a per volume and per user basis.
+ *
+ * The $Q index contains one entry for each existing user_id on the volume=
. The
+ * index key is the user_id of the user/group owning this quota control en=
try,
+ * i.e. the key is the owner_id. The user_id of the owner of a file, i.e. =
the
+ * owner_id, is found in the standard information attribute. The collation=
 rule
+ * for $Q is COLLATION_NTOFS_ULONG.
+ *
+ * The $O index contains one entry for each user/group who has been assign=
ed
+ * a quota on that volume. The index key holds the SID of the user_id the
+ * entry belongs to, i.e. the owner_id. The collation rule for $O is
+ * COLLATION_NTOFS_SID.
+ *
+ * The $O index entry data is the user_id of the user corresponding to the=
 SID.
+ * This user_id is used as an index into $Q to find the quota control entry
+ * associated with the SID.
+ *
+ * The $Q index entry data is the quota control entry and is defined below.
+ */
+struct quota_control_entry {
+	__le32 version;		/* Currently equals 2. */
+	__le32 flags;		/* Flags describing this quota entry. */
+	__le64 bytes_used;	/* How many bytes of the quota are in use. */
+	__le64 change_time;	/* Last time this quota entry was changed. */
+	__le64 threshold;	/* Soft quota (-1 if not limited). */
+	__le64 limit;		/* Hard quota (-1 if not limited). */
+	__le64 exceeded_time;	/* How long the soft quota has been exceeded. */
+	struct ntfs_sid sid;	/*
+				 * The SID of the user/object associated with
+				 * this quota entry.  Equals zero for the quota
+				 * defaults entry (and in fact on a WinXP
+				 * volume, it is not present at all).
+				 */
+} __packed;
+
+/*
+ * Predefined owner_id values (32-bit).
+ */
+enum {
+	QUOTA_INVALID_ID	=3D cpu_to_le32(0x00000000),
+	QUOTA_DEFAULTS_ID	=3D cpu_to_le32(0x00000001),
+	QUOTA_FIRST_USER_ID	=3D cpu_to_le32(0x00000100),
+};
+
+/*
+ * Current constants for quota control entries.
+ */
+enum {
+	/* Current version. */
+	QUOTA_VERSION	=3D 2,
+};
+
+/*
+ * Index entry flags (16-bit).
+ */
+enum {
+	INDEX_ENTRY_NODE =3D cpu_to_le16(1), /*
+					    * This entry contains a sub-node,
+					    * i.e. a reference to an index block
+					    * in form of a virtual cluster number
+					    * (see below).
+					    */
+	INDEX_ENTRY_END  =3D cpu_to_le16(2), /*
+					    * This signifies the last entry in an
+					    * index block.  The index entry does not
+					    * represent a file but it can point
+					    * to a sub-node.
+					    */
+
+	INDEX_ENTRY_SPACE_FILLER =3D cpu_to_le16(0xffff), /* gcc: Force enum bit =
width to 16-bit. */
+} __packed;
+
+/*
+ * This the index entry header (see below).
+ */
+struct index_entry_header {
+/*  0*/	union {
+		struct { /* Only valid when INDEX_ENTRY_END is not set. */
+			__le64 indexed_file;	/*
+						 * The mft reference of the file
+						 * described by this index entry.
+						 * Used for directory indexes.
+						 */
+		} __packed dir;
+		struct {
+			/* Used for views/indexes to find the entry's data. */
+			__le16 data_offset;	/*
+						 * Data byte offset from this
+						 * INDEX_ENTRY. Follows the index key.
+						 */
+			__le16 data_length;	/* Data length in bytes. */
+			__le32 reservedV;		/* Reserved (zero). */
+		} __packed vi;
+	} __packed data;
+	__le16 length;		/* Byte size of this index entry, multiple of 8-bytes. */
+	__le16 key_length;	/*
+				 * Byte size of the key value, which is in the index entry.
+				 * It follows field reserved. Not multiple of 8-bytes.
+				 */
+	__le16 flags; /* Bit field of INDEX_ENTRY_* flags. */
+	__le16 reserved;		 /* Reserved/align to 8-byte boundary. */
+/* sizeof() =3D 16 bytes */
+} __packed;
+
+/*
+ * This is an index entry. A sequence of such entries follows each index_h=
eader
+ * structure. Together they make up a complete index. The index follows ei=
ther
+ * an index root attribute or an index allocation attribute.
+ *
+ * NOTE: Before NTFS 3.0 only filename attributes were indexed.
+ */
+struct index_entry {
+	union {
+		struct { /* Only valid when INDEX_ENTRY_END is not set. */
+			__le64 indexed_file;	/*
+						 * The mft reference of the file
+						 * described by this index entry.
+						 * Used for directory indexes.
+						 */
+		} __packed dir;
+		struct { /* Used for views/indexes to find the entry's data. */
+			__le16 data_offset;	/*
+						 * Data byte offset from this INDEX_ENTRY.
+						 * Follows the index key.
+						 */
+			__le16 data_length;	/* Data length in bytes. */
+			__le32 reservedV;		/* Reserved (zero). */
+		} __packed vi;
+	} __packed data;
+	__le16 length;		 /* Byte size of this index entry, multiple of 8-bytes. */
+	__le16 key_length;	 /*
+				  * Byte size of the key value, which is in the index entry.
+				  * It follows field reserved. Not multiple of 8-bytes.
+				  */
+	__le16 flags;		/* Bit field of INDEX_ENTRY_* flags. */
+	__le16 reserved;		 /* Reserved/align to 8-byte boundary. */
+
+	union {
+		/*
+		 * The key of the indexed attribute. NOTE: Only present
+		 * if INDEX_ENTRY_END bit in flags is not set. NOTE: On
+		 * NTFS versions before 3.0 the only valid key is the
+		 * struct file_name_attr. On NTFS 3.0+ the following
+		 * additional index keys are defined:
+		 */
+		struct file_name_attr file_name;	/* $I30 index in directories. */
+		struct sii_index_key sii;	/* $SII index in $Secure. */
+		struct sdh_index_key sdh;	/* $SDH index in $Secure. */
+		struct guid object_id;	/*
+					 * $O index in FILE_Extend/$ObjId: The object_id
+					 * of the mft record found in the data part of
+					 * the index.
+					 */
+		struct reparse_index_key reparse;	/* $R index in FILE_Extend/$Reparse. */
+		struct ntfs_sid sid;	/*
+					 * $O index in FILE_Extend/$Quota:
+					 * SID of the owner of the user_id.
+					 */
+		__le32 owner_id;	/*
+					 * $Q index in FILE_Extend/$Quota:
+					 * user_id of the owner of the quota
+					 * control entry in the data part of
+					 * the index.
+					 */
+	} __packed key;
+	/*
+	 * The (optional) index data is inserted here when creating.
+	 * __le64 vcn;	   If INDEX_ENTRY_NODE bit in flags is set, the last
+	 *		   eight bytes of this index entry contain the virtual
+	 *		   cluster number of the index block that holds the
+	 *		   entries immediately preceding the current entry (the
+	 *		   vcn references the corresponding cluster in the data
+	 *		   of the non-resident index allocation attribute). If
+	 *		   the key_length is zero, then the vcn immediately
+	 *		   follows the INDEX_ENTRY_HEADER. Regardless of
+	 *		   key_length, the address of the 8-byte boundary
+	 *		   aligned vcn of INDEX_ENTRY{_HEADER} *ie is given by
+	 *		   (char*)ie + le16_to_cpu(ie*)->length) - sizeof(VCN),
+	 *		   where sizeof(VCN) can be hardcoded as 8 if wanted.
+	 */
+} __packed;
+
+/*
+ * The reparse point tag defines the type of the reparse point. It also
+ * includes several flags, which further describe the reparse point.
+ *
+ * The reparse point tag is an unsigned 32-bit value divided in three part=
s:
+ *
+ * 1. The least significant 16 bits (i.e. bits 0 to 15) specify the type of
+ *    the reparse point.
+ * 2. The 12 bits after this (i.e. bits 16 to 27) are reserved for future =
use.
+ * 3. The most significant four bits are flags describing the reparse poin=
t.
+ *    They are defined as follows:
+ *	bit 28: Directory bit. If set, the directory is not a surrogate
+ *		and can be used the usual way.
+ *	bit 29: Name surrogate bit. If set, the filename is an alias for
+ *		another object in the system.
+ *	bit 30: High-latency bit. If set, accessing the first byte of data will
+ *		be slow. (E.g. the data is stored on a tape drive.)
+ *	bit 31: Microsoft bit. If set, the tag is owned by Microsoft. User
+ *		defined tags have to use zero here.
+ * 4. Moreover, on Windows 10 :
+ *	Some flags may be used in bits 12 to 15 to further describe the
+ *	reparse point.
+ */
+enum {
+	IO_REPARSE_TAG_DIRECTORY	=3D cpu_to_le32(0x10000000),
+	IO_REPARSE_TAG_IS_ALIAS		=3D cpu_to_le32(0x20000000),
+	IO_REPARSE_TAG_IS_HIGH_LATENCY	=3D cpu_to_le32(0x40000000),
+	IO_REPARSE_TAG_IS_MICROSOFT	=3D cpu_to_le32(0x80000000),
+
+	IO_REPARSE_TAG_RESERVED_ZERO	=3D cpu_to_le32(0x00000000),
+	IO_REPARSE_TAG_RESERVED_ONE	=3D cpu_to_le32(0x00000001),
+	IO_REPARSE_TAG_RESERVED_RANGE	=3D cpu_to_le32(0x00000001),
+
+	IO_REPARSE_TAG_CSV		=3D cpu_to_le32(0x80000009),
+	IO_REPARSE_TAG_DEDUP		=3D cpu_to_le32(0x80000013),
+	IO_REPARSE_TAG_DFS		=3D cpu_to_le32(0x8000000A),
+	IO_REPARSE_TAG_DFSR		=3D cpu_to_le32(0x80000012),
+	IO_REPARSE_TAG_HSM		=3D cpu_to_le32(0xC0000004),
+	IO_REPARSE_TAG_HSM2		=3D cpu_to_le32(0x80000006),
+	IO_REPARSE_TAG_MOUNT_POINT	=3D cpu_to_le32(0xA0000003),
+	IO_REPARSE_TAG_NFS		=3D cpu_to_le32(0x80000014),
+	IO_REPARSE_TAG_SIS		=3D cpu_to_le32(0x80000007),
+	IO_REPARSE_TAG_SYMLINK		=3D cpu_to_le32(0xA000000C),
+	IO_REPARSE_TAG_WIM		=3D cpu_to_le32(0x80000008),
+	IO_REPARSE_TAG_DFM		=3D cpu_to_le32(0x80000016),
+	IO_REPARSE_TAG_WOF		=3D cpu_to_le32(0x80000017),
+	IO_REPARSE_TAG_WCI		=3D cpu_to_le32(0x80000018),
+	IO_REPARSE_TAG_CLOUD		=3D cpu_to_le32(0x9000001A),
+	IO_REPARSE_TAG_APPEXECLINK	=3D cpu_to_le32(0x8000001B),
+	IO_REPARSE_TAG_GVFS		=3D cpu_to_le32(0x9000001C),
+	IO_REPARSE_TAG_LX_SYMLINK	=3D cpu_to_le32(0xA000001D),
+	IO_REPARSE_TAG_AF_UNIX		=3D cpu_to_le32(0x80000023),
+	IO_REPARSE_TAG_LX_FIFO		=3D cpu_to_le32(0x80000024),
+	IO_REPARSE_TAG_LX_CHR		=3D cpu_to_le32(0x80000025),
+	IO_REPARSE_TAG_LX_BLK		=3D cpu_to_le32(0x80000026),
+
+	IO_REPARSE_TAG_VALID_VALUES	=3D cpu_to_le32(0xf000ffff),
+	IO_REPARSE_PLUGIN_SELECT	=3D cpu_to_le32(0xffff0fff),
+};
+
+/*
+ * Attribute: Reparse point (0xc0).
+ *
+ * NOTE: Can be resident or non-resident.
+ */
+struct reparse_point {
+	__le32 reparse_tag;		/* Reparse point type (inc. flags). */
+	__le16 reparse_data_length;	/* Byte size of reparse data. */
+	__le16 reserved;			/* Align to 8-byte boundary. */
+	u8 reparse_data[0];		/* Meaning depends on reparse_tag. */
+} __packed;
+
+/*
+ * Attribute: Extended attribute (EA) information (0xd0).
+ *
+ * NOTE: Always resident. (Is this true???)
+ */
+struct ea_information {
+	__le16 ea_length;		/* Byte size of the packed extended attributes. */
+	__le16 need_ea_count;	/*
+				 * The number of extended attributes which have
+				 * the NEED_EA bit set.
+				 */
+	__le32 ea_query_length;	/*
+				 * Byte size of the buffer required to query
+				 * the extended attributes when calling
+				 * ZwQueryEaFile() in Windows NT/2k. I.e.
+				 * the byte size of the unpacked extended attributes.
+				 */
+} __packed;
+
+/*
+ * Extended attribute flags (8-bit).
+ */
+enum {
+	NEED_EA	=3D 0x80		/*
+				 * If set the file to which the EA belongs
+				 * cannot be interpreted without understanding
+				 * the associates extended attributes.
+				 */
+} __packed;
+
+/*
+ * Attribute: Extended attribute (EA) (0xe0).
+ *
+ * NOTE: Can be resident or non-resident.
+ *
+ * Like the attribute list and the index buffer list, the EA attribute val=
ue is
+ * a sequence of EA_ATTR variable length records.
+ */
+struct ea_attr {
+	__le32 next_entry_offset;	/* Offset to the next EA_ATTR. */
+	u8 flags;		/* Flags describing the EA. */
+	u8 ea_name_length;	/*
+				 * Length of the name of the EA in bytes
+				 * excluding the '\0' byte terminator.
+				 */
+	__le16 ea_value_length;	/* Byte size of the EA's value. */
+	u8 ea_name[];		/*
+				 * Name of the EA.  Note this is ASCII, not
+				 * Unicode and it is zero terminated.
+				 */
+	/* u8 ea_value[]; */	/* The value of the EA.  Immediately follows the nam=
e. */
+} __packed;
+
+#endif /* _LINUX_NTFS_LAYOUT_H */
diff --git a/fs/ntfsplus/lcnalloc.h b/fs/ntfsplus/lcnalloc.h
new file mode 100644
index 000000000000..a1c66b8b73ac
--- /dev/null
+++ b/fs/ntfsplus/lcnalloc.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Exports for NTFS kernel cluster (de)allocation.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2004-2005 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_LCNALLOC_H
+#define _LINUX_NTFS_LCNALLOC_H
+
+#include <linux/sched/mm.h>
+
+#include "attrib.h"
+
+enum {
+	FIRST_ZONE	=3D 0,	/* For sanity checking. */
+	MFT_ZONE	=3D 0,	/* Allocate from $MFT zone. */
+	DATA_ZONE	=3D 1,	/* Allocate from $DATA zone. */
+	LAST_ZONE	=3D 1,	/* For sanity checking. */
+};
+
+struct runlist_element *ntfs_cluster_alloc(struct ntfs_volume *vol,
+		const s64 start_vcn, const s64 count, const s64 start_lcn,
+		const int zone,
+		const bool is_extension,
+		const bool is_contig,
+		const bool is_dealloc);
+s64 __ntfs_cluster_free(struct ntfs_inode *ni, const s64 start_vcn,
+		s64 count, struct ntfs_attr_search_ctx *ctx, const bool is_rollback);
+
+/**
+ * ntfs_cluster_free - free clusters on an ntfs volume
+ * @ni:		ntfs inode whose runlist describes the clusters to free
+ * @start_vcn:	vcn in the runlist of @ni at which to start freeing clusters
+ * @count:	number of clusters to free or -1 for all clusters
+ * @ctx:	active attribute search context if present or NULL if not
+ *
+ * Free @count clusters starting at the cluster @start_vcn in the runlist
+ * described by the ntfs inode @ni.
+ *
+ * If @count is -1, all clusters from @start_vcn to the end of the runlist=
 are
+ * deallocated.  Thus, to completely free all clusters in a runlist, use
+ * @start_vcn =3D 0 and @count =3D -1.
+ *
+ * If @ctx is specified, it is an active search context of @ni and its bas=
e mft
+ * record.  This is needed when ntfs_cluster_free() encounters unmapped ru=
nlist
+ * fragments and allows their mapping.  If you do not have the mft record
+ * mapped, you can specify @ctx as NULL and ntfs_cluster_free() will perfo=
rm
+ * the necessary mapping and unmapping.
+ *
+ * Note, ntfs_cluster_free() saves the state of @ctx on entry and restores=
 it
+ * before returning.  Thus, @ctx will be left pointing to the same attribu=
te on
+ * return as on entry.  However, the actual pointers in @ctx may point to
+ * different memory locations on return, so you must remember to reset any
+ * cached pointers from the @ctx, i.e. after the call to ntfs_cluster_free=
(),
+ * you will probably want to do:
+ *	m =3D ctx->mrec;
+ *	a =3D ctx->attr;
+ * Assuming you cache ctx->attr in a variable @a of type ATTR_RECORD * and=
 that
+ * you cache ctx->mrec in a variable @m of type MFT_RECORD *.
+ *
+ * Note, ntfs_cluster_free() does not modify the runlist, so you have to r=
emove
+ * from the runlist or mark sparse the freed runs later.
+ *
+ * Return the number of deallocated clusters (not counting sparse ones) on
+ * success and -errno on error.
+ *
+ * WARNING: If @ctx is supplied, regardless of whether success or failure =
is
+ *	    returned, you need to check IS_ERR(@ctx->mrec) and if 'true' the @c=
tx
+ *	    is no longer valid, i.e. you need to either call
+ *	    ntfs_attr_reinit_search_ctx() or ntfs_attr_put_search_ctx() on it.
+ *	    In that case PTR_ERR(@ctx->mrec) will give you the error code for
+ *	    why the mapping of the old inode failed.
+ *
+ * Locking: - The runlist described by @ni must be locked for writing on e=
ntry
+ *	      and is locked on return.  Note the runlist may be modified when
+ *	      needed runlist fragments need to be mapped.
+ *	    - The volume lcn bitmap must be unlocked on entry and is unlocked
+ *	      on return.
+ *	    - This function takes the volume lcn bitmap lock for writing and
+ *	      modifies the bitmap contents.
+ *	    - If @ctx is NULL, the base mft record of @ni must not be mapped on
+ *	      entry and it will be left unmapped on return.
+ *	    - If @ctx is not NULL, the base mft record must be mapped on entry
+ *	      and it will be left mapped on return.
+ */
+static inline s64 ntfs_cluster_free(struct ntfs_inode *ni, const s64 start=
_vcn,
+		s64 count, struct ntfs_attr_search_ctx *ctx)
+{
+	return __ntfs_cluster_free(ni, start_vcn, count, ctx, false);
+}
+
+int ntfs_cluster_free_from_rl_nolock(struct ntfs_volume *vol,
+		const struct runlist_element *rl);
+
+/**
+ * ntfs_cluster_free_from_rl - free clusters from runlist
+ * @vol:	mounted ntfs volume on which to free the clusters
+ * @rl:		runlist describing the clusters to free
+ *
+ * Free all the clusters described by the runlist @rl on the volume @vol. =
 In
+ * the case of an error being returned, at least some of the clusters were=
 not
+ * freed.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: - This function takes the volume lcn bitmap lock for writing a=
nd
+ *	      modifies the bitmap contents.
+ *	    - The caller must have locked the runlist @rl for reading or
+ *	      writing.
+ */
+static inline int ntfs_cluster_free_from_rl(struct ntfs_volume *vol,
+		const struct runlist_element *rl)
+{
+	int ret;
+	unsigned int memalloc_flags;
+
+	memalloc_flags =3D memalloc_nofs_save();
+	down_write(&vol->lcnbmp_lock);
+	ret =3D ntfs_cluster_free_from_rl_nolock(vol, rl);
+	up_write(&vol->lcnbmp_lock);
+	memalloc_nofs_restore(memalloc_flags);
+	return ret;
+}
+
+#endif /* defined _LINUX_NTFS_LCNALLOC_H */
diff --git a/fs/ntfsplus/logfile.h b/fs/ntfsplus/logfile.h
new file mode 100644
index 000000000000..3c7e42425503
--- /dev/null
+++ b/fs/ntfsplus/logfile.h
@@ -0,0 +1,316 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for NTFS kernel journal (LogFile) handling.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2000-2005 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_LOGFILE_H
+#define _LINUX_NTFS_LOGFILE_H
+
+#include "layout.h"
+
+/*
+ * Journal (LogFile) organization:
+ *
+ * Two restart areas present in the first two pages (restart pages, one re=
start
+ * area in each page).  When the volume is dismounted they should be ident=
ical,
+ * except for the update sequence array which usually has a different upda=
te
+ * sequence number.
+ *
+ * These are followed by log records organized in pages headed by a log re=
cord
+ * header going up to log file size.  Not all pages contain log records wh=
en a
+ * volume is first formatted, but as the volume ages, all records will be =
used.
+ * When the log file fills up, the records at the beginning are purged (by
+ * modifying the oldest_lsn to a higher value presumably) and writing begi=
ns
+ * at the beginning of the file.  Effectively, the log file is viewed as a
+ * circular entity.
+ *
+ * NOTE: Windows NT, 2000, and XP all use log file version 1.1 but they ac=
cept
+ * versions <=3D 1.x, including 0.-1.  (Yes, that is a minus one in there!=
)  We
+ * probably only want to support 1.1 as this seems to be the current versi=
on
+ * and we don't know how that differs from the older versions.  The only
+ * exception is if the journal is clean as marked by the two restart pages
+ * then it doesn't matter whether we are on an earlier version.  We can ju=
st
+ * reinitialize the logfile and start again with version 1.1.
+ */
+
+/* Some LogFile related constants. */
+#define MaxLogFileSize		0x100000000ULL
+#define DefaultLogPageSize	4096
+#define MinLogRecordPages	48
+
+/*
+ * Log file restart page header (begins the restart area).
+ */
+struct restart_page_header {
+	__le32 magic;		/* The magic is "RSTR". */
+	__le16 usa_ofs;		/*
+				 * See ntfs_record struct definition in layout.h.
+				 * When creating, set this to be immediately after
+				 * this header structure (without any alignment).
+				 */
+	__le16 usa_count;	/* See ntfs_record struct definition in layout.h. */
+
+	__le64 chkdsk_lsn;	/*
+				 * The last log file sequence number found by chkdsk.
+				 * Only used when the magic is changed to "CHKD".
+				 * Otherwise this is zero.
+				 */
+	__le32 system_page_size; /*
+				  * Byte size of system pages when the log file was created,
+				  * has to be >=3D 512 and a power of 2.  Use this to calculate
+				  * the required size of the usa (usa_count) and add it to
+				  * usa_ofs. Then verify that the result is less than
+				  * the value of the restart_area_offset.
+				  */
+	__le32 log_page_size;	/*
+				 * Byte size of log file pages, has to be >=3D 512 and
+				 * a power of 2.  The default is 4096 and is used
+				 * when the system page size is between 4096 and 8192.
+				 * Otherwise this is set to the system page size instead.
+				 */
+	__le16 restart_area_offset; /*
+				     * Byte offset from the start of this header to
+				     * the RESTART_AREA. Value has to be aligned to 8-byte
+				     * boundary.  When creating, set this to be after the usa.
+				     */
+	__le16 minor_ver;	/* Log file minor version.  Only check if major version=
 is 1. */
+	__le16 major_ver;	/* Log file major version.  We only support version 1.1=
. */
+/* sizeof() =3D 30 (0x1e) bytes */
+} __packed;
+
+/*
+ * Constant for the log client indices meaning that there are no client re=
cords
+ * in this particular client array.  Also inside the client records themse=
lves,
+ * this means that there are no client records preceding or following this=
 one.
+ */
+#define LOGFILE_NO_CLIENT	cpu_to_le16(0xffff)
+#define LOGFILE_NO_CLIENT_CPU	0xffff
+
+/*
+ * These are the so far known RESTART_AREA_* flags (16-bit) which contain
+ * information about the log file in which they are present.
+ */
+enum {
+	RESTART_VOLUME_IS_CLEAN	=3D cpu_to_le16(0x0002),
+	RESTART_SPACE_FILLER	=3D cpu_to_le16(0xffff), /* gcc: Force enum bit widt=
h to 16. */
+} __packed;
+
+/*
+ * Log file restart area record.  The offset of this record is found by ad=
ding
+ * the offset of the RESTART_PAGE_HEADER to the restart_area_offset value =
found
+ * in it.  See notes at restart_area_offset above.
+ */
+struct restart_area {
+	__le64 current_lsn;		/*
+					 * The current, i.e. last LSN inside the log
+					 * when the restart area was last written.
+					 * This happens often but what is the interval?
+					 * Is it just fixed time or is it every time a
+					 * check point is written or somethine else?
+					 * On create set to 0.
+					 */
+	__le16 log_clients;		/*
+					 * Number of log client records in the array of
+					 * log client records which follows this
+					 * restart area.  Must be 1.
+					 */
+	__le16 client_free_list;	/*
+					 * The index of the first free log client record
+					 * in the array of log client records.
+					 * LOGFILE_NO_CLIENT means that there are no
+					 * free log client records in the array.
+					 * If !=3D LOGFILE_NO_CLIENT, check that
+					 * log_clients > client_free_list.  On Win2k
+					 * and presumably earlier, on a clean volume
+					 * this is !=3D LOGFILE_NO_CLIENT, and it should
+					 * be 0, i.e. the first (and only) client
+					 * record is free and thus the logfile is
+					 * closed and hence clean.  A dirty volume
+					 * would have left the logfile open and hence
+					 * this would be LOGFILE_NO_CLIENT.  On WinXP
+					 * and presumably later, the logfile is always
+					 * open, even on clean shutdown so this should
+					 * always be LOGFILE_NO_CLIENT.
+					 */
+	__le16 client_in_use_list;	/*
+					 * The index of the first in-use log client
+					 * record in the array of log client records.
+					 * LOGFILE_NO_CLIENT means that there are no
+					 * in-use log client records in the array.  If
+					 * !=3D LOGFILE_NO_CLIENT check that log_clients
+					 * > client_in_use_list.  On Win2k and
+					 * presumably earlier, on a clean volume this
+					 * is LOGFILE_NO_CLIENT, i.e. there are no
+					 * client records in use and thus the logfile
+					 * is closed and hence clean.  A dirty volume
+					 * would have left the logfile open and hence
+					 * this would be !=3D LOGFILE_NO_CLIENT, and it
+					 * should be 0, i.e. the first (and only)
+					 * client record is in use.  On WinXP and
+					 * presumably later, the logfile is always
+					 * open, even on clean shutdown so this should
+					 * always be 0.
+					 */
+	__le16 flags;			/*
+					 * Flags modifying LFS behaviour.  On Win2k
+					 * and presumably earlier this is always 0.  On
+					 * WinXP and presumably later, if the logfile
+					 * was shutdown cleanly, the second bit,
+					 * RESTART_VOLUME_IS_CLEAN, is set.  This bit
+					 * is cleared when the volume is mounted by
+					 * WinXP and set when the volume is dismounted,
+					 * thus if the logfile is dirty, this bit is
+					 * clear.  Thus we don't need to check the
+					 * Windows version to determine if the logfile
+					 * is clean.  Instead if the logfile is closed,
+					 * we know it must be clean.  If it is open and
+					 * this bit is set, we also know it must be
+					 * clean.  If on the other hand the logfile is
+					 * open and this bit is clear, we can be almost
+					 * certain that the logfile is dirty.
+					 */
+	__le32 seq_number_bits;		/*
+					 * How many bits to use for the sequence
+					 * number.  This is calculated as 67 - the
+					 * number of bits required to store the logfile
+					 * size in bytes and this can be used in with
+					 * the specified file_size as a consistency
+					 * check.
+					 */
+	__le16 restart_area_length;	/*
+					 * Length of the restart area including the
+					 * client array.  Following checks required if
+					 * version matches.  Otherwise, skip them.
+					 * restart_area_offset + restart_area_length
+					 * has to be <=3D system_page_size.  Also,
+					 * restart_area_length has to be >=3D
+					 * client_array_offset + (log_clients *
+					 * sizeof(log client record)).
+					 */
+	__le16 client_array_offset;	/*
+					 * Offset from the start of this record to
+					 * the first log client record if versions are
+					 * matched.  When creating, set this to be
+					 * after this restart area structure, aligned
+					 * to 8-bytes boundary.  If the versions do not
+					 * match, this is ignored and the offset is
+					 * assumed to be (sizeof(RESTART_AREA) + 7) &
+					 * ~7, i.e. rounded up to first 8-byte
+					 * boundary.  Either way, client_array_offset
+					 * has to be aligned to an 8-byte boundary.
+					 * Also, restart_area_offset +
+					 * client_array_offset has to be <=3D 510.
+					 * Finally, client_array_offset + (log_clients
+					 * sizeof(log client record)) has to be <=3D
+					 * system_page_size.  On Win2k and presumably
+					 * earlier, this is 0x30, i.e. immediately
+					 * following this record.  On WinXP and
+					 * presumably later, this is 0x40, i.e. there
+					 * are 16 extra bytes between this record and
+					 * the client array.  This probably means that
+					 * the RESTART_AREA record is actually bigger
+					 * in WinXP and later.
+					 */
+	__le64 file_size;		/*
+					 * Usable byte size of the log file.  If the
+					 * restart_area_offset + the offset of the
+					 * file_size are > 510 then corruption has
+					 * occurred.  This is the very first check when
+					 * starting with the restart_area as if it
+					 * fails it means that some of the above values
+					 * will be corrupted by the multi sector
+					 * transfer protection.  The file_size has to
+					 * be rounded down to be a multiple of the
+					 * log_page_size in the RESTART_PAGE_HEADER and
+					 * then it has to be at least big enough to
+					 * store the two restart pages and 48 (0x30)
+					 * log record pages.
+					 */
+	__le32 last_lsn_data_length;	/*
+					 * Length of data of last LSN, not including
+					 * the log record header.  On create set to 0.
+					 */
+	__le16 log_record_header_length; /*
+					  * Byte size of the log record header.
+					  * If the version matches then check that the
+					  * value of log_record_header_length is a
+					  * multiple of 8,
+					  * i.e. (log_record_header_length + 7) & ~7 =3D=3D
+					  * log_record_header_length.  When creating set
+					  * it to sizeof(LOG_RECORD_HEADER), aligned to
+					  * 8 bytes.
+					  */
+	__le16 log_page_data_offset;	/*
+					 * Offset to the start of data in a log record
+					 * page.  Must be a multiple of 8.  On create
+					 * set it to immediately after the update sequence
+					 * array of the log record page.
+					 */
+	__le32 restart_log_open_count;	/*
+					 * A counter that gets incremented every time
+					 * the logfile is restarted which happens at mount
+					 * time when the logfile is opened. When creating
+					 * set to a random value.  Win2k sets it to the low
+					 * 32 bits of the current system time in NTFS format
+					 * (see time.h).
+					 */
+	__le32 reserved;		/* Reserved/alignment to 8-byte boundary. */
+/* sizeof() =3D 48 (0x30) bytes */
+} __packed;
+
+/*
+ * Log client record.  The offset of this record is found by adding the of=
fset
+ * of the RESTART_AREA to the client_array_offset value found in it.
+ */
+struct log_client_record {
+	__le64 oldest_lsn;	/*
+				 * Oldest LSN needed by this client.  On create
+				 * set to 0.
+				 */
+	__le64 client_restart_lsn;	/*
+					 * LSN at which this client needs to restart
+					 * the volume, i.e. the current position within
+					 * the log file.  At present, if clean this
+					 * should =3D current_lsn in restart area but it
+					 * probably also =3D current_lsn when dirty most
+					 * of the time.  At create set to 0.
+					 */
+	__le16 prev_client;	/*
+				 * The offset to the previous log client record
+				 * in the array of log client records.
+				 * LOGFILE_NO_CLIENT means there is no previous
+				 * client record, i.e. this is the first one.
+				 * This is always LOGFILE_NO_CLIENT.
+				 */
+	__le16 next_client;	/*
+				 * The offset to the next log client record in
+				 * the array of log client records.
+				 * LOGFILE_NO_CLIENT means there are no next
+				 * client records, i.e. this is the last one.
+				 * This is always LOGFILE_NO_CLIENT.
+				 */
+	__le16 seq_number;	/*
+				 * On Win2k and presumably earlier, this is set
+				 * to zero every time the logfile is restarted
+				 * and it is incremented when the logfile is
+				 * closed at dismount time.  Thus it is 0 when
+				 * dirty and 1 when clean.  On WinXP and
+				 * presumably later, this is always 0.
+				 */
+	u8 reserved[6];		/* Reserved/alignment. */
+	__le32 client_name_length;	/* Length of client name in bytes.  Should alw=
ays be 8. */
+	__le16 client_name[64];		/*
+					 * Name of the client in Unicode.
+					 * Should always be "NTFS" with the remaining bytes
+					 * set to 0.
+					 */
+/* sizeof() =3D 160 (0xa0) bytes */
+} __packed;
+
+bool ntfs_check_logfile(struct inode *log_vi,
+		struct restart_page_header **rp);
+bool ntfs_empty_logfile(struct inode *log_vi);
+#endif /* _LINUX_NTFS_LOGFILE_H */
diff --git a/fs/ntfsplus/mft.h b/fs/ntfsplus/mft.h
new file mode 100644
index 000000000000..cce944242f89
--- /dev/null
+++ b/fs/ntfsplus/mft.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for mft record handling in NTFS Linux kernel driver.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_MFT_H
+#define _LINUX_NTFS_MFT_H
+
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+
+#include "inode.h"
+
+struct mft_record *map_mft_record(struct ntfs_inode *ni);
+void unmap_mft_record(struct ntfs_inode *ni);
+struct mft_record *map_extent_mft_record(struct ntfs_inode *base_ni, u64 m=
ref,
+		struct ntfs_inode **ntfs_ino);
+
+static inline void unmap_extent_mft_record(struct ntfs_inode *ni)
+{
+	unmap_mft_record(ni);
+}
+
+void __mark_mft_record_dirty(struct ntfs_inode *ni);
+
+/**
+ * mark_mft_record_dirty - set the mft record and the page containing it d=
irty
+ * @ni:		ntfs inode describing the mapped mft record
+ *
+ * Set the mapped (extent) mft record of the (base or extent) ntfs inode @=
ni,
+ * as well as the page containing the mft record, dirty.  Also, mark the b=
ase
+ * vfs inode dirty.  This ensures that any changes to the mft record are
+ * written out to disk.
+ *
+ * NOTE:  Do not do anything if the mft record is already marked dirty.
+ */
+static inline void mark_mft_record_dirty(struct ntfs_inode *ni)
+{
+	if (!NInoTestSetDirty(ni))
+		__mark_mft_record_dirty(ni);
+}
+
+int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const unsigned long mft_=
no,
+		struct mft_record *m);
+int write_mft_record_nolock(struct ntfs_inode *ni, struct mft_record *m, i=
nt sync);
+
+/**
+ * write_mft_record - write out a mapped (extent) mft record
+ * @ni:		ntfs inode describing the mapped (extent) mft record
+ * @m:		mapped (extent) mft record to write
+ * @sync:	if true, wait for i/o completion
+ *
+ * This is just a wrapper for write_mft_record_nolock() (see mft.c), which
+ * locks the page for the duration of the write.  This ensures that there =
are
+ * no race conditions between writing the mft record via the dirty inode c=
ode
+ * paths and via the page cache write back code paths or between writing
+ * neighbouring mft records residing in the same page.
+ *
+ * Locking the page also serializes us against ->read_folio() if the page =
is not
+ * uptodate.
+ *
+ * On success, clean the mft record and return 0.  On error, leave the mft
+ * record dirty and return -errno.
+ */
+static inline int write_mft_record(struct ntfs_inode *ni, struct mft_recor=
d *m, int sync)
+{
+	struct folio *folio =3D ni->folio;
+	int err;
+
+	folio_lock(folio);
+	err =3D write_mft_record_nolock(ni, m, sync);
+	folio_unlock(folio);
+
+	return err;
+}
+
+bool ntfs_may_write_mft_record(struct ntfs_volume *vol,
+		const unsigned long mft_no, const struct mft_record *m,
+		struct ntfs_inode **locked_ni);
+int ntfs_mft_record_alloc(struct ntfs_volume *vol, const int mode,
+		struct ntfs_inode **ni, struct ntfs_inode *base_ni,
+		struct mft_record **ni_mrec);
+int ntfs_mft_record_free(struct ntfs_volume *vol, struct ntfs_inode *ni);
+int ntfs_mft_records_write(const struct ntfs_volume *vol, const u64 mref,
+		const s64 count, struct mft_record *b);
+int ntfs_mft_record_check(const struct ntfs_volume *vol, struct mft_record=
 *m,
+			  unsigned long mft_no);
+
+#endif /* _LINUX_NTFS_MFT_H */
diff --git a/fs/ntfsplus/misc.h b/fs/ntfsplus/misc.h
new file mode 100644
index 000000000000..3952c6c18bd0
--- /dev/null
+++ b/fs/ntfsplus/misc.h
@@ -0,0 +1,218 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * NTFS kernel debug support. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_MISC_H
+#define _LINUX_NTFS_MISC_H
+
+#include <linux/fs.h>
+#include <linux/vmalloc.h>
+#include <linux/highmem.h>
+
+#include "runlist.h"
+
+#ifdef DEBUG
+
+extern int debug_msgs;
+
+extern __printf(4, 5)
+void __ntfs_debug(const char *file, int line, const char *function,
+		  const char *format, ...);
+/**
+ * ntfs_debug - write a debug level message to syslog
+ * @f:		a printf format string containing the message
+ * @...:	the variables to substitute into @f
+ *
+ * ntfs_debug() writes a DEBUG level message to the syslog but only if the
+ * driver was compiled with -DDEBUG. Otherwise, the call turns into a NOP.
+ */
+#define ntfs_debug(f, a...)						\
+	__ntfs_debug(__FILE__, __LINE__, __func__, f, ##a)
+
+void ntfs_debug_dump_runlist(const struct runlist_element *rl);
+
+#else	/* !DEBUG */
+
+#define ntfs_debug(fmt, ...)						\
+do {									\
+	if (0)								\
+		no_printk(fmt, ##__VA_ARGS__);				\
+} while (0)
+
+#define ntfs_debug_dump_runlist(rl)					\
+do {									\
+	if (0)								\
+		(void)rl;						\
+} while (0)
+
+#endif	/* !DEBUG */
+
+extern  __printf(3, 4)
+void __ntfs_warning(const char *function, const struct super_block *sb,
+		    const char *fmt, ...);
+#define ntfs_warning(sb, f, a...)	__ntfs_warning(__func__, sb, f, ##a)
+
+extern  __printf(3, 4)
+void __ntfs_error(const char *function, struct super_block *sb,
+		  const char *fmt, ...);
+#define ntfs_error(sb, f, a...)		__ntfs_error(__func__, sb, f, ##a)
+
+void ntfs_handle_error(struct super_block *sb);
+
+#if defined(DEBUG) && defined(CONFIG_SYSCTL)
+int ntfs_sysctl(int add);
+#else
+/* Just return success. */
+static inline int ntfs_sysctl(int add)
+{
+	return 0;
+}
+#endif
+
+#define NTFS_TIME_OFFSET ((s64)(369 * 365 + 89) * 24 * 3600 * 10000000)
+
+/**
+ * utc2ntfs - convert Linux UTC time to NTFS time
+ * @ts:		Linux UTC time to convert to NTFS time
+ *
+ * Convert the Linux UTC time @ts to its corresponding NTFS time and return
+ * that in little endian format.
+ *
+ * Linux stores time in a struct timespec64 consisting of a time64_t tv_sec
+ * and a long tv_nsec where tv_sec is the number of 1-second intervals sin=
ce
+ * 1st January 1970, 00:00:00 UTC and tv_nsec is the number of 1-nano-seco=
nd
+ * intervals since the value of tv_sec.
+ *
+ * NTFS uses Microsoft's standard time format which is stored in a s64 and=
 is
+ * measured as the number of 100-nano-second intervals since 1st January 1=
601,
+ * 00:00:00 UTC.
+ */
+static inline __le64 utc2ntfs(const struct timespec64 ts)
+{
+	/*
+	 * Convert the seconds to 100ns intervals, add the nano-seconds
+	 * converted to 100ns intervals, and then add the NTFS time offset.
+	 */
+	return cpu_to_le64((s64)ts.tv_sec * 10000000 + ts.tv_nsec / 100 +
+			NTFS_TIME_OFFSET);
+}
+
+/**
+ * ntfs2utc - convert NTFS time to Linux time
+ * @time:	NTFS time (little endian) to convert to Linux UTC
+ *
+ * Convert the little endian NTFS time @time to its corresponding Linux UTC
+ * time and return that in cpu format.
+ *
+ * Linux stores time in a struct timespec64 consisting of a time64_t tv_sec
+ * and a long tv_nsec where tv_sec is the number of 1-second intervals sin=
ce
+ * 1st January 1970, 00:00:00 UTC and tv_nsec is the number of 1-nano-seco=
nd
+ * intervals since the value of tv_sec.
+ *
+ * NTFS uses Microsoft's standard time format which is stored in a s64 and=
 is
+ * measured as the number of 100 nano-second intervals since 1st January 1=
601,
+ * 00:00:00 UTC.
+ */
+static inline struct timespec64 ntfs2utc(const __le64 time)
+{
+	struct timespec64 ts;
+
+	/* Subtract the NTFS time offset. */
+	u64 t =3D (u64)(le64_to_cpu(time) - NTFS_TIME_OFFSET);
+	/*
+	 * Convert the time to 1-second intervals and the remainder to
+	 * 1-nano-second intervals.
+	 */
+	ts.tv_nsec =3D do_div(t, 10000000) * 100;
+	ts.tv_sec =3D t;
+	return ts;
+}
+
+/**
+ * __ntfs_malloc - allocate memory in multiples of pages
+ * @size:	number of bytes to allocate
+ * @gfp_mask:	extra flags for the allocator
+ *
+ * Internal function.  You probably want ntfs_malloc_nofs()...
+ *
+ * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE a=
nd
+ * returns a pointer to the allocated memory.
+ *
+ * If there was insufficient memory to complete the request, return NULL.
+ * Depending on @gfp_mask the allocation may be guaranteed to succeed.
+ */
+static inline void *__ntfs_malloc(unsigned long size, gfp_t gfp_mask)
+{
+	if (likely(size <=3D PAGE_SIZE)) {
+		if (!size)
+			return NULL;
+		/* kmalloc() has per-CPU caches so is faster for now. */
+		return kmalloc(PAGE_SIZE, gfp_mask & ~__GFP_HIGHMEM);
+		/* return (void *)__get_free_page(gfp_mask); */
+	}
+	if (likely((size >> PAGE_SHIFT) < totalram_pages()))
+		return __vmalloc(size, gfp_mask);
+	return NULL;
+}
+
+/**
+ * ntfs_malloc_nofs - allocate memory in multiples of pages
+ * @size:	number of bytes to allocate
+ *
+ * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE a=
nd
+ * returns a pointer to the allocated memory.
+ *
+ * If there was insufficient memory to complete the request, return NULL.
+ */
+static inline void *ntfs_malloc_nofs(unsigned long size)
+{
+	return __ntfs_malloc(size, GFP_NOFS | __GFP_HIGHMEM | __GFP_ZERO);
+}
+
+/**
+ * ntfs_malloc_nofs_nofail - allocate memory in multiples of pages
+ * @size:	number of bytes to allocate
+ *
+ * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE a=
nd
+ * returns a pointer to the allocated memory.
+ *
+ * This function guarantees that the allocation will succeed.  It will sle=
ep
+ * for as long as it takes to complete the allocation.
+ *
+ * If there was insufficient memory to complete the request, return NULL.
+ */
+static inline void *ntfs_malloc_nofs_nofail(unsigned long size)
+{
+	return __ntfs_malloc(size, GFP_NOFS | __GFP_HIGHMEM | __GFP_NOFAIL);
+}
+
+static inline void ntfs_free(void *addr)
+{
+	kvfree(addr);
+}
+
+static inline void *ntfs_realloc_nofs(void *addr, unsigned long new_size,
+		unsigned long cpy_size)
+{
+	void *pnew_addr;
+
+	if (new_size =3D=3D 0) {
+		ntfs_free(addr);
+		return NULL;
+	}
+
+	pnew_addr =3D ntfs_malloc_nofs(new_size);
+	if (pnew_addr =3D=3D NULL)
+		return NULL;
+	if (addr) {
+		cpy_size =3D min(cpy_size, new_size);
+		if (cpy_size)
+			memcpy(pnew_addr, addr, cpy_size);
+		ntfs_free(addr);
+	}
+	return pnew_addr;
+}
+#endif /* _LINUX_NTFS_MISC_H */
diff --git a/fs/ntfsplus/ntfs.h b/fs/ntfsplus/ntfs.h
new file mode 100644
index 000000000000..d497101bb05a
--- /dev/null
+++ b/fs/ntfsplus/ntfs.h
@@ -0,0 +1,180 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for NTFS Linux kernel driver.
+ *
+ * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (C) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_H
+#define _LINUX_NTFS_H
+
+#include <linux/stddef.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/compiler.h>
+#include <linux/fs.h>
+#include <linux/nls.h>
+#include <linux/smp.h>
+#include <linux/pagemap.h>
+#include <linux/uidgid.h>
+
+#include "volume.h"
+#include "layout.h"
+#include "inode.h"
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#define NTFS_DEF_PREALLOC_SIZE		(64*1024*1024)
+
+#define STANDARD_COMPRESSION_UNIT	4
+#define MAX_COMPRESSION_CLUSTER_SIZE 4096
+
+#define UCHAR_T_SIZE_BITS 1
+
+enum {
+	NTFS_BLOCK_SIZE		=3D 512,
+	NTFS_BLOCK_SIZE_BITS	=3D 9,
+	NTFS_SB_MAGIC		=3D 0x5346544e,	/* 'NTFS' */
+	NTFS_MAX_NAME_LEN	=3D 255,
+	NTFS_MAX_LABEL_LEN	=3D 128,
+};
+
+enum {
+	CASE_SENSITIVE =3D 0,
+	IGNORE_CASE =3D 1,
+};
+
+/* Global variables. */
+
+/* Slab caches (from super.c). */
+extern struct kmem_cache *ntfs_name_cache;
+extern struct kmem_cache *ntfs_inode_cache;
+extern struct kmem_cache *ntfs_big_inode_cache;
+extern struct kmem_cache *ntfs_attr_ctx_cache;
+extern struct kmem_cache *ntfs_index_ctx_cache;
+
+/* The various operations structs defined throughout the driver files. */
+extern const struct address_space_operations ntfs_normal_aops;
+extern const struct address_space_operations ntfs_compressed_aops;
+extern const struct address_space_operations ntfs_mst_aops;
+
+extern const struct  file_operations ntfs_file_ops;
+extern const struct inode_operations ntfs_file_inode_ops;
+extern const  struct inode_operations ntfs_symlink_inode_operations;
+extern const struct inode_operations ntfsp_special_inode_operations;
+
+extern const struct  file_operations ntfs_dir_ops;
+extern const struct inode_operations ntfs_dir_inode_ops;
+
+extern const struct  file_operations ntfs_empty_file_ops;
+extern const struct inode_operations ntfs_empty_inode_ops;
+
+extern const struct export_operations ntfs_export_ops;
+
+/**
+ * NTFS_SB - return the ntfs volume given a vfs super block
+ * @sb:		VFS super block
+ *
+ * NTFS_SB() returns the ntfs volume associated with the VFS super block @=
sb.
+ */
+static inline struct ntfs_volume *NTFS_SB(struct super_block *sb)
+{
+	return sb->s_fs_info;
+}
+
+/* Declarations of functions and global variables. */
+
+/* From fs/ntfs/compress.c */
+int ntfs_read_compressed_block(struct folio *folio);
+int allocate_compression_buffers(void);
+void free_compression_buffers(void);
+int ntfs_compress_write(struct ntfs_inode *ni, loff_t pos, size_t count,
+		struct iov_iter *from);
+
+/* From fs/ntfs/super.c */
+#define default_upcase_len 0x10000
+extern struct mutex ntfs_lock;
+
+struct option_t {
+	int val;
+	char *str;
+};
+extern const struct option_t on_errors_arr[];
+int ntfs_set_volume_flags(struct ntfs_volume *vol, __le16 flags);
+int ntfs_clear_volume_flags(struct ntfs_volume *vol, __le16 flags);
+int ntfs_write_volume_label(struct ntfs_volume *vol, char *label);
+
+/* From fs/ntfs/mst.c */
+int post_read_mst_fixup(struct ntfs_record *b, const u32 size);
+int pre_write_mst_fixup(struct ntfs_record *b, const u32 size);
+void post_write_mst_fixup(struct ntfs_record *b);
+
+/* From fs/ntfs/unistr.c */
+bool ntfs_are_names_equal(const __le16 *s1, size_t s1_len,
+		const __le16 *s2, size_t s2_len,
+		const u32 ic,
+		const __le16 *upcase, const u32 upcase_size);
+int ntfs_collate_names(const __le16 *name1, const u32 name1_len,
+		const __le16 *name2, const u32 name2_len,
+		const int err_val, const u32 ic,
+		const __le16 *upcase, const u32 upcase_len);
+int ntfs_ucsncmp(const __le16 *s1, const __le16 *s2, size_t n);
+int ntfs_ucsncasecmp(const __le16 *s1, const __le16 *s2, size_t n,
+		const __le16 *upcase, const u32 upcase_size);
+int ntfs_file_compare_values(const struct file_name_attr *file_name_attr1,
+		const struct file_name_attr *file_name_attr2,
+		const int err_val, const u32 ic,
+		const __le16 *upcase, const u32 upcase_len);
+int ntfs_nlstoucs(const struct ntfs_volume *vol, const char *ins,
+		const int ins_len, __le16 **outs, int max_name_len);
+int ntfs_ucstonls(const struct ntfs_volume *vol, const __le16 *ins,
+		const int ins_len, unsigned char **outs, int outs_len);
+__le16 *ntfs_ucsndup(const __le16 *s, u32 maxlen);
+bool ntfs_names_are_equal(const __le16 *s1, size_t s1_len,
+		const __le16 *s2, size_t s2_len,
+		const u32 ic,
+		const __le16 *upcase, const u32 upcase_size);
+int ntfs_force_shutdown(struct super_block *sb, u32 flags);
+long ntfsp_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+#ifdef CONFIG_COMPAT
+long ntfsp_compat_ioctl(struct file *filp, unsigned int cmd,
+		unsigned long arg);
+#endif
+
+/* From fs/ntfs/upcase.c */
+__le16 *generate_default_upcase(void);
+
+static inline int ntfs_ffs(int x)
+{
+	int r =3D 1;
+
+	if (!x)
+		return 0;
+	if (!(x & 0xffff)) {
+		x >>=3D 16;
+		r +=3D 16;
+	}
+	if (!(x & 0xff)) {
+		x >>=3D 8;
+		r +=3D 8;
+	}
+	if (!(x & 0xf)) {
+		x >>=3D 4;
+		r +=3D 4;
+	}
+	if (!(x & 3)) {
+		x >>=3D 2;
+		r +=3D 2;
+	}
+	if (!(x & 1))
+		r +=3D 1;
+	return r;
+}
+
+#endif /* _LINUX_NTFS_H */
diff --git a/fs/ntfsplus/ntfs_iomap.h b/fs/ntfsplus/ntfs_iomap.h
new file mode 100644
index 000000000000..b1a5d55fa077
--- /dev/null
+++ b/fs/ntfsplus/ntfs_iomap.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/**
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_IOMAP_H
+#define _LINUX_NTFS_IOMAP_H
+
+#include <linux/pagemap.h>
+#include <linux/iomap.h>
+
+#include "volume.h"
+#include "inode.h"
+
+extern const struct iomap_ops ntfs_write_iomap_ops;
+extern const struct iomap_ops ntfs_read_iomap_ops;
+extern const struct iomap_ops ntfs_page_mkwrite_iomap_ops;
+extern const struct iomap_ops ntfs_dio_iomap_ops;
+extern const struct iomap_writeback_ops ntfs_writeback_ops;
+extern const struct iomap_write_ops ntfs_iomap_folio_ops;
+int ntfs_zeroed_clusters(struct inode *vi, s64 lcn, s64 num);
+#endif /* _LINUX_NTFS_IOMAP_H */
diff --git a/fs/ntfsplus/reparse.h b/fs/ntfsplus/reparse.h
new file mode 100644
index 000000000000..a1f3829a89da
--- /dev/null
+++ b/fs/ntfsplus/reparse.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/**
+ * Copyright (c) 2008-2021 Jean-Pierre Andre
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+extern __le16 reparse_index_name[];
+
+unsigned int ntfs_make_symlink(struct ntfs_inode *ni);
+unsigned int ntfs_reparse_tag_dt_types(struct ntfs_volume *vol, unsigned l=
ong mref);
+int ntfs_reparse_set_wsl_symlink(struct ntfs_inode *ni,
+			const __le16 *target, int target_len);
+int ntfs_reparse_set_wsl_not_symlink(struct ntfs_inode *ni, mode_t mode);
+int ntfs_delete_reparse_index(struct ntfs_inode *ni);
+int ntfs_remove_ntfs_reparse_data(struct ntfs_inode *ni);
diff --git a/fs/ntfsplus/runlist.h b/fs/ntfsplus/runlist.h
new file mode 100644
index 000000000000..c9d88116371d
--- /dev/null
+++ b/fs/ntfsplus/runlist.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for runlist handling in NTFS Linux kernel driver.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2005 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_RUNLIST_H
+#define _LINUX_NTFS_RUNLIST_H
+
+#include "volume.h"
+
+/**
+ * runlist_element - in memory vcn to lcn mapping array element
+ * @vcn:	starting vcn of the current array element
+ * @lcn:	starting lcn of the current array element
+ * @length:	length in clusters of the current array element
+ *
+ * The last vcn (in fact the last vcn + 1) is reached when length =3D=3D 0.
+ *
+ * When lcn =3D=3D -1 this means that the count vcns starting at vcn are n=
ot
+ * physically allocated (i.e. this is a hole / data is sparse).
+ */
+struct runlist_element { /* In memory vcn to lcn mapping structure element=
. */
+	s64 vcn;	/* vcn =3D Starting virtual cluster number. */
+	s64 lcn;	/* lcn =3D Starting logical cluster number. */
+	s64 length;	/* Run length in clusters. */
+};
+
+/**
+ * runlist - in memory vcn to lcn mapping array including a read/write lock
+ * @rl:		pointer to an array of runlist elements
+ * @lock:	read/write spinlock for serializing access to @rl
+ *
+ */
+struct runlist {
+	struct runlist_element *rl;
+	struct rw_semaphore lock;
+	size_t count;
+};
+
+static inline void ntfs_init_runlist(struct runlist *rl)
+{
+	rl->rl =3D NULL;
+	init_rwsem(&rl->lock);
+	rl->count =3D 0;
+}
+
+enum {
+	LCN_DELALLOC		=3D -1,
+	LCN_HOLE		=3D -2,
+	LCN_RL_NOT_MAPPED	=3D -3,
+	LCN_ENOENT		=3D -4,
+	LCN_ENOMEM		=3D -5,
+	LCN_EIO			=3D -6,
+	LCN_EINVAL		=3D -7,
+};
+
+struct runlist_element *ntfs_runlists_merge(struct runlist *d_runlist,
+		struct runlist_element *srl, size_t s_rl_count,
+		size_t *new_rl_count);
+struct runlist_element *ntfs_mapping_pairs_decompress(const struct ntfs_vo=
lume *vol,
+		const struct attr_record *attr, struct runlist *old_runlist,
+		size_t *new_rl_count);
+s64 ntfs_rl_vcn_to_lcn(const struct runlist_element *rl, const s64 vcn);
+struct runlist_element *ntfs_rl_find_vcn_nolock(struct runlist_element *rl=
, const s64 vcn);
+int ntfs_get_size_for_mapping_pairs(const struct ntfs_volume *vol,
+		const struct runlist_element *rl, const s64 first_vcn,
+		const s64 last_vcn, int max_mp_size);
+int ntfs_mapping_pairs_build(const struct ntfs_volume *vol, s8 *dst,
+		const int dst_len, const struct runlist_element *rl,
+		const s64 first_vcn, const s64 last_vcn, s64 *const stop_vcn,
+		struct runlist_element **stop_rl, unsigned int *de_cluster_count);
+int ntfs_rl_truncate_nolock(const struct ntfs_volume *vol,
+		struct runlist *const runlist, const s64 new_length);
+int ntfs_rl_sparse(struct runlist_element *rl);
+s64 ntfs_rl_get_compressed_size(struct ntfs_volume *vol, struct runlist_el=
ement *rl);
+struct runlist_element *ntfs_rl_insert_range(struct runlist_element *dst_r=
l, int dst_cnt,
+		struct runlist_element *src_rl, int src_cnt, size_t *new_cnt);
+struct runlist_element *ntfs_rl_punch_hole(struct runlist_element *dst_rl,=
 int dst_cnt,
+		s64 start_vcn, s64 len, struct runlist_element **punch_rl,
+		size_t *new_rl_cnt);
+struct runlist_element *ntfs_rl_collapse_range(struct runlist_element *dst=
_rl, int dst_cnt,
+		s64 start_vcn, s64 len, struct runlist_element **punch_rl,
+		size_t *new_rl_cnt);
+struct runlist_element *ntfs_rl_realloc(struct runlist_element *rl, int ol=
d_size,
+		int new_size);
+#endif /* _LINUX_NTFS_RUNLIST_H */
diff --git a/fs/ntfsplus/volume.h b/fs/ntfsplus/volume.h
new file mode 100644
index 000000000000..b934c88e5e11
--- /dev/null
+++ b/fs/ntfsplus/volume.h
@@ -0,0 +1,254 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for volume structures in NTFS Linux kernel driver.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2006 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_VOLUME_H
+#define _LINUX_NTFS_VOLUME_H
+
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/uidgid.h>
+#include <linux/workqueue.h>
+#include <linux/errseq.h>
+
+#include "layout.h"
+
+#define NTFS_VOL_UID	BIT(1)
+#define NTFS_VOL_GID	BIT(2)
+
+/*
+ * The NTFS in memory super block structure.
+ */
+struct ntfs_volume {
+	/* Device specifics. */
+	struct super_block *sb;		/* Pointer back to the super_block. */
+	s64 nr_blocks;			/*
+					 * Number of sb->s_blocksize bytes
+					 * sized blocks on the device.
+					 */
+	/* Configuration provided by user at mount time. */
+	unsigned long flags;		/* Miscellaneous flags, see below. */
+	kuid_t uid;			/* uid that files will be mounted as. */
+	kgid_t gid;			/* gid that files will be mounted as. */
+	umode_t fmask;			/* The mask for file permissions. */
+	umode_t dmask;			/* The mask for directory permissions. */
+	u8 mft_zone_multiplier;		/* Initial mft zone multiplier. */
+	u8 on_errors;			/* What to do on filesystem errors. */
+	errseq_t wb_err;
+	/* NTFS bootsector provided information. */
+	u16 sector_size;		/* in bytes */
+	u8 sector_size_bits;		/* log2(sector_size) */
+	u32 cluster_size;		/* in bytes */
+	u32 cluster_size_mask;		/* cluster_size - 1 */
+	u8 cluster_size_bits;		/* log2(cluster_size) */
+	u32 mft_record_size;		/* in bytes */
+	u32 mft_record_size_mask;	/* mft_record_size - 1 */
+	u8 mft_record_size_bits;	/* log2(mft_record_size) */
+	u32 index_record_size;		/* in bytes */
+	u32 index_record_size_mask;	/* index_record_size - 1 */
+	u8 index_record_size_bits;	/* log2(index_record_size) */
+	s64 nr_clusters;		/*
+					 * Volume size in clusters =3D=3D number of
+					 * bits in lcn bitmap.
+					 */
+	s64 mft_lcn;			/* Cluster location of mft data. */
+	s64 mftmirr_lcn;		/* Cluster location of copy of mft. */
+	u64 serial_no;			/* The volume serial number. */
+	/* Mount specific NTFS information. */
+	u32 upcase_len;			/* Number of entries in upcase[]. */
+	__le16 *upcase;		/* The upcase table. */
+
+	s32 attrdef_size;		/* Size of the attribute definition table in bytes. */
+	struct attr_def *attrdef;	/*
+					 * Table of attribute definitions.
+					 * Obtained from FILE_AttrDef.
+					 */
+
+	/* Variables used by the cluster and mft allocators. */
+	s64 mft_data_pos;		/*
+					 * Mft record number at which to
+					 * allocate the next mft record.
+					 */
+	s64 mft_zone_start;		/* First cluster of the mft zone. */
+	s64 mft_zone_end;		/* First cluster beyond the mft zone. */
+	s64 mft_zone_pos;		/* Current position in the mft zone. */
+	s64 data1_zone_pos;		/* Current position in the first data zone. */
+	s64 data2_zone_pos;		/* Current position in the second data zone. */
+
+	struct inode *mft_ino;		/* The VFS inode of $MFT. */
+
+	struct inode *mftbmp_ino;	/* Attribute inode for $MFT/$BITMAP. */
+	struct rw_semaphore mftbmp_lock; /*
+					  *  Lock for serializing accesses to the
+					  * mft record bitmap ($MFT/$BITMAP).
+					  */
+	struct inode *mftmirr_ino;	/* The VFS inode of $MFTMirr. */
+	int mftmirr_size;		/* Size of mft mirror in mft records. */
+
+	struct inode *logfile_ino;	/* The VFS inode of LogFile. */
+
+	struct inode *lcnbmp_ino;	/* The VFS inode of $Bitmap. */
+	struct rw_semaphore lcnbmp_lock; /*
+					  * Lock for serializing accesses to the
+					  * cluster bitmap ($Bitmap/$DATA).
+					  */
+
+	struct inode *vol_ino;		/* The VFS inode of $Volume. */
+	__le16 vol_flags;			/* Volume flags. */
+	u8 major_ver;			/* Ntfs major version of volume. */
+	u8 minor_ver;			/* Ntfs minor version of volume. */
+	unsigned char *volume_label;
+
+	struct inode *root_ino;		/* The VFS inode of the root directory. */
+	struct inode *secure_ino;	/*
+					 * The VFS inode of $Secure (NTFS3.0+
+					 * only, otherwise NULL).
+					 */
+	struct inode *extend_ino;	/*
+					 * The VFS inode of $Extend (NTFS3.0+
+					 * only, otherwise NULL).
+					 */
+	/* $Quota stuff is NTFS3.0+ specific.  Unused/NULL otherwise. */
+	struct inode *quota_ino;	/* The VFS inode of $Quota. */
+	struct inode *quota_q_ino;	/* Attribute inode for $Quota/$Q. */
+	struct nls_table *nls_map;
+	bool nls_utf8;
+	wait_queue_head_t free_waitq;
+
+	atomic64_t free_clusters;	/* Track the number of free clusters */
+	atomic64_t free_mft_records;		/* Track the free mft records */
+	atomic64_t dirty_clusters;
+	u8 sparse_compression_unit;
+	unsigned int *lcn_empty_bits_per_page;
+	struct work_struct precalc_work;
+	loff_t preallocated_size;
+};
+
+/*
+ * Defined bits for the flags field in the ntfs_volume structure.
+ */
+enum {
+	NV_Errors,		/* 1: Volume has errors, prevent remount rw. */
+	NV_ShowSystemFiles,	/* 1: Return system files in ntfs_readdir(). */
+	NV_CaseSensitive,	/*
+				 * 1: Treat file names as case sensitive and
+				 *    create filenames in the POSIX namespace.
+				 *    Otherwise be case insensitive but still
+				 *    create file names in POSIX namespace.
+				 */
+	NV_LogFileEmpty,	/* 1: LogFile journal is empty. */
+	NV_QuotaOutOfDate,	/* 1: Quota is out of date. */
+	NV_UsnJrnlStamped,	/* 1: UsnJrnl has been stamped. */
+	NV_ReadOnly,
+	NV_Compression,
+	NV_FreeClusterKnown,
+	NV_Shutdown,
+	NV_SysImmutable,	/* 1: Protect system files from deletion. */
+	NV_ShowHiddenFiles,	/* 1: Return hidden files in ntfs_readdir(). */
+	NV_HideDotFiles,
+	NV_CheckWindowsNames,
+	NV_Discard,
+	NV_DisableSparse,
+};
+
+/*
+ * Macro tricks to expand the NVolFoo(), NVolSetFoo(), and NVolClearFoo()
+ * functions.
+ */
+#define DEFINE_NVOL_BIT_OPS(flag)					\
+static inline int NVol##flag(struct ntfs_volume *vol)		\
+{								\
+	return test_bit(NV_##flag, &(vol)->flags);		\
+}								\
+static inline void NVolSet##flag(struct ntfs_volume *vol)	\
+{								\
+	set_bit(NV_##flag, &(vol)->flags);			\
+}								\
+static inline void NVolClear##flag(struct ntfs_volume *vol)	\
+{								\
+	clear_bit(NV_##flag, &(vol)->flags);			\
+}
+
+/* Emit the ntfs volume bitops functions. */
+DEFINE_NVOL_BIT_OPS(Errors)
+DEFINE_NVOL_BIT_OPS(ShowSystemFiles)
+DEFINE_NVOL_BIT_OPS(CaseSensitive)
+DEFINE_NVOL_BIT_OPS(LogFileEmpty)
+DEFINE_NVOL_BIT_OPS(QuotaOutOfDate)
+DEFINE_NVOL_BIT_OPS(UsnJrnlStamped)
+DEFINE_NVOL_BIT_OPS(ReadOnly)
+DEFINE_NVOL_BIT_OPS(Compression)
+DEFINE_NVOL_BIT_OPS(FreeClusterKnown)
+DEFINE_NVOL_BIT_OPS(Shutdown)
+DEFINE_NVOL_BIT_OPS(SysImmutable)
+DEFINE_NVOL_BIT_OPS(ShowHiddenFiles)
+DEFINE_NVOL_BIT_OPS(HideDotFiles)
+DEFINE_NVOL_BIT_OPS(CheckWindowsNames)
+DEFINE_NVOL_BIT_OPS(Discard)
+DEFINE_NVOL_BIT_OPS(DisableSparse)
+
+static inline void ntfs_inc_free_clusters(struct ntfs_volume *vol, s64 nr)
+{
+	if (!NVolFreeClusterKnown(vol))
+		wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+	atomic64_add(nr, &vol->free_clusters);
+}
+
+static inline void ntfs_dec_free_clusters(struct ntfs_volume *vol, s64 nr)
+{
+	if (!NVolFreeClusterKnown(vol))
+		wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+	atomic64_sub(nr, &vol->free_clusters);
+}
+
+static inline void ntfs_inc_free_mft_records(struct ntfs_volume *vol, s64 =
nr)
+{
+	if (!NVolFreeClusterKnown(vol))
+		return;
+
+	atomic64_add(nr, &vol->free_mft_records);
+}
+
+static inline void ntfs_dec_free_mft_records(struct ntfs_volume *vol, s64 =
nr)
+{
+	if (!NVolFreeClusterKnown(vol))
+		return;
+
+	atomic64_sub(nr, &vol->free_mft_records);
+}
+
+static inline void ntfs_set_lcn_empty_bits(struct ntfs_volume *vol, unsign=
ed long index,
+		u8 val, unsigned int count)
+{
+	if (!NVolFreeClusterKnown(vol))
+		wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+
+	if (val)
+		vol->lcn_empty_bits_per_page[index] -=3D count;
+	else
+		vol->lcn_empty_bits_per_page[index] +=3D count;
+}
+
+static __always_inline void ntfs_hold_dirty_clusters(struct ntfs_volume *v=
ol, s64 nr_clusters)
+{
+	atomic64_add(nr_clusters, &vol->dirty_clusters);
+}
+
+static __always_inline void ntfs_release_dirty_clusters(struct ntfs_volume=
 *vol, s64 nr_clusters)
+{
+	if (atomic64_read(&vol->dirty_clusters) < nr_clusters)
+		atomic64_set(&vol->dirty_clusters, 0);
+	else
+		atomic64_sub(nr_clusters, &vol->dirty_clusters);
+}
+
+s64 ntfs_available_clusters_count(struct ntfs_volume *vol, s64 nr_clusters=
);
+s64 get_nr_free_clusters(struct ntfs_volume *vol);
+#endif /* _LINUX_NTFS_VOLUME_H */
diff --git a/include/uapi/linux/ntfs.h b/include/uapi/linux/ntfs.h
new file mode 100644
index 000000000000..e76957285280
--- /dev/null
+++ b/include/uapi/linux/ntfs.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _UAPI_LINUX_NTFS_H
+#define _UAPI_LINUX_NTFS_H
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/*
+ * ntfs-specific ioctl commands
+ */
+#define NTFS_IOC_SHUTDOWN _IOR('X', 125, __u32)
+
+/*
+ * Flags used by NTFS_IOC_SHUTDOWN
+ */
+#define NTFS_GOING_DOWN_DEFAULT        0x0     /* default with full sync */
+#define NTFS_GOING_DOWN_FULLSYNC       0x1     /* going down with full syn=
c*/
+#define NTFS_GOING_DOWN_NOSYNC         0x2     /* going down */
+
+#endif /* _UAPI_LINUX_NTFS_H */
--=20
2.25.1
From nobody Sun Feb  8 02:39:38 2026
Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com
 [209.85.214.181])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E58C430F541
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:00:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.181
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219625; cv=none;
 b=j8ZteLAKBgaRLdjK2yjBpoJvYn9cmhhRpmqwYTmnjkgqy+Z73yg2r/5RBgaTbGYcNicI1cIbpQwsx3p4FUSpkeHKqGgenDm6WdxP6bpx9b38c+zHlIFog4teqP/DZVqPZodO0VY1GFgdZ3ZweTdMsYBqN1iyDDBm8PmNRXe/FKk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219625; c=relaxed/simple;
	bh=aYh+aleOMHB8i/UTvoJqp3OPzQQzZDhKxMc9YipwPOE=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=i0AjWxyhHP3NlgS/pywQIPuF727w4/JE/ItOslCkmY1FVtrasaSKblW7lPUMrehxndjiC7JrJOtS07hmR4d7h1xsQ+7u7xHibTX0nCrrsQvfG42+ki0outAcBdXutSa4Cff1G2VXUpQS4XHi854hpQuDiq49HIS3JMdEP3ZwamQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.181
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f181.google.com with SMTP id
 d9443c01a7336-298287a26c3so5571575ad.0
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:00:19 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219619; x=1764824419;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=1F67CgGaqaUE3sV7HvGMr3pnD34hKnF02biA3gLxVnw=;
        b=vegrXq1vGazRDEFW8btm0xCn3CCpxuUn6ADpGn+9kdKTYW6QjeX6npX5X4ry60h+AD
         m0+cxdgf7EBCQPuJIO4dZHpH3MVT49LtwNlXSBXay2lCVwsF+VwDJtGac7vzJnBBuxKo
         s+mWuWvfjidzGN3Px5oAWNrQOnq3hiTVf5jlehECrvJsgRosG5kxsWS6i9icUv8GHJ2i
         hoAbHKYOKouN1z6kYlwOqnE1DufjqyG9v5X46ZqOH22CFC3aGYQKj9HvOOUDpks6m3UM
         1GHXSWCiJph4/qcdCkkNcMXJpxDjUF3tTkZ2MzvIzu/7qsb479Q172WEv8KOe+QZD13k
         oppg==
X-Forwarded-Encrypted: i=1;
 AJvYcCUWeSCweGLtV2OScdPrNR51i0zt6iB+pgxlW7vKSck6YDThJEpvO5mbHZ5Vublh0ul5yDH3sKGXQfkXtxI=@vger.kernel.org
X-Gm-Message-State: AOJu0YxzNAXkbTsfTcXTHvexluTcM7YOD9k6YIgnQ+AMoRLXJLxrKVxo
	4jEi2IUKE0H6iY3731qsOyEUdQHqNjsNf2r7dVf9HKOxRwOU86PGo7C+
X-Gm-Gg: ASbGnctGagVec8Ruq55TIPSTRjKKLNSAscEYSOqyQpoCEEiyN/7LFCduJ0R1Wf/9uop
	PxjB3Dz2y3MvkPJu0NQ+4vekvotCz9CxsF2MWhEAj7PrmuIb0wbj+Gf2CrXZxQo5j5LWILAiohD
	Y3/+yZMHf1mmxAP6XIJZbxuoA6ypndARf24133vj/8r1w4Atp64rRGyFKllcpqnR0t1+CxWh40T
	BuRXoXkLYpiNSPW/Yv1iozxot+vF8TIhnvf7WnIWrFkfrUscreNBeY+FuhB141TmaWxhcu7kdkg
	ciMmB39sm6JdXKluudyyDCvSz74qEXthWNZfNxyVlgyI+tos9jFseGfEHvJUtvHCGDdVPw43DNR
	VJvrcxcXpcgH1S4widjxlHf+Z1oNTEHTPxoA74W9zJSHDbYgRHdRvfED87kZMBtrULOjMZkpD4i
	7/UIUU5nxxXn6KlioWDJAOYZIwQA==
X-Google-Smtp-Source: 
 AGHT+IHu/U5NYQ0NoPZzBEab6YrY6ch+OF6Ww9jMD7t+eitZ1mayAvGI6DmN2qB+nU5CBcws4sWbVw==
X-Received: by 2002:a17:903:17c5:b0:294:f310:5218 with SMTP id
 d9443c01a7336-29b6bd580aamr278284725ad.0.1764219617685;
        Wed, 26 Nov 2025 21:00:17 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.00.11
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:00:16 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>
Subject: [PATCH v2 02/11] ntfsplus: add super block operations
Date: Thu, 27 Nov 2025 13:59:35 +0900
Message-Id: <20251127045944.26009-3-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds the implementation of superblock operations for ntfsplus.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/ntfsplus/super.c | 2865 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 2865 insertions(+)
 create mode 100644 fs/ntfsplus/super.c

diff --git a/fs/ntfsplus/super.c b/fs/ntfsplus/super.c
new file mode 100644
index 000000000000..32d9247e6bed
--- /dev/null
+++ b/fs/ntfsplus/super.c
@@ -0,0 +1,2865 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel super block handling. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2001,2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/blkdev.h>	/* For bdev_logical_block_size(). */
+#include <linux/backing-dev.h>
+#include <linux/vfs.h>
+#include <linux/fs_struct.h>
+#include <linux/sched/mm.h>
+#include <linux/fs_context.h>
+#include <linux/fs_parser.h>
+#include <uapi/linux/ntfs.h>
+
+#include "misc.h"
+#include "logfile.h"
+#include "index.h"
+#include "ntfs.h"
+#include "ea.h"
+#include "volume.h"
+
+/* A global default upcase table and a corresponding reference count. */
+static __le16 *default_upcase;
+static unsigned long ntfs_nr_upcase_users;
+
+static struct workqueue_struct *ntfs_wq;
+
+/* Error constants/strings used in inode.c::ntfs_show_options(). */
+enum {
+	/* One of these must be present, default is ON_ERRORS_CONTINUE. */
+	ON_ERRORS_PANIC =3D 0x01,
+	ON_ERRORS_REMOUNT_RO =3D 0x02,
+	ON_ERRORS_CONTINUE =3D 0x04,
+};
+
+static const struct constant_table ntfs_param_enums[] =3D {
+	{ "panic",		ON_ERRORS_PANIC },
+	{ "remount-ro",		ON_ERRORS_REMOUNT_RO },
+	{ "continue",		ON_ERRORS_CONTINUE },
+	{}
+};
+
+enum {
+	Opt_uid,
+	Opt_gid,
+	Opt_umask,
+	Opt_dmask,
+	Opt_fmask,
+	Opt_errors,
+	Opt_nls,
+	Opt_charset,
+	Opt_show_sys_files,
+	Opt_show_meta,
+	Opt_case_sensitive,
+	Opt_disable_sparse,
+	Opt_sparse,
+	Opt_mft_zone_multiplier,
+	Opt_preallocated_size,
+	Opt_sys_immutable,
+	Opt_nohidden,
+	Opt_hide_dot_files,
+	Opt_check_windows_names,
+	Opt_acl,
+	Opt_discard,
+	Opt_nocase,
+};
+
+static const struct fs_parameter_spec ntfs_parameters[] =3D {
+	fsparam_u32("uid",			Opt_uid),
+	fsparam_u32("gid",			Opt_gid),
+	fsparam_u32oct("umask",			Opt_umask),
+	fsparam_u32oct("dmask",			Opt_dmask),
+	fsparam_u32oct("fmask",			Opt_fmask),
+	fsparam_string("nls",			Opt_nls),
+	fsparam_string("iocharset",		Opt_charset),
+	fsparam_enum("errors",			Opt_errors, ntfs_param_enums),
+	fsparam_flag("show_sys_files",		Opt_show_sys_files),
+	fsparam_flag("showmeta",		Opt_show_meta),
+	fsparam_flag("case_sensitive",		Opt_case_sensitive),
+	fsparam_flag("disable_sparse",		Opt_disable_sparse),
+	fsparam_s32("mft_zone_multiplier",	Opt_mft_zone_multiplier),
+	fsparam_u64("preallocated_size",	Opt_preallocated_size),
+	fsparam_flag("sys_immutable",		Opt_sys_immutable),
+	fsparam_flag("nohidden",		Opt_nohidden),
+	fsparam_flag("hide_dot_files",		Opt_hide_dot_files),
+	fsparam_flag("windows_names",		Opt_check_windows_names),
+	fsparam_flag("acl",			Opt_acl),
+	fsparam_flag("discard",			Opt_discard),
+	fsparam_flag("sparse",			Opt_sparse),
+	fsparam_flag("nocase",			Opt_nocase),
+	{}
+};
+
+static int ntfs_parse_param(struct fs_context *fc, struct fs_parameter *pa=
ram)
+{
+	struct ntfs_volume *vol =3D fc->s_fs_info;
+	struct fs_parse_result result;
+	int opt;
+
+	opt =3D fs_parse(fc, ntfs_parameters, param, &result);
+	if (opt < 0)
+		return opt;
+
+	switch (opt) {
+	case Opt_uid:
+		vol->uid =3D make_kuid(current_user_ns(), result.uint_32);
+		break;
+	case Opt_gid:
+		vol->gid =3D make_kgid(current_user_ns(), result.uint_32);
+		break;
+	case Opt_umask:
+		vol->fmask =3D vol->dmask =3D result.uint_32;
+		break;
+	case Opt_dmask:
+		vol->dmask =3D result.uint_32;
+		break;
+	case Opt_fmask:
+		vol->fmask =3D result.uint_32;
+		break;
+	case Opt_errors:
+		vol->on_errors =3D result.uint_32;
+		break;
+	case Opt_nls:
+	case Opt_charset:
+		if (vol->nls_map)
+			unload_nls(vol->nls_map);
+		vol->nls_map =3D load_nls(param->string);
+		if (!vol->nls_map) {
+			ntfs_error(vol->sb, "Failed to load NLS table '%s'.",
+				   param->string);
+			return -EINVAL;
+		}
+		break;
+	case Opt_mft_zone_multiplier:
+		if (vol->mft_zone_multiplier && vol->mft_zone_multiplier !=3D
+				result.int_32) {
+			ntfs_error(vol->sb, "Cannot change mft_zone_multiplier on remount.");
+			return -EINVAL;
+		}
+		if (result.int_32 < 1 || result.int_32 > 4) {
+			ntfs_error(vol->sb,
+				"Invalid mft_zone_multiplier. Using default value, i.e. 1.");
+			vol->mft_zone_multiplier =3D 1;
+		} else
+			vol->mft_zone_multiplier =3D result.int_32;
+		break;
+	case Opt_show_sys_files:
+	case Opt_show_meta:
+		if (result.boolean)
+			NVolSetShowSystemFiles(vol);
+		else
+			NVolClearShowSystemFiles(vol);
+		break;
+	case Opt_case_sensitive:
+		if (result.boolean)
+			NVolSetCaseSensitive(vol);
+		else
+			NVolClearCaseSensitive(vol);
+		break;
+	case Opt_nocase:
+		if (result.boolean)
+			NVolClearCaseSensitive(vol);
+		else
+			NVolSetCaseSensitive(vol);
+		break;
+	case Opt_preallocated_size:
+		vol->preallocated_size =3D (loff_t)result.uint_64;
+		break;
+	case Opt_sys_immutable:
+		if (result.boolean)
+			NVolSetSysImmutable(vol);
+		else
+			NVolClearSysImmutable(vol);
+		break;
+	case Opt_nohidden:
+		if (result.boolean)
+			NVolClearShowHiddenFiles(vol);
+		else
+			NVolSetShowHiddenFiles(vol);
+		break;
+	case Opt_hide_dot_files:
+		if (result.boolean)
+			NVolSetHideDotFiles(vol);
+		else
+			NVolClearHideDotFiles(vol);
+		break;
+	case Opt_check_windows_names:
+		if (result.boolean)
+			NVolSetCheckWindowsNames(vol);
+		else
+			NVolClearCheckWindowsNames(vol);
+		break;
+	case Opt_acl:
+		if (result.boolean)
+			fc->sb_flags |=3D SB_POSIXACL;
+		else
+			fc->sb_flags &=3D ~SB_POSIXACL;
+		break;
+	case Opt_discard:
+		if (result.boolean)
+			NVolSetDiscard(vol);
+		else
+			NVolClearDiscard(vol);
+		break;
+	case Opt_disable_sparse:
+		if (result.boolean)
+			NVolSetDisableSparse(vol);
+		else
+			NVolClearDisableSparse(vol);
+		break;
+	case Opt_sparse:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
+ * ntfs_mark_quotas_out_of_date - mark the quotas out of date on an ntfs v=
olume
+ * @vol:	ntfs volume on which to mark the quotas out of date
+ *
+ * Mark the quotas out of date on the ntfs volume @vol and return 'true' on
+ * success and 'false' on error.
+ */
+static bool ntfs_mark_quotas_out_of_date(struct ntfs_volume *vol)
+{
+	struct ntfs_index_context *ictx;
+	struct quota_control_entry *qce;
+	const __le32 qid =3D QUOTA_DEFAULTS_ID;
+	int err;
+
+	ntfs_debug("Entering.");
+	if (NVolQuotaOutOfDate(vol))
+		goto done;
+	if (!vol->quota_ino || !vol->quota_q_ino) {
+		ntfs_error(vol->sb, "Quota inodes are not open.");
+		return false;
+	}
+	inode_lock(vol->quota_q_ino);
+	ictx =3D ntfs_index_ctx_get(NTFS_I(vol->quota_q_ino), I30, 4);
+	if (!ictx) {
+		ntfs_error(vol->sb, "Failed to get index context.");
+		goto err_out;
+	}
+	err =3D ntfs_index_lookup(&qid, sizeof(qid), ictx);
+	if (err) {
+		if (err =3D=3D -ENOENT)
+			ntfs_error(vol->sb, "Quota defaults entry is not present.");
+		else
+			ntfs_error(vol->sb, "Lookup of quota defaults entry failed.");
+		goto err_out;
+	}
+	if (ictx->data_len < offsetof(struct quota_control_entry, sid)) {
+		ntfs_error(vol->sb, "Quota defaults entry size is invalid.  Run chkdsk."=
);
+		goto err_out;
+	}
+	qce =3D (struct quota_control_entry *)ictx->data;
+	if (le32_to_cpu(qce->version) !=3D QUOTA_VERSION) {
+		ntfs_error(vol->sb,
+			"Quota defaults entry version 0x%x is not supported.",
+			le32_to_cpu(qce->version));
+		goto err_out;
+	}
+	ntfs_debug("Quota defaults flags =3D 0x%x.", le32_to_cpu(qce->flags));
+	/* If quotas are already marked out of date, no need to do anything. */
+	if (qce->flags & QUOTA_FLAG_OUT_OF_DATE)
+		goto set_done;
+	/*
+	 * If quota tracking is neither requested, nor enabled and there are no
+	 * pending deletes, no need to mark the quotas out of date.
+	 */
+	if (!(qce->flags & (QUOTA_FLAG_TRACKING_ENABLED |
+			QUOTA_FLAG_TRACKING_REQUESTED |
+			QUOTA_FLAG_PENDING_DELETES)))
+		goto set_done;
+	/*
+	 * Set the QUOTA_FLAG_OUT_OF_DATE bit thus marking quotas out of date.
+	 * This is verified on WinXP to be sufficient to cause windows to
+	 * rescan the volume on boot and update all quota entries.
+	 */
+	qce->flags |=3D QUOTA_FLAG_OUT_OF_DATE;
+	/* Ensure the modified flags are written to disk. */
+	ntfs_index_entry_flush_dcache_page(ictx);
+	ntfs_index_entry_mark_dirty(ictx);
+set_done:
+	ntfs_index_ctx_put(ictx);
+	inode_unlock(vol->quota_q_ino);
+	/*
+	 * We set the flag so we do not try to mark the quotas out of date
+	 * again on remount.
+	 */
+	NVolSetQuotaOutOfDate(vol);
+done:
+	ntfs_debug("Done.");
+	return true;
+err_out:
+	if (ictx)
+		ntfs_index_ctx_put(ictx);
+	inode_unlock(vol->quota_q_ino);
+	return false;
+}
+
+static int ntfs_reconfigure(struct fs_context *fc)
+{
+	struct super_block *sb =3D fc->root->d_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+
+	ntfs_debug("Entering with remount");
+
+	sync_filesystem(sb);
+
+	/*
+	 * For the read-write compiled driver, if we are remounting read-write,
+	 * make sure there are no volume errors and that no unsupported volume
+	 * flags are set.  Also, empty the logfile journal as it would become
+	 * stale as soon as something is written to the volume and mark the
+	 * volume dirty so that chkdsk is run if the volume is not umounted
+	 * cleanly.  Finally, mark the quotas out of date so Windows rescans
+	 * the volume on boot and updates them.
+	 *
+	 * When remounting read-only, mark the volume clean if no volume errors
+	 * have occurred.
+	 */
+	if (sb_rdonly(sb) && !(fc->sb_flags & SB_RDONLY)) {
+		static const char *es =3D ".  Cannot remount read-write.";
+
+		/* Remounting read-write. */
+		if (NVolErrors(vol)) {
+			ntfs_error(sb, "Volume has errors and is read-only%s",
+					es);
+			return -EROFS;
+		}
+		if (vol->vol_flags & VOLUME_IS_DIRTY) {
+			ntfs_error(sb, "Volume is dirty and read-only%s", es);
+			return -EROFS;
+		}
+		if (vol->vol_flags & VOLUME_MODIFIED_BY_CHKDSK) {
+			ntfs_error(sb, "Volume has been modified by chkdsk and is read-only%s",=
 es);
+			return -EROFS;
+		}
+		if (vol->vol_flags & VOLUME_MUST_MOUNT_RO_MASK) {
+			ntfs_error(sb, "Volume has unsupported flags set (0x%x) and is read-onl=
y%s",
+					le16_to_cpu(vol->vol_flags), es);
+			return -EROFS;
+		}
+		if (vol->logfile_ino && !ntfs_empty_logfile(vol->logfile_ino)) {
+			ntfs_error(sb, "Failed to empty journal LogFile%s",
+					es);
+			NVolSetErrors(vol);
+			return -EROFS;
+		}
+		if (!ntfs_mark_quotas_out_of_date(vol)) {
+			ntfs_error(sb, "Failed to mark quotas out of date%s",
+					es);
+			NVolSetErrors(vol);
+			return -EROFS;
+		}
+	} else if (!sb_rdonly(sb) && (fc->sb_flags & SB_RDONLY)) {
+		/* Remounting read-only. */
+		if (!NVolErrors(vol)) {
+			if (ntfs_clear_volume_flags(vol, VOLUME_IS_DIRTY))
+				ntfs_warning(sb,
+					"Failed to clear dirty bit in volume information flags.  Run chkdsk."=
);
+		}
+	}
+
+	ntfs_debug("Done.");
+	return 0;
+}
+
+const struct option_t on_errors_arr[] =3D {
+	{ ON_ERRORS_PANIC,	"panic" },
+	{ ON_ERRORS_REMOUNT_RO,	"remount-ro", },
+	{ ON_ERRORS_CONTINUE,	"continue", },
+	{ 0,			NULL }
+};
+
+void ntfs_handle_error(struct super_block *sb)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+
+	if (sb_rdonly(sb))
+		return;
+
+	if (vol->on_errors =3D=3D ON_ERRORS_REMOUNT_RO) {
+		sb->s_flags |=3D SB_RDONLY;
+		pr_crit("(device %s): Filesystem has been set read-only\n",
+			sb->s_id);
+	} else if (vol->on_errors =3D=3D ON_ERRORS_PANIC) {
+		panic("ntfs: (device %s): panic from previous error\n",
+		      sb->s_id);
+	} else if (vol->on_errors =3D=3D ON_ERRORS_CONTINUE) {
+		if (errseq_check(&sb->s_wb_err, vol->wb_err) =3D=3D -ENODEV) {
+			NVolSetShutdown(vol);
+			vol->wb_err =3D sb->s_wb_err;
+		}
+	}
+}
+
+/**
+ * ntfs_write_volume_flags - write new flags to the volume information fla=
gs
+ * @vol:	ntfs volume on which to modify the flags
+ * @flags:	new flags value for the volume information flags
+ *
+ * Internal function.  You probably want to use ntfs_{set,clear}_volume_fl=
ags()
+ * instead (see below).
+ *
+ * Replace the volume information flags on the volume @vol with the value
+ * supplied in @flags.  Note, this overwrites the volume information flags=
, so
+ * make sure to combine the flags you want to modify with the old flags an=
d use
+ * the result when calling ntfs_write_volume_flags().
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_write_volume_flags(struct ntfs_volume *vol, const __le16 f=
lags)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vol->vol_ino);
+	struct volume_information *vi;
+	struct ntfs_attr_search_ctx *ctx;
+	int err;
+
+	ntfs_debug("Entering, old flags =3D 0x%x, new flags =3D 0x%x.",
+			le16_to_cpu(vol->vol_flags), le16_to_cpu(flags));
+	mutex_lock(&ni->mrec_lock);
+	if (vol->vol_flags =3D=3D flags)
+		goto done;
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto put_unm_err_out;
+	}
+
+	err =3D ntfs_attr_lookup(AT_VOLUME_INFORMATION, NULL, 0, 0, 0, NULL, 0,
+			ctx);
+	if (err)
+		goto put_unm_err_out;
+
+	vi =3D (struct volume_information *)((u8 *)ctx->attr +
+			le16_to_cpu(ctx->attr->data.resident.value_offset));
+	vol->vol_flags =3D vi->flags =3D flags;
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+done:
+	mutex_unlock(&ni->mrec_lock);
+	ntfs_debug("Done.");
+	return 0;
+put_unm_err_out:
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	mutex_unlock(&ni->mrec_lock);
+	ntfs_error(vol->sb, "Failed with error code %i.", -err);
+	return err;
+}
+
+/**
+ * ntfs_set_volume_flags - set bits in the volume information flags
+ * @vol:	ntfs volume on which to modify the flags
+ * @flags:	flags to set on the volume
+ *
+ * Set the bits in @flags in the volume information flags on the volume @v=
ol.
+ *
+ * Return 0 on success and -errno on error.
+ */
+int ntfs_set_volume_flags(struct ntfs_volume *vol, __le16 flags)
+{
+	flags &=3D VOLUME_FLAGS_MASK;
+	return ntfs_write_volume_flags(vol, vol->vol_flags | flags);
+}
+
+/**
+ * ntfs_clear_volume_flags - clear bits in the volume information flags
+ * @vol:	ntfs volume on which to modify the flags
+ * @flags:	flags to clear on the volume
+ *
+ * Clear the bits in @flags in the volume information flags on the volume =
@vol.
+ *
+ * Return 0 on success and -errno on error.
+ */
+int ntfs_clear_volume_flags(struct ntfs_volume *vol, __le16 flags)
+{
+	flags &=3D VOLUME_FLAGS_MASK;
+	flags =3D vol->vol_flags & cpu_to_le16(~le16_to_cpu(flags));
+	return ntfs_write_volume_flags(vol, flags);
+}
+
+int ntfs_write_volume_label(struct ntfs_volume *vol, char *label)
+{
+	struct ntfs_inode *vol_ni =3D NTFS_I(vol->vol_ino);
+	struct ntfs_attr_search_ctx *ctx;
+	__le16 *uname;
+	int uname_len, ret;
+
+	uname_len =3D ntfs_nlstoucs(vol, label, strlen(label),
+				  &uname, FSLABEL_MAX);
+	if (uname_len < 0) {
+		ntfs_error(vol->sb,
+			"Failed to convert volume label '%s' to Unicode.",
+			label);
+		return uname_len;
+	}
+
+	if (uname_len  > NTFS_MAX_LABEL_LEN) {
+		ntfs_error(vol->sb,
+			   "Volume label is too long (max %d characters).",
+			   NTFS_MAX_LABEL_LEN);
+		kvfree(uname);
+		return -EINVAL;
+	}
+
+	mutex_lock(&vol_ni->mrec_lock);
+	ctx =3D ntfs_attr_get_search_ctx(vol_ni, NULL);
+	if (!ctx) {
+		ret =3D -ENOMEM;
+		goto  out;
+	}
+
+	if (!ntfs_attr_lookup(AT_VOLUME_NAME, NULL, 0, 0, 0, NULL, 0,
+			     ctx))
+		ntfs_attr_record_rm(ctx);
+	ntfs_attr_put_search_ctx(ctx);
+
+	ret =3D ntfs_resident_attr_record_add(vol_ni, AT_VOLUME_NAME, AT_UNNAMED,=
 0,
+					    (u8 *)uname, uname_len * sizeof(__le16), 0);
+out:
+	mutex_unlock(&vol_ni->mrec_lock);
+	kvfree(uname);
+	mark_inode_dirty_sync(vol->vol_ino);
+
+	if (ret >=3D 0) {
+		kfree(vol->volume_label);
+		vol->volume_label =3D kstrdup(label, GFP_KERNEL);
+		ret =3D 0;
+	}
+	return ret;
+}
+
+/**
+ * is_boot_sector_ntfs - check whether a boot sector is a valid NTFS boot =
sector
+ * @sb:		Super block of the device to which @b belongs.
+ * @b:		Boot sector of device @sb to check.
+ * @silent:	If 'true', all output will be silenced.
+ *
+ * is_boot_sector_ntfs() checks whether the boot sector @b is a valid NTFS=
 boot
+ * sector. Returns 'true' if it is valid and 'false' if not.
+ *
+ * @sb is only needed for warning/error output, i.e. it can be NULL when s=
ilent
+ * is 'true'.
+ */
+static bool is_boot_sector_ntfs(const struct super_block *sb,
+		const struct ntfs_boot_sector *b, const bool silent)
+{
+	/*
+	 * Check that checksum =3D=3D sum of u32 values from b to the checksum
+	 * field.  If checksum is zero, no checking is done.  We will work when
+	 * the checksum test fails, since some utilities update the boot sector
+	 * ignoring the checksum which leaves the checksum out-of-date.  We
+	 * report a warning if this is the case.
+	 */
+	if ((void *)b < (void *)&b->checksum && b->checksum && !silent) {
+		__le32 *u;
+		u32 i;
+
+		for (i =3D 0, u =3D (__le32 *)b; u < (__le32 *)(&b->checksum); ++u)
+			i +=3D le32_to_cpup(u);
+		if (le32_to_cpu(b->checksum) !=3D i)
+			ntfs_warning(sb, "Invalid boot sector checksum.");
+	}
+	/* Check OEMidentifier is "NTFS    " */
+	if (b->oem_id !=3D magicNTFS)
+		goto not_ntfs;
+	/* Check bytes per sector value is between 256 and 4096. */
+	if (le16_to_cpu(b->bpb.bytes_per_sector) < 0x100 ||
+	    le16_to_cpu(b->bpb.bytes_per_sector) > 0x1000)
+		goto not_ntfs;
+	/*
+	 * Check sectors per cluster value is valid and the cluster size
+	 * is not above the maximum (2MB).
+	 */
+	if (b->bpb.sectors_per_cluster > 0x80 &&
+	    b->bpb.sectors_per_cluster < 0xf4)
+		goto not_ntfs;
+
+	/* Check reserved/unused fields are really zero. */
+	if (le16_to_cpu(b->bpb.reserved_sectors) ||
+			le16_to_cpu(b->bpb.root_entries) ||
+			le16_to_cpu(b->bpb.sectors) ||
+			le16_to_cpu(b->bpb.sectors_per_fat) ||
+			le32_to_cpu(b->bpb.large_sectors) || b->bpb.fats)
+		goto not_ntfs;
+	/* Check clusters per file mft record value is valid. */
+	if ((u8)b->clusters_per_mft_record < 0xe1 ||
+			(u8)b->clusters_per_mft_record > 0xf7)
+		switch (b->clusters_per_mft_record) {
+		case 1: case 2: case 4: case 8: case 16: case 32: case 64:
+			break;
+		default:
+			goto not_ntfs;
+		}
+	/* Check clusters per index block value is valid. */
+	if ((u8)b->clusters_per_index_record < 0xe1 ||
+			(u8)b->clusters_per_index_record > 0xf7)
+		switch (b->clusters_per_index_record) {
+		case 1: case 2: case 4: case 8: case 16: case 32: case 64:
+			break;
+		default:
+			goto not_ntfs;
+		}
+	/*
+	 * Check for valid end of sector marker. We will work without it, but
+	 * many BIOSes will refuse to boot from a bootsector if the magic is
+	 * incorrect, so we emit a warning.
+	 */
+	if (!silent && b->end_of_sector_marker !=3D cpu_to_le16(0xaa55))
+		ntfs_warning(sb, "Invalid end of sector marker.");
+	return true;
+not_ntfs:
+	return false;
+}
+
+/**
+ * read_ntfs_boot_sector - read the NTFS boot sector of a device
+ * @sb:		super block of device to read the boot sector from
+ * @silent:	if true, suppress all output
+ *
+ * Reads the boot sector from the device and validates it.
+ */
+static char *read_ntfs_boot_sector(struct super_block *sb,
+		const int silent)
+{
+	char *boot_sector;
+
+	boot_sector =3D ntfs_malloc_nofs(PAGE_SIZE);
+	if (!boot_sector)
+		return NULL;
+
+	if (ntfs_dev_read(sb, boot_sector, 0, PAGE_SIZE)) {
+		if (!silent)
+			ntfs_error(sb, "Unable to read primary boot sector.");
+		kfree(boot_sector);
+		return NULL;
+	}
+
+	if (!is_boot_sector_ntfs(sb, (struct ntfs_boot_sector *)boot_sector,
+				 silent)) {
+		if (!silent)
+			ntfs_error(sb, "Primary boot sector is invalid.");
+		kfree(boot_sector);
+		return NULL;
+	}
+
+	return boot_sector;
+}
+
+/**
+ * parse_ntfs_boot_sector - parse the boot sector and store the data in @v=
ol
+ * @vol:	volume structure to initialise with data from boot sector
+ * @b:		boot sector to parse
+ *
+ * Parse the ntfs boot sector @b and store all imporant information therei=
n in
+ * the ntfs super block @vol.  Return 'true' on success and 'false' on err=
or.
+ */
+static bool parse_ntfs_boot_sector(struct ntfs_volume *vol,
+		const struct ntfs_boot_sector *b)
+{
+	unsigned int sectors_per_cluster, sectors_per_cluster_bits, nr_hidden_sec=
ts;
+	int clusters_per_mft_record, clusters_per_index_record;
+	s64 ll;
+
+	vol->sector_size =3D le16_to_cpu(b->bpb.bytes_per_sector);
+	vol->sector_size_bits =3D ffs(vol->sector_size) - 1;
+	ntfs_debug("vol->sector_size =3D %i (0x%x)", vol->sector_size,
+			vol->sector_size);
+	ntfs_debug("vol->sector_size_bits =3D %i (0x%x)", vol->sector_size_bits,
+			vol->sector_size_bits);
+	if (vol->sector_size < vol->sb->s_blocksize) {
+		ntfs_error(vol->sb,
+			"Sector size (%i) is smaller than the device block size (%lu).  This is=
 not supported.",
+			vol->sector_size, vol->sb->s_blocksize);
+		return false;
+	}
+
+	if (b->bpb.sectors_per_cluster >=3D 0xf4)
+		sectors_per_cluster =3D 1U << -(s8)b->bpb.sectors_per_cluster;
+	else
+		sectors_per_cluster =3D b->bpb.sectors_per_cluster;
+	ntfs_debug("sectors_per_cluster =3D 0x%x", b->bpb.sectors_per_cluster);
+	sectors_per_cluster_bits =3D ffs(sectors_per_cluster) - 1;
+	ntfs_debug("sectors_per_cluster_bits =3D 0x%x",
+			sectors_per_cluster_bits);
+	nr_hidden_sects =3D le32_to_cpu(b->bpb.hidden_sectors);
+	ntfs_debug("number of hidden sectors =3D 0x%x", nr_hidden_sects);
+	vol->cluster_size =3D vol->sector_size << sectors_per_cluster_bits;
+	vol->cluster_size_mask =3D vol->cluster_size - 1;
+	vol->cluster_size_bits =3D ffs(vol->cluster_size) - 1;
+	ntfs_debug("vol->cluster_size =3D %i (0x%x)", vol->cluster_size,
+			vol->cluster_size);
+	ntfs_debug("vol->cluster_size_mask =3D 0x%x", vol->cluster_size_mask);
+	ntfs_debug("vol->cluster_size_bits =3D %i", vol->cluster_size_bits);
+	if (vol->cluster_size < vol->sector_size) {
+		ntfs_error(vol->sb,
+			"Cluster size (%i) is smaller than the sector size (%i).  This is not s=
upported.",
+			vol->cluster_size, vol->sector_size);
+		return false;
+	}
+	clusters_per_mft_record =3D b->clusters_per_mft_record;
+	ntfs_debug("clusters_per_mft_record =3D %i (0x%x)",
+			clusters_per_mft_record, clusters_per_mft_record);
+	if (clusters_per_mft_record > 0)
+		vol->mft_record_size =3D vol->cluster_size <<
+				(ffs(clusters_per_mft_record) - 1);
+	else
+		/*
+		 * When mft_record_size < cluster_size, clusters_per_mft_record
+		 * =3D -log2(mft_record_size) bytes. mft_record_size normaly is
+		 * 1024 bytes, which is encoded as 0xF6 (-10 in decimal).
+		 */
+		vol->mft_record_size =3D 1 << -clusters_per_mft_record;
+	vol->mft_record_size_mask =3D vol->mft_record_size - 1;
+	vol->mft_record_size_bits =3D ffs(vol->mft_record_size) - 1;
+	ntfs_debug("vol->mft_record_size =3D %i (0x%x)", vol->mft_record_size,
+			vol->mft_record_size);
+	ntfs_debug("vol->mft_record_size_mask =3D 0x%x",
+			vol->mft_record_size_mask);
+	ntfs_debug("vol->mft_record_size_bits =3D %i (0x%x)",
+			vol->mft_record_size_bits, vol->mft_record_size_bits);
+	/*
+	 * We cannot support mft record sizes above the PAGE_SIZE since
+	 * we store $MFT/$DATA, the table of mft records in the page cache.
+	 */
+	if (vol->mft_record_size > PAGE_SIZE) {
+		ntfs_error(vol->sb,
+			"Mft record size (%i) exceeds the PAGE_SIZE on your system (%lu).  This=
 is not supported.",
+			vol->mft_record_size, PAGE_SIZE);
+		return false;
+	}
+	/* We cannot support mft record sizes below the sector size. */
+	if (vol->mft_record_size < vol->sector_size) {
+		ntfs_warning(vol->sb, "Mft record size (%i) is smaller than the sector s=
ize (%i).",
+				vol->mft_record_size, vol->sector_size);
+	}
+	clusters_per_index_record =3D b->clusters_per_index_record;
+	ntfs_debug("clusters_per_index_record =3D %i (0x%x)",
+			clusters_per_index_record, clusters_per_index_record);
+	if (clusters_per_index_record > 0)
+		vol->index_record_size =3D vol->cluster_size <<
+				(ffs(clusters_per_index_record) - 1);
+	else
+		/*
+		 * When index_record_size < cluster_size,
+		 * clusters_per_index_record =3D -log2(index_record_size) bytes.
+		 * index_record_size normaly equals 4096 bytes, which is
+		 * encoded as 0xF4 (-12 in decimal).
+		 */
+		vol->index_record_size =3D 1 << -clusters_per_index_record;
+	vol->index_record_size_mask =3D vol->index_record_size - 1;
+	vol->index_record_size_bits =3D ffs(vol->index_record_size) - 1;
+	ntfs_debug("vol->index_record_size =3D %i (0x%x)",
+			vol->index_record_size, vol->index_record_size);
+	ntfs_debug("vol->index_record_size_mask =3D 0x%x",
+			vol->index_record_size_mask);
+	ntfs_debug("vol->index_record_size_bits =3D %i (0x%x)",
+			vol->index_record_size_bits,
+			vol->index_record_size_bits);
+	/* We cannot support index record sizes below the sector size. */
+	if (vol->index_record_size < vol->sector_size) {
+		ntfs_error(vol->sb,
+			   "Index record size (%i) is smaller than the sector size (%i).  This =
is not supported.",
+			   vol->index_record_size, vol->sector_size);
+		return false;
+	}
+	/*
+	 * Get the size of the volume in clusters and check for 64-bit-ness.
+	 * Windows currently only uses 32 bits to save the clusters so we do
+	 * the same as it is much faster on 32-bit CPUs.
+	 */
+	ll =3D le64_to_cpu(b->number_of_sectors) >> sectors_per_cluster_bits;
+	if ((u64)ll >=3D 1ULL << 32) {
+		ntfs_error(vol->sb, "Cannot handle 64-bit clusters.");
+		return false;
+	}
+	vol->nr_clusters =3D ll;
+	ntfs_debug("vol->nr_clusters =3D 0x%llx", vol->nr_clusters);
+	/*
+	 * On an architecture where unsigned long is 32-bits, we restrict the
+	 * volume size to 2TiB (2^41). On a 64-bit architecture, the compiler
+	 * will hopefully optimize the whole check away.
+	 */
+	if (sizeof(unsigned long) < 8) {
+		if ((ll << vol->cluster_size_bits) >=3D (1ULL << 41)) {
+			ntfs_error(vol->sb,
+				   "Volume size (%lluTiB) is too large for this architecture.  Maximum=
 supported is 2TiB.",
+				   ll >> (40 - vol->cluster_size_bits));
+			return false;
+		}
+	}
+	ll =3D le64_to_cpu(b->mft_lcn);
+	if (ll >=3D vol->nr_clusters) {
+		ntfs_error(vol->sb, "MFT LCN (%lli, 0x%llx) is beyond end of volume.  We=
ird.",
+				ll, ll);
+		return false;
+	}
+	vol->mft_lcn =3D ll;
+	ntfs_debug("vol->mft_lcn =3D 0x%llx", vol->mft_lcn);
+	ll =3D le64_to_cpu(b->mftmirr_lcn);
+	if (ll >=3D vol->nr_clusters) {
+		ntfs_error(vol->sb, "MFTMirr LCN (%lli, 0x%llx) is beyond end of volume.=
  Weird.",
+				ll, ll);
+		return false;
+	}
+	vol->mftmirr_lcn =3D ll;
+	ntfs_debug("vol->mftmirr_lcn =3D 0x%llx", vol->mftmirr_lcn);
+	/*
+	 * Work out the size of the mft mirror in number of mft records. If the
+	 * cluster size is less than or equal to the size taken by four mft
+	 * records, the mft mirror stores the first four mft records. If the
+	 * cluster size is bigger than the size taken by four mft records, the
+	 * mft mirror contains as many mft records as will fit into one
+	 * cluster.
+	 */
+	if (vol->cluster_size <=3D (4 << vol->mft_record_size_bits))
+		vol->mftmirr_size =3D 4;
+	else
+		vol->mftmirr_size =3D vol->cluster_size >>
+				vol->mft_record_size_bits;
+	ntfs_debug("vol->mftmirr_size =3D %i", vol->mftmirr_size);
+	vol->serial_no =3D le64_to_cpu(b->volume_serial_number);
+	ntfs_debug("vol->serial_no =3D 0x%llx", vol->serial_no);
+
+	vol->sparse_compression_unit =3D 4;
+	if (vol->cluster_size > 4096) {
+		switch (vol->cluster_size) {
+		case 65536:
+			vol->sparse_compression_unit =3D 0;
+			break;
+		case 32768:
+			vol->sparse_compression_unit =3D 1;
+			break;
+		case 16384:
+			vol->sparse_compression_unit =3D 2;
+			break;
+		case 8192:
+			vol->sparse_compression_unit =3D 3;
+			break;
+		}
+	}
+
+	return true;
+}
+
+/**
+ * ntfs_setup_allocators - initialize the cluster and mft allocators
+ * @vol:	volume structure for which to setup the allocators
+ *
+ * Setup the cluster (lcn) and mft allocators to the starting values.
+ */
+static void ntfs_setup_allocators(struct ntfs_volume *vol)
+{
+	s64 mft_zone_size, mft_lcn;
+
+	ntfs_debug("vol->mft_zone_multiplier =3D 0x%x",
+			vol->mft_zone_multiplier);
+	/* Determine the size of the MFT zone. */
+	mft_zone_size =3D vol->nr_clusters;
+	switch (vol->mft_zone_multiplier) {  /* % of volume size in clusters */
+	case 4:
+		mft_zone_size >>=3D 1;			/* 50%   */
+		break;
+	case 3:
+		mft_zone_size =3D (mft_zone_size +
+				(mft_zone_size >> 1)) >> 2;	/* 37.5% */
+		break;
+	case 2:
+		mft_zone_size >>=3D 2;			/* 25%   */
+		break;
+	/* case 1: */
+	default:
+		mft_zone_size >>=3D 3;			/* 12.5% */
+		break;
+	}
+	/* Setup the mft zone. */
+	vol->mft_zone_start =3D vol->mft_zone_pos =3D vol->mft_lcn;
+	ntfs_debug("vol->mft_zone_pos =3D 0x%llx", vol->mft_zone_pos);
+	/*
+	 * Calculate the mft_lcn for an unmodified NTFS volume (see mkntfs
+	 * source) and if the actual mft_lcn is in the expected place or even
+	 * further to the front of the volume, extend the mft_zone to cover the
+	 * beginning of the volume as well.  This is in order to protect the
+	 * area reserved for the mft bitmap as well within the mft_zone itself.
+	 * On non-standard volumes we do not protect it as the overhead would
+	 * be higher than the speed increase we would get by doing it.
+	 */
+	mft_lcn =3D (8192 + 2 * vol->cluster_size - 1) >> vol->cluster_size_bits;
+	if (mft_lcn * vol->cluster_size < 16 * 1024)
+		mft_lcn =3D (16 * 1024 + vol->cluster_size - 1) >>
+				vol->cluster_size_bits;
+	if (vol->mft_zone_start <=3D mft_lcn)
+		vol->mft_zone_start =3D 0;
+	ntfs_debug("vol->mft_zone_start =3D 0x%llx", vol->mft_zone_start);
+	/*
+	 * Need to cap the mft zone on non-standard volumes so that it does
+	 * not point outside the boundaries of the volume.  We do this by
+	 * halving the zone size until we are inside the volume.
+	 */
+	vol->mft_zone_end =3D vol->mft_lcn + mft_zone_size;
+	while (vol->mft_zone_end >=3D vol->nr_clusters) {
+		mft_zone_size >>=3D 1;
+		vol->mft_zone_end =3D vol->mft_lcn + mft_zone_size;
+	}
+	ntfs_debug("vol->mft_zone_end =3D 0x%llx", vol->mft_zone_end);
+	/*
+	 * Set the current position within each data zone to the start of the
+	 * respective zone.
+	 */
+	vol->data1_zone_pos =3D vol->mft_zone_end;
+	ntfs_debug("vol->data1_zone_pos =3D 0x%llx", vol->data1_zone_pos);
+	vol->data2_zone_pos =3D 0;
+	ntfs_debug("vol->data2_zone_pos =3D 0x%llx", vol->data2_zone_pos);
+
+	/* Set the mft data allocation position to mft record 24. */
+	vol->mft_data_pos =3D 24;
+	ntfs_debug("vol->mft_data_pos =3D 0x%llx", vol->mft_data_pos);
+}
+
+static struct lock_class_key mftmirr_runlist_lock_key,
+			     mftmirr_mrec_lock_key;
+/**
+ * load_and_init_mft_mirror - load and setup the mft mirror inode for a vo=
lume
+ * @vol:	ntfs super block describing device whose mft mirror to load
+ *
+ * Return 'true' on success or 'false' on error.
+ */
+static bool load_and_init_mft_mirror(struct ntfs_volume *vol)
+{
+	struct inode *tmp_ino;
+	struct ntfs_inode *tmp_ni;
+
+	ntfs_debug("Entering.");
+	/* Get mft mirror inode. */
+	tmp_ino =3D ntfs_iget(vol->sb, FILE_MFTMirr);
+	if (IS_ERR(tmp_ino)) {
+		if (!IS_ERR(tmp_ino))
+			iput(tmp_ino);
+		/* Caller will display error message. */
+		return false;
+	}
+	lockdep_set_class(&NTFS_I(tmp_ino)->runlist.lock,
+			  &mftmirr_runlist_lock_key);
+	lockdep_set_class(&NTFS_I(tmp_ino)->mrec_lock,
+			  &mftmirr_mrec_lock_key);
+	/*
+	 * Re-initialize some specifics about $MFTMirr's inode as
+	 * ntfs_read_inode() will have set up the default ones.
+	 */
+	/* Set uid and gid to root. */
+	tmp_ino->i_uid =3D GLOBAL_ROOT_UID;
+	tmp_ino->i_gid =3D GLOBAL_ROOT_GID;
+	/* Regular file.  No access for anyone. */
+	tmp_ino->i_mode =3D S_IFREG;
+	/* No VFS initiated operations allowed for $MFTMirr. */
+	tmp_ino->i_op =3D &ntfs_empty_inode_ops;
+	tmp_ino->i_fop =3D &ntfs_empty_file_ops;
+	/* Put in our special address space operations. */
+	tmp_ino->i_mapping->a_ops =3D &ntfs_mst_aops;
+	tmp_ni =3D NTFS_I(tmp_ino);
+	/* The $MFTMirr, like the $MFT is multi sector transfer protected. */
+	NInoSetMstProtected(tmp_ni);
+	NInoSetSparseDisabled(tmp_ni);
+	/*
+	 * Set up our little cheat allowing us to reuse the async read io
+	 * completion handler for directories.
+	 */
+	tmp_ni->itype.index.block_size =3D vol->mft_record_size;
+	tmp_ni->itype.index.block_size_bits =3D vol->mft_record_size_bits;
+	vol->mftmirr_ino =3D tmp_ino;
+	ntfs_debug("Done.");
+	return true;
+}
+
+/**
+ * check_mft_mirror - compare contents of the mft mirror with the mft
+ * @vol:	ntfs super block describing device whose mft mirror to check
+ *
+ * Return 'true' on success or 'false' on error.
+ *
+ * Note, this function also results in the mft mirror runlist being comple=
tely
+ * mapped into memory.  The mft mirror write code requires this and will B=
UG()
+ * should it find an unmapped runlist element.
+ */
+static bool check_mft_mirror(struct ntfs_volume *vol)
+{
+	struct super_block *sb =3D vol->sb;
+	struct ntfs_inode *mirr_ni;
+	struct folio *mft_folio =3D NULL, *mirr_folio =3D NULL;
+	u8 *kmft =3D NULL, *kmirr =3D NULL;
+	struct runlist_element *rl, rl2[2];
+	pgoff_t index;
+	int mrecs_per_page, i;
+
+	ntfs_debug("Entering.");
+	/* Compare contents of $MFT and $MFTMirr. */
+	mrecs_per_page =3D PAGE_SIZE / vol->mft_record_size;
+	index =3D i =3D 0;
+	do {
+		u32 bytes;
+
+		/* Switch pages if necessary. */
+		if (!(i % mrecs_per_page)) {
+			if (index) {
+				ntfs_unmap_folio(mirr_folio, kmirr);
+				ntfs_unmap_folio(mft_folio, kmft);
+			}
+			/* Get the $MFT page. */
+			mft_folio =3D ntfs_read_mapping_folio(vol->mft_ino->i_mapping,
+					index);
+			if (IS_ERR(mft_folio)) {
+				ntfs_error(sb, "Failed to read $MFT.");
+				return false;
+			}
+			kmft =3D kmap_local_folio(mft_folio, 0);
+			/* Get the $MFTMirr page. */
+			mirr_folio =3D ntfs_read_mapping_folio(vol->mftmirr_ino->i_mapping,
+					index);
+			if (IS_ERR(mirr_folio)) {
+				ntfs_error(sb, "Failed to read $MFTMirr.");
+				goto mft_unmap_out;
+			}
+			kmirr =3D kmap_local_folio(mirr_folio, 0);
+			++index;
+		}
+
+		/* Do not check the record if it is not in use. */
+		if (((struct mft_record *)kmft)->flags & MFT_RECORD_IN_USE) {
+			/* Make sure the record is ok. */
+			if (ntfs_is_baad_recordp((__le32 *)kmft)) {
+				ntfs_error(sb,
+					"Incomplete multi sector transfer detected in mft record %i.",
+					i);
+mm_unmap_out:
+				ntfs_unmap_folio(mirr_folio, kmirr);
+mft_unmap_out:
+				ntfs_unmap_folio(mft_folio, kmft);
+				return false;
+			}
+		}
+		/* Do not check the mirror record if it is not in use. */
+		if (((struct mft_record *)kmirr)->flags & MFT_RECORD_IN_USE) {
+			if (ntfs_is_baad_recordp((__le32 *)kmirr)) {
+				ntfs_error(sb,
+					"Incomplete multi sector transfer detected in mft mirror record %i.",
+					i);
+				goto mm_unmap_out;
+			}
+		}
+		/* Get the amount of data in the current record. */
+		bytes =3D le32_to_cpu(((struct mft_record *)kmft)->bytes_in_use);
+		if (bytes < sizeof(struct mft_record_old) ||
+		    bytes > vol->mft_record_size ||
+		    ntfs_is_baad_recordp((__le32 *)kmft)) {
+			bytes =3D le32_to_cpu(((struct mft_record *)kmirr)->bytes_in_use);
+			if (bytes < sizeof(struct mft_record_old) ||
+			    bytes > vol->mft_record_size ||
+			    ntfs_is_baad_recordp((__le32 *)kmirr))
+				bytes =3D vol->mft_record_size;
+		}
+		kmft +=3D vol->mft_record_size;
+		kmirr +=3D vol->mft_record_size;
+	} while (++i < vol->mftmirr_size);
+	/* Release the last folios. */
+	ntfs_unmap_folio(mirr_folio, kmirr);
+	ntfs_unmap_folio(mft_folio, kmft);
+
+	/* Construct the mft mirror runlist by hand. */
+	rl2[0].vcn =3D 0;
+	rl2[0].lcn =3D vol->mftmirr_lcn;
+	rl2[0].length =3D (vol->mftmirr_size * vol->mft_record_size +
+			vol->cluster_size - 1) >> vol->cluster_size_bits;
+	rl2[1].vcn =3D rl2[0].length;
+	rl2[1].lcn =3D LCN_ENOENT;
+	rl2[1].length =3D 0;
+	/*
+	 * Because we have just read all of the mft mirror, we know we have
+	 * mapped the full runlist for it.
+	 */
+	mirr_ni =3D NTFS_I(vol->mftmirr_ino);
+	down_read(&mirr_ni->runlist.lock);
+	rl =3D mirr_ni->runlist.rl;
+	/* Compare the two runlists.  They must be identical. */
+	i =3D 0;
+	do {
+		if (rl2[i].vcn !=3D rl[i].vcn || rl2[i].lcn !=3D rl[i].lcn ||
+				rl2[i].length !=3D rl[i].length) {
+			ntfs_error(sb, "$MFTMirr location mismatch.  Run chkdsk.");
+			up_read(&mirr_ni->runlist.lock);
+			return false;
+		}
+	} while (rl2[i++].length);
+	up_read(&mirr_ni->runlist.lock);
+	ntfs_debug("Done.");
+	return true;
+}
+
+/**
+ * load_and_check_logfile - load and check the logfile inode for a volume
+ *
+ * Return 0 on success or errno on error.
+ */
+static int load_and_check_logfile(struct ntfs_volume *vol,
+				  struct restart_page_header **rp)
+{
+	struct inode *tmp_ino;
+	int err =3D 0;
+
+	ntfs_debug("Entering.");
+	tmp_ino =3D ntfs_iget(vol->sb, FILE_LogFile);
+	if (IS_ERR(tmp_ino)) {
+		if (!IS_ERR(tmp_ino))
+			iput(tmp_ino);
+		/* Caller will display error message. */
+		return -ENOENT;
+	}
+	if (!ntfs_check_logfile(tmp_ino, rp))
+		err =3D -EINVAL;
+	NInoSetSparseDisabled(NTFS_I(tmp_ino));
+	vol->logfile_ino =3D tmp_ino;
+	ntfs_debug("Done.");
+	return err;
+}
+
+#define NTFS_HIBERFIL_HEADER_SIZE	4096
+
+/**
+ * check_windows_hibernation_status - check if Windows is suspended on a v=
olume
+ * @vol:	ntfs super block of device to check
+ *
+ * Check if Windows is hibernated on the ntfs volume @vol.  This is done by
+ * looking for the file hiberfil.sys in the root directory of the volume. =
 If
+ * the file is not present Windows is definitely not suspended.
+ *
+ * If hiberfil.sys exists and is less than 4kiB in size it means Windows is
+ * definitely suspended (this volume is not the system volume).  Caveat:  =
on a
+ * system with many volumes it is possible that the < 4kiB check is bogus =
but
+ * for now this should do fine.
+ *
+ * If hiberfil.sys exists and is larger than 4kiB in size, we need to read=
 the
+ * hiberfil header (which is the first 4kiB).  If this begins with "hibr",
+ * Windows is definitely suspended.  If it is completely full of zeroes,
+ * Windows is definitely not hibernated.  Any other case is treated as if
+ * Windows is suspended.  This caters for the above mentioned caveat of a
+ * system with many volumes where no "hibr" magic would be present and the=
re is
+ * no zero header.
+ *
+ * Return 0 if Windows is not hibernated on the volume, >0 if Windows is
+ * hibernated on the volume, and -errno on error.
+ */
+static int check_windows_hibernation_status(struct ntfs_volume *vol)
+{
+	static const __le16 hiberfil[13] =3D { cpu_to_le16('h'),
+			cpu_to_le16('i'), cpu_to_le16('b'),
+			cpu_to_le16('e'), cpu_to_le16('r'),
+			cpu_to_le16('f'), cpu_to_le16('i'),
+			cpu_to_le16('l'), cpu_to_le16('.'),
+			cpu_to_le16('s'), cpu_to_le16('y'),
+			cpu_to_le16('s'), 0 };
+	u64 mref;
+	struct inode *vi;
+	struct folio *folio;
+	u32 *kaddr, *kend, *start_addr =3D NULL;
+	struct ntfs_name *name =3D NULL;
+	int ret =3D 1;
+
+	ntfs_debug("Entering.");
+	/*
+	 * Find the inode number for the hibernation file by looking up the
+	 * filename hiberfil.sys in the root directory.
+	 */
+	inode_lock(vol->root_ino);
+	mref =3D ntfs_lookup_inode_by_name(NTFS_I(vol->root_ino), hiberfil, 12,
+			&name);
+	inode_unlock(vol->root_ino);
+	kfree(name);
+	if (IS_ERR_MREF(mref)) {
+		ret =3D MREF_ERR(mref);
+		/* If the file does not exist, Windows is not hibernated. */
+		if (ret =3D=3D -ENOENT) {
+			ntfs_debug("hiberfil.sys not present.  Windows is not hibernated on the=
 volume.");
+			return 0;
+		}
+		/* A real error occurred. */
+		ntfs_error(vol->sb, "Failed to find inode number for hiberfil.sys.");
+		return ret;
+	}
+	/* Get the inode. */
+	vi =3D ntfs_iget(vol->sb, MREF(mref));
+	if (IS_ERR(vi)) {
+		if (!IS_ERR(vi))
+			iput(vi);
+		ntfs_error(vol->sb, "Failed to load hiberfil.sys.");
+		return IS_ERR(vi) ? PTR_ERR(vi) : -EIO;
+	}
+	if (unlikely(i_size_read(vi) < NTFS_HIBERFIL_HEADER_SIZE)) {
+		ntfs_debug("hiberfil.sys is smaller than 4kiB (0x%llx).  Windows is hibe=
rnated on the volume.  This is not the system volume.",
+				i_size_read(vi));
+		goto iput_out;
+	}
+
+	folio =3D ntfs_read_mapping_folio(vi->i_mapping, 0);
+	if (IS_ERR(folio)) {
+		ntfs_error(vol->sb, "Failed to read from hiberfil.sys.");
+		ret =3D PTR_ERR(folio);
+		goto iput_out;
+	}
+	start_addr =3D (u32 *)kmap_local_folio(folio, 0);
+	kaddr =3D start_addr;
+	if (*(__le32 *)kaddr =3D=3D cpu_to_le32(0x72626968)/*'hibr'*/) {
+		ntfs_debug("Magic \"hibr\" found in hiberfil.sys.  Windows is hibernated=
 on the volume.  This is the system volume.");
+		goto unm_iput_out;
+	}
+	kend =3D kaddr + NTFS_HIBERFIL_HEADER_SIZE/sizeof(*kaddr);
+	do {
+		if (unlikely(*kaddr)) {
+			ntfs_debug("hiberfil.sys is larger than 4kiB (0x%llx), does not contain=
 the \"hibr\" magic, and does not have a zero header.  Windows is hibernate=
d on the volume.  This is not the system volume.",
+					i_size_read(vi));
+			goto unm_iput_out;
+		}
+	} while (++kaddr < kend);
+	ntfs_debug("hiberfil.sys contains a zero header.  Windows is not hibernat=
ed on the volume.  This is the system volume.");
+	ret =3D 0;
+unm_iput_out:
+	ntfs_unmap_folio(folio, start_addr);
+iput_out:
+	iput(vi);
+	return ret;
+}
+
+/**
+ * load_and_init_quota - load and setup the quota file for a volume if pre=
sent
+ * @vol:	ntfs super block describing device whose quota file to load
+ *
+ * Return 'true' on success or 'false' on error.  If $Quota is not present=
, we
+ * leave vol->quota_ino as NULL and return success.
+ */
+static bool load_and_init_quota(struct ntfs_volume *vol)
+{
+	static const __le16 Quota[7] =3D { cpu_to_le16('$'),
+			cpu_to_le16('Q'), cpu_to_le16('u'),
+			cpu_to_le16('o'), cpu_to_le16('t'),
+			cpu_to_le16('a'), 0 };
+	static __le16 Q[3] =3D { cpu_to_le16('$'),
+			cpu_to_le16('Q'), 0 };
+	struct ntfs_name *name =3D NULL;
+	u64 mref;
+	struct inode *tmp_ino;
+
+	ntfs_debug("Entering.");
+	/*
+	 * Find the inode number for the quota file by looking up the filename
+	 * $Quota in the extended system files directory $Extend.
+	 */
+	inode_lock(vol->extend_ino);
+	mref =3D ntfs_lookup_inode_by_name(NTFS_I(vol->extend_ino), Quota, 6,
+			&name);
+	inode_unlock(vol->extend_ino);
+	kfree(name);
+	if (IS_ERR_MREF(mref)) {
+		/*
+		 * If the file does not exist, quotas are disabled and have
+		 * never been enabled on this volume, just return success.
+		 */
+		if (MREF_ERR(mref) =3D=3D -ENOENT) {
+			ntfs_debug("$Quota not present.  Volume does not have quotas enabled.");
+			/*
+			 * No need to try to set quotas out of date if they are
+			 * not enabled.
+			 */
+			NVolSetQuotaOutOfDate(vol);
+			return true;
+		}
+		/* A real error occurred. */
+		ntfs_error(vol->sb, "Failed to find inode number for $Quota.");
+		return false;
+	}
+	/* Get the inode. */
+	tmp_ino =3D ntfs_iget(vol->sb, MREF(mref));
+	if (IS_ERR(tmp_ino)) {
+		if (!IS_ERR(tmp_ino))
+			iput(tmp_ino);
+		ntfs_error(vol->sb, "Failed to load $Quota.");
+		return false;
+	}
+	vol->quota_ino =3D tmp_ino;
+	/* Get the $Q index allocation attribute. */
+	tmp_ino =3D ntfs_index_iget(vol->quota_ino, Q, 2);
+	if (IS_ERR(tmp_ino)) {
+		ntfs_error(vol->sb, "Failed to load $Quota/$Q index.");
+		return false;
+	}
+	vol->quota_q_ino =3D tmp_ino;
+	ntfs_debug("Done.");
+	return true;
+}
+
+/**
+ * load_and_init_attrdef - load the attribute definitions table for a volu=
me
+ * @vol:	ntfs super block describing device whose attrdef to load
+ *
+ * Return 'true' on success or 'false' on error.
+ */
+static bool load_and_init_attrdef(struct ntfs_volume *vol)
+{
+	loff_t i_size;
+	struct super_block *sb =3D vol->sb;
+	struct inode *ino;
+	struct folio *folio;
+	u8 *addr;
+	pgoff_t index, max_index;
+	unsigned int size;
+
+	ntfs_debug("Entering.");
+	/* Read attrdef table and setup vol->attrdef and vol->attrdef_size. */
+	ino =3D ntfs_iget(sb, FILE_AttrDef);
+	if (IS_ERR(ino)) {
+		if (!IS_ERR(ino))
+			iput(ino);
+		goto failed;
+	}
+	NInoSetSparseDisabled(NTFS_I(ino));
+	/* The size of FILE_AttrDef must be above 0 and fit inside 31 bits. */
+	i_size =3D i_size_read(ino);
+	if (i_size <=3D 0 || i_size > 0x7fffffff)
+		goto iput_failed;
+	vol->attrdef =3D (struct attr_def *)ntfs_malloc_nofs(i_size);
+	if (!vol->attrdef)
+		goto iput_failed;
+	index =3D 0;
+	max_index =3D i_size >> PAGE_SHIFT;
+	size =3D PAGE_SIZE;
+	while (index < max_index) {
+		/* Read the attrdef table and copy it into the linear buffer. */
+read_partial_attrdef_page:
+		folio =3D ntfs_read_mapping_folio(ino->i_mapping, index);
+		if (IS_ERR(folio))
+			goto free_iput_failed;
+		addr =3D kmap_local_folio(folio, 0);
+		memcpy((u8 *)vol->attrdef + (index++ << PAGE_SHIFT),
+				addr, size);
+		ntfs_unmap_folio(folio, addr);
+	}
+	if (size =3D=3D PAGE_SIZE) {
+		size =3D i_size & ~PAGE_MASK;
+		if (size)
+			goto read_partial_attrdef_page;
+	}
+	vol->attrdef_size =3D i_size;
+	ntfs_debug("Read %llu bytes from $AttrDef.", i_size);
+	iput(ino);
+	return true;
+free_iput_failed:
+	ntfs_free(vol->attrdef);
+	vol->attrdef =3D NULL;
+iput_failed:
+	iput(ino);
+failed:
+	ntfs_error(sb, "Failed to initialize attribute definition table.");
+	return false;
+}
+
+/**
+ * load_and_init_upcase - load the upcase table for an ntfs volume
+ * @vol:	ntfs super block describing device whose upcase to load
+ *
+ * Return 'true' on success or 'false' on error.
+ */
+static bool load_and_init_upcase(struct ntfs_volume *vol)
+{
+	loff_t i_size;
+	struct super_block *sb =3D vol->sb;
+	struct inode *ino;
+	struct folio *folio;
+	u8 *addr;
+	pgoff_t index, max_index;
+	unsigned int size;
+	int i, max;
+
+	ntfs_debug("Entering.");
+	/* Read upcase table and setup vol->upcase and vol->upcase_len. */
+	ino =3D ntfs_iget(sb, FILE_UpCase);
+	if (IS_ERR(ino)) {
+		if (!IS_ERR(ino))
+			iput(ino);
+		goto upcase_failed;
+	}
+	/*
+	 * The upcase size must not be above 64k Unicode characters, must not
+	 * be zero and must be a multiple of sizeof(__le16).
+	 */
+	i_size =3D i_size_read(ino);
+	if (!i_size || i_size & (sizeof(__le16) - 1) ||
+			i_size > 64ULL * 1024 * sizeof(__le16))
+		goto iput_upcase_failed;
+	vol->upcase =3D (__le16 *)ntfs_malloc_nofs(i_size);
+	if (!vol->upcase)
+		goto iput_upcase_failed;
+	index =3D 0;
+	max_index =3D i_size >> PAGE_SHIFT;
+	size =3D PAGE_SIZE;
+	while (index < max_index) {
+		/* Read the upcase table and copy it into the linear buffer. */
+read_partial_upcase_page:
+		folio =3D ntfs_read_mapping_folio(ino->i_mapping, index);
+		if (IS_ERR(folio))
+			goto iput_upcase_failed;
+		addr =3D kmap_local_folio(folio, 0);
+		memcpy((char *)vol->upcase + (index++ << PAGE_SHIFT),
+				addr, size);
+		ntfs_unmap_folio(folio, addr);
+	};
+	if (size =3D=3D PAGE_SIZE) {
+		size =3D i_size & ~PAGE_MASK;
+		if (size)
+			goto read_partial_upcase_page;
+	}
+	vol->upcase_len =3D i_size >> UCHAR_T_SIZE_BITS;
+	ntfs_debug("Read %llu bytes from $UpCase (expected %zu bytes).",
+			i_size, 64 * 1024 * sizeof(__le16));
+	iput(ino);
+	mutex_lock(&ntfs_lock);
+	if (!default_upcase) {
+		ntfs_debug("Using volume specified $UpCase since default is not present.=
");
+		mutex_unlock(&ntfs_lock);
+		return true;
+	}
+	max =3D default_upcase_len;
+	if (max > vol->upcase_len)
+		max =3D vol->upcase_len;
+	for (i =3D 0; i < max; i++)
+		if (vol->upcase[i] !=3D default_upcase[i])
+			break;
+	if (i =3D=3D max) {
+		ntfs_free(vol->upcase);
+		vol->upcase =3D default_upcase;
+		vol->upcase_len =3D max;
+		ntfs_nr_upcase_users++;
+		mutex_unlock(&ntfs_lock);
+		ntfs_debug("Volume specified $UpCase matches default. Using default.");
+		return true;
+	}
+	mutex_unlock(&ntfs_lock);
+	ntfs_debug("Using volume specified $UpCase since it does not match the de=
fault.");
+	return true;
+iput_upcase_failed:
+	iput(ino);
+	ntfs_free(vol->upcase);
+	vol->upcase =3D NULL;
+upcase_failed:
+	mutex_lock(&ntfs_lock);
+	if (default_upcase) {
+		vol->upcase =3D default_upcase;
+		vol->upcase_len =3D default_upcase_len;
+		ntfs_nr_upcase_users++;
+		mutex_unlock(&ntfs_lock);
+		ntfs_error(sb, "Failed to load $UpCase from the volume. Using default.");
+		return true;
+	}
+	mutex_unlock(&ntfs_lock);
+	ntfs_error(sb, "Failed to initialize upcase table.");
+	return false;
+}
+
+/*
+ * The lcn and mft bitmap inodes are NTFS-internal inodes with
+ * their own special locking rules:
+ */
+static struct lock_class_key
+	lcnbmp_runlist_lock_key, lcnbmp_mrec_lock_key,
+	mftbmp_runlist_lock_key, mftbmp_mrec_lock_key;
+
+/**
+ * load_system_files - open the system files using normal functions
+ * @vol:	ntfs super block describing device whose system files to load
+ *
+ * Open the system files with normal access functions and complete setting=
 up
+ * the ntfs super block @vol.
+ *
+ * Return 'true' on success or 'false' on error.
+ */
+static bool load_system_files(struct ntfs_volume *vol)
+{
+	struct super_block *sb =3D vol->sb;
+	struct mft_record *m;
+	struct volume_information *vi;
+	struct ntfs_attr_search_ctx *ctx;
+	struct restart_page_header *rp;
+	int err;
+
+	ntfs_debug("Entering.");
+	/* Get mft mirror inode compare the contents of $MFT and $MFTMirr. */
+	if (!load_and_init_mft_mirror(vol) || !check_mft_mirror(vol)) {
+		/* If a read-write mount, convert it to a read-only mount. */
+		if (!sb_rdonly(sb) && vol->on_errors =3D=3D ON_ERRORS_REMOUNT_RO) {
+			static const char *es1 =3D "Failed to load $MFTMirr";
+			static const char *es2 =3D "$MFTMirr does not match $MFT";
+			static const char *es3 =3D ".  Run ntfsck and/or chkdsk.";
+
+			sb->s_flags |=3D SB_RDONLY;
+			ntfs_error(sb, "%s.  Mounting read-only%s",
+					!vol->mftmirr_ino ? es1 : es2, es3);
+		}
+		NVolSetErrors(vol);
+	}
+	/* Get mft bitmap attribute inode. */
+	vol->mftbmp_ino =3D ntfs_attr_iget(vol->mft_ino, AT_BITMAP, NULL, 0);
+	if (IS_ERR(vol->mftbmp_ino)) {
+		ntfs_error(sb, "Failed to load $MFT/$BITMAP attribute.");
+		goto iput_mirr_err_out;
+	}
+	lockdep_set_class(&NTFS_I(vol->mftbmp_ino)->runlist.lock,
+			   &mftbmp_runlist_lock_key);
+	lockdep_set_class(&NTFS_I(vol->mftbmp_ino)->mrec_lock,
+			   &mftbmp_mrec_lock_key);
+	/* Read upcase table and setup @vol->upcase and @vol->upcase_len. */
+	if (!load_and_init_upcase(vol))
+		goto iput_mftbmp_err_out;
+	/*
+	 * Read attribute definitions table and setup @vol->attrdef and
+	 * @vol->attrdef_size.
+	 */
+	if (!load_and_init_attrdef(vol))
+		goto iput_upcase_err_out;
+	/*
+	 * Get the cluster allocation bitmap inode and verify the size, no
+	 * need for any locking at this stage as we are already running
+	 * exclusively as we are mount in progress task.
+	 */
+	vol->lcnbmp_ino =3D ntfs_iget(sb, FILE_Bitmap);
+	if (IS_ERR(vol->lcnbmp_ino)) {
+		if (!IS_ERR(vol->lcnbmp_ino))
+			iput(vol->lcnbmp_ino);
+		goto bitmap_failed;
+	}
+	lockdep_set_class(&NTFS_I(vol->lcnbmp_ino)->runlist.lock,
+			   &lcnbmp_runlist_lock_key);
+	lockdep_set_class(&NTFS_I(vol->lcnbmp_ino)->mrec_lock,
+			   &lcnbmp_mrec_lock_key);
+
+	NInoSetSparseDisabled(NTFS_I(vol->lcnbmp_ino));
+	if ((vol->nr_clusters + 7) >> 3 > i_size_read(vol->lcnbmp_ino)) {
+		iput(vol->lcnbmp_ino);
+bitmap_failed:
+		ntfs_error(sb, "Failed to load $Bitmap.");
+		goto iput_attrdef_err_out;
+	}
+	/*
+	 * Get the volume inode and setup our cache of the volume flags and
+	 * version.
+	 */
+	vol->vol_ino =3D ntfs_iget(sb, FILE_Volume);
+	if (IS_ERR(vol->vol_ino)) {
+		if (!IS_ERR(vol->vol_ino))
+			iput(vol->vol_ino);
+volume_failed:
+		ntfs_error(sb, "Failed to load $Volume.");
+		goto iput_lcnbmp_err_out;
+	}
+	m =3D map_mft_record(NTFS_I(vol->vol_ino));
+	if (IS_ERR(m)) {
+iput_volume_failed:
+		iput(vol->vol_ino);
+		goto volume_failed;
+	}
+
+	ctx =3D ntfs_attr_get_search_ctx(NTFS_I(vol->vol_ino), m);
+	if (!ctx) {
+		ntfs_error(sb, "Failed to get attribute search context.");
+		goto get_ctx_vol_failed;
+	}
+
+	if (!ntfs_attr_lookup(AT_VOLUME_NAME, NULL, 0, 0, 0, NULL, 0, ctx) &&
+	    !ctx->attr->non_resident &&
+	    !(ctx->attr->flags & (ATTR_IS_SPARSE | ATTR_IS_COMPRESSED)) &&
+	    le32_to_cpu(ctx->attr->data.resident.value_length) > 0) {
+		err =3D ntfs_ucstonls(vol, (__le16 *)((u8 *)ctx->attr +
+				    le16_to_cpu(ctx->attr->data.resident.value_offset)),
+				    le32_to_cpu(ctx->attr->data.resident.value_length) / 2,
+				    &vol->volume_label, NTFS_MAX_LABEL_LEN);
+		if (err < 0)
+			vol->volume_label =3D NULL;
+	}
+
+	if (ntfs_attr_lookup(AT_VOLUME_INFORMATION, NULL, 0, 0, 0, NULL, 0,
+			ctx) || ctx->attr->non_resident || ctx->attr->flags) {
+err_put_vol:
+		ntfs_attr_put_search_ctx(ctx);
+get_ctx_vol_failed:
+		unmap_mft_record(NTFS_I(vol->vol_ino));
+		goto iput_volume_failed;
+	}
+	vi =3D (struct volume_information *)((char *)ctx->attr +
+			le16_to_cpu(ctx->attr->data.resident.value_offset));
+	/* Some bounds checks. */
+	if ((u8 *)vi < (u8 *)ctx->attr || (u8 *)vi +
+			le32_to_cpu(ctx->attr->data.resident.value_length) >
+			(u8 *)ctx->attr + le32_to_cpu(ctx->attr->length))
+		goto err_put_vol;
+	/* Copy the volume flags and version to the struct ntfs_volume structure.=
 */
+	vol->vol_flags =3D vi->flags;
+	vol->major_ver =3D vi->major_ver;
+	vol->minor_ver =3D vi->minor_ver;
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(NTFS_I(vol->vol_ino));
+	pr_info("volume version %i.%i, dev %s, cluster size %d\n",
+		vol->major_ver, vol->minor_ver, sb->s_id, vol->cluster_size);
+
+	/* Make sure that no unsupported volume flags are set. */
+	if (vol->vol_flags & VOLUME_MUST_MOUNT_RO_MASK) {
+		static const char *es1a =3D "Volume is dirty";
+		static const char *es1b =3D "Volume has been modified by chkdsk";
+		static const char *es1c =3D "Volume has unsupported flags set";
+		static const char *es2a =3D ".  Run chkdsk and mount in Windows.";
+		static const char *es2b =3D ".  Mount in Windows.";
+		const char *es1, *es2;
+
+		es2 =3D es2a;
+		if (vol->vol_flags & VOLUME_IS_DIRTY)
+			es1 =3D es1a;
+		else if (vol->vol_flags & VOLUME_MODIFIED_BY_CHKDSK) {
+			es1 =3D es1b;
+			es2 =3D es2b;
+		} else {
+			es1 =3D es1c;
+			ntfs_warning(sb, "Unsupported volume flags 0x%x encountered.",
+					(unsigned int)le16_to_cpu(vol->vol_flags));
+		}
+		/* If a read-write mount, convert it to a read-only mount. */
+		if (!sb_rdonly(sb) && vol->on_errors =3D=3D ON_ERRORS_REMOUNT_RO) {
+			sb->s_flags |=3D SB_RDONLY;
+			ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
+		}
+		/*
+		 * Do not set NVolErrors() because ntfs_remount() re-checks the
+		 * flags which we need to do in case any flags have changed.
+		 */
+	}
+	/*
+	 * Get the inode for the logfile, check it and determine if the volume
+	 * was shutdown cleanly.
+	 */
+	rp =3D NULL;
+	err =3D load_and_check_logfile(vol, &rp);
+	if (err) {
+		/* If a read-write mount, convert it to a read-only mount. */
+		if (!sb_rdonly(sb) && vol->on_errors =3D=3D ON_ERRORS_REMOUNT_RO) {
+			sb->s_flags |=3D SB_RDONLY;
+			ntfs_error(sb, "Failed to load LogFile. Mounting read-only.");
+		}
+		NVolSetErrors(vol);
+	}
+
+	ntfs_free(rp);
+	/* Get the root directory inode so we can do path lookups. */
+	vol->root_ino =3D ntfs_iget(sb, FILE_root);
+	if (IS_ERR(vol->root_ino)) {
+		if (!IS_ERR(vol->root_ino))
+			iput(vol->root_ino);
+		ntfs_error(sb, "Failed to load root directory.");
+		goto iput_logfile_err_out;
+	}
+	/*
+	 * Check if Windows is suspended to disk on the target volume.  If it
+	 * is hibernated, we must not write *anything* to the disk so set
+	 * NVolErrors() without setting the dirty volume flag and mount
+	 * read-only.  This will prevent read-write remounting and it will also
+	 * prevent all writes.
+	 */
+	err =3D check_windows_hibernation_status(vol);
+	if (unlikely(err)) {
+		static const char *es1a =3D "Failed to determine if Windows is hibernate=
d";
+		static const char *es1b =3D "Windows is hibernated";
+		static const char *es2 =3D ".  Run chkdsk.";
+		const char *es1;
+
+		es1 =3D err < 0 ? es1a : es1b;
+		/* If a read-write mount, convert it to a read-only mount. */
+		if (!sb_rdonly(sb) && vol->on_errors =3D=3D ON_ERRORS_REMOUNT_RO) {
+			sb->s_flags |=3D SB_RDONLY;
+			ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
+		}
+		NVolSetErrors(vol);
+	}
+
+	/* If (still) a read-write mount, empty the logfile. */
+	if (!sb_rdonly(sb) &&
+	    vol->logfile_ino && !ntfs_empty_logfile(vol->logfile_ino) &&
+	    vol->on_errors =3D=3D ON_ERRORS_REMOUNT_RO) {
+		static const char *es1 =3D "Failed to empty LogFile";
+		static const char *es2 =3D ".  Mount in Windows.";
+
+		/* Convert to a read-only mount. */
+		ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
+		sb->s_flags |=3D SB_RDONLY;
+		NVolSetErrors(vol);
+	}
+	/* If on NTFS versions before 3.0, we are done. */
+	if (unlikely(vol->major_ver < 3))
+		return true;
+	/* NTFS 3.0+ specific initialization. */
+	/* Get the security descriptors inode. */
+	vol->secure_ino =3D ntfs_iget(sb, FILE_Secure);
+	if (IS_ERR(vol->secure_ino)) {
+		if (!IS_ERR(vol->secure_ino))
+			iput(vol->secure_ino);
+		ntfs_error(sb, "Failed to load $Secure.");
+		goto iput_root_err_out;
+	}
+	/* Get the extended system files' directory inode. */
+	vol->extend_ino =3D ntfs_iget(sb, FILE_Extend);
+	if (IS_ERR(vol->extend_ino) ||
+	    !S_ISDIR(vol->extend_ino->i_mode)) {
+		if (!IS_ERR(vol->extend_ino))
+			iput(vol->extend_ino);
+		ntfs_error(sb, "Failed to load $Extend.");
+		goto iput_sec_err_out;
+	}
+	/* Find the quota file, load it if present, and set it up. */
+	if (!load_and_init_quota(vol) &&
+	    vol->on_errors =3D=3D ON_ERRORS_REMOUNT_RO) {
+		static const char *es1 =3D "Failed to load $Quota";
+		static const char *es2 =3D ".  Run chkdsk.";
+
+		sb->s_flags |=3D SB_RDONLY;
+		ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
+		/* This will prevent a read-write remount. */
+		NVolSetErrors(vol);
+	}
+
+	return true;
+
+iput_sec_err_out:
+	iput(vol->secure_ino);
+iput_root_err_out:
+	iput(vol->root_ino);
+iput_logfile_err_out:
+	if (vol->logfile_ino)
+		iput(vol->logfile_ino);
+	iput(vol->vol_ino);
+iput_lcnbmp_err_out:
+	iput(vol->lcnbmp_ino);
+iput_attrdef_err_out:
+	vol->attrdef_size =3D 0;
+	if (vol->attrdef) {
+		ntfs_free(vol->attrdef);
+		vol->attrdef =3D NULL;
+	}
+iput_upcase_err_out:
+	vol->upcase_len =3D 0;
+	mutex_lock(&ntfs_lock);
+	if (vol->upcase =3D=3D default_upcase) {
+		ntfs_nr_upcase_users--;
+		vol->upcase =3D NULL;
+	}
+	mutex_unlock(&ntfs_lock);
+	if (vol->upcase) {
+		ntfs_free(vol->upcase);
+		vol->upcase =3D NULL;
+	}
+iput_mftbmp_err_out:
+	iput(vol->mftbmp_ino);
+iput_mirr_err_out:
+	iput(vol->mftmirr_ino);
+	return false;
+}
+
+static void ntfs_volume_free(struct ntfs_volume *vol)
+{
+	/* Throw away the table of attribute definitions. */
+	vol->attrdef_size =3D 0;
+	if (vol->attrdef) {
+		ntfs_free(vol->attrdef);
+		vol->attrdef =3D NULL;
+	}
+	vol->upcase_len =3D 0;
+	/*
+	 * Destroy the global default upcase table if necessary.  Also decrease
+	 * the number of upcase users if we are a user.
+	 */
+	mutex_lock(&ntfs_lock);
+	if (vol->upcase =3D=3D default_upcase) {
+		ntfs_nr_upcase_users--;
+		vol->upcase =3D NULL;
+	}
+
+	if (!ntfs_nr_upcase_users && default_upcase) {
+		ntfs_free(default_upcase);
+		default_upcase =3D NULL;
+	}
+
+	free_compression_buffers();
+
+	mutex_unlock(&ntfs_lock);
+	if (vol->upcase) {
+		ntfs_free(vol->upcase);
+		vol->upcase =3D NULL;
+	}
+
+	unload_nls(vol->nls_map);
+
+	if (vol->lcn_empty_bits_per_page)
+		kvfree(vol->lcn_empty_bits_per_page);
+	kfree(vol->volume_label);
+	kfree(vol);
+}
+
+/**
+ * ntfs_put_super - called by the vfs to unmount a volume
+ * @sb:		vfs superblock of volume to unmount
+ */
+static void ntfs_put_super(struct super_block *sb)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+
+	pr_info("Entering %s, dev %s\n", __func__, sb->s_id);
+
+	cancel_work_sync(&vol->precalc_work);
+
+	/*
+	 * Commit all inodes while they are still open in case some of them
+	 * cause others to be dirtied.
+	 */
+	ntfs_commit_inode(vol->vol_ino);
+
+	/* NTFS 3.0+ specific. */
+	if (vol->major_ver >=3D 3) {
+		if (vol->quota_q_ino)
+			ntfs_commit_inode(vol->quota_q_ino);
+		if (vol->quota_ino)
+			ntfs_commit_inode(vol->quota_ino);
+		if (vol->extend_ino)
+			ntfs_commit_inode(vol->extend_ino);
+		if (vol->secure_ino)
+			ntfs_commit_inode(vol->secure_ino);
+	}
+
+	ntfs_commit_inode(vol->root_ino);
+
+	ntfs_commit_inode(vol->lcnbmp_ino);
+
+	/*
+	 * the GFP_NOFS scope is not needed because ntfs_commit_inode
+	 * does nothing
+	 */
+	ntfs_commit_inode(vol->mftbmp_ino);
+
+	if (vol->logfile_ino)
+		ntfs_commit_inode(vol->logfile_ino);
+
+	if (vol->mftmirr_ino)
+		ntfs_commit_inode(vol->mftmirr_ino);
+	ntfs_commit_inode(vol->mft_ino);
+
+	/*
+	 * If a read-write mount and no volume errors have occurred, mark the
+	 * volume clean.  Also, re-commit all affected inodes.
+	 */
+	if (!sb_rdonly(sb)) {
+		if (!NVolErrors(vol)) {
+			if (ntfs_clear_volume_flags(vol, VOLUME_IS_DIRTY))
+				ntfs_warning(sb,
+					"Failed to clear dirty bit in volume information flags.  Run chkdsk."=
);
+			ntfs_commit_inode(vol->vol_ino);
+			ntfs_commit_inode(vol->root_ino);
+			if (vol->mftmirr_ino)
+				ntfs_commit_inode(vol->mftmirr_ino);
+			ntfs_commit_inode(vol->mft_ino);
+		} else {
+			ntfs_warning(sb,
+				"Volume has errors.  Leaving volume marked dirty.  Run chkdsk.");
+		}
+	}
+
+	iput(vol->vol_ino);
+	vol->vol_ino =3D NULL;
+
+	/* NTFS 3.0+ specific clean up. */
+	if (vol->major_ver >=3D 3) {
+		if (vol->quota_q_ino) {
+			iput(vol->quota_q_ino);
+			vol->quota_q_ino =3D NULL;
+		}
+		if (vol->quota_ino) {
+			iput(vol->quota_ino);
+			vol->quota_ino =3D NULL;
+		}
+		if (vol->extend_ino) {
+			iput(vol->extend_ino);
+			vol->extend_ino =3D NULL;
+		}
+		if (vol->secure_ino) {
+			iput(vol->secure_ino);
+			vol->secure_ino =3D NULL;
+		}
+	}
+
+	iput(vol->root_ino);
+	vol->root_ino =3D NULL;
+
+	iput(vol->lcnbmp_ino);
+	vol->lcnbmp_ino =3D NULL;
+
+	iput(vol->mftbmp_ino);
+	vol->mftbmp_ino =3D NULL;
+
+	if (vol->logfile_ino) {
+		iput(vol->logfile_ino);
+		vol->logfile_ino =3D NULL;
+	}
+	if (vol->mftmirr_ino) {
+		/* Re-commit the mft mirror and mft just in case. */
+		ntfs_commit_inode(vol->mftmirr_ino);
+		ntfs_commit_inode(vol->mft_ino);
+		iput(vol->mftmirr_ino);
+		vol->mftmirr_ino =3D NULL;
+	}
+	/*
+	 * We should have no dirty inodes left, due to
+	 * mft.c::ntfs_mft_writepage() cleaning all the dirty pages as
+	 * the underlying mft records are written out and cleaned.
+	 */
+	ntfs_commit_inode(vol->mft_ino);
+	write_inode_now(vol->mft_ino, 1);
+
+	iput(vol->mft_ino);
+	vol->mft_ino =3D NULL;
+
+	ntfs_volume_free(vol);
+}
+
+int ntfs_force_shutdown(struct super_block *sb, u32 flags)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	int ret;
+
+	if (NVolShutdown(vol))
+		return 0;
+
+	switch (flags) {
+	case NTFS_GOING_DOWN_DEFAULT:
+	case NTFS_GOING_DOWN_FULLSYNC:
+		ret =3D bdev_freeze(sb->s_bdev);
+		if (ret)
+			return ret;
+		bdev_thaw(sb->s_bdev);
+		NVolSetShutdown(vol);
+		break;
+	case NTFS_GOING_DOWN_NOSYNC:
+		NVolSetShutdown(vol);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void ntfs_shutdown(struct super_block *sb)
+{
+	ntfs_force_shutdown(sb, NTFS_GOING_DOWN_NOSYNC);
+
+}
+
+static int ntfs_sync_fs(struct super_block *sb, int wait)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	int err =3D 0;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	if (!wait)
+		return 0;
+
+	/* If there are some dirty buffers in the bdev inode */
+	if (ntfs_clear_volume_flags(vol, VOLUME_IS_DIRTY)) {
+		ntfs_warning(sb, "Failed to clear dirty bit in volume information flags.=
  Run chkdsk.");
+		err =3D -EIO;
+	}
+	sync_inodes_sb(sb);
+	sync_blockdev(sb->s_bdev);
+	blkdev_issue_flush(sb->s_bdev);
+	return err;
+}
+
+/**
+ * get_nr_free_clusters - return the number of free clusters on a volume
+ * @vol:	ntfs volume for which to obtain free cluster count
+ *
+ * Calculate the number of free clusters on the mounted NTFS volume @vol. =
We
+ * actually calculate the number of clusters in use instead because this
+ * allows us to not care about partial pages as these will be just zero fi=
lled
+ * and hence not be counted as allocated clusters.
+ *
+ * The only particularity is that clusters beyond the end of the logical n=
tfs
+ * volume will be marked as allocated to prevent errors which means we hav=
e to
+ * discount those at the end. This is important as the cluster bitmap alwa=
ys
+ * has a size in multiples of 8 bytes, i.e. up to 63 clusters could be out=
side
+ * the logical volume and marked in use when they are not as they do not e=
xist.
+ *
+ * If any pages cannot be read we assume all clusters in the erroring page=
s are
+ * in use. This means we return an underestimate on errors which is better=
 than
+ * an overestimate.
+ */
+s64 get_nr_free_clusters(struct ntfs_volume *vol)
+{
+	s64 nr_free =3D vol->nr_clusters;
+	u32 nr_used;
+	struct address_space *mapping =3D vol->lcnbmp_ino->i_mapping;
+	struct folio *folio;
+	pgoff_t index, max_index;
+	struct file_ra_state *ra;
+
+	ntfs_debug("Entering.");
+	/* Serialize accesses to the cluster bitmap. */
+
+	if (NVolFreeClusterKnown(vol))
+		return atomic64_read(&vol->free_clusters);
+
+	ra =3D kzalloc(sizeof(*ra), GFP_NOFS);
+	if (!ra)
+		return 0;
+
+	file_ra_state_init(ra, mapping);
+
+	/*
+	 * Convert the number of bits into bytes rounded up, then convert into
+	 * multiples of PAGE_SIZE, rounding up so that if we have one
+	 * full and one partial page max_index =3D 2.
+	 */
+	max_index =3D (((vol->nr_clusters + 7) >> 3) + PAGE_SIZE - 1) >>
+			PAGE_SHIFT;
+	/* Use multiples of 4 bytes, thus max_size is PAGE_SIZE / 4. */
+	ntfs_debug("Reading $Bitmap, max_index =3D 0x%lx, max_size =3D 0x%lx.",
+			max_index, PAGE_SIZE / 4);
+	for (index =3D 0; index < max_index; index++) {
+		unsigned long *kaddr;
+
+		/*
+		 * Get folio from page cache, getting it from backing store
+		 * if necessary, and increment the use count.
+		 */
+		folio =3D filemap_lock_folio(mapping, index);
+		if (IS_ERR(folio)) {
+			page_cache_sync_readahead(mapping, ra, NULL,
+				index, max_index - index);
+			folio =3D ntfs_read_mapping_folio(mapping, index);
+			if (!IS_ERR(folio))
+				folio_lock(folio);
+		}
+
+		/* Ignore pages which errored synchronously. */
+		if (IS_ERR(folio)) {
+			ntfs_debug("Skipping page (index 0x%lx).", index);
+			nr_free -=3D PAGE_SIZE * 8;
+			vol->lcn_empty_bits_per_page[index] =3D 0;
+			continue;
+		}
+
+		kaddr =3D kmap_local_folio(folio, 0);
+		/*
+		 * Subtract the number of set bits. If this
+		 * is the last page and it is partial we don't really care as
+		 * it just means we do a little extra work but it won't affect
+		 * the result as all out of range bytes are set to zero by
+		 * ntfs_readpage().
+		 */
+		nr_used =3D bitmap_weight(kaddr, PAGE_SIZE * BITS_PER_BYTE);
+		nr_free -=3D nr_used;
+		vol->lcn_empty_bits_per_page[index] =3D PAGE_SIZE * BITS_PER_BYTE - nr_u=
sed;
+		kunmap_local(kaddr);
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+	ntfs_debug("Finished reading $Bitmap, last index =3D 0x%lx.", index - 1);
+	/*
+	 * Fixup for eventual bits outside logical ntfs volume (see function
+	 * description above).
+	 */
+	if (vol->nr_clusters & 63)
+		nr_free +=3D 64 - (vol->nr_clusters & 63);
+
+	/* If errors occurred we may well have gone below zero, fix this. */
+	if (nr_free < 0)
+		nr_free =3D 0;
+	else
+		atomic64_set(&vol->free_clusters, nr_free);
+
+	kfree(ra);
+	NVolSetFreeClusterKnown(vol);
+	wake_up_all(&vol->free_waitq);
+	ntfs_debug("Exiting.");
+	return nr_free;
+}
+
+/*
+ * @nr_clusters is the number of clusters requested for allocation.
+ *
+ * Return the number of clusters available for allocation within
+ * the range of @nr_clusters, which is counts that considered
+ * for delayed allocation.
+ */
+s64 ntfs_available_clusters_count(struct ntfs_volume *vol, s64 nr_clusters)
+{
+	s64 free_clusters;
+
+	/* wait event */
+	if (!NVolFreeClusterKnown(vol))
+		wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+
+	free_clusters =3D atomic64_read(&vol->free_clusters) -
+		atomic64_read(&vol->dirty_clusters);
+	if (free_clusters <=3D 0)
+		return -ENOSPC;
+	else if (free_clusters < nr_clusters)
+		nr_clusters =3D free_clusters;
+
+	return nr_clusters;
+}
+
+/**
+ * __get_nr_free_mft_records - return the number of free inodes on a volume
+ * @vol:	ntfs volume for which to obtain free inode count
+ * @nr_free:	number of mft records in filesystem
+ * @max_index:	maximum number of pages containing set bits
+ *
+ * Calculate the number of free mft records (inodes) on the mounted NTFS
+ * volume @vol. We actually calculate the number of mft records in use ins=
tead
+ * because this allows us to not care about partial pages as these will be=
 just
+ * zero filled and hence not be counted as allocated mft record.
+ *
+ * If any pages cannot be read we assume all mft records in the erroring p=
ages
+ * are in use. This means we return an underestimate on errors which is be=
tter
+ * than an overestimate.
+ *
+ * NOTE: Caller must hold mftbmp_lock rw_semaphore for reading or writing.
+ */
+static unsigned long __get_nr_free_mft_records(struct ntfs_volume *vol,
+		s64 nr_free, const pgoff_t max_index)
+{
+	struct address_space *mapping =3D vol->mftbmp_ino->i_mapping;
+	struct folio *folio;
+	pgoff_t index;
+	struct file_ra_state *ra;
+
+	ntfs_debug("Entering.");
+
+	ra =3D kzalloc(sizeof(*ra), GFP_NOFS);
+	if (!ra)
+		return 0;
+
+	file_ra_state_init(ra, mapping);
+
+	/* Use multiples of 4 bytes, thus max_size is PAGE_SIZE / 4. */
+	ntfs_debug("Reading $MFT/$BITMAP, max_index =3D 0x%lx, max_size =3D 0x%lx=
.",
+			max_index, PAGE_SIZE / 4);
+	for (index =3D 0; index < max_index; index++) {
+		unsigned long *kaddr;
+
+		/*
+		 * Get folio from page cache, getting it from backing store
+		 * if necessary, and increment the use count.
+		 */
+		folio =3D filemap_lock_folio(mapping, index);
+		if (IS_ERR(folio)) {
+			page_cache_sync_readahead(mapping, ra, NULL,
+				index, max_index - index);
+			folio =3D ntfs_read_mapping_folio(mapping, index);
+			if (!IS_ERR(folio))
+				folio_lock(folio);
+		}
+
+		/* Ignore pages which errored synchronously. */
+		if (IS_ERR(folio)) {
+			ntfs_debug("read_mapping_page() error. Skipping page (index 0x%lx).",
+					index);
+			nr_free -=3D PAGE_SIZE * 8;
+			continue;
+		}
+
+		kaddr =3D kmap_local_folio(folio, 0);
+		/*
+		 * Subtract the number of set bits. If this
+		 * is the last page and it is partial we don't really care as
+		 * it just means we do a little extra work but it won't affect
+		 * the result as all out of range bytes are set to zero by
+		 * ntfs_readpage().
+		 */
+		nr_free -=3D bitmap_weight(kaddr,
+					PAGE_SIZE * BITS_PER_BYTE);
+		kunmap_local(kaddr);
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+	ntfs_debug("Finished reading $MFT/$BITMAP, last index =3D 0x%lx.",
+			index - 1);
+	/* If errors occurred we may well have gone below zero, fix this. */
+	if (nr_free < 0)
+		nr_free =3D 0;
+	else
+		atomic64_set(&vol->free_mft_records, nr_free);
+
+	kfree(ra);
+	ntfs_debug("Exiting.");
+	return nr_free;
+}
+
+/**
+ * ntfs_statfs - return information about mounted NTFS volume
+ * @dentry:	dentry from mounted volume
+ * @sfs:	statfs structure in which to return the information
+ *
+ * Return information about the mounted NTFS volume @dentry in the statfs =
structure
+ * pointed to by @sfs (this is initialized with zeros before ntfs_statfs is
+ * called). We interpret the values to be correct of the moment in time at
+ * which we are called. Most values are variable otherwise and this isn't =
just
+ * the free values but the totals as well. For example we can increase the
+ * total number of file nodes if we run out and we can keep doing this unt=
il
+ * there is no more space on the volume left at all.
+ *
+ * Called from vfs_statfs which is used to handle the statfs, fstatfs, and
+ * ustat system calls.
+ *
+ * Return 0 on success or -errno on error.
+ */
+static int ntfs_statfs(struct dentry *dentry, struct kstatfs *sfs)
+{
+	struct super_block *sb =3D dentry->d_sb;
+	s64 size;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	struct ntfs_inode *mft_ni =3D NTFS_I(vol->mft_ino);
+	unsigned long flags;
+
+	ntfs_debug("Entering.");
+	/* Type of filesystem. */
+	sfs->f_type   =3D NTFS_SB_MAGIC;
+	/* Optimal transfer block size. */
+	sfs->f_bsize =3D vol->cluster_size;
+	/* Fundamental file system block size, used as the unit. */
+	sfs->f_frsize =3D vol->cluster_size;
+
+	/*
+	 * Total data blocks in filesystem in units of f_bsize and since
+	 * inodes are also stored in data blocs ($MFT is a file) this is just
+	 * the total clusters.
+	 */
+	sfs->f_blocks =3D vol->nr_clusters;
+
+	/* wait event */
+	if (!NVolFreeClusterKnown(vol))
+		wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+
+	/* Free data blocks in filesystem in units of f_bsize. */
+	size =3D atomic64_read(&vol->free_clusters) -
+		atomic64_read(&vol->dirty_clusters);
+	if (size < 0LL)
+		size =3D 0LL;
+
+	/* Free blocks avail to non-superuser, same as above on NTFS. */
+	sfs->f_bavail =3D sfs->f_bfree =3D size;
+
+	/* Number of inodes in filesystem (at this point in time). */
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	sfs->f_files =3D i_size_read(vol->mft_ino) >> vol->mft_record_size_bits;
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+
+	/* Free inodes in fs (based on current total count). */
+	sfs->f_ffree =3D atomic64_read(&vol->free_mft_records);
+
+	/*
+	 * File system id. This is extremely *nix flavour dependent and even
+	 * within Linux itself all fs do their own thing. I interpret this to
+	 * mean a unique id associated with the mounted fs and not the id
+	 * associated with the filesystem driver, the latter is already given
+	 * by the filesystem type in sfs->f_type. Thus we use the 64-bit
+	 * volume serial number splitting it into two 32-bit parts. We enter
+	 * the least significant 32-bits in f_fsid[0] and the most significant
+	 * 32-bits in f_fsid[1].
+	 */
+	sfs->f_fsid =3D u64_to_fsid(vol->serial_no);
+	/* Maximum length of filenames. */
+	sfs->f_namelen	   =3D NTFS_MAX_NAME_LEN;
+
+	return 0;
+}
+
+static int ntfs_write_inode(struct inode *vi, struct writeback_control *wb=
c)
+{
+	return __ntfs_write_inode(vi, wbc->sync_mode =3D=3D WB_SYNC_ALL);
+}
+
+/**
+ * The complete super operations.
+ */
+static const struct super_operations ntfs_sops =3D {
+	.alloc_inode	=3D ntfs_alloc_big_inode,	  /* VFS: Allocate new inode. */
+	.free_inode	=3D ntfs_free_big_inode, /* VFS: Deallocate inode. */
+	.drop_inode	=3D ntfs_drop_big_inode,
+	.write_inode	=3D ntfs_write_inode,	/* VFS: Write dirty inode to disk. */
+	.put_super	=3D ntfs_put_super,	/* Syscall: umount. */
+	.shutdown	=3D ntfs_shutdown,
+	.sync_fs	=3D ntfs_sync_fs,		/* Syscall: sync. */
+	.statfs		=3D ntfs_statfs,		/* Syscall: statfs */
+	.evict_inode	=3D ntfs_evict_big_inode,
+	.show_options	=3D ntfs_show_options,	/* Show mount options in proc. */
+};
+
+static void precalc_free_clusters(struct work_struct *work)
+{
+	struct ntfs_volume *vol =3D container_of(work, struct ntfs_volume, precal=
c_work);
+	s64 nr_free;
+
+	nr_free =3D get_nr_free_clusters(vol);
+
+	ntfs_debug("pre-calculate free clusters(%lld) using workqueue",
+			nr_free);
+}
+
+/**
+ * ntfs_fill_super - mount an ntfs filesystem
+ *
+ * ntfs_fill_super() is called by the VFS to mount the device described by=
 @sb
+ * with the mount otions in @data with the NTFS filesystem.
+ *
+ * If @silent is true, remain silent even if errors are detected. This is =
used
+ * during bootup, when the kernel tries to mount the root filesystem with =
all
+ * registered filesystems one after the other until one succeeds. This imp=
lies
+ * that all filesystems except the correct one will quite correctly and
+ * expectedly return an error, but nobody wants to see error messages when=
 in
+ * fact this is what is supposed to happen.
+ */
+static struct lock_class_key ntfs_mft_inval_lock_key;
+
+static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc)
+{
+	char *boot;
+	struct inode *tmp_ino;
+	int blocksize, result;
+	pgoff_t lcn_bit_pages;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	int silent =3D fc->sb_flags & SB_SILENT;
+
+	vol->sb =3D sb;
+
+	/*
+	 * We do a pretty difficult piece of bootstrap by reading the
+	 * MFT (and other metadata) from disk into memory. We'll only
+	 * release this metadata during umount, so the locking patterns
+	 * observed during bootstrap do not count. So turn off the
+	 * observation of locking patterns (strictly for this context
+	 * only) while mounting NTFS. [The validator is still active
+	 * otherwise, even for this context: it will for example record
+	 * lock class registrations.]
+	 */
+	lockdep_off();
+	ntfs_debug("Entering.");
+
+	if (vol->nls_map && !strcmp(vol->nls_map->charset, "utf8"))
+		vol->nls_utf8 =3D true;
+	if (NVolDisableSparse(vol))
+		vol->preallocated_size =3D 0;
+
+	if (NVolDiscard(vol) && !bdev_max_discard_sectors(sb->s_bdev)) {
+		ntfs_warning(
+			sb,
+			"Discard requested but device does not support discard.  Discard disabl=
ed.");
+		NVolClearDiscard(vol);
+	}
+
+	/* We support sector sizes up to the PAGE_SIZE. */
+	if (bdev_logical_block_size(sb->s_bdev) > PAGE_SIZE) {
+		if (!silent)
+			ntfs_error(sb,
+				"Device has unsupported sector size (%i).  The maximum supported secto=
r size on this architecture is %lu bytes.",
+				bdev_logical_block_size(sb->s_bdev),
+				PAGE_SIZE);
+		goto err_out_now;
+	}
+
+	/*
+	 * Setup the device access block size to NTFS_BLOCK_SIZE or the hard
+	 * sector size, whichever is bigger.
+	 */
+	blocksize =3D sb_min_blocksize(sb, NTFS_BLOCK_SIZE);
+	if (blocksize < NTFS_BLOCK_SIZE) {
+		if (!silent)
+			ntfs_error(sb, "Unable to set device block size.");
+		goto err_out_now;
+	}
+
+	ntfs_debug("Set device block size to %i bytes (block size bits %i).",
+			blocksize, sb->s_blocksize_bits);
+	/* Determine the size of the device in units of block_size bytes. */
+	if (!bdev_nr_bytes(sb->s_bdev)) {
+		if (!silent)
+			ntfs_error(sb, "Unable to determine device size.");
+		goto err_out_now;
+	}
+	vol->nr_blocks =3D bdev_nr_bytes(sb->s_bdev) >>
+			sb->s_blocksize_bits;
+	/* Read the boot sector and return unlocked buffer head to it. */
+	boot =3D read_ntfs_boot_sector(sb, silent);
+	if (!boot) {
+		if (!silent)
+			ntfs_error(sb, "Not an NTFS volume.");
+		goto err_out_now;
+	}
+	/*
+	 * Extract the data from the boot sector and setup the ntfs volume
+	 * using it.
+	 */
+	result =3D parse_ntfs_boot_sector(vol, (struct ntfs_boot_sector *)boot);
+	kfree(boot);
+	if (!result) {
+		if (!silent)
+			ntfs_error(sb, "Unsupported NTFS filesystem.");
+		goto err_out_now;
+	}
+
+	if (vol->sector_size > blocksize) {
+		blocksize =3D sb_set_blocksize(sb, vol->sector_size);
+		if (blocksize !=3D vol->sector_size) {
+			if (!silent)
+				ntfs_error(sb,
+					   "Unable to set device block size to sector size (%i).",
+					   vol->sector_size);
+			goto err_out_now;
+		}
+		vol->nr_blocks =3D bdev_nr_bytes(sb->s_bdev) >>
+				sb->s_blocksize_bits;
+		ntfs_debug("Changed device block size to %i bytes (block size bits %i) t=
o match volume sector size.",
+				blocksize, sb->s_blocksize_bits);
+	}
+	/* Initialize the cluster and mft allocators. */
+	ntfs_setup_allocators(vol);
+	/* Setup remaining fields in the super block. */
+	sb->s_magic =3D NTFS_SB_MAGIC;
+	/*
+	 * Ntfs allows 63 bits for the file size, i.e. correct would be:
+	 *	sb->s_maxbytes =3D ~0ULL >> 1;
+	 * But the kernel uses a long as the page cache page index which on
+	 * 32-bit architectures is only 32-bits. MAX_LFS_FILESIZE is kernel
+	 * defined to the maximum the page cache page index can cope with
+	 * without overflowing the index or to 2^63 - 1, whichever is smaller.
+	 */
+	sb->s_maxbytes =3D MAX_LFS_FILESIZE;
+	/* Ntfs measures time in 100ns intervals. */
+	sb->s_time_gran =3D 100;
+
+	sb->s_xattr =3D ntfsp_xattr_handlers;
+	/*
+	 * Now load the metadata required for the page cache and our address
+	 * space operations to function. We do this by setting up a specialised
+	 * read_inode method and then just calling the normal iget() to obtain
+	 * the inode for $MFT which is sufficient to allow our normal inode
+	 * operations and associated address space operations to function.
+	 */
+	sb->s_op =3D &ntfs_sops;
+	tmp_ino =3D new_inode(sb);
+	if (!tmp_ino) {
+		if (!silent)
+			ntfs_error(sb, "Failed to load essential metadata.");
+		goto err_out_now;
+	}
+
+	tmp_ino->i_ino =3D FILE_MFT;
+	insert_inode_hash(tmp_ino);
+	if (ntfs_read_inode_mount(tmp_ino) < 0) {
+		if (!silent)
+			ntfs_error(sb, "Failed to load essential metadata.");
+		goto iput_tmp_ino_err_out_now;
+	}
+	lockdep_set_class(&tmp_ino->i_mapping->invalidate_lock,
+			  &ntfs_mft_inval_lock_key);
+
+	mutex_lock(&ntfs_lock);
+
+	/*
+	 * Generate the global default upcase table if necessary.  Also
+	 * temporarily increment the number of upcase users to avoid race
+	 * conditions with concurrent (u)mounts.
+	 */
+	if (!default_upcase)
+		default_upcase =3D generate_default_upcase();
+	ntfs_nr_upcase_users++;
+	mutex_unlock(&ntfs_lock);
+
+	lcn_bit_pages =3D (((vol->nr_clusters + 7) >> 3) + PAGE_SIZE - 1) >> PAGE=
_SHIFT;
+	vol->lcn_empty_bits_per_page =3D kvmalloc_array(lcn_bit_pages, sizeof(uns=
igned int),
+						      GFP_KERNEL);
+	if (!vol->lcn_empty_bits_per_page) {
+		ntfs_error(sb,
+			   "Unable to allocate pages for storing LCN empty bit counts\n");
+		goto unl_upcase_iput_tmp_ino_err_out_now;
+	}
+
+	/*
+	 * From now on, ignore @silent parameter. If we fail below this line,
+	 * it will be due to a corrupt fs or a system error, so we report it.
+	 */
+	/*
+	 * Open the system files with normal access functions and complete
+	 * setting up the ntfs super block.
+	 */
+	if (!load_system_files(vol)) {
+		ntfs_error(sb, "Failed to load system files.");
+		goto unl_upcase_iput_tmp_ino_err_out_now;
+	}
+
+	/* We grab a reference, simulating an ntfs_iget(). */
+	ihold(vol->root_ino);
+	sb->s_root =3D d_make_root(vol->root_ino);
+	if (sb->s_root) {
+		s64 nr_records;
+
+		ntfs_debug("Exiting, status successful.");
+
+		/* Release the default upcase if it has no users. */
+		mutex_lock(&ntfs_lock);
+		if (!--ntfs_nr_upcase_users && default_upcase) {
+			ntfs_free(default_upcase);
+			default_upcase =3D NULL;
+		}
+		mutex_unlock(&ntfs_lock);
+		sb->s_export_op =3D &ntfs_export_ops;
+		lockdep_on();
+
+		nr_records =3D __get_nr_free_mft_records(vol,
+				i_size_read(vol->mft_ino) >> vol->mft_record_size_bits,
+				((((NTFS_I(vol->mft_ino)->initialized_size >>
+				    vol->mft_record_size_bits) +
+				   7) >> 3) + PAGE_SIZE - 1) >> PAGE_SHIFT);
+		ntfs_debug("Free mft records(%lld)", nr_records);
+
+		init_waitqueue_head(&vol->free_waitq);
+		INIT_WORK(&vol->precalc_work, precalc_free_clusters);
+		queue_work(ntfs_wq, &vol->precalc_work);
+		return 0;
+	}
+	ntfs_error(sb, "Failed to allocate root directory.");
+	/* Clean up after the successful load_system_files() call from above. */
+	iput(vol->vol_ino);
+	vol->vol_ino =3D NULL;
+	/* NTFS 3.0+ specific clean up. */
+	if (vol->major_ver >=3D 3) {
+		if (vol->quota_q_ino) {
+			iput(vol->quota_q_ino);
+			vol->quota_q_ino =3D NULL;
+		}
+		if (vol->quota_ino) {
+			iput(vol->quota_ino);
+			vol->quota_ino =3D NULL;
+		}
+		if (vol->extend_ino) {
+			iput(vol->extend_ino);
+			vol->extend_ino =3D NULL;
+		}
+		if (vol->secure_ino) {
+			iput(vol->secure_ino);
+			vol->secure_ino =3D NULL;
+		}
+	}
+	iput(vol->root_ino);
+	vol->root_ino =3D NULL;
+	iput(vol->lcnbmp_ino);
+	vol->lcnbmp_ino =3D NULL;
+	iput(vol->mftbmp_ino);
+	vol->mftbmp_ino =3D NULL;
+	if (vol->logfile_ino) {
+		iput(vol->logfile_ino);
+		vol->logfile_ino =3D NULL;
+	}
+	if (vol->mftmirr_ino) {
+		iput(vol->mftmirr_ino);
+		vol->mftmirr_ino =3D NULL;
+	}
+	/* Throw away the table of attribute definitions. */
+	vol->attrdef_size =3D 0;
+	if (vol->attrdef) {
+		ntfs_free(vol->attrdef);
+		vol->attrdef =3D NULL;
+	}
+	vol->upcase_len =3D 0;
+	mutex_lock(&ntfs_lock);
+	if (vol->upcase =3D=3D default_upcase) {
+		ntfs_nr_upcase_users--;
+		vol->upcase =3D NULL;
+	}
+	mutex_unlock(&ntfs_lock);
+	if (vol->upcase) {
+		ntfs_free(vol->upcase);
+		vol->upcase =3D NULL;
+	}
+	if (vol->nls_map) {
+		unload_nls(vol->nls_map);
+		vol->nls_map =3D NULL;
+	}
+	/* Error exit code path. */
+unl_upcase_iput_tmp_ino_err_out_now:
+	if (vol->lcn_empty_bits_per_page)
+		kvfree(vol->lcn_empty_bits_per_page);
+	/*
+	 * Decrease the number of upcase users and destroy the global default
+	 * upcase table if necessary.
+	 */
+	mutex_lock(&ntfs_lock);
+	if (!--ntfs_nr_upcase_users && default_upcase) {
+		ntfs_free(default_upcase);
+		default_upcase =3D NULL;
+	}
+
+	mutex_unlock(&ntfs_lock);
+iput_tmp_ino_err_out_now:
+	iput(tmp_ino);
+	if (vol->mft_ino && vol->mft_ino !=3D tmp_ino)
+		iput(vol->mft_ino);
+	vol->mft_ino =3D NULL;
+	/* Errors at this stage are irrelevant. */
+err_out_now:
+	sb->s_fs_info =3D NULL;
+	kfree(vol);
+	ntfs_debug("Failed, returning -EINVAL.");
+	lockdep_on();
+	return -EINVAL;
+}
+
+/*
+ * This is a slab cache to optimize allocations and deallocations of Unico=
de
+ * strings of the maximum length allowed by NTFS, which is NTFS_MAX_NAME_L=
EN
+ * (255) Unicode characters + a terminating NULL Unicode character.
+ */
+struct kmem_cache *ntfs_name_cache;
+
+/* Slab caches for efficient allocation/deallocation of inodes. */
+struct kmem_cache *ntfs_inode_cache;
+struct kmem_cache *ntfs_big_inode_cache;
+
+/* Init once constructor for the inode slab cache. */
+static void ntfs_big_inode_init_once(void *foo)
+{
+	struct ntfs_inode *ni =3D (struct ntfs_inode *)foo;
+
+	inode_init_once(VFS_I(ni));
+}
+
+/*
+ * Slab caches to optimize allocations and deallocations of attribute sear=
ch
+ * contexts and index contexts, respectively.
+ */
+struct kmem_cache *ntfs_attr_ctx_cache;
+struct kmem_cache *ntfs_index_ctx_cache;
+
+/* Driver wide mutex. */
+DEFINE_MUTEX(ntfs_lock);
+
+static int ntfs_get_tree(struct fs_context *fc)
+{
+	return get_tree_bdev(fc, ntfs_fill_super);
+}
+
+static void ntfs_free_fs_context(struct fs_context *fc)
+{
+	struct ntfs_volume *vol =3D fc->s_fs_info;
+
+	if (vol)
+		ntfs_volume_free(vol);
+}
+
+static const struct fs_context_operations ntfs_context_ops =3D {
+	.parse_param	=3D ntfs_parse_param,
+	.get_tree	=3D ntfs_get_tree,
+	.free		=3D ntfs_free_fs_context,
+	.reconfigure	=3D ntfs_reconfigure,
+};
+
+static int ntfs_init_fs_context(struct fs_context *fc)
+{
+	struct ntfs_volume *vol;
+
+	/* Allocate a new struct ntfs_volume and place it in sb->s_fs_info. */
+	vol =3D kmalloc(sizeof(struct ntfs_volume), GFP_NOFS);
+	if (!vol)
+		return -ENOMEM;
+
+	/* Initialize struct ntfs_volume structure. */
+	*vol =3D (struct ntfs_volume) {
+		.uid =3D INVALID_UID,
+		.gid =3D INVALID_GID,
+		.fmask =3D 0,
+		.dmask =3D 0,
+		.mft_zone_multiplier =3D 1,
+		.on_errors =3D ON_ERRORS_CONTINUE,
+		.nls_map =3D load_nls_default(),
+		.preallocated_size =3D NTFS_DEF_PREALLOC_SIZE,
+	};
+
+	NVolSetShowHiddenFiles(vol);
+	NVolSetCaseSensitive(vol);
+	init_rwsem(&vol->mftbmp_lock);
+	init_rwsem(&vol->lcnbmp_lock);
+
+	fc->s_fs_info =3D vol;
+	fc->ops =3D &ntfs_context_ops;
+	return 0;
+}
+
+static struct file_system_type ntfs_fs_type =3D {
+	.owner                  =3D THIS_MODULE,
+	.name                   =3D "ntfsplus",
+	.init_fs_context        =3D ntfs_init_fs_context,
+	.parameters             =3D ntfs_parameters,
+	.kill_sb                =3D kill_block_super,
+	.fs_flags               =3D FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
+};
+MODULE_ALIAS_FS("ntfsplus");
+
+static int ntfs_workqueue_init(void)
+{
+	ntfs_wq =3D alloc_workqueue("ntfsplus-bg-io", 0, 0);
+	if (!ntfs_wq)
+		return -ENOMEM;
+	return 0;
+}
+
+static void ntfs_workqueue_destroy(void)
+{
+	destroy_workqueue(ntfs_wq);
+	ntfs_wq =3D NULL;
+}
+
+/* Stable names for the slab caches. */
+static const char ntfs_index_ctx_cache_name[] =3D "ntfs_index_ctx_cache";
+static const char ntfs_attr_ctx_cache_name[] =3D "ntfs_attr_ctx_cache";
+static const char ntfs_name_cache_name[] =3D "ntfs_name_cache";
+static const char ntfs_inode_cache_name[] =3D "ntfs_inode_cache";
+static const char ntfs_big_inode_cache_name[] =3D "ntfs_big_inode_cache";
+
+static int __init init_ntfs_fs(void)
+{
+	int err =3D 0;
+
+	err =3D ntfs_workqueue_init();
+	if (err) {
+		pr_crit("Failed to register workqueue!\n");
+		return err;
+	}
+
+	ntfs_index_ctx_cache =3D kmem_cache_create(ntfs_index_ctx_cache_name,
+			sizeof(struct ntfs_index_context), 0 /* offset */,
+			SLAB_HWCACHE_ALIGN, NULL /* ctor */);
+	if (!ntfs_index_ctx_cache) {
+		pr_crit("Failed to create %s!\n", ntfs_index_ctx_cache_name);
+		goto ictx_err_out;
+	}
+	ntfs_attr_ctx_cache =3D kmem_cache_create(ntfs_attr_ctx_cache_name,
+			sizeof(struct ntfs_attr_search_ctx), 0 /* offset */,
+			SLAB_HWCACHE_ALIGN, NULL /* ctor */);
+	if (!ntfs_attr_ctx_cache) {
+		pr_crit("ntfs+: Failed to create %s!\n",
+			ntfs_attr_ctx_cache_name);
+		goto actx_err_out;
+	}
+
+	ntfs_name_cache =3D kmem_cache_create(ntfs_name_cache_name,
+			(NTFS_MAX_NAME_LEN+2) * sizeof(__le16), 0,
+			SLAB_HWCACHE_ALIGN, NULL);
+	if (!ntfs_name_cache) {
+		pr_crit("Failed to create %s!\n", ntfs_name_cache_name);
+		goto name_err_out;
+	}
+
+	ntfs_inode_cache =3D kmem_cache_create(ntfs_inode_cache_name,
+			sizeof(struct ntfs_inode), 0, SLAB_RECLAIM_ACCOUNT, NULL);
+	if (!ntfs_inode_cache) {
+		pr_crit("Failed to create %s!\n", ntfs_inode_cache_name);
+		goto inode_err_out;
+	}
+
+	ntfs_big_inode_cache =3D kmem_cache_create(ntfs_big_inode_cache_name,
+			sizeof(struct big_ntfs_inode), 0, SLAB_HWCACHE_ALIGN |
+			SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+			ntfs_big_inode_init_once);
+	if (!ntfs_big_inode_cache) {
+		pr_crit("Failed to create %s!\n", ntfs_big_inode_cache_name);
+		goto big_inode_err_out;
+	}
+
+	/* Register the ntfs sysctls. */
+	err =3D ntfs_sysctl(1);
+	if (err) {
+		pr_crit("Failed to register NTFS sysctls!\n");
+		goto sysctl_err_out;
+	}
+
+	err =3D register_filesystem(&ntfs_fs_type);
+	if (!err) {
+		ntfs_debug("ntfs+ driver registered successfully.");
+		return 0; /* Success! */
+	}
+	pr_crit("Failed to register ntfs+ filesystem driver!\n");
+
+	/* Unregister the ntfs sysctls. */
+	ntfs_sysctl(0);
+sysctl_err_out:
+	kmem_cache_destroy(ntfs_big_inode_cache);
+big_inode_err_out:
+	kmem_cache_destroy(ntfs_inode_cache);
+inode_err_out:
+	kmem_cache_destroy(ntfs_name_cache);
+name_err_out:
+	kmem_cache_destroy(ntfs_attr_ctx_cache);
+actx_err_out:
+	kmem_cache_destroy(ntfs_index_ctx_cache);
+ictx_err_out:
+	if (!err) {
+		pr_crit("Aborting ntfs+ filesystem driver registration...\n");
+		err =3D -ENOMEM;
+	}
+	return err;
+}
+
+static void __exit exit_ntfs_fs(void)
+{
+	ntfs_debug("Unregistering ntfs+ driver.");
+
+	unregister_filesystem(&ntfs_fs_type);
+
+	/*
+	 * Make sure all delayed rcu free inodes are flushed before we
+	 * destroy cache.
+	 */
+	rcu_barrier();
+	kmem_cache_destroy(ntfs_big_inode_cache);
+	kmem_cache_destroy(ntfs_inode_cache);
+	kmem_cache_destroy(ntfs_name_cache);
+	kmem_cache_destroy(ntfs_attr_ctx_cache);
+	kmem_cache_destroy(ntfs_index_ctx_cache);
+	ntfs_workqueue_destroy();
+	/* Unregister the ntfs sysctls. */
+	ntfs_sysctl(0);
+}
+
+module_init(init_ntfs_fs);
+module_exit(exit_ntfs_fs);
+
+MODULE_AUTHOR("Anton Altaparmakov <anton@tuxera.com>"); /* Original read-o=
nly NTFS driver */
+MODULE_AUTHOR("Namjae Jeon <linkinjeon@kernel.org>"); /* Add write, iomap =
and various features */
+MODULE_DESCRIPTION("NTFS+ read-write filesystem driver");
+MODULE_LICENSE("GPL");
+#ifdef DEBUG
+module_param(debug_msgs, bint, 0);
+MODULE_PARM_DESC(debug_msgs, "Enable debug messages.");
+#endif
--=20
2.25.1
From nobody Sun Feb  8 02:39:38 2026
Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com
 [209.85.214.175])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9ECB12D9780
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:00:31 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.175
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219643; cv=none;
 b=W8YRV6vWNvjmtkYrDFtSiZrMxFvsXxk1kQawfujgDCzBnKreAioZtPqbmHmWK76WqL1/OMQknjOREasIhmdqii+t++z++hln8BhLIkO7VcdtlBFX8FCKNJ4rt/OkpVrkvh5w1i42Gn5i/a0TzwQLR67GXeGzFSFyndxCU3ttut0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219643; c=relaxed/simple;
	bh=n0JRXr92h3lpVQaSxPHtFuJlO8b9O3BW1Iea7A6/ALw=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=Yk59dHTNQF2QGV27wu/QqkGmWU7i/ygM2HYxZ7rMxCgXnqHPPJrksfY8+T/3sEa6QNpLO1VV5R4EuZRH86G1zdTXtrQNFwpEf1dFNPlKcGeYOaspJKNGrYXx7WwjjdoZIKfw1Qb5ERhTmIZqKu2PAeck94QO03gxX2ZRx2ztIEY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.175
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f175.google.com with SMTP id
 d9443c01a7336-297d4a56f97so7198255ad.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:00:31 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219631; x=1764824431;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=bUjy3b3BR3skAPU1G4VO+s6kZAbKlWZdK5uKQynd2z8=;
        b=ix9ewSFIna3RipeRzqD9jltCaSYdCVXfRfCd5HNx6ub41IPQSjOwXOVoQG26IXarkO
         dLDCGy5TGytZyap9U5cNh+WL4RZphubPR9oOjdT5RRdPNwfGXyFMFDkVhCYYQKRyZWP4
         oWyplxrri5rXud0WPC+Q3PC9Pm2ToNTWGsbc48TIiDsvAM4jBbNZkw1SngmfHFW3kXZ7
         EMEJvthKdRML3BqeIVChwyE/uzvOyKEYj5Rwd2q/0snMQrFjIqZmoKXgWU+QQQTSkl/O
         mvfw+Qjl5A7qh23MFKtkMb5iE+WMly+h+SmQc8DA1mEmOxhs1ELTVAodeCemopwJmsi5
         N7VA==
X-Forwarded-Encrypted: i=1;
 AJvYcCX7QKD5usZAf0FEhB2eRboxG6LDau6JsCB7GnrQNlU3+Blv/OuPJEkAOek5eYKbipLj9nJQdJxzt3vBEmI=@vger.kernel.org
X-Gm-Message-State: AOJu0YzPy7dYpfjLW1wWLWcS6lU0kPAfyrj178V2mIHJ1lvMnBuG6Ogb
	3cYODLOwTab02XciUu5JhjNHNmpLfAB0FJS8CdUDeJc96eG7qNOywb9C
X-Gm-Gg: ASbGnctW/8jP/05L+Rfyp98ViSn4mWGa30pXyYEtmOK9vcv1sLkMslFQJlpXyyZB38z
	A4Gx2W4ULO+PCi14ijb5hAy1lHQ1oImv4zmjOzZMRWsWeLtNnqiZNlAP62IJjVXQ4twSoWO2X5W
	lUy+bINqy+o9/ULonj5AFl+KufRfeu/+ZOtwe80BJAAj0jWulmFnZnmX6yFuHhybizRZDAv+hg4
	8uPccbns5ZitbOyahYcFlCrbPwJwp2CDKBe6ld/F74kTqsya5/9i1LxXG6e7RYqhXvPpGasphms
	kLNLTcGY9PpGlH/JRnkJVB6wycrSeZFKt9oUDxB9q+Y94pnUcscySvMugK+VU419mSqag2dGaoc
	tNxaMZRKOIDXINEC1CKMaZBcPryrkIICeAPJ+D2sOuNG6mY5KYfd3Zee3K7O7LT21P0+d6WLwYy
	Qc5aWLZJDVDMpVGCuyWDoevA4SFg==
X-Google-Smtp-Source: 
 AGHT+IEXgReFwdVm/JnEV/3B/f6UKTP7M0DtsiED3ek+j0y8xlEr/mQ71VtkNI4WpckhfQ4RD4dxSA==
X-Received: by 2002:a17:903:1209:b0:298:3e3a:ae6 with SMTP id
 d9443c01a7336-29b6c6ae64fmr235647345ad.48.1764219628276;
        Wed, 26 Nov 2025 21:00:28 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.00.21
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:00:26 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>,
	Hyunchul Lee <hyc.lee@gmail.com>
Subject: [PATCH v2 03/11] ntfsplus: add inode operations
Date: Thu, 27 Nov 2025 13:59:36 +0900
Message-Id: <20251127045944.26009-4-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds the implementation of inode operations for ntfsplus.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/ntfsplus/inode.c | 3729 +++++++++++++++++++++++++++++++++++++++++++
 fs/ntfsplus/mft.c   | 2698 +++++++++++++++++++++++++++++++
 fs/ntfsplus/mst.c   |  195 +++
 fs/ntfsplus/namei.c | 1677 +++++++++++++++++++
 4 files changed, 8299 insertions(+)
 create mode 100644 fs/ntfsplus/inode.c
 create mode 100644 fs/ntfsplus/mft.c
 create mode 100644 fs/ntfsplus/mst.c
 create mode 100644 fs/ntfsplus/namei.c

diff --git a/fs/ntfsplus/inode.c b/fs/ntfsplus/inode.c
new file mode 100644
index 000000000000..f577ef28ba69
--- /dev/null
+++ b/fs/ntfsplus/inode.c
@@ -0,0 +1,3729 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS kernel inode handling.
+ *
+ * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/writeback.h>
+#include <linux/seq_file.h>
+
+#include "lcnalloc.h"
+#include "misc.h"
+#include "ntfs.h"
+#include "index.h"
+#include "attrlist.h"
+#include "reparse.h"
+#include "ea.h"
+#include "attrib.h"
+#include "ntfs_iomap.h"
+
+/**
+ * ntfs_test_inode - compare two (possibly fake) inodes for equality
+ * @vi:		vfs inode which to test
+ * @data:	data which is being tested with
+ *
+ * Compare the ntfs attribute embedded in the ntfs specific part of the vfs
+ * inode @vi for equality with the ntfs attribute @data.
+ *
+ * If searching for the normal file/directory inode, set @na->type to AT_U=
NUSED.
+ * @na->name and @na->name_len are then ignored.
+ *
+ * Return 1 if the attributes match and 0 if not.
+ *
+ * NOTE: This function runs with the inode_hash_lock spin lock held so it =
is not
+ * allowed to sleep.
+ */
+int ntfs_test_inode(struct inode *vi, void *data)
+{
+	struct ntfs_attr *na =3D (struct ntfs_attr *)data;
+	struct ntfs_inode *ni;
+
+	if (vi->i_ino !=3D na->mft_no)
+		return 0;
+
+	ni =3D NTFS_I(vi);
+
+	/* If !NInoAttr(ni), @vi is a normal file or directory inode. */
+	if (likely(!NInoAttr(ni))) {
+		/* If not looking for a normal inode this is a mismatch. */
+		if (unlikely(na->type !=3D AT_UNUSED))
+			return 0;
+	} else {
+		/* A fake inode describing an attribute. */
+		if (ni->type !=3D na->type)
+			return 0;
+		if (ni->name_len !=3D na->name_len)
+			return 0;
+		if (na->name_len && memcmp(ni->name, na->name,
+				na->name_len * sizeof(__le16)))
+			return 0;
+		if (!ni->ext.base_ntfs_ino)
+			return 0;
+	}
+
+	/* Match! */
+	return 1;
+}
+
+/**
+ * ntfs_init_locked_inode - initialize an inode
+ * @vi:		vfs inode to initialize
+ * @data:	data which to initialize @vi to
+ *
+ * Initialize the vfs inode @vi with the values from the ntfs attribute @d=
ata in
+ * order to enable ntfs_test_inode() to do its work.
+ *
+ * If initializing the normal file/directory inode, set @na->type to AT_UN=
USED.
+ * In that case, @na->name and @na->name_len should be set to NULL and 0,
+ * respectively. Although that is not strictly necessary as
+ * ntfs_read_locked_inode() will fill them in later.
+ *
+ * Return 0 on success and error.
+ *
+ * NOTE: This function runs with the inode->i_lock spin lock held so it is=
 not
+ * allowed to sleep. (Hence the GFP_ATOMIC allocation.)
+ */
+static int ntfs_init_locked_inode(struct inode *vi, void *data)
+{
+	struct ntfs_attr *na =3D (struct ntfs_attr *)data;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+
+	vi->i_ino =3D na->mft_no;
+
+	if (na->type =3D=3D AT_INDEX_ALLOCATION)
+		NInoSetMstProtected(ni);
+	else
+		ni->type =3D na->type;
+
+	ni->name =3D na->name;
+	ni->name_len =3D na->name_len;
+	ni->folio =3D NULL;
+	atomic_set(&ni->count, 1);
+
+	/* If initializing a normal inode, we are done. */
+	if (likely(na->type =3D=3D AT_UNUSED))
+		return 0;
+
+	/* It is a fake inode. */
+	NInoSetAttr(ni);
+
+	/*
+	 * We have I30 global constant as an optimization as it is the name
+	 * in >99.9% of named attributes! The other <0.1% incur a GFP_ATOMIC
+	 * allocation but that is ok. And most attributes are unnamed anyway,
+	 * thus the fraction of named attributes with name !=3D I30 is actually
+	 * absolutely tiny.
+	 */
+	if (na->name_len && na->name !=3D I30) {
+		unsigned int i;
+
+		i =3D na->name_len * sizeof(__le16);
+		ni->name =3D kmalloc(i + sizeof(__le16), GFP_ATOMIC);
+		if (!ni->name)
+			return -ENOMEM;
+		memcpy(ni->name, na->name, i);
+		ni->name[na->name_len] =3D 0;
+	}
+	return 0;
+}
+
+static int ntfs_read_locked_inode(struct inode *vi);
+static int ntfs_read_locked_attr_inode(struct inode *base_vi, struct inode=
 *vi);
+static int ntfs_read_locked_index_inode(struct inode *base_vi,
+		struct inode *vi);
+
+/**
+ * ntfs_iget - obtain a struct inode corresponding to a specific normal in=
ode
+ * @sb:		super block of mounted volume
+ * @mft_no:	mft record number / inode number to obtain
+ *
+ * Obtain the struct inode corresponding to a specific normal inode (i.e. a
+ * file or directory).
+ *
+ * If the inode is in the cache, it is just returned with an increased
+ * reference count. Otherwise, a new struct inode is allocated and initial=
ized,
+ * and finally ntfs_read_locked_inode() is called to read in the inode and
+ * fill in the remainder of the inode structure.
+ *
+ * Return the struct inode on success. Check the return value with IS_ERR(=
) and
+ * if true, the function failed and the error code is obtained from PTR_ER=
R().
+ */
+struct inode *ntfs_iget(struct super_block *sb, unsigned long mft_no)
+{
+	struct inode *vi;
+	int err;
+	struct ntfs_attr na;
+
+	na.mft_no =3D mft_no;
+	na.type =3D AT_UNUSED;
+	na.name =3D NULL;
+	na.name_len =3D 0;
+
+	vi =3D iget5_locked(sb, mft_no, ntfs_test_inode,
+			ntfs_init_locked_inode, &na);
+	if (unlikely(!vi))
+		return ERR_PTR(-ENOMEM);
+
+	err =3D 0;
+
+	/* If this is a freshly allocated inode, need to read it now. */
+	if (vi->i_state & I_NEW) {
+		err =3D ntfs_read_locked_inode(vi);
+		unlock_new_inode(vi);
+	}
+	/*
+	 * There is no point in keeping bad inodes around if the failure was
+	 * due to ENOMEM. We want to be able to retry again later.
+	 */
+	if (unlikely(err =3D=3D -ENOMEM)) {
+		iput(vi);
+		vi =3D ERR_PTR(err);
+	}
+	return vi;
+}
+
+/**
+ * ntfs_attr_iget - obtain a struct inode corresponding to an attribute
+ * @base_vi:	vfs base inode containing the attribute
+ * @type:	attribute type
+ * @name:	Unicode name of the attribute (NULL if unnamed)
+ * @name_len:	length of @name in Unicode characters (0 if unnamed)
+ *
+ * Obtain the (fake) struct inode corresponding to the attribute specified=
 by
+ * @type, @name, and @name_len, which is present in the base mft record
+ * specified by the vfs inode @base_vi.
+ *
+ * If the attribute inode is in the cache, it is just returned with an
+ * increased reference count. Otherwise, a new struct inode is allocated a=
nd
+ * initialized, and finally ntfs_read_locked_attr_inode() is called to rea=
d the
+ * attribute and fill in the inode structure.
+ *
+ * Note, for index allocation attributes, you need to use ntfs_index_iget()
+ * instead of ntfs_attr_iget() as working with indices is a lot more compl=
ex.
+ *
+ * Return the struct inode of the attribute inode on success. Check the re=
turn
+ * value with IS_ERR() and if true, the function failed and the error code=
 is
+ * obtained from PTR_ERR().
+ */
+struct inode *ntfs_attr_iget(struct inode *base_vi, __le32 type,
+		__le16 *name, u32 name_len)
+{
+	struct inode *vi;
+	int err;
+	struct ntfs_attr na;
+
+	/* Make sure no one calls ntfs_attr_iget() for indices. */
+	WARN_ON(type =3D=3D AT_INDEX_ALLOCATION);
+
+	na.mft_no =3D base_vi->i_ino;
+	na.type =3D type;
+	na.name =3D name;
+	na.name_len =3D name_len;
+
+	vi =3D iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode,
+			ntfs_init_locked_inode, &na);
+	if (unlikely(!vi))
+		return ERR_PTR(-ENOMEM);
+	err =3D 0;
+
+	/* If this is a freshly allocated inode, need to read it now. */
+	if (vi->i_state & I_NEW) {
+		err =3D ntfs_read_locked_attr_inode(base_vi, vi);
+		unlock_new_inode(vi);
+	}
+	/*
+	 * There is no point in keeping bad attribute inodes around. This also
+	 * simplifies things in that we never need to check for bad attribute
+	 * inodes elsewhere.
+	 */
+	if (unlikely(err)) {
+		iput(vi);
+		vi =3D ERR_PTR(err);
+	}
+	return vi;
+}
+
+/**
+ * ntfs_index_iget - obtain a struct inode corresponding to an index
+ * @base_vi:	vfs base inode containing the index related attributes
+ * @name:	Unicode name of the index
+ * @name_len:	length of @name in Unicode characters
+ *
+ * Obtain the (fake) struct inode corresponding to the index specified by =
@name
+ * and @name_len, which is present in the base mft record specified by the=
 vfs
+ * inode @base_vi.
+ *
+ * If the index inode is in the cache, it is just returned with an increas=
ed
+ * reference count.  Otherwise, a new struct inode is allocated and
+ * initialized, and finally ntfs_read_locked_index_inode() is called to re=
ad
+ * the index related attributes and fill in the inode structure.
+ *
+ * Return the struct inode of the index inode on success. Check the return
+ * value with IS_ERR() and if true, the function failed and the error code=
 is
+ * obtained from PTR_ERR().
+ */
+struct inode *ntfs_index_iget(struct inode *base_vi, __le16 *name,
+		u32 name_len)
+{
+	struct inode *vi;
+	int err;
+	struct ntfs_attr na;
+
+	na.mft_no =3D base_vi->i_ino;
+	na.type =3D AT_INDEX_ALLOCATION;
+	na.name =3D name;
+	na.name_len =3D name_len;
+
+	vi =3D iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode,
+			ntfs_init_locked_inode, &na);
+	if (unlikely(!vi))
+		return ERR_PTR(-ENOMEM);
+
+	err =3D 0;
+
+	/* If this is a freshly allocated inode, need to read it now. */
+	if (vi->i_state & I_NEW) {
+		err =3D ntfs_read_locked_index_inode(base_vi, vi);
+		unlock_new_inode(vi);
+	}
+	/*
+	 * There is no point in keeping bad index inodes around.  This also
+	 * simplifies things in that we never need to check for bad index
+	 * inodes elsewhere.
+	 */
+	if (unlikely(err)) {
+		iput(vi);
+		vi =3D ERR_PTR(err);
+	}
+	return vi;
+}
+
+struct inode *ntfs_alloc_big_inode(struct super_block *sb)
+{
+	struct ntfs_inode *ni;
+
+	ntfs_debug("Entering.");
+	ni =3D alloc_inode_sb(sb, ntfs_big_inode_cache, GFP_NOFS);
+	if (likely(ni !=3D NULL)) {
+		ni->state =3D 0;
+		ni->type =3D 0;
+		ni->mft_no =3D 0;
+		return VFS_I(ni);
+	}
+	ntfs_error(sb, "Allocation of NTFS big inode structure failed.");
+	return NULL;
+}
+
+void ntfs_free_big_inode(struct inode *inode)
+{
+	kmem_cache_free(ntfs_big_inode_cache, NTFS_I(inode));
+}
+
+static int ntfs_non_resident_dealloc_clusters(struct ntfs_inode *ni)
+{
+	struct super_block *sb =3D ni->vol->sb;
+	struct ntfs_attr_search_ctx *actx;
+	int err =3D 0;
+
+	actx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!actx)
+		return -ENOMEM;
+	WARN_ON(actx->mrec->link_count !=3D 0);
+
+	/**
+	 * ntfs_truncate_vfs cannot be called in evict() context due
+	 * to some limitations, which are the @ni vfs inode is marked
+	 * with I_FREEING, and etc.
+	 */
+	if (NInoRunlistDirty(ni)) {
+		err =3D ntfs_cluster_free_from_rl(ni->vol, ni->runlist.rl);
+		if (err)
+			ntfs_error(sb,
+					"Failed to free clusters. Leaving inconsistent metadata.\n");
+	}
+
+	while ((err =3D ntfs_attrs_walk(actx)) =3D=3D 0) {
+		if (actx->attr->non_resident &&
+				(!NInoRunlistDirty(ni) || actx->attr->type !=3D AT_DATA)) {
+			struct runlist_element *rl;
+			size_t new_rl_count;
+
+			rl =3D ntfs_mapping_pairs_decompress(ni->vol, actx->attr, NULL,
+					&new_rl_count);
+			if (IS_ERR(rl)) {
+				err =3D PTR_ERR(rl);
+				ntfs_error(sb,
+					   "Failed to decompress runlist. Leaving inconsistent metadata.\n");
+				continue;
+			}
+
+			err =3D ntfs_cluster_free_from_rl(ni->vol, rl);
+			if (err)
+				ntfs_error(sb,
+					   "Failed to free attribute clusters. Leaving inconsistent metadata.=
\n");
+			ntfs_free(rl);
+		}
+	}
+
+	ntfs_release_dirty_clusters(ni->vol, ni->i_dealloc_clusters);
+	ntfs_attr_put_search_ctx(actx);
+	return err;
+}
+
+int ntfs_drop_big_inode(struct inode *inode)
+{
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+
+	if (!inode_unhashed(inode) && inode->i_state & I_SYNC) {
+		if (ni->type =3D=3D AT_DATA || ni->type =3D=3D AT_INDEX_ALLOCATION) {
+			if (!inode->i_nlink) {
+				struct ntfs_inode *ni =3D NTFS_I(inode);
+
+				if (ni->data_size =3D=3D 0)
+					return 0;
+
+				/* To avoid evict_inode call simultaneously */
+				atomic_inc(&inode->i_count);
+				spin_unlock(&inode->i_lock);
+
+				truncate_setsize(VFS_I(ni), 0);
+				ntfs_truncate_vfs(VFS_I(ni), 0, 1);
+
+				sb_start_intwrite(inode->i_sb);
+				i_size_write(inode, 0);
+				ni->allocated_size =3D ni->initialized_size =3D ni->data_size =3D 0;
+
+				truncate_inode_pages_final(inode->i_mapping);
+				sb_end_intwrite(inode->i_sb);
+
+				spin_lock(&inode->i_lock);
+				atomic_dec(&inode->i_count);
+			}
+			return 0;
+		} else if (ni->type =3D=3D AT_INDEX_ROOT)
+			return 0;
+	}
+
+	return inode_generic_drop(inode);
+}
+
+static inline struct ntfs_inode *ntfs_alloc_extent_inode(void)
+{
+	struct ntfs_inode *ni;
+
+	ntfs_debug("Entering.");
+	ni =3D kmem_cache_alloc(ntfs_inode_cache, GFP_NOFS);
+	if (likely(ni !=3D NULL)) {
+		ni->state =3D 0;
+		return ni;
+	}
+	ntfs_error(NULL, "Allocation of NTFS inode structure failed.");
+	return NULL;
+}
+
+static void ntfs_destroy_extent_inode(struct ntfs_inode *ni)
+{
+	ntfs_debug("Entering.");
+
+	if (!atomic_dec_and_test(&ni->count))
+		WARN_ON(1);
+	if (ni->folio)
+		ntfs_unmap_folio(ni->folio, NULL);
+	kfree(ni->mrec);
+	kmem_cache_free(ntfs_inode_cache, ni);
+}
+
+static struct lock_class_key attr_inode_mrec_lock_class;
+static struct lock_class_key attr_list_inode_mrec_lock_class;
+
+/*
+ * The attribute runlist lock has separate locking rules from the
+ * normal runlist lock, so split the two lock-classes:
+ */
+static struct lock_class_key attr_list_rl_lock_class;
+
+/**
+ * __ntfs_init_inode - initialize ntfs specific part of an inode
+ * @sb:		super block of mounted volume
+ * @ni:		freshly allocated ntfs inode which to initialize
+ *
+ * Initialize an ntfs inode to defaults.
+ *
+ * NOTE: ni->mft_no, ni->state, ni->type, ni->name, and ni->name_len are l=
eft
+ * untouched. Make sure to initialize them elsewhere.
+ */
+void __ntfs_init_inode(struct super_block *sb, struct ntfs_inode *ni)
+{
+	ntfs_debug("Entering.");
+	rwlock_init(&ni->size_lock);
+	ni->initialized_size =3D ni->allocated_size =3D 0;
+	ni->seq_no =3D 0;
+	atomic_set(&ni->count, 1);
+	ni->vol =3D NTFS_SB(sb);
+	ntfs_init_runlist(&ni->runlist);
+	ni->lcn_seek_trunc =3D LCN_RL_NOT_MAPPED;
+	mutex_init(&ni->mrec_lock);
+	if (ni->type =3D=3D AT_ATTRIBUTE_LIST) {
+		lockdep_set_class(&ni->mrec_lock,
+				  &attr_list_inode_mrec_lock_class);
+		lockdep_set_class(&ni->runlist.lock,
+				  &attr_list_rl_lock_class);
+	} else if (NInoAttr(ni)) {
+		lockdep_set_class(&ni->mrec_lock,
+				  &attr_inode_mrec_lock_class);
+	}
+
+	ni->folio =3D NULL;
+	ni->folio_ofs =3D 0;
+	ni->mrec =3D NULL;
+	ni->attr_list_size =3D 0;
+	ni->attr_list =3D NULL;
+	ni->itype.index.block_size =3D 0;
+	ni->itype.index.vcn_size =3D 0;
+	ni->itype.index.collation_rule =3D 0;
+	ni->itype.index.block_size_bits =3D 0;
+	ni->itype.index.vcn_size_bits =3D 0;
+	mutex_init(&ni->extent_lock);
+	ni->nr_extents =3D 0;
+	ni->ext.base_ntfs_ino =3D NULL;
+	ni->flags =3D 0;
+	ni->mft_lcn[0] =3D LCN_RL_NOT_MAPPED;
+	ni->mft_lcn_count =3D 0;
+	ni->target =3D NULL;
+	ni->i_dealloc_clusters =3D 0;
+}
+
+/*
+ * Extent inodes get MFT-mapped in a nested way, while the base inode
+ * is still mapped. Teach this nesting to the lock validator by creating
+ * a separate class for nested inode's mrec_lock's:
+ */
+static struct lock_class_key extent_inode_mrec_lock_key;
+
+inline struct ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
+		unsigned long mft_no)
+{
+	struct ntfs_inode *ni =3D ntfs_alloc_extent_inode();
+
+	ntfs_debug("Entering.");
+	if (likely(ni !=3D NULL)) {
+		__ntfs_init_inode(sb, ni);
+		lockdep_set_class(&ni->mrec_lock, &extent_inode_mrec_lock_key);
+		ni->mft_no =3D mft_no;
+		ni->type =3D AT_UNUSED;
+		ni->name =3D NULL;
+		ni->name_len =3D 0;
+	}
+	return ni;
+}
+
+/**
+ * ntfs_is_extended_system_file - check if a file is in the $Extend direct=
ory
+ * @ctx:	initialized attribute search context
+ *
+ * Search all file name attributes in the inode described by the attribute
+ * search context @ctx and check if any of the names are in the $Extend sy=
stem
+ * directory.
+ *
+ * Return values:
+ *	   1: file is in $Extend directory
+ *	   0: file is not in $Extend directory
+ *    -errno: failed to determine if the file is in the $Extend directory
+ */
+static int ntfs_is_extended_system_file(struct ntfs_attr_search_ctx *ctx)
+{
+	int nr_links, err;
+
+	/* Restart search. */
+	ntfs_attr_reinit_search_ctx(ctx);
+
+	/* Get number of hard links. */
+	nr_links =3D le16_to_cpu(ctx->mrec->link_count);
+
+	/* Loop through all hard links. */
+	while (!(err =3D ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0, NULL, 0,
+			ctx))) {
+		struct file_name_attr *file_name_attr;
+		struct attr_record *attr =3D ctx->attr;
+		u8 *p, *p2;
+
+		nr_links--;
+		/*
+		 * Maximum sanity checking as we are called on an inode that
+		 * we suspect might be corrupt.
+		 */
+		p =3D (u8 *)attr + le32_to_cpu(attr->length);
+		if (p < (u8 *)ctx->mrec || (u8 *)p > (u8 *)ctx->mrec +
+				le32_to_cpu(ctx->mrec->bytes_in_use)) {
+err_corrupt_attr:
+			ntfs_error(ctx->ntfs_ino->vol->sb,
+					"Corrupt file name attribute. You should run chkdsk.");
+			return -EIO;
+		}
+		if (attr->non_resident) {
+			ntfs_error(ctx->ntfs_ino->vol->sb,
+					"Non-resident file name. You should run chkdsk.");
+			return -EIO;
+		}
+		if (attr->flags) {
+			ntfs_error(ctx->ntfs_ino->vol->sb,
+					"File name with invalid flags. You should run chkdsk.");
+			return -EIO;
+		}
+		if (!(attr->data.resident.flags & RESIDENT_ATTR_IS_INDEXED)) {
+			ntfs_error(ctx->ntfs_ino->vol->sb,
+					"Unindexed file name. You should run chkdsk.");
+			return -EIO;
+		}
+		file_name_attr =3D (struct file_name_attr *)((u8 *)attr +
+				le16_to_cpu(attr->data.resident.value_offset));
+		p2 =3D (u8 *)file_name_attr + le32_to_cpu(attr->data.resident.value_leng=
th);
+		if (p2 < (u8 *)attr || p2 > p)
+			goto err_corrupt_attr;
+		/* This attribute is ok, but is it in the $Extend directory? */
+		if (MREF_LE(file_name_attr->parent_directory) =3D=3D FILE_Extend) {
+			unsigned char *s;
+
+			s =3D ntfs_attr_name_get(ctx->ntfs_ino->vol,
+					file_name_attr->file_name,
+					file_name_attr->file_name_length);
+			if (!s)
+				return 1;
+			if (!strcmp("$Reparse", s)) {
+				ntfs_attr_name_free(&s);
+				return 2; /* it's reparse point file */
+			}
+			ntfs_attr_name_free(&s);
+			return 1;	/* YES, it's an extended system file. */
+		}
+	}
+	if (unlikely(err !=3D -ENOENT))
+		return err;
+	if (unlikely(nr_links)) {
+		ntfs_error(ctx->ntfs_ino->vol->sb,
+			"Inode hard link count doesn't match number of name attributes. You sho=
uld run chkdsk.");
+		return -EIO;
+	}
+	return 0;	/* NO, it is not an extended system file. */
+}
+
+static struct lock_class_key ntfs_dir_inval_lock_key;
+
+void ntfs_set_vfs_operations(struct inode *inode, mode_t mode, dev_t dev)
+{
+	if (S_ISDIR(mode)) {
+		if (!NInoAttr(NTFS_I(inode))) {
+			inode->i_op =3D &ntfs_dir_inode_ops;
+			inode->i_fop =3D &ntfs_dir_ops;
+		}
+		if (NInoMstProtected(NTFS_I(inode)))
+			inode->i_mapping->a_ops =3D &ntfs_mst_aops;
+		else
+			inode->i_mapping->a_ops =3D &ntfs_normal_aops;
+		lockdep_set_class(&inode->i_mapping->invalidate_lock,
+				  &ntfs_dir_inval_lock_key);
+	} else if (S_ISLNK(mode)) {
+		inode->i_op =3D &ntfs_symlink_inode_operations;
+		inode->i_mapping->a_ops =3D &ntfs_normal_aops;
+	} else if (S_ISCHR(mode) || S_ISBLK(mode) || S_ISFIFO(mode) || S_ISSOCK(m=
ode)) {
+		inode->i_op =3D &ntfsp_special_inode_operations;
+		init_special_inode(inode, inode->i_mode, dev);
+	} else {
+		if (!NInoAttr(NTFS_I(inode))) {
+			inode->i_op =3D &ntfs_file_inode_ops;
+			inode->i_fop =3D &ntfs_file_ops;
+		}
+		if (NInoMstProtected(NTFS_I(inode)))
+			inode->i_mapping->a_ops =3D &ntfs_mst_aops;
+		else if (NInoCompressed(NTFS_I(inode)))
+			inode->i_mapping->a_ops =3D &ntfs_compressed_aops;
+		else
+			inode->i_mapping->a_ops =3D &ntfs_normal_aops;
+	}
+}
+
+__le16 R[3] =3D { cpu_to_le16('$'), cpu_to_le16('R'), 0 };
+
+/**
+ * ntfs_read_locked_inode - read an inode from its device
+ * @vi:		inode to read
+ *
+ * ntfs_read_locked_inode() is called from ntfs_iget() to read the inode
+ * described by @vi into memory from the device.
+ *
+ * The only fields in @vi that we need to/can look at when the function is
+ * called are i_sb, pointing to the mounted device's super block, and i_in=
o,
+ * the number of the inode to load.
+ *
+ * ntfs_read_locked_inode() maps, pins and locks the mft record number i_i=
no
+ * for reading and sets up the necessary @vi fields as well as initializing
+ * the ntfs inode.
+ *
+ * Q: What locks are held when the function is called?
+ * A: i_state has I_NEW set, hence the inode is locked, also
+ *    i_count is set to 1, so it is not going to go away
+ *    i_flags is set to 0 and we have no business touching it.  Only an io=
ctl()
+ *    is allowed to write to them. We should of course be honouring them b=
ut
+ *    we need to do that using the IS_* macros defined in include/linux/fs=
.h.
+ *    In any case ntfs_read_locked_inode() has nothing to do with i_flags.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_read_locked_inode(struct inode *vi)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(vi->i_sb);
+	struct ntfs_inode *ni;
+	struct mft_record *m;
+	struct attr_record *a;
+	struct standard_information *si;
+	struct ntfs_attr_search_ctx *ctx;
+	int err =3D 0;
+	__le16 *name =3D I30;
+	unsigned int name_len =3D 4, flags =3D 0;
+	int extend_sys =3D 0;
+	dev_t dev =3D 0;
+	bool vol_err =3D true;
+
+	ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
+
+	if (uid_valid(vol->uid)) {
+		vi->i_uid =3D vol->uid;
+		flags |=3D NTFS_VOL_UID;
+	} else
+		vi->i_uid =3D GLOBAL_ROOT_UID;
+
+	if (gid_valid(vol->gid)) {
+		vi->i_gid =3D vol->gid;
+		flags |=3D NTFS_VOL_GID;
+	} else
+		vi->i_gid =3D GLOBAL_ROOT_GID;
+
+	vi->i_mode =3D 0777;
+
+	/*
+	 * Initialize the ntfs specific part of @vi special casing
+	 * FILE_MFT which we need to do at mount time.
+	 */
+	if (vi->i_ino !=3D FILE_MFT)
+		ntfs_init_big_inode(vi);
+	ni =3D NTFS_I(vi);
+
+	m =3D map_mft_record(ni);
+	if (IS_ERR(m)) {
+		err =3D PTR_ERR(m);
+		goto err_out;
+	}
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, m);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto unm_err_out;
+	}
+
+	if (!(m->flags & MFT_RECORD_IN_USE)) {
+		err =3D -ENOENT;
+		vol_err =3D false;
+		goto unm_err_out;
+	}
+
+	if (m->base_mft_record) {
+		ntfs_error(vi->i_sb, "Inode is an extent inode!");
+		goto unm_err_out;
+	}
+
+	/* Transfer information from mft record into vfs and ntfs inodes. */
+	vi->i_generation =3D ni->seq_no =3D le16_to_cpu(m->sequence_number);
+
+	if (le16_to_cpu(m->link_count) < 1) {
+		ntfs_error(vi->i_sb, "Inode link count is 0!");
+		goto unm_err_out;
+	}
+	set_nlink(vi, le16_to_cpu(m->link_count));
+
+	/* If read-only, no one gets write permissions. */
+	if (IS_RDONLY(vi))
+		vi->i_mode &=3D ~0222;
+
+	/*
+	 * Find the standard information attribute in the mft record. At this
+	 * stage we haven't setup the attribute list stuff yet, so this could
+	 * in fact fail if the standard information is in an extent record, but
+	 * I don't think this actually ever happens.
+	 */
+	ntfs_attr_reinit_search_ctx(ctx);
+	err =3D ntfs_attr_lookup(AT_STANDARD_INFORMATION, NULL, 0, 0, 0, NULL, 0,
+			ctx);
+	if (unlikely(err)) {
+		if (err =3D=3D -ENOENT)
+			ntfs_error(vi->i_sb, "$STANDARD_INFORMATION attribute is missing.");
+		goto unm_err_out;
+	}
+	a =3D ctx->attr;
+	/* Get the standard information attribute value. */
+	if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset)
+			+ le32_to_cpu(a->data.resident.value_length) >
+			(u8 *)ctx->mrec + vol->mft_record_size) {
+		ntfs_error(vi->i_sb, "Corrupt standard information attribute in inode.");
+		goto unm_err_out;
+	}
+	si =3D (struct standard_information *)((u8 *)a +
+			le16_to_cpu(a->data.resident.value_offset));
+
+	/* Transfer information from the standard information into vi. */
+	/*
+	 * Note: The i_?times do not quite map perfectly onto the NTFS times,
+	 * but they are close enough, and in the end it doesn't really matter
+	 * that much...
+	 */
+	/*
+	 * mtime is the last change of the data within the file. Not changed
+	 * when only metadata is changed, e.g. a rename doesn't affect mtime.
+	 */
+	ni->i_crtime =3D ntfs2utc(si->creation_time);
+
+	inode_set_mtime_to_ts(vi, ntfs2utc(si->last_data_change_time));
+	/*
+	 * ctime is the last change of the metadata of the file. This obviously
+	 * always changes, when mtime is changed. ctime can be changed on its
+	 * own, mtime is then not changed, e.g. when a file is renamed.
+	 */
+	inode_set_ctime_to_ts(vi, ntfs2utc(si->last_mft_change_time));
+	/*
+	 * Last access to the data within the file. Not changed during a rename
+	 * for example but changed whenever the file is written to.
+	 */
+	inode_set_atime_to_ts(vi, ntfs2utc(si->last_access_time));
+	ni->flags =3D si->file_attributes;
+
+	/* Find the attribute list attribute if present. */
+	ntfs_attr_reinit_search_ctx(ctx);
+	err =3D ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0, 0, 0, NULL, 0, ctx);
+	if (err) {
+		if (unlikely(err !=3D -ENOENT)) {
+			ntfs_error(vi->i_sb, "Failed to lookup attribute list attribute.");
+			goto unm_err_out;
+		}
+	} else {
+		if (vi->i_ino =3D=3D FILE_MFT)
+			goto skip_attr_list_load;
+		ntfs_debug("Attribute list found in inode 0x%lx.", vi->i_ino);
+		NInoSetAttrList(ni);
+		a =3D ctx->attr;
+		if (a->flags & ATTR_COMPRESSION_MASK) {
+			ntfs_error(vi->i_sb,
+				"Attribute list attribute is compressed.");
+			goto unm_err_out;
+		}
+		if (a->flags & ATTR_IS_ENCRYPTED ||
+				a->flags & ATTR_IS_SPARSE) {
+			if (a->non_resident) {
+				ntfs_error(vi->i_sb,
+					"Non-resident attribute list attribute is encrypted/sparse.");
+				goto unm_err_out;
+			}
+			ntfs_warning(vi->i_sb,
+				"Resident attribute list attribute in inode 0x%lx is marked encrypted/=
sparse which is not true.  However, Windows allows this and chkdsk does not=
 detect or correct it so we will just ignore the invalid flags and pretend =
they are not set.",
+				vi->i_ino);
+		}
+		/* Now allocate memory for the attribute list. */
+		ni->attr_list_size =3D (u32)ntfs_attr_size(a);
+		if (!ni->attr_list_size) {
+			ntfs_error(vi->i_sb, "Attr_list_size is zero");
+			goto unm_err_out;
+		}
+		ni->attr_list =3D ntfs_malloc_nofs(ni->attr_list_size);
+		if (!ni->attr_list) {
+			ntfs_error(vi->i_sb,
+				"Not enough memory to allocate buffer for attribute list.");
+			err =3D -ENOMEM;
+			goto unm_err_out;
+		}
+		if (a->non_resident) {
+			NInoSetAttrListNonResident(ni);
+			if (a->data.non_resident.lowest_vcn) {
+				ntfs_error(vi->i_sb, "Attribute list has non zero lowest_vcn.");
+				goto unm_err_out;
+			}
+
+			/* Now load the attribute list. */
+			err =3D load_attribute_list(ni, ni->attr_list, ni->attr_list_size);
+			if (err) {
+				ntfs_error(vi->i_sb, "Failed to load attribute list attribute.");
+				goto unm_err_out;
+			}
+		} else /* if (!a->non_resident) */ {
+			if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset)
+					+ le32_to_cpu(
+					a->data.resident.value_length) >
+					(u8 *)ctx->mrec + vol->mft_record_size) {
+				ntfs_error(vi->i_sb, "Corrupt attribute list in inode.");
+				goto unm_err_out;
+			}
+			/* Now copy the attribute list. */
+			memcpy(ni->attr_list, (u8 *)a + le16_to_cpu(
+					a->data.resident.value_offset),
+					le32_to_cpu(
+					a->data.resident.value_length));
+		}
+	}
+skip_attr_list_load:
+	err =3D ntfs_attr_lookup(AT_EA_INFORMATION, NULL, 0, 0, 0, NULL, 0, ctx);
+	if (!err)
+		NInoSetHasEA(ni);
+
+	ntfs_ea_get_wsl_inode(vi, &dev, flags);
+
+	if (m->flags & MFT_RECORD_IS_DIRECTORY) {
+		vi->i_mode |=3D S_IFDIR;
+		/*
+		 * Apply the directory permissions mask set in the mount
+		 * options.
+		 */
+		vi->i_mode &=3D ~vol->dmask;
+		/* Things break without this kludge! */
+		if (vi->i_nlink > 1)
+			set_nlink(vi, 1);
+	} else {
+		if (ni->flags & FILE_ATTR_REPARSE_POINT) {
+			unsigned int mode;
+
+			mode =3D ntfs_make_symlink(ni);
+			if (mode)
+				vi->i_mode |=3D mode;
+			else {
+				vi->i_mode &=3D ~S_IFLNK;
+				vi->i_mode |=3D S_IFREG;
+			}
+		} else
+			vi->i_mode |=3D S_IFREG;
+		/* Apply the file permissions mask set in the mount options. */
+		vi->i_mode &=3D ~vol->fmask;
+	}
+
+	/*
+	 * If an attribute list is present we now have the attribute list value
+	 * in ntfs_ino->attr_list and it is ntfs_ino->attr_list_size bytes.
+	 */
+	if (S_ISDIR(vi->i_mode)) {
+		struct index_root *ir;
+		u8 *ir_end, *index_end;
+
+view_index_meta:
+		/* It is a directory, find index root attribute. */
+		ntfs_attr_reinit_search_ctx(ctx);
+		err =3D ntfs_attr_lookup(AT_INDEX_ROOT, name, name_len, CASE_SENSITIVE,
+				0, NULL, 0, ctx);
+		if (unlikely(err)) {
+			if (err =3D=3D -ENOENT)
+				ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is missing.");
+			goto unm_err_out;
+		}
+		a =3D ctx->attr;
+		/* Set up the state. */
+		if (unlikely(a->non_resident)) {
+			ntfs_error(vol->sb,
+				"$INDEX_ROOT attribute is not resident.");
+			goto unm_err_out;
+		}
+		/* Ensure the attribute name is placed before the value. */
+		if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=3D
+				le16_to_cpu(a->data.resident.value_offset)))) {
+			ntfs_error(vol->sb,
+				"$INDEX_ROOT attribute name is placed after the attribute value.");
+			goto unm_err_out;
+		}
+		/*
+		 * Compressed/encrypted index root just means that the newly
+		 * created files in that directory should be created compressed/
+		 * encrypted. However index root cannot be both compressed and
+		 * encrypted.
+		 */
+		if (a->flags & ATTR_COMPRESSION_MASK) {
+			NInoSetCompressed(ni);
+			ni->flags |=3D FILE_ATTR_COMPRESSED;
+		}
+		if (a->flags & ATTR_IS_ENCRYPTED) {
+			if (a->flags & ATTR_COMPRESSION_MASK) {
+				ntfs_error(vi->i_sb, "Found encrypted and compressed attribute.");
+				goto unm_err_out;
+			}
+			NInoSetEncrypted(ni);
+			ni->flags |=3D FILE_ATTR_ENCRYPTED;
+		}
+		if (a->flags & ATTR_IS_SPARSE) {
+			NInoSetSparse(ni);
+			ni->flags |=3D FILE_ATTR_SPARSE_FILE;
+		}
+		ir =3D (struct index_root *)((u8 *)a +
+				le16_to_cpu(a->data.resident.value_offset));
+		ir_end =3D (u8 *)ir + le32_to_cpu(a->data.resident.value_length);
+		if (ir_end > (u8 *)ctx->mrec + vol->mft_record_size) {
+			ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is corrupt.");
+			goto unm_err_out;
+		}
+		index_end =3D (u8 *)&ir->index +
+				le32_to_cpu(ir->index.index_length);
+		if (index_end > ir_end) {
+			ntfs_error(vi->i_sb, "Directory index is corrupt.");
+			goto unm_err_out;
+		}
+
+		if (extend_sys) {
+			if (ir->type) {
+				ntfs_error(vi->i_sb, "Indexed attribute is not zero.");
+				goto unm_err_out;
+			}
+		} else {
+			if (ir->type !=3D AT_FILE_NAME) {
+				ntfs_error(vi->i_sb, "Indexed attribute is not $FILE_NAME.");
+				goto unm_err_out;
+			}
+
+			if (ir->collation_rule !=3D COLLATION_FILE_NAME) {
+				ntfs_error(vi->i_sb,
+					"Index collation rule is not COLLATION_FILE_NAME.");
+				goto unm_err_out;
+			}
+		}
+
+		ni->itype.index.collation_rule =3D ir->collation_rule;
+		ni->itype.index.block_size =3D le32_to_cpu(ir->index_block_size);
+		if (ni->itype.index.block_size &
+				(ni->itype.index.block_size - 1)) {
+			ntfs_error(vi->i_sb, "Index block size (%u) is not a power of two.",
+					ni->itype.index.block_size);
+			goto unm_err_out;
+		}
+		if (ni->itype.index.block_size > PAGE_SIZE) {
+			ntfs_error(vi->i_sb,
+				"Index block size (%u) > PAGE_SIZE (%ld) is not supported.",
+				ni->itype.index.block_size,
+				PAGE_SIZE);
+			err =3D -EOPNOTSUPP;
+			goto unm_err_out;
+		}
+		if (ni->itype.index.block_size < NTFS_BLOCK_SIZE) {
+			ntfs_error(vi->i_sb,
+				"Index block size (%u) < NTFS_BLOCK_SIZE (%i) is not supported.",
+				ni->itype.index.block_size,
+				NTFS_BLOCK_SIZE);
+			err =3D -EOPNOTSUPP;
+			goto unm_err_out;
+		}
+		ni->itype.index.block_size_bits =3D
+				ffs(ni->itype.index.block_size) - 1;
+		/* Determine the size of a vcn in the directory index. */
+		if (vol->cluster_size <=3D ni->itype.index.block_size) {
+			ni->itype.index.vcn_size =3D vol->cluster_size;
+			ni->itype.index.vcn_size_bits =3D vol->cluster_size_bits;
+		} else {
+			ni->itype.index.vcn_size =3D vol->sector_size;
+			ni->itype.index.vcn_size_bits =3D vol->sector_size_bits;
+		}
+
+		/* Setup the index allocation attribute, even if not present. */
+		ni->type =3D AT_INDEX_ROOT;
+		ni->name =3D name;
+		ni->name_len =3D name_len;
+		vi->i_size =3D ni->initialized_size =3D ni->data_size =3D
+			le32_to_cpu(a->data.resident.value_length);
+		ni->allocated_size =3D (ni->data_size + 7) & ~7;
+		/* We are done with the mft record, so we release it. */
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(ni);
+		m =3D NULL;
+		ctx =3D NULL;
+		/* Setup the operations for this inode. */
+		ntfs_set_vfs_operations(vi, S_IFDIR, 0);
+		if (ir->index.flags & LARGE_INDEX)
+			NInoSetIndexAllocPresent(ni);
+	} else {
+		/* It is a file. */
+		ntfs_attr_reinit_search_ctx(ctx);
+
+		/* Setup the data attribute, even if not present. */
+		ni->type =3D AT_DATA;
+		ni->name =3D AT_UNNAMED;
+		ni->name_len =3D 0;
+
+		/* Find first extent of the unnamed data attribute. */
+		err =3D ntfs_attr_lookup(AT_DATA, NULL, 0, 0, 0, NULL, 0, ctx);
+		if (unlikely(err)) {
+			vi->i_size =3D ni->initialized_size =3D
+					ni->allocated_size =3D 0;
+			if (err !=3D -ENOENT) {
+				ntfs_error(vi->i_sb, "Failed to lookup $DATA attribute.");
+				goto unm_err_out;
+			}
+			/*
+			 * FILE_Secure does not have an unnamed $DATA
+			 * attribute, so we special case it here.
+			 */
+			if (vi->i_ino =3D=3D FILE_Secure)
+				goto no_data_attr_special_case;
+			/*
+			 * Most if not all the system files in the $Extend
+			 * system directory do not have unnamed data
+			 * attributes so we need to check if the parent
+			 * directory of the file is FILE_Extend and if it is
+			 * ignore this error. To do this we need to get the
+			 * name of this inode from the mft record as the name
+			 * contains the back reference to the parent directory.
+			 */
+			extend_sys =3D ntfs_is_extended_system_file(ctx);
+			if (extend_sys > 0) {
+				if (m->flags & MFT_RECORD_IS_VIEW_INDEX &&
+				    extend_sys =3D=3D 2) {
+					name =3D R;
+					name_len =3D 2;
+					goto view_index_meta;
+				}
+				goto no_data_attr_special_case;
+			}
+
+			err =3D extend_sys;
+			ntfs_error(vi->i_sb, "$DATA attribute is missing, err : %d", err);
+			goto unm_err_out;
+		}
+		a =3D ctx->attr;
+		/* Setup the state. */
+		if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_SPARSE)) {
+			if (a->flags & ATTR_COMPRESSION_MASK) {
+				NInoSetCompressed(ni);
+				ni->flags |=3D FILE_ATTR_COMPRESSED;
+				if (vol->cluster_size > 4096) {
+					ntfs_error(vi->i_sb,
+						"Found compressed data but compression is disabled due to cluster si=
ze (%i) > 4kiB.",
+						vol->cluster_size);
+					goto unm_err_out;
+				}
+				if ((a->flags & ATTR_COMPRESSION_MASK)
+						!=3D ATTR_IS_COMPRESSED) {
+					ntfs_error(vi->i_sb,
+						"Found unknown compression method or corrupt file.");
+					goto unm_err_out;
+				}
+			}
+			if (a->flags & ATTR_IS_SPARSE) {
+				NInoSetSparse(ni);
+				ni->flags |=3D FILE_ATTR_SPARSE_FILE;
+			}
+		}
+		if (a->flags & ATTR_IS_ENCRYPTED) {
+			if (NInoCompressed(ni)) {
+				ntfs_error(vi->i_sb, "Found encrypted and compressed data.");
+				goto unm_err_out;
+			}
+			NInoSetEncrypted(ni);
+			ni->flags |=3D FILE_ATTR_ENCRYPTED;
+		}
+		if (a->non_resident) {
+			NInoSetNonResident(ni);
+			if (NInoCompressed(ni) || NInoSparse(ni)) {
+				if (NInoCompressed(ni) &&
+				    a->data.non_resident.compression_unit !=3D 4) {
+					ntfs_error(vi->i_sb,
+						"Found non-standard compression unit (%u instead of 4).  Cannot hand=
le this.",
+						a->data.non_resident.compression_unit);
+					err =3D -EOPNOTSUPP;
+					goto unm_err_out;
+				}
+
+				if (NInoSparse(ni) &&
+				    a->data.non_resident.compression_unit &&
+				    a->data.non_resident.compression_unit !=3D
+				     vol->sparse_compression_unit) {
+					ntfs_error(vi->i_sb,
+						   "Found non-standard compression unit (%u instead of 0 or %d).  Ca=
nnot handle this.",
+						   a->data.non_resident.compression_unit,
+						   vol->sparse_compression_unit);
+					err =3D -EOPNOTSUPP;
+					goto unm_err_out;
+				}
+
+
+				if (a->data.non_resident.compression_unit) {
+					ni->itype.compressed.block_size =3D 1U <<
+							(a->data.non_resident.compression_unit +
+							vol->cluster_size_bits);
+					ni->itype.compressed.block_size_bits =3D
+							ffs(ni->itype.compressed.block_size) - 1;
+					ni->itype.compressed.block_clusters =3D
+							1U << a->data.non_resident.compression_unit;
+				} else {
+					ni->itype.compressed.block_size =3D 0;
+					ni->itype.compressed.block_size_bits =3D
+							0;
+					ni->itype.compressed.block_clusters =3D
+							0;
+				}
+				ni->itype.compressed.size =3D le64_to_cpu(
+						a->data.non_resident.compressed_size);
+			}
+			if (a->data.non_resident.lowest_vcn) {
+				ntfs_error(vi->i_sb,
+					"First extent of $DATA attribute has non zero lowest_vcn.");
+				goto unm_err_out;
+			}
+			vi->i_size =3D ni->data_size =3D le64_to_cpu(a->data.non_resident.data_=
size);
+			ni->initialized_size =3D le64_to_cpu(a->data.non_resident.initialized_s=
ize);
+			ni->allocated_size =3D le64_to_cpu(a->data.non_resident.allocated_size);
+		} else { /* Resident attribute. */
+			vi->i_size =3D ni->data_size =3D ni->initialized_size =3D le32_to_cpu(
+					a->data.resident.value_length);
+			ni->allocated_size =3D le32_to_cpu(a->length) -
+					le16_to_cpu(
+					a->data.resident.value_offset);
+			if (vi->i_size > ni->allocated_size) {
+				ntfs_error(vi->i_sb,
+					"Resident data attribute is corrupt (size exceeds allocation).");
+				goto unm_err_out;
+			}
+		}
+no_data_attr_special_case:
+		/* We are done with the mft record, so we release it. */
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(ni);
+		m =3D NULL;
+		ctx =3D NULL;
+		/* Setup the operations for this inode. */
+		ntfs_set_vfs_operations(vi, vi->i_mode, dev);
+	}
+
+	if (NVolSysImmutable(vol) && (ni->flags & FILE_ATTR_SYSTEM) &&
+	    !S_ISFIFO(vi->i_mode) && !S_ISSOCK(vi->i_mode) && !S_ISLNK(vi->i_mode=
))
+		vi->i_flags |=3D S_IMMUTABLE;
+
+	/*
+	 * The number of 512-byte blocks used on disk (for stat). This is in so
+	 * far inaccurate as it doesn't account for any named streams or other
+	 * special non-resident attributes, but that is how Windows works, too,
+	 * so we are at least consistent with Windows, if not entirely
+	 * consistent with the Linux Way. Doing it the Linux Way would cause a
+	 * significant slowdown as it would involve iterating over all
+	 * attributes in the mft record and adding the allocated/compressed
+	 * sizes of all non-resident attributes present to give us the Linux
+	 * correct size that should go into i_blocks (after division by 512).
+	 */
+	if (S_ISREG(vi->i_mode) && (NInoCompressed(ni) || NInoSparse(ni)))
+		vi->i_blocks =3D ni->itype.compressed.size >> 9;
+	else
+		vi->i_blocks =3D ni->allocated_size >> 9;
+
+	ntfs_debug("Done.");
+	return 0;
+unm_err_out:
+	if (!err)
+		err =3D -EIO;
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	if (m)
+		unmap_mft_record(ni);
+err_out:
+	if (err !=3D -EOPNOTSUPP && err !=3D -ENOMEM && vol_err =3D=3D true) {
+		ntfs_error(vol->sb,
+			"Failed with error code %i.  Marking corrupt inode 0x%lx as bad.  Run c=
hkdsk.",
+			err, vi->i_ino);
+		NVolSetErrors(vol);
+	}
+	return err;
+}
+
+/**
+ * ntfs_read_locked_attr_inode - read an attribute inode from its base ino=
de
+ * @base_vi:	base inode
+ * @vi:		attribute inode to read
+ *
+ * ntfs_read_locked_attr_inode() is called from ntfs_attr_iget() to read t=
he
+ * attribute inode described by @vi into memory from the base mft record
+ * described by @base_ni.
+ *
+ * ntfs_read_locked_attr_inode() maps, pins and locks the base inode for
+ * reading and looks up the attribute described by @vi before setting up t=
he
+ * necessary fields in @vi as well as initializing the ntfs inode.
+ *
+ * Q: What locks are held when the function is called?
+ * A: i_state has I_NEW set, hence the inode is locked, also
+ *    i_count is set to 1, so it is not going to go away
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Note this cannot be called for AT_INDEX_ALLOCATION.
+ */
+static int ntfs_read_locked_attr_inode(struct inode *base_vi, struct inode=
 *vi)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(vi->i_sb);
+	struct ntfs_inode *ni =3D NTFS_I(vi), *base_ni =3D NTFS_I(base_vi);
+	struct mft_record *m;
+	struct attr_record *a;
+	struct ntfs_attr_search_ctx *ctx;
+	int err =3D 0;
+
+	ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
+
+	ntfs_init_big_inode(vi);
+
+	/* Just mirror the values from the base inode. */
+	vi->i_uid	=3D base_vi->i_uid;
+	vi->i_gid	=3D base_vi->i_gid;
+	set_nlink(vi, base_vi->i_nlink);
+	inode_set_mtime_to_ts(vi, inode_get_mtime(base_vi));
+	inode_set_ctime_to_ts(vi, inode_get_ctime(base_vi));
+	inode_set_atime_to_ts(vi, inode_get_atime(base_vi));
+	vi->i_generation =3D ni->seq_no =3D base_ni->seq_no;
+
+	/* Set inode type to zero but preserve permissions. */
+	vi->i_mode	=3D base_vi->i_mode & ~S_IFMT;
+
+	m =3D map_mft_record(base_ni);
+	if (IS_ERR(m)) {
+		err =3D PTR_ERR(m);
+		goto err_out;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, m);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto unm_err_out;
+	}
+	/* Find the attribute. */
+	err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err))
+		goto unm_err_out;
+	a =3D ctx->attr;
+	if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_SPARSE)) {
+		if (a->flags & ATTR_COMPRESSION_MASK) {
+			NInoSetCompressed(ni);
+			ni->flags |=3D FILE_ATTR_COMPRESSED;
+			if ((ni->type !=3D AT_DATA) || (ni->type =3D=3D AT_DATA &&
+					ni->name_len)) {
+				ntfs_error(vi->i_sb,
+					   "Found compressed non-data or named data attribute.");
+				goto unm_err_out;
+			}
+			if (vol->cluster_size > 4096) {
+				ntfs_error(vi->i_sb,
+					"Found compressed attribute but compression is disabled due to cluste=
r size (%i) > 4kiB.",
+					vol->cluster_size);
+				goto unm_err_out;
+			}
+			if ((a->flags & ATTR_COMPRESSION_MASK) !=3D
+					ATTR_IS_COMPRESSED) {
+				ntfs_error(vi->i_sb, "Found unknown compression method.");
+				goto unm_err_out;
+			}
+		}
+		/*
+		 * The compressed/sparse flag set in an index root just means
+		 * to compress all files.
+		 */
+		if (NInoMstProtected(ni) && ni->type !=3D AT_INDEX_ROOT) {
+			ntfs_error(vi->i_sb,
+				"Found mst protected attribute but the attribute is %s.",
+				NInoCompressed(ni) ? "compressed" : "sparse");
+			goto unm_err_out;
+		}
+		if (a->flags & ATTR_IS_SPARSE) {
+			NInoSetSparse(ni);
+			ni->flags |=3D FILE_ATTR_SPARSE_FILE;
+		}
+	}
+	if (a->flags & ATTR_IS_ENCRYPTED) {
+		if (NInoCompressed(ni)) {
+			ntfs_error(vi->i_sb, "Found encrypted and compressed data.");
+			goto unm_err_out;
+		}
+		/*
+		 * The encryption flag set in an index root just means to
+		 * encrypt all files.
+		 */
+		if (NInoMstProtected(ni) && ni->type !=3D AT_INDEX_ROOT) {
+			ntfs_error(vi->i_sb,
+				"Found mst protected attribute but the attribute is encrypted.");
+			goto unm_err_out;
+		}
+		if (ni->type !=3D AT_DATA) {
+			ntfs_error(vi->i_sb,
+				"Found encrypted non-data attribute.");
+			goto unm_err_out;
+		}
+		NInoSetEncrypted(ni);
+		ni->flags |=3D FILE_ATTR_ENCRYPTED;
+	}
+	if (!a->non_resident) {
+		/* Ensure the attribute name is placed before the value. */
+		if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=3D
+				le16_to_cpu(a->data.resident.value_offset)))) {
+			ntfs_error(vol->sb,
+				"Attribute name is placed after the attribute value.");
+			goto unm_err_out;
+		}
+		if (NInoMstProtected(ni)) {
+			ntfs_error(vi->i_sb,
+				"Found mst protected attribute but the attribute is resident.");
+			goto unm_err_out;
+		}
+		vi->i_size =3D ni->initialized_size =3D ni->data_size =3D le32_to_cpu(
+				a->data.resident.value_length);
+		ni->allocated_size =3D le32_to_cpu(a->length) -
+				le16_to_cpu(a->data.resident.value_offset);
+		if (vi->i_size > ni->allocated_size) {
+			ntfs_error(vi->i_sb,
+				"Resident attribute is corrupt (size exceeds allocation).");
+			goto unm_err_out;
+		}
+	} else {
+		NInoSetNonResident(ni);
+		/*
+		 * Ensure the attribute name is placed before the mapping pairs
+		 * array.
+		 */
+		if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=3D
+				le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset)))) {
+			ntfs_error(vol->sb,
+				"Attribute name is placed after the mapping pairs array.");
+			goto unm_err_out;
+		}
+		if (NInoCompressed(ni) || NInoSparse(ni)) {
+			if (NInoCompressed(ni) && a->data.non_resident.compression_unit !=3D 4)=
 {
+				ntfs_error(vi->i_sb,
+					"Found non-standard compression unit (%u instead of 4).  Cannot handl=
e this.",
+					a->data.non_resident.compression_unit);
+				err =3D -EOPNOTSUPP;
+				goto unm_err_out;
+			}
+			if (a->data.non_resident.compression_unit) {
+				ni->itype.compressed.block_size =3D 1U <<
+						(a->data.non_resident.compression_unit +
+						vol->cluster_size_bits);
+				ni->itype.compressed.block_size_bits =3D
+						ffs(ni->itype.compressed.block_size) - 1;
+				ni->itype.compressed.block_clusters =3D 1U <<
+						a->data.non_resident.compression_unit;
+			} else {
+				ni->itype.compressed.block_size =3D 0;
+				ni->itype.compressed.block_size_bits =3D 0;
+				ni->itype.compressed.block_clusters =3D 0;
+			}
+			ni->itype.compressed.size =3D le64_to_cpu(
+					a->data.non_resident.compressed_size);
+		}
+		if (a->data.non_resident.lowest_vcn) {
+			ntfs_error(vi->i_sb, "First extent of attribute has non-zero lowest_vcn=
.");
+			goto unm_err_out;
+		}
+		vi->i_size =3D ni->data_size =3D le64_to_cpu(a->data.non_resident.data_s=
ize);
+		ni->initialized_size =3D le64_to_cpu(a->data.non_resident.initialized_si=
ze);
+		ni->allocated_size =3D le64_to_cpu(a->data.non_resident.allocated_size);
+	}
+	vi->i_mapping->a_ops =3D &ntfs_normal_aops;
+	if (NInoMstProtected(ni))
+		vi->i_mapping->a_ops =3D &ntfs_mst_aops;
+	else if (NInoCompressed(ni))
+		vi->i_mapping->a_ops =3D &ntfs_compressed_aops;
+	if ((NInoCompressed(ni) || NInoSparse(ni)) && ni->type !=3D AT_INDEX_ROOT)
+		vi->i_blocks =3D ni->itype.compressed.size >> 9;
+	else
+		vi->i_blocks =3D ni->allocated_size >> 9;
+	/*
+	 * Make sure the base inode does not go away and attach it to the
+	 * attribute inode.
+	 */
+	if (!igrab(base_vi)) {
+		err =3D -ENOENT;
+		goto unm_err_out;
+	}
+	ni->ext.base_ntfs_ino =3D base_ni;
+	ni->nr_extents =3D -1;
+
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(base_ni);
+
+	ntfs_debug("Done.");
+	return 0;
+
+unm_err_out:
+	if (!err)
+		err =3D -EIO;
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(base_ni);
+err_out:
+	if (err !=3D -ENOENT)
+		ntfs_error(vol->sb,
+			"Failed with error code %i while reading attribute inode (mft_no 0x%lx,=
 type 0x%x, name_len %i).  Marking corrupt inode and base inode 0x%lx as ba=
d.  Run chkdsk.",
+			err, vi->i_ino, ni->type, ni->name_len,
+			base_vi->i_ino);
+	if (err !=3D -ENOENT && err !=3D -ENOMEM)
+		NVolSetErrors(vol);
+	return err;
+}
+
+/**
+ * ntfs_read_locked_index_inode - read an index inode from its base inode
+ * @base_vi:	base inode
+ * @vi:		index inode to read
+ *
+ * ntfs_read_locked_index_inode() is called from ntfs_index_iget() to read=
 the
+ * index inode described by @vi into memory from the base mft record descr=
ibed
+ * by @base_ni.
+ *
+ * ntfs_read_locked_index_inode() maps, pins and locks the base inode for
+ * reading and looks up the attributes relating to the index described by =
@vi
+ * before setting up the necessary fields in @vi as well as initializing t=
he
+ * ntfs inode.
+ *
+ * Note, index inodes are essentially attribute inodes (NInoAttr() is true)
+ * with the attribute type set to AT_INDEX_ALLOCATION.  Apart from that, t=
hey
+ * are setup like directory inodes since directories are a special case of
+ * indices ao they need to be treated in much the same way.  Most importan=
tly,
+ * for small indices the index allocation attribute might not actually exi=
st.
+ * However, the index root attribute always exists but this does not need =
to
+ * have an inode associated with it and this is why we define a new inode =
type
+ * index.  Also, like for directories, we need to have an attribute inode =
for
+ * the bitmap attribute corresponding to the index allocation attribute an=
d we
+ * can store this in the appropriate field of the inode, just like we do f=
or
+ * normal directory inodes.
+ *
+ * Q: What locks are held when the function is called?
+ * A: i_state has I_NEW set, hence the inode is locked, also
+ *    i_count is set to 1, so it is not going to go away
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_read_locked_index_inode(struct inode *base_vi, struct inod=
e *vi)
+{
+	loff_t bvi_size;
+	struct ntfs_volume *vol =3D NTFS_SB(vi->i_sb);
+	struct ntfs_inode *ni =3D NTFS_I(vi), *base_ni =3D NTFS_I(base_vi), *bni;
+	struct inode *bvi;
+	struct mft_record *m;
+	struct attr_record *a;
+	struct ntfs_attr_search_ctx *ctx;
+	struct index_root *ir;
+	u8 *ir_end, *index_end;
+	int err =3D 0;
+
+	ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
+	lockdep_assert_held(&base_ni->mrec_lock);
+
+	ntfs_init_big_inode(vi);
+	/* Just mirror the values from the base inode. */
+	vi->i_uid	=3D base_vi->i_uid;
+	vi->i_gid	=3D base_vi->i_gid;
+	set_nlink(vi, base_vi->i_nlink);
+	inode_set_mtime_to_ts(vi, inode_get_mtime(base_vi));
+	inode_set_ctime_to_ts(vi, inode_get_ctime(base_vi));
+	inode_set_atime_to_ts(vi, inode_get_atime(base_vi));
+	vi->i_generation =3D ni->seq_no =3D base_ni->seq_no;
+	/* Set inode type to zero but preserve permissions. */
+	vi->i_mode	=3D base_vi->i_mode & ~S_IFMT;
+	/* Map the mft record for the base inode. */
+	m =3D map_mft_record(base_ni);
+	if (IS_ERR(m)) {
+		err =3D PTR_ERR(m);
+		goto err_out;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, m);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto unm_err_out;
+	}
+	/* Find the index root attribute. */
+	err =3D ntfs_attr_lookup(AT_INDEX_ROOT, ni->name, ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err)) {
+		if (err =3D=3D -ENOENT)
+			ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is missing.");
+		goto unm_err_out;
+	}
+	a =3D ctx->attr;
+	/* Set up the state. */
+	if (unlikely(a->non_resident)) {
+		ntfs_error(vol->sb, "$INDEX_ROOT attribute is not resident.");
+		goto unm_err_out;
+	}
+	/* Ensure the attribute name is placed before the value. */
+	if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=3D
+			le16_to_cpu(a->data.resident.value_offset)))) {
+		ntfs_error(vol->sb,
+			"$INDEX_ROOT attribute name is placed after the attribute value.");
+		goto unm_err_out;
+	}
+
+	ir =3D (struct index_root *)((u8 *)a + le16_to_cpu(a->data.resident.value=
_offset));
+	ir_end =3D (u8 *)ir + le32_to_cpu(a->data.resident.value_length);
+	if (ir_end > (u8 *)ctx->mrec + vol->mft_record_size) {
+		ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is corrupt.");
+		goto unm_err_out;
+	}
+	index_end =3D (u8 *)&ir->index + le32_to_cpu(ir->index.index_length);
+	if (index_end > ir_end) {
+		ntfs_error(vi->i_sb, "Index is corrupt.");
+		goto unm_err_out;
+	}
+
+	ni->itype.index.collation_rule =3D ir->collation_rule;
+	ntfs_debug("Index collation rule is 0x%x.",
+			le32_to_cpu(ir->collation_rule));
+	ni->itype.index.block_size =3D le32_to_cpu(ir->index_block_size);
+	if (!is_power_of_2(ni->itype.index.block_size)) {
+		ntfs_error(vi->i_sb, "Index block size (%u) is not a power of two.",
+				ni->itype.index.block_size);
+		goto unm_err_out;
+	}
+	if (ni->itype.index.block_size > PAGE_SIZE) {
+		ntfs_error(vi->i_sb, "Index block size (%u) > PAGE_SIZE (%ld) is not sup=
ported.",
+				ni->itype.index.block_size, PAGE_SIZE);
+		err =3D -EOPNOTSUPP;
+		goto unm_err_out;
+	}
+	if (ni->itype.index.block_size < NTFS_BLOCK_SIZE) {
+		ntfs_error(vi->i_sb,
+				"Index block size (%u) < NTFS_BLOCK_SIZE (%i) is not supported.",
+				ni->itype.index.block_size, NTFS_BLOCK_SIZE);
+		err =3D -EOPNOTSUPP;
+		goto unm_err_out;
+	}
+	ni->itype.index.block_size_bits =3D ffs(ni->itype.index.block_size) - 1;
+	/* Determine the size of a vcn in the index. */
+	if (vol->cluster_size <=3D ni->itype.index.block_size) {
+		ni->itype.index.vcn_size =3D vol->cluster_size;
+		ni->itype.index.vcn_size_bits =3D vol->cluster_size_bits;
+	} else {
+		ni->itype.index.vcn_size =3D vol->sector_size;
+		ni->itype.index.vcn_size_bits =3D vol->sector_size_bits;
+	}
+
+	/* Find index allocation attribute. */
+	ntfs_attr_reinit_search_ctx(ctx);
+	err =3D ntfs_attr_lookup(AT_INDEX_ALLOCATION, ni->name, ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err)) {
+		if (err =3D=3D -ENOENT) {
+			/* No index allocation. */
+			vi->i_size =3D ni->initialized_size =3D ni->allocated_size =3D 0;
+			/* We are done with the mft record, so we release it. */
+			ntfs_attr_put_search_ctx(ctx);
+			unmap_mft_record(base_ni);
+			m =3D NULL;
+			ctx =3D NULL;
+			goto skip_large_index_stuff;
+		} else
+			ntfs_error(vi->i_sb, "Failed to lookup $INDEX_ALLOCATION attribute.");
+		goto unm_err_out;
+	}
+	NInoSetIndexAllocPresent(ni);
+	NInoSetNonResident(ni);
+	ni->type =3D AT_INDEX_ALLOCATION;
+
+	a =3D ctx->attr;
+	if (!a->non_resident) {
+		ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is resident.");
+		goto unm_err_out;
+	}
+	/*
+	 * Ensure the attribute name is placed before the mapping pairs array.
+	 */
+	if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=3D
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset)))) {
+		ntfs_error(vol->sb,
+			"$INDEX_ALLOCATION attribute name is placed after the mapping pairs arr=
ay.");
+		goto unm_err_out;
+	}
+	if (a->flags & ATTR_IS_ENCRYPTED) {
+		ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is encrypted.");
+		goto unm_err_out;
+	}
+	if (a->flags & ATTR_IS_SPARSE) {
+		ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is sparse.");
+		goto unm_err_out;
+	}
+	if (a->flags & ATTR_COMPRESSION_MASK) {
+		ntfs_error(vi->i_sb,
+			"$INDEX_ALLOCATION attribute is compressed.");
+		goto unm_err_out;
+	}
+	if (a->data.non_resident.lowest_vcn) {
+		ntfs_error(vi->i_sb,
+			"First extent of $INDEX_ALLOCATION attribute has non zero lowest_vcn.");
+		goto unm_err_out;
+	}
+	vi->i_size =3D ni->data_size =3D le64_to_cpu(a->data.non_resident.data_si=
ze);
+	ni->initialized_size =3D le64_to_cpu(a->data.non_resident.initialized_siz=
e);
+	ni->allocated_size =3D le64_to_cpu(a->data.non_resident.allocated_size);
+	/*
+	 * We are done with the mft record, so we release it.  Otherwise
+	 * we would deadlock in ntfs_attr_iget().
+	 */
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(base_ni);
+	m =3D NULL;
+	ctx =3D NULL;
+	/* Get the index bitmap attribute inode. */
+	bvi =3D ntfs_attr_iget(base_vi, AT_BITMAP, ni->name, ni->name_len);
+	if (IS_ERR(bvi)) {
+		ntfs_error(vi->i_sb, "Failed to get bitmap attribute.");
+		err =3D PTR_ERR(bvi);
+		goto unm_err_out;
+	}
+	bni =3D NTFS_I(bvi);
+	if (NInoCompressed(bni) || NInoEncrypted(bni) ||
+			NInoSparse(bni)) {
+		ntfs_error(vi->i_sb,
+			"$BITMAP attribute is compressed and/or encrypted and/or sparse.");
+		goto iput_unm_err_out;
+	}
+	/* Consistency check bitmap size vs. index allocation size. */
+	bvi_size =3D i_size_read(bvi);
+	if ((bvi_size << 3) < (vi->i_size >> ni->itype.index.block_size_bits)) {
+		ntfs_error(vi->i_sb,
+			"Index bitmap too small (0x%llx) for index allocation (0x%llx).",
+			bvi_size << 3, vi->i_size);
+		goto iput_unm_err_out;
+	}
+	iput(bvi);
+skip_large_index_stuff:
+	/* Setup the operations for this index inode. */
+	ntfs_set_vfs_operations(vi, S_IFDIR, 0);
+	vi->i_blocks =3D ni->allocated_size >> 9;
+	/*
+	 * Make sure the base inode doesn't go away and attach it to the
+	 * index inode.
+	 */
+	if (!igrab(base_vi))
+		goto unm_err_out;
+	ni->ext.base_ntfs_ino =3D base_ni;
+	ni->nr_extents =3D -1;
+
+	ntfs_debug("Done.");
+	return 0;
+iput_unm_err_out:
+	iput(bvi);
+unm_err_out:
+	if (!err)
+		err =3D -EIO;
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	if (m)
+		unmap_mft_record(base_ni);
+err_out:
+	ntfs_error(vi->i_sb,
+		"Failed with error code %i while reading index inode (mft_no 0x%lx, name=
_len %i.",
+		err, vi->i_ino, ni->name_len);
+	if (err !=3D -EOPNOTSUPP && err !=3D -ENOMEM)
+		NVolSetErrors(vol);
+	return err;
+}
+
+/**
+ * load_attribute_list_mount - load an attribute list into memory
+ * @vol:		ntfs volume from which to read
+ * @runlist:		runlist of the attribute list
+ * @al_start:		destination buffer
+ * @size:		size of the destination buffer in bytes
+ * @initialized_size:	initialized size of the attribute list
+ *
+ * Walk the runlist @runlist and load all clusters from it copying them in=
to
+ * the linear buffer @al. The maximum number of bytes copied to @al is @si=
ze
+ * bytes. Note, @size does not need to be a multiple of the cluster size. =
If
+ * @initialized_size is less than @size, the region in @al between
+ * @initialized_size and @size will be zeroed and not read from disk.
+ *
+ * Return 0 on success or -errno on error.
+ */
+static int load_attribute_list_mount(struct ntfs_volume *vol,
+		struct runlist_element *rl, u8 *al_start, const s64 size,
+		const s64 initialized_size)
+{
+	s64 lcn;
+	u8 *al =3D al_start;
+	u8 *al_end =3D al + initialized_size;
+	struct super_block *sb;
+	int err =3D 0;
+	loff_t rl_byte_off, rl_byte_len;
+
+	ntfs_debug("Entering.");
+	if (!vol || !rl || !al || size <=3D 0 || initialized_size < 0 ||
+			initialized_size > size)
+		return -EINVAL;
+	if (!initialized_size) {
+		memset(al, 0, size);
+		return 0;
+	}
+	sb =3D vol->sb;
+
+	/* Read all clusters specified by the runlist one run at a time. */
+	while (rl->length) {
+		lcn =3D ntfs_rl_vcn_to_lcn(rl, rl->vcn);
+		ntfs_debug("Reading vcn =3D 0x%llx, lcn =3D 0x%llx.",
+				(unsigned long long)rl->vcn,
+				(unsigned long long)lcn);
+		/* The attribute list cannot be sparse. */
+		if (lcn < 0) {
+			ntfs_error(sb, "ntfs_rl_vcn_to_lcn() failed. Cannot read attribute list=
.");
+			goto err_out;
+		}
+
+		rl_byte_off =3D lcn << vol->cluster_size_bits;
+		rl_byte_len =3D rl->length << vol->cluster_size_bits;
+
+		if (al + rl_byte_len > al_end)
+			rl_byte_len =3D al_end - al;
+
+		err =3D ntfs_dev_read(sb, al, rl_byte_off, rl_byte_len);
+		if (err) {
+			ntfs_error(sb, "Cannot read attribute list.");
+			goto err_out;
+		}
+
+		if (al + rl_byte_len >=3D al_end) {
+			if (initialized_size < size)
+				goto initialize;
+			goto done;
+		}
+
+		al +=3D rl_byte_len;
+		rl++;
+	}
+	if (initialized_size < size) {
+initialize:
+		memset(al_start + initialized_size, 0, size - initialized_size);
+	}
+done:
+	return err;
+	/* Real overflow! */
+	ntfs_error(sb, "Attribute list buffer overflow. Read attribute list is tr=
uncated.");
+err_out:
+	err =3D -EIO;
+	goto done;
+}
+/*
+ * The MFT inode has special locking, so teach the lock validator
+ * about this by splitting off the locking rules of the MFT from
+ * the locking rules of other inodes. The MFT inode can never be
+ * accessed from the VFS side (or even internally), only by the
+ * map_mft functions.
+ */
+static struct lock_class_key mft_ni_runlist_lock_key, mft_ni_mrec_lock_key;
+
+/**
+ * ntfs_read_inode_mount - special read_inode for mount time use only
+ * @vi:		inode to read
+ *
+ * Read inode FILE_MFT at mount time, only called with super_block lock
+ * held from within the read_super() code path.
+ *
+ * This function exists because when it is called the page cache for $MFT/=
$DATA
+ * is not initialized and hence we cannot get at the contents of mft recor=
ds
+ * by calling map_mft_record*().
+ *
+ * Further it needs to cope with the circular references problem, i.e. can=
not
+ * load any attributes other than $ATTRIBUTE_LIST until $DATA is loaded, b=
ecause
+ * we do not know where the other extent mft records are yet and again, be=
cause
+ * we cannot call map_mft_record*() yet.  Obviously this applies only when=
 an
+ * attribute list is actually present in $MFT inode.
+ *
+ * We solve these problems by starting with the $DATA attribute before any=
thing
+ * else and iterating using ntfs_attr_lookup($DATA) over all extents.  As =
each
+ * extent is found, we ntfs_mapping_pairs_decompress() including the impli=
ed
+ * ntfs_runlists_merge().  Each step of the iteration necessarily provides
+ * sufficient information for the next step to complete.
+ *
+ * This should work but there are two possible pit falls (see inline comme=
nts
+ * below), but only time will tell if they are real pits or just smoke...
+ */
+int ntfs_read_inode_mount(struct inode *vi)
+{
+	s64 next_vcn, last_vcn, highest_vcn;
+	struct super_block *sb =3D vi->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	struct ntfs_inode *ni;
+	struct mft_record *m =3D NULL;
+	struct attr_record *a;
+	struct ntfs_attr_search_ctx *ctx;
+	unsigned int i, nr_blocks;
+	int err;
+	size_t new_rl_count;
+
+	ntfs_debug("Entering.");
+
+	/* Initialize the ntfs specific part of @vi. */
+	ntfs_init_big_inode(vi);
+
+	ni =3D NTFS_I(vi);
+
+	/* Setup the data attribute. It is special as it is mst protected. */
+	NInoSetNonResident(ni);
+	NInoSetMstProtected(ni);
+	NInoSetSparseDisabled(ni);
+	ni->type =3D AT_DATA;
+	ni->name =3D AT_UNNAMED;
+	ni->name_len =3D 0;
+	/*
+	 * This sets up our little cheat allowing us to reuse the async read io
+	 * completion handler for directories.
+	 */
+	ni->itype.index.block_size =3D vol->mft_record_size;
+	ni->itype.index.block_size_bits =3D vol->mft_record_size_bits;
+
+	/* Very important! Needed to be able to call map_mft_record*(). */
+	vol->mft_ino =3D vi;
+
+	/* Allocate enough memory to read the first mft record. */
+	if (vol->mft_record_size > 64 * 1024) {
+		ntfs_error(sb, "Unsupported mft record size %i (max 64kiB).",
+				vol->mft_record_size);
+		goto err_out;
+	}
+
+	i =3D vol->mft_record_size;
+	if (i < sb->s_blocksize)
+		i =3D sb->s_blocksize;
+
+	m =3D (struct mft_record *)ntfs_malloc_nofs(i);
+	if (!m) {
+		ntfs_error(sb, "Failed to allocate buffer for $MFT record 0.");
+		goto err_out;
+	}
+
+	/* Determine the first block of the $MFT/$DATA attribute. */
+	nr_blocks =3D vol->mft_record_size >> sb->s_blocksize_bits;
+	if (!nr_blocks)
+		nr_blocks =3D 1;
+
+	/* Load $MFT/$DATA's first mft record. */
+	err =3D ntfs_dev_read(sb, m, vol->mft_lcn << vol->cluster_size_bits, i);
+	if (err) {
+		ntfs_error(sb, "Device read failed.");
+		goto err_out;
+	}
+
+	if (le32_to_cpu(m->bytes_allocated) !=3D vol->mft_record_size) {
+		ntfs_error(sb, "Incorrect mft record size %u in superblock, should be %u=
.",
+				le32_to_cpu(m->bytes_allocated), vol->mft_record_size);
+		goto err_out;
+	}
+
+	/* Apply the mst fixups. */
+	if (post_read_mst_fixup((struct ntfs_record *)m, vol->mft_record_size)) {
+		ntfs_error(sb, "MST fixup failed. $MFT is corrupt.");
+		goto err_out;
+	}
+
+	if (ntfs_mft_record_check(vol, m, FILE_MFT)) {
+		ntfs_error(sb, "ntfs_mft_record_check failed. $MFT is corrupt.");
+		goto err_out;
+	}
+
+	/* Need this to sanity check attribute list references to $MFT. */
+	vi->i_generation =3D ni->seq_no =3D le16_to_cpu(m->sequence_number);
+
+	/* Provides read_folio() for map_mft_record(). */
+	vi->i_mapping->a_ops =3D &ntfs_mst_aops;
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, m);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+	/* Find the attribute list attribute if present. */
+	err =3D ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0, 0, 0, NULL, 0, ctx);
+	if (err) {
+		if (unlikely(err !=3D -ENOENT)) {
+			ntfs_error(sb,
+				"Failed to lookup attribute list attribute. You should run chkdsk.");
+			goto put_err_out;
+		}
+	} else /* if (!err) */ {
+		struct attr_list_entry *al_entry, *next_al_entry;
+		u8 *al_end;
+		static const char *es =3D "  Not allowed.  $MFT is corrupt.  You should =
run chkdsk.";
+
+		ntfs_debug("Attribute list attribute found in $MFT.");
+		NInoSetAttrList(ni);
+		a =3D ctx->attr;
+		if (a->flags & ATTR_COMPRESSION_MASK) {
+			ntfs_error(sb,
+				"Attribute list attribute is compressed.%s",
+				es);
+			goto put_err_out;
+		}
+		if (a->flags & ATTR_IS_ENCRYPTED ||
+				a->flags & ATTR_IS_SPARSE) {
+			if (a->non_resident) {
+				ntfs_error(sb,
+					"Non-resident attribute list attribute is encrypted/sparse.%s",
+					es);
+				goto put_err_out;
+			}
+			ntfs_warning(sb,
+				"Resident attribute list attribute in $MFT system file is marked encry=
pted/sparse which is not true.  However, Windows allows this and chkdsk doe=
s not detect or correct it so we will just ignore the invalid flags and pre=
tend they are not set.");
+		}
+		/* Now allocate memory for the attribute list. */
+		ni->attr_list_size =3D (u32)ntfs_attr_size(a);
+		if (!ni->attr_list_size) {
+			ntfs_error(sb, "Attr_list_size is zero");
+			goto put_err_out;
+		}
+		ni->attr_list =3D ntfs_malloc_nofs(ni->attr_list_size);
+		if (!ni->attr_list) {
+			ntfs_error(sb, "Not enough memory to allocate buffer for attribute list=
.");
+			goto put_err_out;
+		}
+		if (a->non_resident) {
+			struct runlist_element *rl;
+			size_t new_rl_count;
+
+			NInoSetAttrListNonResident(ni);
+			if (a->data.non_resident.lowest_vcn) {
+				ntfs_error(sb,
+					"Attribute list has non zero lowest_vcn. $MFT is corrupt. You should =
run chkdsk.");
+				goto put_err_out;
+			}
+
+			rl =3D ntfs_mapping_pairs_decompress(vol, a, NULL, &new_rl_count);
+			if (IS_ERR(rl)) {
+				err =3D PTR_ERR(rl);
+				ntfs_error(sb,
+					   "Mapping pairs decompression failed with error code %i.",
+					   -err);
+				goto put_err_out;
+			}
+
+			err =3D load_attribute_list_mount(vol, rl, ni->attr_list, ni->attr_list=
_size,
+					le64_to_cpu(a->data.non_resident.initialized_size));
+			ntfs_free(rl);
+			if (err) {
+				ntfs_error(sb,
+					   "Failed to load attribute list with error code %i.",
+					   -err);
+				goto put_err_out;
+			}
+		} else /* if (!ctx.attr->non_resident) */ {
+			if ((u8 *)a + le16_to_cpu(
+					a->data.resident.value_offset) +
+					le32_to_cpu(a->data.resident.value_length) >
+					(u8 *)ctx->mrec + vol->mft_record_size) {
+				ntfs_error(sb, "Corrupt attribute list attribute.");
+				goto put_err_out;
+			}
+			/* Now copy the attribute list. */
+			memcpy(ni->attr_list, (u8 *)a + le16_to_cpu(
+					a->data.resident.value_offset),
+					le32_to_cpu(a->data.resident.value_length));
+		}
+		/* The attribute list is now setup in memory. */
+		al_entry =3D (struct attr_list_entry *)ni->attr_list;
+		al_end =3D (u8 *)al_entry + ni->attr_list_size;
+		for (;; al_entry =3D next_al_entry) {
+			/* Out of bounds check. */
+			if ((u8 *)al_entry < ni->attr_list ||
+					(u8 *)al_entry > al_end)
+				goto em_put_err_out;
+			/* Catch the end of the attribute list. */
+			if ((u8 *)al_entry =3D=3D al_end)
+				goto em_put_err_out;
+			if (!al_entry->length)
+				goto em_put_err_out;
+			if ((u8 *)al_entry + 6 > al_end ||
+			    (u8 *)al_entry + le16_to_cpu(al_entry->length) > al_end)
+				goto em_put_err_out;
+			next_al_entry =3D (struct attr_list_entry *)((u8 *)al_entry +
+					le16_to_cpu(al_entry->length));
+			if (le32_to_cpu(al_entry->type) > le32_to_cpu(AT_DATA))
+				goto em_put_err_out;
+			if (al_entry->type !=3D AT_DATA)
+				continue;
+			/* We want an unnamed attribute. */
+			if (al_entry->name_length)
+				goto em_put_err_out;
+			/* Want the first entry, i.e. lowest_vcn =3D=3D 0. */
+			if (al_entry->lowest_vcn)
+				goto em_put_err_out;
+			/* First entry has to be in the base mft record. */
+			if (MREF_LE(al_entry->mft_reference) !=3D vi->i_ino) {
+				/* MFT references do not match, logic fails. */
+				ntfs_error(sb,
+					"BUG: The first $DATA extent of $MFT is not in the base mft record.");
+				goto put_err_out;
+			} else {
+				/* Sequence numbers must match. */
+				if (MSEQNO_LE(al_entry->mft_reference) !=3D
+						ni->seq_no)
+					goto em_put_err_out;
+				/* Got it. All is ok. We can stop now. */
+				break;
+			}
+		}
+	}
+
+	ntfs_attr_reinit_search_ctx(ctx);
+
+	/* Now load all attribute extents. */
+	a =3D NULL;
+	next_vcn =3D last_vcn =3D highest_vcn =3D 0;
+	while (!(err =3D ntfs_attr_lookup(AT_DATA, NULL, 0, 0, next_vcn, NULL, 0,
+			ctx))) {
+		struct runlist_element *nrl;
+
+		/* Cache the current attribute. */
+		a =3D ctx->attr;
+		/* $MFT must be non-resident. */
+		if (!a->non_resident) {
+			ntfs_error(sb,
+				"$MFT must be non-resident but a resident extent was found. $MFT is co=
rrupt. Run chkdsk.");
+			goto put_err_out;
+		}
+		/* $MFT must be uncompressed and unencrypted. */
+		if (a->flags & ATTR_COMPRESSION_MASK ||
+				a->flags & ATTR_IS_ENCRYPTED ||
+				a->flags & ATTR_IS_SPARSE) {
+			ntfs_error(sb,
+				"$MFT must be uncompressed, non-sparse, and unencrypted but a compress=
ed/sparse/encrypted extent was found. $MFT is corrupt. Run chkdsk.");
+			goto put_err_out;
+		}
+		/*
+		 * Decompress the mapping pairs array of this extent and merge
+		 * the result into the existing runlist. No need for locking
+		 * as we have exclusive access to the inode at this time and we
+		 * are a mount in progress task, too.
+		 */
+		nrl =3D ntfs_mapping_pairs_decompress(vol, a, &ni->runlist,
+						    &new_rl_count);
+		if (IS_ERR(nrl)) {
+			ntfs_error(sb,
+				"ntfs_mapping_pairs_decompress() failed with error code %ld.",
+				PTR_ERR(nrl));
+			goto put_err_out;
+		}
+		ni->runlist.rl =3D nrl;
+		ni->runlist.count =3D new_rl_count;
+
+		/* Are we in the first extent? */
+		if (!next_vcn) {
+			if (a->data.non_resident.lowest_vcn) {
+				ntfs_error(sb,
+					"First extent of $DATA attribute has non zero lowest_vcn. $MFT is cor=
rupt. You should run chkdsk.");
+				goto put_err_out;
+			}
+			/* Get the last vcn in the $DATA attribute. */
+			last_vcn =3D le64_to_cpu(a->data.non_resident.allocated_size) >>
+				vol->cluster_size_bits;
+			/* Fill in the inode size. */
+			vi->i_size =3D le64_to_cpu(a->data.non_resident.data_size);
+			ni->initialized_size =3D le64_to_cpu(a->data.non_resident.initialized_s=
ize);
+			ni->allocated_size =3D le64_to_cpu(a->data.non_resident.allocated_size);
+			/*
+			 * Verify the number of mft records does not exceed
+			 * 2^32 - 1.
+			 */
+			if ((vi->i_size >> vol->mft_record_size_bits) >=3D
+					(1ULL << 32)) {
+				ntfs_error(sb, "$MFT is too big! Aborting.");
+				goto put_err_out;
+			}
+			/*
+			 * We have got the first extent of the runlist for
+			 * $MFT which means it is now relatively safe to call
+			 * the normal ntfs_read_inode() function.
+			 * Complete reading the inode, this will actually
+			 * re-read the mft record for $MFT, this time entering
+			 * it into the page cache with which we complete the
+			 * kick start of the volume. It should be safe to do
+			 * this now as the first extent of $MFT/$DATA is
+			 * already known and we would hope that we don't need
+			 * further extents in order to find the other
+			 * attributes belonging to $MFT. Only time will tell if
+			 * this is really the case. If not we will have to play
+			 * magic at this point, possibly duplicating a lot of
+			 * ntfs_read_inode() at this point. We will need to
+			 * ensure we do enough of its work to be able to call
+			 * ntfs_read_inode() on extents of $MFT/$DATA. But lets
+			 * hope this never happens...
+			 */
+			err =3D ntfs_read_locked_inode(vi);
+			if (err) {
+				ntfs_error(sb, "ntfs_read_inode() of $MFT failed.\n");
+				ntfs_attr_put_search_ctx(ctx);
+				/* Revert to the safe super operations. */
+				ntfs_free(m);
+				return -1;
+			}
+			/*
+			 * Re-initialize some specifics about $MFT's inode as
+			 * ntfs_read_inode() will have set up the default ones.
+			 */
+			/* Set uid and gid to root. */
+			vi->i_uid =3D GLOBAL_ROOT_UID;
+			vi->i_gid =3D GLOBAL_ROOT_GID;
+			/* Regular file. No access for anyone. */
+			vi->i_mode =3D S_IFREG;
+			/* No VFS initiated operations allowed for $MFT. */
+			vi->i_op =3D &ntfs_empty_inode_ops;
+			vi->i_fop =3D &ntfs_empty_file_ops;
+		}
+
+		/* Get the lowest vcn for the next extent. */
+		highest_vcn =3D le64_to_cpu(a->data.non_resident.highest_vcn);
+		next_vcn =3D highest_vcn + 1;
+
+		/* Only one extent or error, which we catch below. */
+		if (next_vcn <=3D 0)
+			break;
+
+		/* Avoid endless loops due to corruption. */
+		if (next_vcn < le64_to_cpu(a->data.non_resident.lowest_vcn)) {
+			ntfs_error(sb, "$MFT has corrupt attribute list attribute. Run chkdsk."=
);
+			goto put_err_out;
+		}
+	}
+	if (err !=3D -ENOENT) {
+		ntfs_error(sb, "Failed to lookup $MFT/$DATA attribute extent. Run chkdsk=
.\n");
+		goto put_err_out;
+	}
+	if (!a) {
+		ntfs_error(sb, "$MFT/$DATA attribute not found. $MFT is corrupt. Run chk=
dsk.");
+		goto put_err_out;
+	}
+	if (highest_vcn && highest_vcn !=3D last_vcn - 1) {
+		ntfs_error(sb, "Failed to load the complete runlist for $MFT/$DATA. Run =
chkdsk.");
+		ntfs_debug("highest_vcn =3D 0x%llx, last_vcn - 1 =3D 0x%llx",
+				(unsigned long long)highest_vcn,
+				(unsigned long long)last_vcn - 1);
+		goto put_err_out;
+	}
+	ntfs_attr_put_search_ctx(ctx);
+	ntfs_debug("Done.");
+	ntfs_free(m);
+
+	/*
+	 * Split the locking rules of the MFT inode from the
+	 * locking rules of other inodes:
+	 */
+	lockdep_set_class(&ni->runlist.lock, &mft_ni_runlist_lock_key);
+	lockdep_set_class(&ni->mrec_lock, &mft_ni_mrec_lock_key);
+
+	return 0;
+
+em_put_err_out:
+	ntfs_error(sb,
+		"Couldn't find first extent of $DATA attribute in attribute list. $MFT i=
s corrupt. Run chkdsk.");
+put_err_out:
+	ntfs_attr_put_search_ctx(ctx);
+err_out:
+	ntfs_error(sb, "Failed. Marking inode as bad.");
+	ntfs_free(m);
+	return -1;
+}
+
+static void __ntfs_clear_inode(struct ntfs_inode *ni)
+{
+	/* Free all alocated memory. */
+	if (NInoNonResident(ni) && ni->runlist.rl) {
+		ntfs_free(ni->runlist.rl);
+		ni->runlist.rl =3D NULL;
+	}
+
+	if (ni->attr_list) {
+		ntfs_free(ni->attr_list);
+		ni->attr_list =3D NULL;
+	}
+
+	if (ni->name_len && ni->name !=3D I30 &&
+	    ni->name !=3D reparse_index_name &&
+	    ni->name !=3D R) {
+		WARN_ON(!ni->name);
+		kfree(ni->name);
+	}
+}
+
+void ntfs_clear_extent_inode(struct ntfs_inode *ni)
+{
+	ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
+
+	WARN_ON(NInoAttr(ni));
+	WARN_ON(ni->nr_extents !=3D -1);
+
+	__ntfs_clear_inode(ni);
+	ntfs_destroy_extent_inode(ni);
+}
+
+static int ntfs_delete_base_inode(struct ntfs_inode *ni)
+{
+	struct super_block *sb =3D ni->vol->sb;
+	int err;
+
+	if (NInoAttr(ni) || ni->nr_extents =3D=3D -1)
+		return 0;
+
+	err =3D ntfs_non_resident_dealloc_clusters(ni);
+
+	/*
+	 * Deallocate extent mft records and free extent inodes.
+	 * No need to lock as no one else has a reference.
+	 */
+	while (ni->nr_extents) {
+		err =3D ntfs_mft_record_free(ni->vol, *(ni->ext.extent_ntfs_inos));
+		if (err)
+			ntfs_error(sb,
+				"Failed to free extent MFT record. Leaving inconsistent metadata.\n");
+		ntfs_inode_close(*(ni->ext.extent_ntfs_inos));
+	}
+
+	/* Deallocate base mft record */
+	err =3D ntfs_mft_record_free(ni->vol, ni);
+	if (err)
+		ntfs_error(sb, "Failed to free base MFT record. Leaving inconsistent met=
adata.\n");
+	return err;
+}
+
+/**
+ * ntfs_evict_big_inode - clean up the ntfs specific part of an inode
+ * @vi:		vfs inode pending annihilation
+ *
+ * When the VFS is going to remove an inode from memory, ntfs_clear_big_in=
ode()
+ * is called, which deallocates all memory belonging to the NTFS specific =
part
+ * of the inode and returns.
+ *
+ * If the MFT record is dirty, we commit it before doing anything else.
+ */
+void ntfs_evict_big_inode(struct inode *vi)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+
+	truncate_inode_pages_final(&vi->i_data);
+
+	if (!vi->i_nlink) {
+		if (!NInoAttr(ni)) {
+			/* Never called with extent inodes */
+			WARN_ON(ni->nr_extents =3D=3D -1);
+			ntfs_delete_base_inode(ni);
+		}
+		goto release;
+	}
+
+	if (NInoDirty(ni)) {
+		/* Committing the inode also commits all extent inodes. */
+		ntfs_commit_inode(vi);
+
+		if (NInoDirty(ni)) {
+			ntfs_debug("Failed to commit dirty inode 0x%lx.  Losing data!",
+				   vi->i_ino);
+			NInoClearAttrListDirty(ni);
+			NInoClearDirty(ni);
+		}
+	}
+
+	/* No need to lock at this stage as no one else has a reference. */
+	if (ni->nr_extents > 0) {
+		int i;
+
+		for (i =3D 0; i < ni->nr_extents; i++) {
+			if (ni->ext.extent_ntfs_inos[i])
+				ntfs_clear_extent_inode(ni->ext.extent_ntfs_inos[i]);
+		}
+		ni->nr_extents =3D 0;
+		ntfs_free(ni->ext.extent_ntfs_inos);
+	}
+
+release:
+	clear_inode(vi);
+	__ntfs_clear_inode(ni);
+
+	if (NInoAttr(ni)) {
+		/* Release the base inode if we are holding it. */
+		if (ni->nr_extents =3D=3D -1) {
+			iput(VFS_I(ni->ext.base_ntfs_ino));
+			ni->nr_extents =3D 0;
+			ni->ext.base_ntfs_ino =3D NULL;
+		}
+	}
+
+	if (!atomic_dec_and_test(&ni->count))
+		WARN_ON(1);
+	if (ni->folio)
+		ntfs_unmap_folio(ni->folio, NULL);
+	kfree(ni->mrec);
+	ntfs_free(ni->target);
+}
+
+/**
+ * ntfs_show_options - show mount options in /proc/mounts
+ * @sf:		seq_file in which to write our mount options
+ * @root:	root of the mounted tree whose mount options to display
+ *
+ * Called by the VFS once for each mounted ntfs volume when someone reads
+ * /proc/mounts in order to display the NTFS specific mount options of each
+ * mount. The mount options of fs specified by @root are written to the se=
q file
+ * @sf and success is returned.
+ */
+int ntfs_show_options(struct seq_file *sf, struct dentry *root)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(root->d_sb);
+	int i;
+
+	if (uid_valid(vol->uid))
+		seq_printf(sf, ",uid=3D%i", from_kuid_munged(&init_user_ns, vol->uid));
+	if (gid_valid(vol->gid))
+		seq_printf(sf, ",gid=3D%i", from_kgid_munged(&init_user_ns, vol->gid));
+	if (vol->fmask =3D=3D vol->dmask)
+		seq_printf(sf, ",umask=3D0%o", vol->fmask);
+	else {
+		seq_printf(sf, ",fmask=3D0%o", vol->fmask);
+		seq_printf(sf, ",dmask=3D0%o", vol->dmask);
+	}
+	seq_printf(sf, ",iocharset=3D%s", vol->nls_map->charset);
+	if (NVolCaseSensitive(vol))
+		seq_puts(sf, ",case_sensitive");
+	else
+		seq_puts(sf, ",nocase");
+	if (NVolShowSystemFiles(vol))
+		seq_puts(sf, ",show_sys_files,showmeta");
+	for (i =3D 0; on_errors_arr[i].val; i++) {
+		if (on_errors_arr[i].val =3D=3D vol->on_errors)
+			seq_printf(sf, ",errors=3D%s", on_errors_arr[i].str);
+	}
+	seq_printf(sf, ",mft_zone_multiplier=3D%i", vol->mft_zone_multiplier);
+	if (NVolSysImmutable(vol))
+		seq_puts(sf, ",sys_immutable");
+	if (!NVolShowHiddenFiles(vol))
+		seq_puts(sf, ",nohidden");
+	if (NVolHideDotFiles(vol))
+		seq_puts(sf, ",hide_dot_files");
+	if (NVolCheckWindowsNames(vol))
+		seq_puts(sf, ",windows_names");
+	if (NVolDiscard(vol))
+		seq_puts(sf, ",discard");
+	if (NVolDisableSparse(vol))
+		seq_puts(sf, ",disable_sparse");
+	if (vol->sb->s_flags & SB_POSIXACL)
+		seq_puts(sf, ",acl");
+	return 0;
+}
+
+int ntfs_extend_initialized_size(struct inode *vi, const loff_t offset,
+		const loff_t new_size)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	loff_t old_init_size;
+	unsigned long flags;
+	int err;
+
+	read_lock_irqsave(&ni->size_lock, flags);
+	old_init_size =3D ni->initialized_size;
+	read_unlock_irqrestore(&ni->size_lock, flags);
+
+	if (!NInoNonResident(ni))
+		return -EINVAL;
+	if (old_init_size >=3D new_size)
+		return 0;
+
+	err =3D ntfs_attr_map_whole_runlist(ni);
+	if (err)
+		return err;
+
+	if (!NInoCompressed(ni) && old_init_size < offset) {
+		err =3D iomap_zero_range(vi, old_init_size,
+				       offset - old_init_size,
+				       NULL, &ntfs_read_iomap_ops,
+				       &ntfs_iomap_folio_ops, NULL);
+		if (err)
+			return err;
+	}
+
+
+	mutex_lock(&ni->mrec_lock);
+	err =3D ntfs_attr_set_initialized_size(ni, new_size);
+	mutex_unlock(&ni->mrec_lock);
+	if (err)
+		truncate_setsize(vi, old_init_size);
+	return err;
+}
+
+int ntfs_truncate_vfs(struct inode *vi, loff_t new_size, loff_t i_size)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	int err;
+
+	mutex_lock(&ni->mrec_lock);
+	err =3D __ntfs_attr_truncate_vfs(ni, new_size, i_size);
+	mutex_unlock(&ni->mrec_lock);
+	if (err < 0)
+		return err;
+
+	inode_set_mtime_to_ts(vi, inode_set_ctime_current(vi));
+	return 0;
+}
+
+/**
+ * ntfs_inode_sync_standard_information - update standard information attr=
ibute
+ * @vi:	inode to update standard information
+ * @m:	mft record
+ *
+ * Return 0 on success or -errno on error.
+ */
+static int ntfs_inode_sync_standard_information(struct inode *vi, struct m=
ft_record *m)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_attr_search_ctx *ctx;
+	struct standard_information *si;
+	__le64 nt;
+	int err =3D 0;
+	bool modified =3D false;
+
+	/* Update the access times in the standard information attribute. */
+	ctx =3D ntfs_attr_get_search_ctx(ni, m);
+	if (unlikely(!ctx))
+		return -ENOMEM;
+	err =3D ntfs_attr_lookup(AT_STANDARD_INFORMATION, NULL, 0,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err)) {
+		ntfs_attr_put_search_ctx(ctx);
+		return err;
+	}
+	si =3D (struct standard_information *)((u8 *)ctx->attr +
+			le16_to_cpu(ctx->attr->data.resident.value_offset));
+	if (si->file_attributes !=3D ni->flags) {
+		si->file_attributes =3D ni->flags;
+		modified =3D true;
+	}
+
+	/* Update the creation times if they have changed. */
+	nt =3D utc2ntfs(ni->i_crtime);
+	if (si->creation_time !=3D nt) {
+		ntfs_debug("Updating creation time for inode 0x%lx: old =3D 0x%llx, new =
=3D 0x%llx",
+				vi->i_ino, le64_to_cpu(si->creation_time),
+				le64_to_cpu(nt));
+		si->creation_time =3D nt;
+		modified =3D true;
+	}
+
+	/* Update the access times if they have changed. */
+	nt =3D utc2ntfs(inode_get_mtime(vi));
+	if (si->last_data_change_time !=3D nt) {
+		ntfs_debug("Updating mtime for inode 0x%lx: old =3D 0x%llx, new =3D 0x%l=
lx",
+				vi->i_ino, le64_to_cpu(si->last_data_change_time),
+				le64_to_cpu(nt));
+		si->last_data_change_time =3D nt;
+		modified =3D true;
+	}
+
+	nt =3D utc2ntfs(inode_get_ctime(vi));
+	if (si->last_mft_change_time !=3D nt) {
+		ntfs_debug("Updating ctime for inode 0x%lx: old =3D 0x%llx, new =3D 0x%l=
lx",
+				vi->i_ino, le64_to_cpu(si->last_mft_change_time),
+				le64_to_cpu(nt));
+		si->last_mft_change_time =3D nt;
+		modified =3D true;
+	}
+	nt =3D utc2ntfs(inode_get_atime(vi));
+	if (si->last_access_time !=3D nt) {
+		ntfs_debug("Updating atime for inode 0x%lx: old =3D 0x%llx, new =3D 0x%l=
lx",
+				vi->i_ino,
+				le64_to_cpu(si->last_access_time),
+				le64_to_cpu(nt));
+		si->last_access_time =3D nt;
+		modified =3D true;
+	}
+
+	/*
+	 * If we just modified the standard information attribute we need to
+	 * mark the mft record it is in dirty.  We do this manually so that
+	 * mark_inode_dirty() is not called which would redirty the inode and
+	 * hence result in an infinite loop of trying to write the inode.
+	 * There is no need to mark the base inode nor the base mft record
+	 * dirty, since we are going to write this mft record below in any case
+	 * and the base mft record may actually not have been modified so it
+	 * might not need to be written out.
+	 * NOTE: It is not a problem when the inode for $MFT itself is being
+	 * written out as mark_ntfs_record_dirty() will only set I_DIRTY_PAGES
+	 * on the $MFT inode and hence ntfs_write_inode() will not be
+	 * re-invoked because of it which in turn is ok since the dirtied mft
+	 * record will be cleaned and written out to disk below, i.e. before
+	 * this function returns.
+	 */
+	if (modified)
+		NInoSetDirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+
+	return err;
+}
+
+/**
+ * ntfs_inode_sync_filename - update FILE_NAME attributes
+ * @ni:	ntfs inode to update FILE_NAME attributes
+ *
+ * Update all FILE_NAME attributes for inode @ni in the index.
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_sync_filename(struct ntfs_inode *ni)
+{
+	struct inode *index_vi;
+	struct super_block *sb =3D VFS_I(ni)->i_sb;
+	struct ntfs_attr_search_ctx *ctx =3D NULL;
+	struct ntfs_index_context *ictx;
+	struct ntfs_inode *index_ni;
+	struct file_name_attr *fn;
+	struct file_name_attr *fnx;
+	struct reparse_point *rpp;
+	__le32 reparse_tag;
+	int err =3D 0;
+	unsigned long flags;
+
+	ntfs_debug("Entering for inode %lld\n", (long long)ni->mft_no);
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx)
+		return -ENOMEM;
+
+	/* Collect the reparse tag, if any */
+	reparse_tag =3D cpu_to_le32(0);
+	if (ni->flags & FILE_ATTR_REPARSE_POINT) {
+		if (!ntfs_attr_lookup(AT_REPARSE_POINT, NULL,
+					0, CASE_SENSITIVE, 0, NULL, 0, ctx)) {
+			rpp =3D (struct reparse_point *)((u8 *)ctx->attr +
+					le16_to_cpu(ctx->attr->data.resident.value_offset));
+			reparse_tag =3D rpp->reparse_tag;
+		}
+		ntfs_attr_reinit_search_ctx(ctx);
+	}
+
+	/* Walk through all FILE_NAME attributes and update them. */
+	while (!(err =3D ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0, NULL, 0, c=
tx))) {
+		fn =3D (struct file_name_attr *)((u8 *)ctx->attr +
+				le16_to_cpu(ctx->attr->data.resident.value_offset));
+		if (MREF_LE(fn->parent_directory) =3D=3D ni->mft_no)
+			continue;
+
+		index_vi =3D ntfs_iget(sb, MREF_LE(fn->parent_directory));
+		if (IS_ERR(index_vi)) {
+			ntfs_error(sb, "Failed to open inode %lld with index",
+					(long long)MREF_LE(fn->parent_directory));
+			continue;
+		}
+
+		index_ni =3D NTFS_I(index_vi);
+
+		mutex_lock_nested(&index_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+		if (NInoBeingDeleted(ni)) {
+			iput(index_vi);
+			mutex_unlock(&index_ni->mrec_lock);
+			continue;
+		}
+
+		ictx =3D ntfs_index_ctx_get(index_ni, I30, 4);
+		if (!ictx) {
+			ntfs_error(sb, "Failed to get index ctx, inode %lld",
+					(long long)index_ni->mft_no);
+			iput(index_vi);
+			mutex_unlock(&index_ni->mrec_lock);
+			continue;
+		}
+
+		err =3D ntfs_index_lookup(fn, sizeof(struct file_name_attr), ictx);
+		if (err) {
+			ntfs_debug("Index lookup failed, inode %lld",
+					(long long)index_ni->mft_no);
+			ntfs_index_ctx_put(ictx);
+			iput(index_vi);
+			mutex_unlock(&index_ni->mrec_lock);
+			continue;
+		}
+		/* Update flags and file size. */
+		fnx =3D (struct file_name_attr *)ictx->data;
+		fnx->file_attributes =3D
+			(fnx->file_attributes & ~FILE_ATTR_VALID_FLAGS) |
+			(ni->flags & FILE_ATTR_VALID_FLAGS);
+		if (ctx->mrec->flags & MFT_RECORD_IS_DIRECTORY)
+			fnx->data_size =3D fnx->allocated_size =3D 0;
+		else {
+			read_lock_irqsave(&ni->size_lock, flags);
+			if (NInoSparse(ni) || NInoCompressed(ni))
+				fnx->allocated_size =3D cpu_to_le64(ni->itype.compressed.size);
+			else
+				fnx->allocated_size =3D cpu_to_le64(ni->allocated_size);
+			fnx->data_size =3D cpu_to_le64(ni->data_size);
+
+			/*
+			 * The file name record has also to be fixed if some
+			 * attribute update implied the unnamed data to be
+			 * made non-resident
+			 */
+			fn->allocated_size =3D fnx->allocated_size;
+			fn->data_size =3D fnx->data_size;
+			read_unlock_irqrestore(&ni->size_lock, flags);
+		}
+
+		/* update or clear the reparse tag in the index */
+		fnx->type.rp.reparse_point_tag =3D reparse_tag;
+		fnx->creation_time =3D fn->creation_time;
+		fnx->last_data_change_time =3D fn->last_data_change_time;
+		fnx->last_mft_change_time =3D fn->last_mft_change_time;
+		fnx->last_access_time =3D fn->last_access_time;
+		ntfs_index_entry_mark_dirty(ictx);
+		ntfs_icx_ib_sync_write(ictx);
+		NInoSetDirty(ctx->ntfs_ino);
+		ntfs_index_ctx_put(ictx);
+		mutex_unlock(&index_ni->mrec_lock);
+		iput(index_vi);
+	}
+	/* Check for real error occurred. */
+	if (err !=3D -ENOENT) {
+		ntfs_error(sb, "Attribute lookup failed, err : %d, inode %lld", err,
+				(long long)ni->mft_no);
+	} else
+		err =3D 0;
+
+	ntfs_attr_put_search_ctx(ctx);
+	return err;
+}
+
+/**
+ * __ntfs_write_inode - write out a dirty inode
+ * @vi:		inode to write out
+ * @sync:	if true, write out synchronously
+ *
+ * Write out a dirty inode to disk including any extent inodes if present.
+ *
+ * If @sync is true, commit the inode to disk and wait for io completion. =
 This
+ * is done using write_mft_record().
+ *
+ * If @sync is false, just schedule the write to happen but do not wait fo=
r i/o
+ * completion.
+ *
+ * Return 0 on success and -errno on error.
+ */
+int __ntfs_write_inode(struct inode *vi, int sync)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct mft_record *m;
+	int err =3D 0;
+	bool need_iput =3D false;
+
+	ntfs_debug("Entering for %sinode 0x%lx.", NInoAttr(ni) ? "attr " : "",
+			vi->i_ino);
+
+	if (NVolShutdown(ni->vol))
+		return -EIO;
+
+	/*
+	 * Dirty attribute inodes are written via their real inodes so just
+	 * clean them here.  Access time updates are taken care off when the
+	 * real inode is written.
+	 */
+	if (NInoAttr(ni) || ni->nr_extents =3D=3D -1) {
+		NInoClearDirty(ni);
+		ntfs_debug("Done.");
+		return 0;
+	}
+
+	/* igrab prevents vi from being evicted while mrec_lock is hold. */
+	if (igrab(vi) !=3D NULL)
+		need_iput =3D true;
+
+	mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+	/* Map, pin, and lock the mft record belonging to the inode. */
+	m =3D map_mft_record(ni);
+	if (IS_ERR(m)) {
+		mutex_unlock(&ni->mrec_lock);
+		err =3D PTR_ERR(m);
+		goto err_out;
+	}
+
+	if (NInoNonResident(ni) && NInoRunlistDirty(ni)) {
+		down_write(&ni->runlist.lock);
+		err =3D ntfs_attr_update_mapping_pairs(ni, 0);
+		if (!err)
+			NInoClearRunlistDirty(ni);
+		up_write(&ni->runlist.lock);
+	}
+
+	err =3D ntfs_inode_sync_standard_information(vi, m);
+	if (err)
+		goto unm_err_out;
+
+	/*
+	 * when being umounted and inodes are evicted, write_inode()
+	 * is called with all inodes being marked with I_FREEING.
+	 * then ntfs_inode_sync_filename() waits infinitly because
+	 * of ntfs_iget. This situation happens only where sync_filesysem()
+	 * from umount fails because of a disk unplug and etc.
+	 * the absent of SB_ACTIVE means umounting.
+	 */
+	if ((vi->i_sb->s_flags & SB_ACTIVE) && NInoTestClearFileNameDirty(ni))
+		ntfs_inode_sync_filename(ni);
+
+	/* Now the access times are updated, write the base mft record. */
+	if (NInoDirty(ni)) {
+		err =3D write_mft_record(ni, m, sync);
+		if (err)
+			ntfs_error(vi->i_sb, "write_mft_record failed, err : %d\n", err);
+	}
+	unmap_mft_record(ni);
+
+	/* Write all attached extent mft records. */
+	mutex_lock(&ni->extent_lock);
+	if (ni->nr_extents > 0) {
+		struct ntfs_inode **extent_nis =3D ni->ext.extent_ntfs_inos;
+		int i;
+
+		ntfs_debug("Writing %i extent inodes.", ni->nr_extents);
+		for (i =3D 0; i < ni->nr_extents; i++) {
+			struct ntfs_inode *tni =3D extent_nis[i];
+
+			if (NInoDirty(tni)) {
+				struct mft_record *tm;
+				int ret;
+
+				mutex_lock(&tni->mrec_lock);
+				tm =3D map_mft_record(tni);
+				if (IS_ERR(tm)) {
+					mutex_unlock(&tni->mrec_lock);
+					if (!err || err =3D=3D -ENOMEM)
+						err =3D PTR_ERR(tm);
+					continue;
+				}
+
+				ret =3D write_mft_record(tni, tm, sync);
+				unmap_mft_record(tni);
+				mutex_unlock(&tni->mrec_lock);
+
+				if (unlikely(ret)) {
+					if (!err || err =3D=3D -ENOMEM)
+						err =3D ret;
+				}
+			}
+		}
+	}
+	mutex_unlock(&ni->extent_lock);
+	mutex_unlock(&ni->mrec_lock);
+
+	if (unlikely(err))
+		goto err_out;
+	if (need_iput)
+		iput(vi);
+	ntfs_debug("Done.");
+	return 0;
+unm_err_out:
+	unmap_mft_record(ni);
+	mutex_unlock(&ni->mrec_lock);
+err_out:
+	if (err =3D=3D -ENOMEM)
+		mark_inode_dirty(vi);
+	else {
+		ntfs_error(vi->i_sb, "Failed (error %i):  Run chkdsk.", -err);
+		NVolSetErrors(ni->vol);
+	}
+	if (need_iput)
+		iput(vi);
+	return err;
+}
+
+/**
+ * ntfs_extent_inode_open - load an extent inode and attach it to its base
+ * @base_ni:	base ntfs inode
+ * @mref:	mft reference of the extent inode to load (in little endian)
+ *
+ * First check if the extent inode @mref is already attached to the base n=
tfs
+ * inode @base_ni, and if so, return a pointer to the attached extent inod=
e.
+ *
+ * If the extent inode is not already attached to the base inode, allocate=
 an
+ * ntfs_inode structure and initialize it for the given inode @mref. @mref
+ * specifies the inode number / mft record to read, including the sequence
+ * number, which can be 0 if no sequence number checking is to be performe=
d.
+ *
+ * Then, allocate a buffer for the mft record, read the mft record from the
+ * volume @base_ni->vol, and attach it to the ntfs_inode structure (->mrec=
).
+ * The mft record is mst deprotected and sanity checked for validity and we
+ * abort if deprotection or checks fail.
+ *
+ * Finally attach the ntfs inode to its base inode @base_ni and return a
+ * pointer to the ntfs_inode structure on success or NULL on error, with e=
rrno
+ * set to the error code.
+ *
+ * Note, extent inodes are never closed directly. They are automatically
+ * disposed off by the closing of the base inode.
+ */
+static struct ntfs_inode *ntfs_extent_inode_open(struct ntfs_inode *base_n=
i,
+		const __le64 mref)
+{
+	u64 mft_no =3D MREF_LE(mref);
+	struct ntfs_inode *ni =3D NULL;
+	struct ntfs_inode **extent_nis;
+	int i;
+	struct mft_record *ni_mrec;
+	struct super_block *sb;
+
+	if (!base_ni)
+		return NULL;
+
+	sb =3D base_ni->vol->sb;
+	ntfs_debug("Opening extent inode %lld (base mft record %lld).\n",
+			(unsigned long long)mft_no,
+			(unsigned long long)base_ni->mft_no);
+
+	/* Is the extent inode already open and attached to the base inode? */
+	if (base_ni->nr_extents > 0) {
+		extent_nis =3D base_ni->ext.extent_ntfs_inos;
+		for (i =3D 0; i < base_ni->nr_extents; i++) {
+			u16 seq_no;
+
+			ni =3D extent_nis[i];
+			if (mft_no !=3D ni->mft_no)
+				continue;
+			ni_mrec =3D map_mft_record(ni);
+			if (IS_ERR(ni_mrec)) {
+				ntfs_error(sb, "failed to map mft record for %lu",
+						ni->mft_no);
+				goto out;
+			}
+			/* Verify the sequence number if given. */
+			seq_no =3D MSEQNO_LE(mref);
+			if (seq_no &&
+			    seq_no !=3D le16_to_cpu(ni_mrec->sequence_number)) {
+				ntfs_error(sb, "Found stale extent mft reference mft=3D%lld",
+						(long long)ni->mft_no);
+				unmap_mft_record(ni);
+				goto out;
+			}
+			unmap_mft_record(ni);
+			goto out;
+		}
+	}
+	/* Wasn't there, we need to load the extent inode. */
+	ni =3D ntfs_new_extent_inode(base_ni->vol->sb, mft_no);
+	if (!ni)
+		goto out;
+
+	ni->seq_no =3D (u16)MSEQNO_LE(mref);
+	ni->nr_extents =3D -1;
+	ni->ext.base_ntfs_ino =3D base_ni;
+	/* Attach extent inode to base inode, reallocating memory if needed. */
+	if (!(base_ni->nr_extents & 3)) {
+		i =3D (base_ni->nr_extents + 4) * sizeof(struct ntfs_inode *);
+
+		extent_nis =3D ntfs_malloc_nofs(i);
+		if (!extent_nis)
+			goto err_out;
+		if (base_ni->nr_extents) {
+			memcpy(extent_nis, base_ni->ext.extent_ntfs_inos,
+					i - 4 * sizeof(struct ntfs_inode *));
+			ntfs_free(base_ni->ext.extent_ntfs_inos);
+		}
+		base_ni->ext.extent_ntfs_inos =3D extent_nis;
+	}
+	base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] =3D ni;
+
+out:
+	ntfs_debug("\n");
+	return ni;
+err_out:
+	ntfs_destroy_ext_inode(ni);
+	ni =3D NULL;
+	goto out;
+}
+
+/**
+ * ntfs_inode_attach_all_extents - attach all extents for target inode
+ * @ni:		opened ntfs inode for which perform attach
+ *
+ * Return 0 on success and error.
+ */
+int ntfs_inode_attach_all_extents(struct ntfs_inode *ni)
+{
+	struct attr_list_entry *ale;
+	u64 prev_attached =3D 0;
+
+	if (!ni) {
+		ntfs_debug("Invalid arguments.\n");
+		return -EINVAL;
+	}
+
+	if (NInoAttr(ni))
+		ni =3D ni->ext.base_ntfs_ino;
+
+	ntfs_debug("Entering for inode 0x%llx.\n", (long long) ni->mft_no);
+
+	/* Inode haven't got attribute list, thus nothing to attach. */
+	if (!NInoAttrList(ni))
+		return 0;
+
+	if (!ni->attr_list) {
+		ntfs_debug("Corrupt in-memory struct.\n");
+		return -EINVAL;
+	}
+
+	/* Walk through attribute list and attach all extents. */
+	ale =3D (struct attr_list_entry *)ni->attr_list;
+	while ((u8 *)ale < ni->attr_list + ni->attr_list_size) {
+		if (ni->mft_no !=3D MREF_LE(ale->mft_reference) &&
+				prev_attached !=3D MREF_LE(ale->mft_reference)) {
+			if (!ntfs_extent_inode_open(ni, ale->mft_reference)) {
+				ntfs_debug("Couldn't attach extent inode.\n");
+				return -1;
+			}
+			prev_attached =3D MREF_LE(ale->mft_reference);
+		}
+		ale =3D (struct attr_list_entry *)((u8 *)ale + le16_to_cpu(ale->length));
+	}
+	return 0;
+}
+
+/**
+ * ntfs_inode_add_attrlist - add attribute list to inode and fill it
+ * @ni: opened ntfs inode to which add attribute list
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_add_attrlist(struct ntfs_inode *ni)
+{
+	int err;
+	struct ntfs_attr_search_ctx *ctx;
+	u8 *al =3D NULL, *aln;
+	int al_len =3D 0;
+	struct attr_list_entry *ale =3D NULL;
+	struct mft_record *ni_mrec;
+	u32 attr_al_len;
+
+	if (!ni)
+		return -EINVAL;
+
+	ntfs_debug("inode %llu\n", (unsigned long long) ni->mft_no);
+
+	if (NInoAttrList(ni) || ni->nr_extents) {
+		ntfs_error(ni->vol->sb, "Inode already has attribute list");
+		return -EEXIST;
+	}
+
+	ni_mrec =3D map_mft_record(ni);
+	if (IS_ERR(ni_mrec))
+		return -EIO;
+
+	/* Form attribute list. */
+	ctx =3D ntfs_attr_get_search_ctx(ni, ni_mrec);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+	/* Walk through all attributes. */
+	while (!(err =3D ntfs_attr_lookup(AT_UNUSED, NULL, 0, 0, 0, NULL, 0, ctx)=
)) {
+		int ale_size;
+
+		if (ctx->attr->type =3D=3D AT_ATTRIBUTE_LIST) {
+			err =3D -EIO;
+			ntfs_error(ni->vol->sb, "Attribute list already present");
+			goto put_err_out;
+		}
+
+		ale_size =3D (sizeof(struct attr_list_entry) + sizeof(__le16) *
+				ctx->attr->name_length + 7) & ~7;
+		al_len +=3D ale_size;
+
+		aln =3D ntfs_realloc_nofs(al, al_len, al_len-ale_size);
+		if (!aln) {
+			err =3D -ENOMEM;
+			ntfs_error(ni->vol->sb, "Failed to realloc %d bytes", al_len);
+			goto put_err_out;
+		}
+		ale =3D (struct attr_list_entry *)(aln + ((u8 *)ale - al));
+		al =3D aln;
+
+		memset(ale, 0, ale_size);
+
+		/* Add attribute to attribute list. */
+		ale->type =3D ctx->attr->type;
+		ale->length =3D cpu_to_le16((sizeof(struct attr_list_entry) +
+					sizeof(__le16) * ctx->attr->name_length + 7) & ~7);
+		ale->name_length =3D ctx->attr->name_length;
+		ale->name_offset =3D (u8 *)ale->name - (u8 *)ale;
+		if (ctx->attr->non_resident)
+			ale->lowest_vcn =3D
+				ctx->attr->data.non_resident.lowest_vcn;
+		else
+			ale->lowest_vcn =3D 0;
+		ale->mft_reference =3D MK_LE_MREF(ni->mft_no,
+				le16_to_cpu(ni_mrec->sequence_number));
+		ale->instance =3D ctx->attr->instance;
+		memcpy(ale->name, (u8 *)ctx->attr +
+				le16_to_cpu(ctx->attr->name_offset),
+				ctx->attr->name_length * sizeof(__le16));
+		ale =3D (struct attr_list_entry *)(al + al_len);
+	}
+
+	/* Check for real error occurred. */
+	if (err !=3D -ENOENT) {
+		ntfs_error(ni->vol->sb, "%s: Attribute lookup failed, inode %lld",
+				__func__, (long long)ni->mft_no);
+		goto put_err_out;
+	}
+
+	/* Set in-memory attribute list. */
+	ni->attr_list =3D al;
+	ni->attr_list_size =3D al_len;
+	NInoSetAttrList(ni);
+
+	attr_al_len =3D offsetof(struct attr_record, data.resident.reserved) + 1 +
+		((al_len + 7) & ~7);
+	/* Free space if there is not enough it for $ATTRIBUTE_LIST. */
+	if (le32_to_cpu(ni_mrec->bytes_allocated) -
+			le32_to_cpu(ni_mrec->bytes_in_use) < attr_al_len) {
+		if (ntfs_inode_free_space(ni, (int)attr_al_len)) {
+			/* Failed to free space. */
+			err =3D -ENOSPC;
+			ntfs_error(ni->vol->sb, "Failed to free space for attrlist");
+			goto rollback;
+		}
+	}
+
+	/* Add $ATTRIBUTE_LIST to mft record. */
+	err =3D ntfs_resident_attr_record_add(ni, AT_ATTRIBUTE_LIST, AT_UNNAMED, =
0,
+					    NULL, al_len, 0);
+	if (err < 0) {
+		ntfs_error(ni->vol->sb, "Couldn't add $ATTRIBUTE_LIST to MFT");
+		goto rollback;
+	}
+
+	err =3D ntfs_attrlist_update(ni);
+	if (err < 0)
+		goto remove_attrlist_record;
+
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(ni);
+	return 0;
+
+remove_attrlist_record:
+	/* Prevent ntfs_attr_recorm_rm from freeing attribute list. */
+	ni->attr_list =3D NULL;
+	NInoClearAttrList(ni);
+	/* Remove $ATTRIBUTE_LIST record. */
+	ntfs_attr_reinit_search_ctx(ctx);
+	if (!ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0,
+				CASE_SENSITIVE, 0, NULL, 0, ctx)) {
+		if (ntfs_attr_record_rm(ctx))
+			ntfs_error(ni->vol->sb, "Rollback failed to remove attrlist");
+	} else {
+		ntfs_error(ni->vol->sb, "Rollback failed to find attrlist");
+	}
+
+	/* Setup back in-memory runlist. */
+	ni->attr_list =3D al;
+	ni->attr_list_size =3D al_len;
+	NInoSetAttrList(ni);
+rollback:
+	/*
+	 * Scan attribute list for attributes that placed not in the base MFT
+	 * record and move them to it.
+	 */
+	ntfs_attr_reinit_search_ctx(ctx);
+	ale =3D (struct attr_list_entry *)al;
+	while ((u8 *)ale < al + al_len) {
+		if (MREF_LE(ale->mft_reference) !=3D ni->mft_no) {
+			if (!ntfs_attr_lookup(ale->type, ale->name,
+						ale->name_length,
+						CASE_SENSITIVE,
+						le64_to_cpu(ale->lowest_vcn),
+						NULL, 0, ctx)) {
+				if (ntfs_attr_record_move_to(ctx, ni))
+					ntfs_error(ni->vol->sb,
+							"Rollback failed to move attribute");
+			} else {
+				ntfs_error(ni->vol->sb, "Rollback failed to find attr");
+			}
+			ntfs_attr_reinit_search_ctx(ctx);
+		}
+		ale =3D (struct attr_list_entry *)((u8 *)ale + le16_to_cpu(ale->length));
+	}
+
+	/* Remove in-memory attribute list. */
+	ni->attr_list =3D NULL;
+	ni->attr_list_size =3D 0;
+	NInoClearAttrList(ni);
+	NInoClearAttrListDirty(ni);
+put_err_out:
+	ntfs_attr_put_search_ctx(ctx);
+err_out:
+	ntfs_free(al);
+	unmap_mft_record(ni);
+	return err;
+}
+
+/**
+ * ntfs_inode_close - close an ntfs inode and free all associated memory
+ * @ni:		ntfs inode to close
+ *
+ * Make sure the ntfs inode @ni is clean.
+ *
+ * If the ntfs inode @ni is a base inode, close all associated extent inod=
es,
+ * then deallocate all memory attached to it, and finally free the ntfs in=
ode
+ * structure itself.
+ *
+ * If it is an extent inode, we disconnect it from its base inode before we
+ * destroy it.
+ *
+ * It is OK to pass NULL to this function, it is just noop in this case.
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_close(struct ntfs_inode *ni)
+{
+	int err =3D -1;
+	struct ntfs_inode **tmp_nis;
+	struct ntfs_inode *base_ni;
+	s32 i;
+
+	if (!ni)
+		return 0;
+
+	ntfs_debug("Entering for inode %lld\n", (long long)ni->mft_no);
+
+	/* Is this a base inode with mapped extent inodes? */
+	/*
+	 * If the inode is an extent inode, disconnect it from the
+	 * base inode before destroying it.
+	 */
+	base_ni =3D ni->ext.base_ntfs_ino;
+	for (i =3D 0; i < base_ni->nr_extents; ++i) {
+		tmp_nis =3D base_ni->ext.extent_ntfs_inos;
+		if (tmp_nis[i] !=3D ni)
+			continue;
+		/* Found it. Disconnect. */
+		memmove(tmp_nis + i, tmp_nis + i + 1,
+				(base_ni->nr_extents - i - 1) *
+				sizeof(struct ntfs_inode *));
+		/* Buffer should be for multiple of four extents. */
+		if ((--base_ni->nr_extents) & 3) {
+			i =3D -1;
+			break;
+		}
+		/*
+		 * ElectricFence is unhappy with realloc(x,0) as free(x)
+		 * thus we explicitly separate these two cases.
+		 */
+		if (base_ni->nr_extents) {
+			/* Resize the memory buffer. */
+			tmp_nis =3D ntfs_realloc_nofs(tmp_nis, base_ni->nr_extents *
+					sizeof(struct ntfs_inode *), base_ni->nr_extents *
+					sizeof(struct ntfs_inode *));
+			/* Ignore errors, they don't really matter. */
+			if (tmp_nis)
+				base_ni->ext.extent_ntfs_inos =3D tmp_nis;
+		} else if (tmp_nis) {
+			ntfs_free(tmp_nis);
+			base_ni->ext.extent_ntfs_inos =3D NULL;
+		}
+		/* Allow for error checking. */
+		i =3D -1;
+		break;
+	}
+
+	if (NInoDirty(ni))
+		ntfs_error(ni->vol->sb, "Releasing dirty inode %lld!\n",
+				(long long)ni->mft_no);
+	if (NInoAttrList(ni) && ni->attr_list)
+		ntfs_free(ni->attr_list);
+	ntfs_destroy_ext_inode(ni);
+	err =3D 0;
+	ntfs_debug("\n");
+	return err;
+}
+
+void ntfs_destroy_ext_inode(struct ntfs_inode *ni)
+{
+	ntfs_debug("Entering.");
+	if (ni =3D=3D NULL)
+		return;
+
+	ntfs_attr_close(ni);
+
+	if (NInoDirty(ni))
+		ntfs_error(ni->vol->sb, "Releasing dirty ext inode %lld!\n",
+				(long long)ni->mft_no);
+	if (NInoAttrList(ni) && ni->attr_list)
+		ntfs_free(ni->attr_list);
+	kfree(ni->mrec);
+	kmem_cache_free(ntfs_inode_cache, ni);
+}
+
+static struct ntfs_inode *ntfs_inode_base(struct ntfs_inode *ni)
+{
+	if (ni->nr_extents =3D=3D -1)
+		return ni->ext.base_ntfs_ino;
+	return ni;
+}
+
+static int ntfs_attr_position(__le32 type, struct ntfs_attr_search_ctx *ct=
x)
+{
+	int err;
+
+	err =3D ntfs_attr_lookup(type, NULL, 0, CASE_SENSITIVE, 0, NULL,
+				0, ctx);
+	if (err) {
+		__le32 atype;
+
+		if (err !=3D -ENOENT)
+			return err;
+
+		atype =3D ctx->attr->type;
+		if (atype =3D=3D AT_END)
+			return -ENOSPC;
+
+		/*
+		 * if ntfs_external_attr_lookup return -ENOENT, ctx->al_entry
+		 * could point to an attribute in an extent mft record, but
+		 * ctx->attr and ctx->ntfs_ino always points to an attibute in
+		 * a base mft record.
+		 */
+		if (ctx->al_entry &&
+		    MREF_LE(ctx->al_entry->mft_reference) !=3D ctx->ntfs_ino->mft_no) {
+			ntfs_attr_reinit_search_ctx(ctx);
+			err =3D ntfs_attr_lookup(atype, NULL, 0, CASE_SENSITIVE, 0, NULL,
+					       0, ctx);
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
+
+/**
+ * ntfs_inode_free_space - free space in the MFT record of inode
+ * @ni:		ntfs inode in which MFT record free space
+ * @size:	amount of space needed to free
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_free_space(struct ntfs_inode *ni, int size)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	int freed, err;
+	struct mft_record *ni_mrec;
+	struct super_block *sb;
+
+	if (!ni || size < 0)
+		return -EINVAL;
+	ntfs_debug("Entering for inode %lld, size %d\n",
+			(unsigned long long)ni->mft_no, size);
+
+	sb =3D ni->vol->sb;
+	ni_mrec =3D map_mft_record(ni);
+	if (IS_ERR(ni_mrec))
+		return -EIO;
+
+	freed =3D (le32_to_cpu(ni_mrec->bytes_allocated) -
+			le32_to_cpu(ni_mrec->bytes_in_use));
+
+	unmap_mft_record(ni);
+
+	if (size <=3D freed)
+		return 0;
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		ntfs_error(sb, "%s, Failed to get search context", __func__);
+		return -ENOMEM;
+	}
+
+	/*
+	 * Chkdsk complain if $STANDARD_INFORMATION is not in the base MFT
+	 * record.
+	 *
+	 * Also we can't move $ATTRIBUTE_LIST from base MFT_RECORD, so position
+	 * search context on first attribute after $STANDARD_INFORMATION and
+	 * $ATTRIBUTE_LIST.
+	 *
+	 * Why we reposition instead of simply skip this attributes during
+	 * enumeration? Because in case we have got only in-memory attribute
+	 * list ntfs_attr_lookup will fail when it will try to find
+	 * $ATTRIBUTE_LIST.
+	 */
+	err =3D ntfs_attr_position(AT_FILE_NAME, ctx);
+	if (err)
+		goto put_err_out;
+
+	while (1) {
+		int record_size;
+
+		/*
+		 * Check whether attribute is from different MFT record. If so,
+		 * find next, because we don't need such.
+		 */
+		while (ctx->ntfs_ino->mft_no !=3D ni->mft_no) {
+retry:
+			err =3D ntfs_attr_lookup(AT_UNUSED, NULL, 0, CASE_SENSITIVE,
+						0, NULL, 0, ctx);
+			if (err) {
+				if (err !=3D -ENOENT)
+					ntfs_error(sb, "Attr lookup failed #2");
+				else if (ctx->attr->type =3D=3D AT_END)
+					err =3D -ENOSPC;
+				else
+					err =3D 0;
+
+				if (err)
+					goto put_err_out;
+			}
+		}
+
+		if (ntfs_inode_base(ctx->ntfs_ino)->mft_no =3D=3D FILE_MFT &&
+				ctx->attr->type =3D=3D AT_DATA)
+			goto retry;
+
+		if (ctx->attr->type =3D=3D AT_INDEX_ROOT)
+			goto retry;
+
+		record_size =3D le32_to_cpu(ctx->attr->length);
+
+		/* Move away attribute. */
+		err =3D ntfs_attr_record_move_away(ctx, 0);
+		if (err) {
+			ntfs_error(sb, "Failed to move out attribute #2");
+			break;
+		}
+		freed +=3D record_size;
+
+		/* Check whether we done. */
+		if (size <=3D freed) {
+			ntfs_attr_put_search_ctx(ctx);
+			return 0;
+		}
+
+		/*
+		 * Reposition to first attribute after $STANDARD_INFORMATION and
+		 * $ATTRIBUTE_LIST (see comments upwards).
+		 */
+		ntfs_attr_reinit_search_ctx(ctx);
+		err =3D ntfs_attr_position(AT_FILE_NAME, ctx);
+		if (err)
+			break;
+	}
+put_err_out:
+	ntfs_attr_put_search_ctx(ctx);
+	if (err =3D=3D -ENOSPC)
+		ntfs_debug("No attributes left that can be moved out.\n");
+	return err;
+}
+
+s64 ntfs_inode_attr_pread(struct inode *vi, s64 pos, s64 count, u8 *buf)
+{
+	struct address_space *mapping =3D vi->i_mapping;
+	struct folio *folio;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	s64 isize;
+	u32 attr_len, total =3D 0, offset;
+	pgoff_t index;
+	int err =3D 0;
+
+	WARN_ON(!NInoAttr(ni));
+	if (!count)
+		return 0;
+
+	mutex_lock(&ni->mrec_lock);
+	isize =3D i_size_read(vi);
+	if (pos > isize) {
+		mutex_unlock(&ni->mrec_lock);
+		return -EINVAL;
+	}
+	if (pos + count > isize)
+		count =3D isize - pos;
+
+	if (!NInoNonResident(ni)) {
+		struct ntfs_attr_search_ctx *ctx;
+		u8 *attr;
+
+		ctx =3D ntfs_attr_get_search_ctx(ni->ext.base_ntfs_ino, NULL);
+		if (!ctx) {
+			ntfs_error(vi->i_sb, "Failed to get attr search ctx");
+			err =3D -ENOMEM;
+			mutex_unlock(&ni->mrec_lock);
+			goto out;
+		}
+
+		err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIV=
E,
+				       0, NULL, 0, ctx);
+		if (err) {
+			ntfs_error(vi->i_sb, "Failed to look up attr %#x", ni->type);
+			ntfs_attr_put_search_ctx(ctx);
+			mutex_unlock(&ni->mrec_lock);
+			goto out;
+		}
+
+		attr =3D (u8 *)ctx->attr + le16_to_cpu(ctx->attr->data.resident.value_of=
fset);
+		memcpy(buf, (u8 *)attr + pos, count);
+		ntfs_attr_put_search_ctx(ctx);
+		mutex_unlock(&ni->mrec_lock);
+		return count;
+	}
+	mutex_unlock(&ni->mrec_lock);
+
+	index =3D pos >> PAGE_SHIFT;
+	do {
+		/* Update @index and get the next folio. */
+		folio =3D ntfs_read_mapping_folio(mapping, index);
+		if (IS_ERR(folio))
+			break;
+
+		offset =3D offset_in_folio(folio, pos);
+		attr_len =3D min_t(size_t, (size_t)count, folio_size(folio) - offset);
+
+		folio_lock(folio);
+		memcpy_from_folio(buf, folio, offset, attr_len);
+		folio_unlock(folio);
+		folio_put(folio);
+
+		total +=3D attr_len;
+		buf +=3D attr_len;
+		pos +=3D attr_len;
+		count -=3D attr_len;
+		index++;
+	} while (count);
+out:
+	return err ? (s64)err : total;
+}
+
+static inline int ntfs_enlarge_attribute(struct inode *vi, s64 pos, s64 co=
unt,
+					 struct ntfs_attr_search_ctx *ctx)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct super_block *sb =3D vi->i_sb;
+	int ret;
+
+	if (pos + count <=3D ni->initialized_size)
+		return 0;
+
+	if (NInoEncrypted(ni) && NInoNonResident(ni))
+		return -EACCES;
+
+	if (NInoCompressed(ni))
+		return -EOPNOTSUPP;
+
+	if (pos + count > ni->data_size) {
+		if (ntfs_attr_truncate(ni, pos + count)) {
+			ntfs_debug("Failed to truncate attribute");
+			return -1;
+		}
+
+		ntfs_attr_reinit_search_ctx(ctx);
+		ret =3D ntfs_attr_lookup(ni->type,
+				       ni->name, ni->name_len, CASE_SENSITIVE,
+				       0, NULL, 0, ctx);
+		if (ret) {
+			ntfs_error(sb, "Failed to look up attr %#x", ni->type);
+			return ret;
+		}
+	}
+
+	if (!NInoNonResident(ni)) {
+		if (likely(i_size_read(vi) < ni->data_size))
+			i_size_write(vi, ni->data_size);
+		return 0;
+	}
+
+	if (pos + count > ni->initialized_size) {
+		ctx->attr->data.non_resident.initialized_size =3D cpu_to_le64(pos + coun=
t);
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		ni->initialized_size =3D pos + count;
+		if (i_size_read(vi) < ni->initialized_size)
+			i_size_write(vi, ni->initialized_size);
+	}
+	return 0;
+}
+
+static s64 __ntfs_inode_resident_attr_pwrite(struct inode *vi,
+					     s64 pos, s64 count, u8 *buf,
+					     struct ntfs_attr_search_ctx *ctx)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct folio *folio;
+	struct address_space *mapping =3D vi->i_mapping;
+	u8 *addr;
+	int err =3D 0;
+
+	WARN_ON(NInoNonResident(ni));
+	if (pos + count > PAGE_SIZE) {
+		ntfs_error(vi->i_sb, "Out of write into resident attr %#x", ni->type);
+		return -EINVAL;
+	}
+
+	/* Copy to mft record page */
+	addr =3D (u8 *)ctx->attr + le16_to_cpu(ctx->attr->data.resident.value_off=
set);
+	memcpy(addr + pos, buf, count);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+
+	/* Keep the first page clean and uptodate */
+	folio =3D __filemap_get_folio(mapping, 0, FGP_WRITEBEGIN | FGP_NOFS,
+				   mapping_gfp_mask(mapping));
+	if (IS_ERR(folio)) {
+		err =3D PTR_ERR(folio);
+		ntfs_error(vi->i_sb, "Failed to read a page 0 for attr %#x: %d",
+			   ni->type, err);
+		goto out;
+	}
+	if (!folio_test_uptodate(folio)) {
+		u32 len =3D le32_to_cpu(ctx->attr->data.resident.value_length);
+
+		memcpy_to_folio(folio, 0, addr, len);
+		folio_zero_segment(folio, offset_in_folio(folio, len),
+				   folio_size(folio) - len);
+	} else {
+		memcpy_to_folio(folio, offset_in_folio(folio, pos), buf, count);
+	}
+	folio_mark_uptodate(folio);
+	folio_unlock(folio);
+	folio_put(folio);
+out:
+	return err ? err : count;
+}
+
+static s64 __ntfs_inode_non_resident_attr_pwrite(struct inode *vi,
+						 s64 pos, s64 count, u8 *buf,
+						 struct ntfs_attr_search_ctx *ctx,
+						 bool sync)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct address_space *mapping =3D vi->i_mapping;
+	struct folio *folio;
+	pgoff_t index;
+	unsigned long offset, length;
+	size_t attr_len;
+	s64 ret =3D 0, written =3D 0;
+
+	WARN_ON(!NInoNonResident(ni));
+
+	index =3D pos >> PAGE_SHIFT;
+	while (count) {
+		folio =3D ntfs_read_mapping_folio(mapping, index);
+		if (IS_ERR(folio)) {
+			ret =3D PTR_ERR(folio);
+			ntfs_error(vi->i_sb, "Failed to read a page %lu for attr %#x: %ld",
+				   index, ni->type, PTR_ERR(folio));
+			break;
+		}
+
+		folio_lock(folio);
+		offset =3D offset_in_folio(folio, pos);
+		attr_len =3D min_t(size_t, (size_t)count, folio_size(folio) - offset);
+
+		memcpy_to_folio(folio, offset, buf, attr_len);
+
+		if (sync) {
+			struct ntfs_volume *vol =3D ni->vol;
+			s64 lcn, lcn_count;
+			unsigned int lcn_folio_off =3D 0;
+			struct bio *bio;
+			u64 rl_length =3D 0;
+			s64 vcn;
+			struct runlist_element *rl;
+
+			lcn_count =3D max_t(s64, 1, attr_len >> vol->cluster_size_bits);
+			vcn =3D (s64)folio->index << PAGE_SHIFT >> vol->cluster_size_bits;
+
+			do {
+				down_write(&ni->runlist.lock);
+				rl =3D ntfs_attr_vcn_to_rl(ni, vcn, &lcn);
+				if (IS_ERR(rl)) {
+					ret =3D PTR_ERR(rl);
+					up_write(&ni->runlist.lock);
+					goto err_unlock_folio;
+				}
+
+				rl_length =3D rl->length - (vcn - rl->vcn);
+				if (rl_length < lcn_count) {
+					lcn_count -=3D rl_length;
+				} else {
+					rl_length =3D lcn_count;
+					lcn_count =3D 0;
+				}
+				up_write(&ni->runlist.lock);
+
+				if (vol->cluster_size_bits > PAGE_SHIFT) {
+					lcn_folio_off =3D folio->index << PAGE_SHIFT;
+					lcn_folio_off &=3D vol->cluster_size_mask;
+				}
+
+				bio =3D ntfs_setup_bio(vol, REQ_OP_WRITE, lcn,
+						lcn_folio_off);
+				if (!bio) {
+					ret =3D -ENOMEM;
+					goto err_unlock_folio;
+				}
+
+				length =3D min_t(unsigned long,
+					       rl_length << vol->cluster_size_bits,
+					       folio_size(folio));
+				if (!bio_add_folio(bio, folio, length, offset)) {
+					ret =3D -EIO;
+					bio_put(bio);
+					goto err_unlock_folio;
+				}
+
+				submit_bio_wait(bio);
+				bio_put(bio);
+				vcn +=3D rl_length;
+				offset +=3D length;
+			} while (lcn_count !=3D 0);
+
+			folio_mark_uptodate(folio);
+		} else
+			folio_mark_dirty(folio);
+err_unlock_folio:
+		folio_unlock(folio);
+		folio_put(folio);
+
+		if (ret)
+			break;
+
+		written +=3D attr_len;
+		buf +=3D attr_len;
+		pos +=3D attr_len;
+		count -=3D attr_len;
+		index++;
+
+		cond_resched();
+	}
+
+	return ret ? ret : written;
+}
+
+s64 ntfs_inode_attr_pwrite(struct inode *vi, s64 pos, s64 count, u8 *buf, =
bool sync)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_attr_search_ctx *ctx;
+	s64 ret;
+
+	WARN_ON(!NInoAttr(ni));
+
+	ctx =3D ntfs_attr_get_search_ctx(ni->ext.base_ntfs_ino, NULL);
+	if (!ctx) {
+		ntfs_error(vi->i_sb, "Failed to get attr search ctx");
+		return -ENOMEM;
+	}
+
+	ret =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIVE,
+			       0, NULL, 0, ctx);
+	if (ret) {
+		ntfs_attr_put_search_ctx(ctx);
+		ntfs_error(vi->i_sb, "Failed to look up attr %#x", ni->type);
+		return ret;
+	}
+
+	mutex_lock(&ni->mrec_lock);
+	ret =3D ntfs_enlarge_attribute(vi, pos, count, ctx);
+	mutex_unlock(&ni->mrec_lock);
+	if (ret)
+		goto out;
+
+	if (NInoNonResident(ni))
+		ret =3D __ntfs_inode_non_resident_attr_pwrite(vi, pos, count, buf, ctx, =
sync);
+	else
+		ret =3D __ntfs_inode_resident_attr_pwrite(vi, pos, count, buf, ctx);
+out:
+	ntfs_attr_put_search_ctx(ctx);
+	return ret;
+}
diff --git a/fs/ntfsplus/mft.c b/fs/ntfsplus/mft.c
new file mode 100644
index 000000000000..c390e7fb98a0
--- /dev/null
+++ b/fs/ntfsplus/mft.c
@@ -0,0 +1,2698 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS kernel mft record operations. Part of the Linux-NTFS project.
+ * Part of this file is based on code from the NTFS-3G project.
+ *
+ * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/bio.h>
+
+#include "aops.h"
+#include "bitmap.h"
+#include "lcnalloc.h"
+#include "misc.h"
+#include "mft.h"
+#include "ntfs.h"
+
+/*
+ * ntfs_mft_record_check - Check the consistency of an MFT record
+ *
+ * Make sure its general fields are safe, then examine all its
+ * attributes and apply generic checks to them.
+ *
+ * Returns 0 if the checks are successful. If not, return -EIO.
+ */
+int ntfs_mft_record_check(const struct ntfs_volume *vol, struct mft_record=
 *m,
+		unsigned long mft_no)
+{
+	struct attr_record *a;
+	struct super_block *sb =3D vol->sb;
+
+	if (!ntfs_is_file_record(m->magic)) {
+		ntfs_error(sb, "Record %llu has no FILE magic (0x%x)\n",
+				(unsigned long long)mft_no, le32_to_cpu(*(__le32 *)m));
+		goto err_out;
+	}
+
+	if ((m->usa_ofs & 0x1) ||
+	    (vol->mft_record_size >> NTFS_BLOCK_SIZE_BITS) + 1 !=3D le16_to_cpu(m=
->usa_count) ||
+	    le16_to_cpu(m->usa_ofs) + le16_to_cpu(m->usa_count) * 2 > vol->mft_re=
cord_size) {
+		ntfs_error(sb, "Record %llu has corrupt fix-up values fields\n",
+				(unsigned long long)mft_no);
+		goto err_out;
+	}
+
+	if (le32_to_cpu(m->bytes_allocated) !=3D vol->mft_record_size) {
+		ntfs_error(sb, "Record %llu has corrupt allocation size (%u <> %u)\n",
+				(unsigned long long)mft_no,
+				vol->mft_record_size,
+				le32_to_cpu(m->bytes_allocated));
+		goto err_out;
+	}
+
+	if (le32_to_cpu(m->bytes_in_use) > vol->mft_record_size) {
+		ntfs_error(sb, "Record %llu has corrupt in-use size (%u > %u)\n",
+				(unsigned long long)mft_no,
+				le32_to_cpu(m->bytes_in_use),
+				vol->mft_record_size);
+		goto err_out;
+	}
+
+	if (le16_to_cpu(m->attrs_offset) & 7) {
+		ntfs_error(sb, "Attributes badly aligned in record %llu\n",
+				(unsigned long long)mft_no);
+		goto err_out;
+	}
+
+	a =3D (struct attr_record *)((char *)m + le16_to_cpu(m->attrs_offset));
+	if ((char *)a < (char *)m || (char *)a > (char *)m + vol->mft_record_size=
) {
+		ntfs_error(sb, "Record %llu is corrupt\n",
+				(unsigned long long)mft_no);
+		goto err_out;
+	}
+
+	return 0;
+
+err_out:
+	return -EIO;
+}
+
+/**
+ * map_mft_record_page - map the page in which a specific mft record resid=
es
+ * @ni:		ntfs inode whose mft record page to map
+ *
+ * This maps the page in which the mft record of the ntfs inode @ni is sit=
uated
+ * and returns a pointer to the mft record within the mapped page.
+ *
+ * Return value needs to be checked with IS_ERR() and if that is true PTR_=
ERR()
+ * contains the negative error code returned.
+ */
+static inline struct mft_record *map_mft_record_folio(struct ntfs_inode *n=
i)
+{
+	loff_t i_size;
+	struct ntfs_volume *vol =3D ni->vol;
+	struct inode *mft_vi =3D vol->mft_ino;
+	struct folio *folio;
+	unsigned long index, end_index;
+	unsigned int ofs;
+
+	WARN_ON(ni->folio);
+	/*
+	 * The index into the page cache and the offset within the page cache
+	 * page of the wanted mft record.
+	 */
+	index =3D (u64)ni->mft_no << vol->mft_record_size_bits >>
+			PAGE_SHIFT;
+	ofs =3D (ni->mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
+
+	i_size =3D i_size_read(mft_vi);
+	/* The maximum valid index into the page cache for $MFT's data. */
+	end_index =3D i_size >> PAGE_SHIFT;
+
+	/* If the wanted index is out of bounds the mft record doesn't exist. */
+	if (unlikely(index >=3D end_index)) {
+		if (index > end_index || (i_size & ~PAGE_MASK) < ofs +
+				vol->mft_record_size) {
+			folio =3D ERR_PTR(-ENOENT);
+			ntfs_error(vol->sb,
+				"Attempt to read mft record 0x%lx, which is beyond the end of the mft.=
 This is probably a bug in the ntfs driver.",
+				ni->mft_no);
+			goto err_out;
+		}
+	}
+
+	/* Read, map, and pin the folio. */
+	folio =3D ntfs_read_mapping_folio(mft_vi->i_mapping, index);
+	if (!IS_ERR(folio)) {
+		u8 *addr;
+
+		ni->mrec =3D kmalloc(vol->mft_record_size, GFP_NOFS);
+		if (!ni->mrec) {
+			ntfs_unmap_folio(folio, NULL);
+			folio =3D ERR_PTR(-ENOMEM);
+			goto err_out;
+		}
+
+		addr =3D kmap_local_folio(folio, 0);
+		memcpy(ni->mrec, addr + ofs, vol->mft_record_size);
+		post_read_mst_fixup((struct ntfs_record *)ni->mrec, vol->mft_record_size=
);
+
+		/* Catch multi sector transfer fixup errors. */
+		if (!ntfs_mft_record_check(vol, (struct mft_record *)ni->mrec, ni->mft_n=
o)) {
+			kunmap_local(addr);
+			ni->folio =3D folio;
+			ni->folio_ofs =3D ofs;
+			return ni->mrec;
+		}
+		ntfs_unmap_folio(folio, addr);
+		kfree(ni->mrec);
+		ni->mrec =3D NULL;
+		folio =3D ERR_PTR(-EIO);
+		NVolSetErrors(vol);
+	}
+err_out:
+	ni->folio =3D NULL;
+	ni->folio_ofs =3D 0;
+	return (void *)folio;
+}
+
+/**
+ * map_mft_record - map, pin and lock an mft record
+ * @ni:		ntfs inode whose MFT record to map
+ *
+ * First, take the mrec_lock mutex.  We might now be sleeping, while waiti=
ng
+ * for the mutex if it was already locked by someone else.
+ *
+ * The page of the record is mapped using map_mft_record_folio() before be=
ing
+ * returned to the caller.
+ *
+ * This in turn uses ntfs_read_mapping_folio() to get the page containing =
the wanted mft
+ * record (it in turn calls read_cache_page() which reads it in from disk =
if
+ * necessary, increments the use count on the page so that it cannot disap=
pear
+ * under us and returns a reference to the page cache page).
+ *
+ * If read_cache_page() invokes ntfs_readpage() to load the page from disk=
, it
+ * sets PG_locked and clears PG_uptodate on the page. Once I/O has complet=
ed
+ * and the post-read mst fixups on each mft record in the page have been
+ * performed, the page gets PG_uptodate set and PG_locked cleared (this is=
 done
+ * in our asynchronous I/O completion handler end_buffer_read_mft_async()).
+ * ntfs_read_mapping_folio() waits for PG_locked to become clear and check=
s if
+ * PG_uptodate is set and returns an error code if not. This provides
+ * sufficient protection against races when reading/using the page.
+ *
+ * However there is the write mapping to think about. Doing the above desc=
ribed
+ * checking here will be fine, because when initiating the write we will s=
et
+ * PG_locked and clear PG_uptodate making sure nobody is touching the page
+ * contents. Doing the locking this way means that the commit to disk code=
 in
+ * the page cache code paths is automatically sufficiently locked with us =
as
+ * we will not touch a page that has been locked or is not uptodate. The o=
nly
+ * locking problem then is them locking the page while we are accessing it.
+ *
+ * So that code will end up having to own the mrec_lock of all mft
+ * records/inodes present in the page before I/O can proceed. In that case=
 we
+ * wouldn't need to bother with PG_locked and PG_uptodate as nobody will be
+ * accessing anything without owning the mrec_lock mutex.  But we do need =
to
+ * use them because of the read_cache_page() invocation and the code becom=
es so
+ * much simpler this way that it is well worth it.
+ *
+ * The mft record is now ours and we return a pointer to it. You need to c=
heck
+ * the returned pointer with IS_ERR() and if that is true, PTR_ERR() will =
return
+ * the error code.
+ *
+ * NOTE: Caller is responsible for setting the mft record dirty before cal=
ling
+ * unmap_mft_record(). This is obviously only necessary if the caller real=
ly
+ * modified the mft record...
+ * Q: Do we want to recycle one of the VFS inode state bits instead?
+ * A: No, the inode ones mean we want to change the mft record, not we wan=
t to
+ * write it out.
+ */
+struct mft_record *map_mft_record(struct ntfs_inode *ni)
+{
+	struct mft_record *m;
+
+	if (!ni)
+		return ERR_PTR(-EINVAL);
+
+	ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
+
+	/* Make sure the ntfs inode doesn't go away. */
+	atomic_inc(&ni->count);
+
+	if (ni->folio)
+		return (struct mft_record *)ni->mrec;
+
+	m =3D map_mft_record_folio(ni);
+	if (!IS_ERR(m))
+		return m;
+
+	atomic_dec(&ni->count);
+	ntfs_error(ni->vol->sb, "Failed with error code %lu.", -PTR_ERR(m));
+	return m;
+}
+
+/**
+ * unmap_mft_record - release a mapped mft record
+ * @ni:		ntfs inode whose MFT record to unmap
+ *
+ * We release the page mapping and the mrec_lock mutex which unmaps the mft
+ * record and releases it for others to get hold of. We also release the n=
tfs
+ * inode by decrementing the ntfs inode reference count.
+ *
+ * NOTE: If caller has modified the mft record, it is imperative to set th=
e mft
+ * record dirty BEFORE calling unmap_mft_record().
+ */
+void unmap_mft_record(struct ntfs_inode *ni)
+{
+	struct folio *folio;
+
+	if (!ni)
+		return;
+
+	ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
+
+	folio =3D ni->folio;
+	if (atomic_dec_return(&ni->count) > 1)
+		return;
+	WARN_ON(!folio);
+}
+
+/**
+ * map_extent_mft_record - load an extent inode and attach it to its base
+ * @base_ni:	base ntfs inode
+ * @mref:	mft reference of the extent inode to load
+ * @ntfs_ino:	on successful return, pointer to the struct ntfs_inode struc=
ture
+ *
+ * Load the extent mft record @mref and attach it to its base inode @base_=
ni.
+ * Return the mapped extent mft record if IS_ERR(result) is false.  Otherw=
ise
+ * PTR_ERR(result) gives the negative error code.
+ *
+ * On successful return, @ntfs_ino contains a pointer to the ntfs_inode
+ * structure of the mapped extent inode.
+ */
+struct mft_record *map_extent_mft_record(struct ntfs_inode *base_ni, u64 m=
ref,
+		struct ntfs_inode **ntfs_ino)
+{
+	struct mft_record *m;
+	struct ntfs_inode *ni =3D NULL;
+	struct ntfs_inode **extent_nis =3D NULL;
+	int i;
+	unsigned long mft_no =3D MREF(mref);
+	u16 seq_no =3D MSEQNO(mref);
+	bool destroy_ni =3D false;
+
+	ntfs_debug("Mapping extent mft record 0x%lx (base mft record 0x%lx).",
+			mft_no, base_ni->mft_no);
+	/* Make sure the base ntfs inode doesn't go away. */
+	atomic_inc(&base_ni->count);
+	/*
+	 * Check if this extent inode has already been added to the base inode,
+	 * in which case just return it. If not found, add it to the base
+	 * inode before returning it.
+	 */
+	mutex_lock(&base_ni->extent_lock);
+	if (base_ni->nr_extents > 0) {
+		extent_nis =3D base_ni->ext.extent_ntfs_inos;
+		for (i =3D 0; i < base_ni->nr_extents; i++) {
+			if (mft_no !=3D extent_nis[i]->mft_no)
+				continue;
+			ni =3D extent_nis[i];
+			/* Make sure the ntfs inode doesn't go away. */
+			atomic_inc(&ni->count);
+			break;
+		}
+	}
+	if (likely(ni !=3D NULL)) {
+		mutex_unlock(&base_ni->extent_lock);
+		atomic_dec(&base_ni->count);
+		/* We found the record; just have to map and return it. */
+		m =3D map_mft_record(ni);
+		/* map_mft_record() has incremented this on success. */
+		atomic_dec(&ni->count);
+		if (!IS_ERR(m)) {
+			/* Verify the sequence number. */
+			if (likely(le16_to_cpu(m->sequence_number) =3D=3D seq_no)) {
+				ntfs_debug("Done 1.");
+				*ntfs_ino =3D ni;
+				return m;
+			}
+			unmap_mft_record(ni);
+			ntfs_error(base_ni->vol->sb,
+					"Found stale extent mft reference! Corrupt filesystem. Run chkdsk.");
+			return ERR_PTR(-EIO);
+		}
+map_err_out:
+		ntfs_error(base_ni->vol->sb,
+				"Failed to map extent mft record, error code %ld.",
+				-PTR_ERR(m));
+		return m;
+	}
+	/* Record wasn't there. Get a new ntfs inode and initialize it. */
+	ni =3D ntfs_new_extent_inode(base_ni->vol->sb, mft_no);
+	if (unlikely(!ni)) {
+		mutex_unlock(&base_ni->extent_lock);
+		atomic_dec(&base_ni->count);
+		return ERR_PTR(-ENOMEM);
+	}
+	ni->vol =3D base_ni->vol;
+	ni->seq_no =3D seq_no;
+	ni->nr_extents =3D -1;
+	ni->ext.base_ntfs_ino =3D base_ni;
+	/* Now map the record. */
+	m =3D map_mft_record(ni);
+	if (IS_ERR(m)) {
+		mutex_unlock(&base_ni->extent_lock);
+		atomic_dec(&base_ni->count);
+		ntfs_clear_extent_inode(ni);
+		goto map_err_out;
+	}
+	/* Verify the sequence number if it is present. */
+	if (seq_no && (le16_to_cpu(m->sequence_number) !=3D seq_no)) {
+		ntfs_error(base_ni->vol->sb,
+				"Found stale extent mft reference! Corrupt filesystem. Run chkdsk.");
+		destroy_ni =3D true;
+		m =3D ERR_PTR(-EIO);
+		goto unm_err_out;
+	}
+	/* Attach extent inode to base inode, reallocating memory if needed. */
+	if (!(base_ni->nr_extents & 3)) {
+		struct ntfs_inode **tmp;
+		int new_size =3D (base_ni->nr_extents + 4) * sizeof(struct ntfs_inode *);
+
+		tmp =3D ntfs_malloc_nofs(new_size);
+		if (unlikely(!tmp)) {
+			ntfs_error(base_ni->vol->sb, "Failed to allocate internal buffer.");
+			destroy_ni =3D true;
+			m =3D ERR_PTR(-ENOMEM);
+			goto unm_err_out;
+		}
+		if (base_ni->nr_extents) {
+			WARN_ON(!base_ni->ext.extent_ntfs_inos);
+			memcpy(tmp, base_ni->ext.extent_ntfs_inos, new_size -
+					4 * sizeof(struct ntfs_inode *));
+			ntfs_free(base_ni->ext.extent_ntfs_inos);
+		}
+		base_ni->ext.extent_ntfs_inos =3D tmp;
+	}
+	base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] =3D ni;
+	mutex_unlock(&base_ni->extent_lock);
+	atomic_dec(&base_ni->count);
+	ntfs_debug("Done 2.");
+	*ntfs_ino =3D ni;
+	return m;
+unm_err_out:
+	unmap_mft_record(ni);
+	mutex_unlock(&base_ni->extent_lock);
+	atomic_dec(&base_ni->count);
+	/*
+	 * If the extent inode was not attached to the base inode we need to
+	 * release it or we will leak memory.
+	 */
+	if (destroy_ni)
+		ntfs_clear_extent_inode(ni);
+	return m;
+}
+
+/**
+ * __mark_mft_record_dirty - set the mft record and the page containing it=
 dirty
+ * @ni:		ntfs inode describing the mapped mft record
+ *
+ * Internal function.  Users should call mark_mft_record_dirty() instead.
+ *
+ * Set the mapped (extent) mft record of the (base or extent) ntfs inode @=
ni,
+ * as well as the page containing the mft record, dirty.  Also, mark the b=
ase
+ * vfs inode dirty.  This ensures that any changes to the mft record are
+ * written out to disk.
+ *
+ * NOTE:  We only set I_DIRTY_DATASYNC (and not I_DIRTY_PAGES)
+ * on the base vfs inode, because even though file data may have been modi=
fied,
+ * it is dirty in the inode meta data rather than the data page cache of t=
he
+ * inode, and thus there are no data pages that need writing out.  Therefo=
re, a
+ * full mark_inode_dirty() is overkill.  A mark_inode_dirty_sync(), on the
+ * other hand, is not sufficient, because ->write_inode needs to be called=
 even
+ * in case of fdatasync. This needs to happen or the file data would not
+ * necessarily hit the device synchronously, even though the vfs inode has=
 the
+ * O_SYNC flag set.  Also, I_DIRTY_DATASYNC simply "feels" better than just
+ * I_DIRTY_SYNC, since the file data has not actually hit the block device=
 yet,
+ * which is not what I_DIRTY_SYNC on its own would suggest.
+ */
+void __mark_mft_record_dirty(struct ntfs_inode *ni)
+{
+	struct ntfs_inode *base_ni;
+
+	ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
+	WARN_ON(NInoAttr(ni));
+	/* Determine the base vfs inode and mark it dirty, too. */
+	if (likely(ni->nr_extents >=3D 0))
+		base_ni =3D ni;
+	else
+		base_ni =3D ni->ext.base_ntfs_ino;
+	__mark_inode_dirty(VFS_I(base_ni), I_DIRTY_DATASYNC);
+}
+
+/**
+ * ntfs_sync_mft_mirror - synchronize an mft record to the mft mirror
+ * @vol:	ntfs volume on which the mft record to synchronize resides
+ * @mft_no:	mft record number of mft record to synchronize
+ * @m:		mapped, mst protected (extent) mft record to synchronize
+ *
+ * Write the mapped, mst protected (extent) mft record @m with mft record
+ * number @mft_no to the mft mirror ($MFTMirr) of the ntfs volume @vol.
+ *
+ * On success return 0.  On error return -errno and set the volume errors =
flag
+ * in the ntfs volume @vol.
+ *
+ * NOTE:  We always perform synchronous i/o and ignore the @sync parameter.
+ */
+int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const unsigned long mft_=
no,
+		struct mft_record *m)
+{
+	u8 *kmirr =3D NULL;
+	struct folio *folio;
+	unsigned int folio_ofs, lcn_folio_off =3D 0;
+	int err =3D 0;
+	struct bio *bio;
+
+	ntfs_debug("Entering for inode 0x%lx.", mft_no);
+
+	if (unlikely(!vol->mftmirr_ino)) {
+		/* This could happen during umount... */
+		err =3D -EIO;
+		goto err_out;
+	}
+	/* Get the page containing the mirror copy of the mft record @m. */
+	folio =3D ntfs_read_mapping_folio(vol->mftmirr_ino->i_mapping, mft_no >>
+			(PAGE_SHIFT - vol->mft_record_size_bits));
+	if (IS_ERR(folio)) {
+		ntfs_error(vol->sb, "Failed to map mft mirror page.");
+		err =3D PTR_ERR(folio);
+		goto err_out;
+	}
+
+	folio_lock(folio);
+	folio_clear_uptodate(folio);
+	/* Offset of the mft mirror record inside the page. */
+	folio_ofs =3D (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
+	/* The address in the page of the mirror copy of the mft record @m. */
+	kmirr =3D kmap_local_folio(folio, 0) + folio_ofs;
+	/* Copy the mst protected mft record to the mirror. */
+	memcpy(kmirr, m, vol->mft_record_size);
+
+	if (vol->cluster_size_bits > PAGE_SHIFT) {
+		lcn_folio_off =3D folio->index << PAGE_SHIFT;
+		lcn_folio_off &=3D vol->cluster_size_mask;
+	}
+
+	bio =3D ntfs_setup_bio(vol, REQ_OP_WRITE, vol->mftmirr_lcn,
+			     lcn_folio_off + folio_ofs);
+	if (!bio) {
+		err =3D -ENOMEM;
+		goto unlock_folio;
+	}
+
+	if (!bio_add_folio(bio, folio, vol->mft_record_size, folio_ofs)) {
+		err =3D -EIO;
+		bio_put(bio);
+		goto unlock_folio;
+	}
+
+	submit_bio_wait(bio);
+	bio_put(bio);
+	/* Current state: all buffers are clean, unlocked, and uptodate. */
+	flush_dcache_folio(folio);
+	folio_mark_uptodate(folio);
+
+unlock_folio:
+	folio_unlock(folio);
+	ntfs_unmap_folio(folio, kmirr);
+	if (likely(!err)) {
+		ntfs_debug("Done.");
+	} else {
+		ntfs_error(vol->sb, "I/O error while writing mft mirror record 0x%lx!", =
mft_no);
+err_out:
+		ntfs_error(vol->sb,
+			"Failed to synchronize $MFTMirr (error code %i).  Volume will be left m=
arked dirty on umount.  Run chkdsk on the partition after umounting to corr=
ect this.",
+			err);
+		NVolSetErrors(vol);
+	}
+	return err;
+}
+
+/**
+ * write_mft_record_nolock - write out a mapped (extent) mft record
+ * @ni:		ntfs inode describing the mapped (extent) mft record
+ * @m:		mapped (extent) mft record to write
+ * @sync:	if true, wait for i/o completion
+ *
+ * Write the mapped (extent) mft record @m described by the (regular or ex=
tent)
+ * ntfs inode @ni to backing store.  If the mft record @m has a counterpar=
t in
+ * the mft mirror, that is also updated.
+ *
+ * We only write the mft record if the ntfs inode @ni is dirty and the fir=
st
+ * buffer belonging to its mft record is dirty, too.  We ignore the dirty =
state
+ * of subsequent buffers because we could have raced with
+ * fs/ntfs/aops.c::mark_ntfs_record_dirty().
+ *
+ * On success, clean the mft record and return 0.  On error, leave the mft
+ * record dirty and return -errno.
+ *
+ * NOTE:  We always perform synchronous i/o and ignore the @sync parameter.
+ * However, if the mft record has a counterpart in the mft mirror and @syn=
c is
+ * true, we write the mft record, wait for i/o completion, and only then w=
rite
+ * the mft mirror copy.  This ensures that if the system crashes either th=
e mft
+ * or the mft mirror will contain a self-consistent mft record @m.  If @sy=
nc is
+ * false on the other hand, we start i/o on both and then wait for complet=
ion
+ * on them.  This provides a speedup but no longer guarantees that you wil=
l end
+ * up with a self-consistent mft record in the case of a crash but if you =
asked
+ * for asynchronous writing you probably do not care about that anyway.
+ */
+int write_mft_record_nolock(struct ntfs_inode *ni, struct mft_record *m, i=
nt sync)
+{
+	struct ntfs_volume *vol =3D ni->vol;
+	struct folio *folio =3D ni->folio;
+	int err =3D 0, i =3D 0;
+	u8 *kaddr;
+	struct mft_record *fixup_m;
+	struct bio *bio;
+	unsigned int offset =3D 0, folio_size;
+
+	ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
+
+	WARN_ON(NInoAttr(ni));
+	WARN_ON(!folio_test_locked(folio));
+
+	/*
+	 * If the struct ntfs_inode is clean no need to do anything.  If it is di=
rty,
+	 * mark it as clean now so that it can be redirtied later on if needed.
+	 * There is no danger of races since the caller is holding the locks
+	 * for the mft record @m and the page it is in.
+	 */
+	if (!NInoTestClearDirty(ni))
+		goto done;
+
+	if (ni->mft_lcn[0] =3D=3D LCN_RL_NOT_MAPPED) {
+		s64 vcn;
+		struct runlist_element *rl;
+
+		vcn =3D (s64)ni->mft_no << vol->mft_record_size_bits >> vol->cluster_siz=
e_bits;
+
+		down_read(&NTFS_I(vol->mft_ino)->runlist.lock);
+		rl =3D NTFS_I(vol->mft_ino)->runlist.rl;
+
+		/* Seek to element containing target vcn. */
+		while (rl->length && rl[1].vcn <=3D vcn)
+			rl++;
+		ni->mft_lcn[0] =3D ntfs_rl_vcn_to_lcn(rl, vcn);
+		ni->mft_lcn_count++;
+
+		if (vol->cluster_size < vol->mft_record_size &&
+		    (rl->length - (vcn - rl->vcn)) <=3D 1) {
+			rl++;
+			ni->mft_lcn[1] =3D ntfs_rl_vcn_to_lcn(rl, vcn + 1);
+			ni->mft_lcn_count++;
+		}
+		up_read(&NTFS_I(vol->mft_ino)->runlist.lock);
+	}
+
+	kaddr =3D kmap_local_folio(folio, 0);
+	fixup_m =3D (struct mft_record *)(kaddr + ni->folio_ofs);
+	memcpy(fixup_m, m, vol->mft_record_size);
+
+	/* Apply the mst protection fixups. */
+	err =3D pre_write_mst_fixup((struct ntfs_record *)fixup_m, vol->mft_recor=
d_size);
+	if (err) {
+		ntfs_error(vol->sb, "Failed to apply mst fixups!");
+		goto err_out;
+	}
+
+	folio_size =3D vol->mft_record_size / ni->mft_lcn_count;
+	while (i < ni->mft_lcn_count) {
+		unsigned int clu_off;
+
+		clu_off =3D (unsigned int)((s64)ni->mft_no * vol->mft_record_size + offs=
et) &
+			vol->cluster_size_mask;
+
+		flush_dcache_folio(folio);
+
+		bio =3D ntfs_setup_bio(vol, REQ_OP_WRITE, ni->mft_lcn[i], clu_off);
+		if (!bio) {
+			err =3D -ENOMEM;
+			goto err_out;
+		}
+
+		if (!bio_add_folio(bio, folio, folio_size,
+				   ni->folio_ofs + offset)) {
+			err =3D -EIO;
+			goto put_bio_out;
+		}
+
+		/* Synchronize the mft mirror now if not @sync. */
+		if (!sync && ni->mft_no < vol->mftmirr_size)
+			ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
+
+		submit_bio_wait(bio);
+		bio_put(bio);
+		offset +=3D vol->cluster_size;
+		i++;
+	}
+
+	/* If @sync, now synchronize the mft mirror. */
+	if (sync && ni->mft_no < vol->mftmirr_size)
+		ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
+	kunmap_local(kaddr);
+	if (unlikely(err)) {
+		/* I/O error during writing.  This is really bad! */
+		ntfs_error(vol->sb,
+			"I/O error while writing mft record 0x%lx!  Marking base inode as bad. =
 You should unmount the volume and run chkdsk.",
+			ni->mft_no);
+		goto err_out;
+	}
+done:
+	ntfs_debug("Done.");
+	return 0;
+put_bio_out:
+	bio_put(bio);
+err_out:
+	/*
+	 * Current state: all buffers are clean, unlocked, and uptodate.
+	 * The caller should mark the base inode as bad so that no more i/o
+	 * happens.  ->clear_inode() will still be invoked so all extent inodes
+	 * and other allocated memory will be freed.
+	 */
+	if (err =3D=3D -ENOMEM) {
+		ntfs_error(vol->sb,
+			"Not enough memory to write mft record. Redirtying so the write is retr=
ied later.");
+		mark_mft_record_dirty(ni);
+		err =3D 0;
+	} else
+		NVolSetErrors(vol);
+	return err;
+}
+
+static int ntfs_test_inode_wb(struct inode *vi, unsigned long ino, void *d=
ata)
+{
+	struct ntfs_attr *na =3D (struct ntfs_attr *)data;
+
+	if (!ntfs_test_inode(vi, na))
+		return 0;
+
+	/*
+	 * Without this, ntfs_write_mst_block() could call iput_final()
+	 * , and ntfs_evict_big_inode() could try to unlink this inode
+	 * and the contex could be blocked infinitly in map_mft_record().
+	 */
+	if (NInoBeingDeleted(NTFS_I(vi))) {
+		na->state =3D NI_BeingDeleted;
+		return -1;
+	}
+
+	/*
+	 * This condition can prevent ntfs_write_mst_block()
+	 * from applying/undo fixups while ntfs_create() being
+	 * called
+	 */
+	spin_lock(&vi->i_lock);
+	if (vi->i_state & I_CREATING) {
+		spin_unlock(&vi->i_lock);
+		na->state =3D NI_BeingCreated;
+		return -1;
+	}
+	spin_unlock(&vi->i_lock);
+
+	return igrab(vi) ? 1 : -1;
+}
+
+/**
+ * ntfs_may_write_mft_record - check if an mft record may be written out
+ * @vol:	[IN]  ntfs volume on which the mft record to check resides
+ * @mft_no:	[IN]  mft record number of the mft record to check
+ * @m:		[IN]  mapped mft record to check
+ * @locked_ni:	[OUT] caller has to unlock this ntfs inode if one is return=
ed
+ *
+ * Check if the mapped (base or extent) mft record @m with mft record numb=
er
+ * @mft_no belonging to the ntfs volume @vol may be written out.  If neces=
sary
+ * and possible the ntfs inode of the mft record is locked and the base vfs
+ * inode is pinned.  The locked ntfs inode is then returned in @locked_ni.=
  The
+ * caller is responsible for unlocking the ntfs inode and unpinning the ba=
se
+ * vfs inode.
+ *
+ * Return 'true' if the mft record may be written out and 'false' if not.
+ *
+ * The caller has locked the page and cleared the uptodate flag on it which
+ * means that we can safely write out any dirty mft records that do not ha=
ve
+ * their inodes in icache as determined by ilookup5() as anyone
+ * opening/creating such an inode would block when attempting to map the m=
ft
+ * record in read_cache_page() until we are finished with the write out.
+ *
+ * Here is a description of the tests we perform:
+ *
+ * If the inode is found in icache we know the mft record must be a base m=
ft
+ * record.  If it is dirty, we do not write it and return 'false' as the v=
fs
+ * inode write paths will result in the access times being updated which w=
ould
+ * cause the base mft record to be redirtied and written out again.  (We k=
now
+ * the access time update will modify the base mft record because Windows
+ * chkdsk complains if the standard information attribute is not in the ba=
se
+ * mft record.)
+ *
+ * If the inode is in icache and not dirty, we attempt to lock the mft rec=
ord
+ * and if we find the lock was already taken, it is not safe to write the =
mft
+ * record and we return 'false'.
+ *
+ * If we manage to obtain the lock we have exclusive access to the mft rec=
ord,
+ * which also allows us safe writeout of the mft record.  We then set
+ * @locked_ni to the locked ntfs inode and return 'true'.
+ *
+ * Note we cannot just lock the mft record and sleep while waiting for the=
 lock
+ * because this would deadlock due to lock reversal (normally the mft reco=
rd is
+ * locked before the page is locked but we already have the page locked he=
re
+ * when we try to lock the mft record).
+ *
+ * If the inode is not in icache we need to perform further checks.
+ *
+ * If the mft record is not a FILE record or it is a base mft record, we c=
an
+ * safely write it and return 'true'.
+ *
+ * We now know the mft record is an extent mft record.  We check if the in=
ode
+ * corresponding to its base mft record is in icache and obtain a referenc=
e to
+ * it if it is.  If it is not, we can safely write it and return 'true'.
+ *
+ * We now have the base inode for the extent mft record.  We check if it h=
as an
+ * ntfs inode for the extent mft record attached and if not it is safe to =
write
+ * the extent mft record and we return 'true'.
+ *
+ * The ntfs inode for the extent mft record is attached to the base inode =
so we
+ * attempt to lock the extent mft record and if we find the lock was alrea=
dy
+ * taken, it is not safe to write the extent mft record and we return 'fal=
se'.
+ *
+ * If we manage to obtain the lock we have exclusive access to the extent =
mft
+ * record, which also allows us safe writeout of the extent mft record.  We
+ * set the ntfs inode of the extent mft record clean and then set @locked_=
ni to
+ * the now locked ntfs inode and return 'true'.
+ *
+ * Note, the reason for actually writing dirty mft records here and not ju=
st
+ * relying on the vfs inode dirty code paths is that we can have mft recor=
ds
+ * modified without them ever having actual inodes in memory.  Also we can=
 have
+ * dirty mft records with clean ntfs inodes in memory.  None of the descri=
bed
+ * cases would result in the dirty mft records being written out if we only
+ * relied on the vfs inode dirty code paths.  And these cases can really o=
ccur
+ * during allocation of new mft records and in particular when the
+ * initialized_size of the $MFT/$DATA attribute is extended and the new sp=
ace
+ * is initialized using ntfs_mft_record_format().  The clean inode can then
+ * appear if the mft record is reused for a new inode before it got written
+ * out.
+ */
+bool ntfs_may_write_mft_record(struct ntfs_volume *vol, const unsigned lon=
g mft_no,
+		const struct mft_record *m, struct ntfs_inode **locked_ni)
+{
+	struct super_block *sb =3D vol->sb;
+	struct inode *mft_vi =3D vol->mft_ino;
+	struct inode *vi;
+	struct ntfs_inode *ni, *eni, **extent_nis;
+	int i;
+	struct ntfs_attr na =3D {0};
+
+	ntfs_debug("Entering for inode 0x%lx.", mft_no);
+	/*
+	 * Normally we do not return a locked inode so set @locked_ni to NULL.
+	 */
+	*locked_ni =3D NULL;
+	/*
+	 * Check if the inode corresponding to this mft record is in the VFS
+	 * inode cache and obtain a reference to it if it is.
+	 */
+	ntfs_debug("Looking for inode 0x%lx in icache.", mft_no);
+	na.mft_no =3D mft_no;
+	na.type =3D AT_UNUSED;
+	/*
+	 * Optimize inode 0, i.e. $MFT itself, since we have it in memory and
+	 * we get here for it rather often.
+	 */
+	if (!mft_no) {
+		/* Balance the below iput(). */
+		vi =3D igrab(mft_vi);
+		WARN_ON(vi !=3D mft_vi);
+	} else {
+		/*
+		 * Have to use find_inode_nowait() since ilookup5_nowait()
+		 * waits for inode with I_FREEING, which causes ntfs to deadlock
+		 * when inodes are unlinked concurrently
+		 */
+		vi =3D find_inode_nowait(sb, mft_no, ntfs_test_inode_wb, &na);
+		if (na.state =3D=3D NI_BeingDeleted || na.state =3D=3D NI_BeingCreated)
+			return false;
+	}
+	if (vi) {
+		ntfs_debug("Base inode 0x%lx is in icache.", mft_no);
+		/* The inode is in icache. */
+		ni =3D NTFS_I(vi);
+		/* Take a reference to the ntfs inode. */
+		atomic_inc(&ni->count);
+		/* If the inode is dirty, do not write this record. */
+		if (NInoDirty(ni)) {
+			ntfs_debug("Inode 0x%lx is dirty, do not write it.",
+					mft_no);
+			atomic_dec(&ni->count);
+			iput(vi);
+			return false;
+		}
+		ntfs_debug("Inode 0x%lx is not dirty.", mft_no);
+		/* The inode is not dirty, try to take the mft record lock. */
+		if (unlikely(!mutex_trylock(&ni->mrec_lock))) {
+			ntfs_debug("Mft record 0x%lx is already locked, do not write it.", mft_=
no);
+			atomic_dec(&ni->count);
+			iput(vi);
+			return false;
+		}
+		ntfs_debug("Managed to lock mft record 0x%lx, write it.",
+				mft_no);
+		/*
+		 * The write has to occur while we hold the mft record lock so
+		 * return the locked ntfs inode.
+		 */
+		*locked_ni =3D ni;
+		return true;
+	}
+	ntfs_debug("Inode 0x%lx is not in icache.", mft_no);
+	/* The inode is not in icache. */
+	/* Write the record if it is not a mft record (type "FILE"). */
+	if (!ntfs_is_mft_record(m->magic)) {
+		ntfs_debug("Mft record 0x%lx is not a FILE record, write it.",
+				mft_no);
+		return true;
+	}
+	/* Write the mft record if it is a base inode. */
+	if (!m->base_mft_record) {
+		ntfs_debug("Mft record 0x%lx is a base record, write it.",
+				mft_no);
+		return true;
+	}
+	/*
+	 * This is an extent mft record.  Check if the inode corresponding to
+	 * its base mft record is in icache and obtain a reference to it if it
+	 * is.
+	 */
+	na.mft_no =3D MREF_LE(m->base_mft_record);
+	na.state =3D 0;
+	ntfs_debug("Mft record 0x%lx is an extent record.  Looking for base inode=
 0x%lx in icache.",
+			mft_no, na.mft_no);
+	if (!na.mft_no) {
+		/* Balance the below iput(). */
+		vi =3D igrab(mft_vi);
+		WARN_ON(vi !=3D mft_vi);
+	} else {
+		vi =3D find_inode_nowait(sb, mft_no, ntfs_test_inode_wb, &na);
+		if (na.state =3D=3D NI_BeingDeleted || na.state =3D=3D NI_BeingCreated)
+			return false;
+	}
+
+	if (!vi)
+		return false;
+	ntfs_debug("Base inode 0x%lx is in icache.", na.mft_no);
+	/*
+	 * The base inode is in icache.  Check if it has the extent inode
+	 * corresponding to this extent mft record attached.
+	 */
+	ni =3D NTFS_I(vi);
+	mutex_lock(&ni->extent_lock);
+	if (ni->nr_extents <=3D 0) {
+		/*
+		 * The base inode has no attached extent inodes, write this
+		 * extent mft record.
+		 */
+		mutex_unlock(&ni->extent_lock);
+		iput(vi);
+		ntfs_debug("Base inode 0x%lx has no attached extent inodes, write the ex=
tent record.",
+				na.mft_no);
+		return true;
+	}
+	/* Iterate over the attached extent inodes. */
+	extent_nis =3D ni->ext.extent_ntfs_inos;
+	for (eni =3D NULL, i =3D 0; i < ni->nr_extents; ++i) {
+		if (mft_no =3D=3D extent_nis[i]->mft_no) {
+			/*
+			 * Found the extent inode corresponding to this extent
+			 * mft record.
+			 */
+			eni =3D extent_nis[i];
+			break;
+		}
+	}
+	/*
+	 * If the extent inode was not attached to the base inode, write this
+	 * extent mft record.
+	 */
+	if (!eni) {
+		mutex_unlock(&ni->extent_lock);
+		iput(vi);
+		ntfs_debug("Extent inode 0x%lx is not attached to its base inode 0x%lx, =
write the extent record.",
+				mft_no, na.mft_no);
+		return true;
+	}
+	ntfs_debug("Extent inode 0x%lx is attached to its base inode 0x%lx.",
+			mft_no, na.mft_no);
+	/* Take a reference to the extent ntfs inode. */
+	atomic_inc(&eni->count);
+	mutex_unlock(&ni->extent_lock);
+
+	/* if extent inode is dirty, write_inode will write it */
+	if (NInoDirty(eni)) {
+		atomic_dec(&eni->count);
+		iput(vi);
+		return false;
+	}
+
+	/*
+	 * Found the extent inode coresponding to this extent mft record.
+	 * Try to take the mft record lock.
+	 */
+	if (unlikely(!mutex_trylock(&eni->mrec_lock))) {
+		atomic_dec(&eni->count);
+		iput(vi);
+		ntfs_debug("Extent mft record 0x%lx is already locked, do not write it.",
+				mft_no);
+		return false;
+	}
+	ntfs_debug("Managed to lock extent mft record 0x%lx, write it.",
+			mft_no);
+	/*
+	 * The write has to occur while we hold the mft record lock so return
+	 * the locked extent ntfs inode.
+	 */
+	*locked_ni =3D eni;
+	return true;
+}
+
+static const char *es =3D "  Leaving inconsistent metadata.  Unmount and r=
un chkdsk.";
+
+#define RESERVED_MFT_RECORDS	64
+
+/**
+ * ntfs_mft_bitmap_find_and_alloc_free_rec_nolock - see name
+ * @vol:	volume on which to search for a free mft record
+ * @base_ni:	open base inode if allocating an extent mft record or NULL
+ *
+ * Search for a free mft record in the mft bitmap attribute on the ntfs vo=
lume
+ * @vol.
+ *
+ * If @base_ni is NULL start the search at the default allocator position.
+ *
+ * If @base_ni is not NULL start the search at the mft record after the ba=
se
+ * mft record @base_ni.
+ *
+ * Return the free mft record on success and -errno on error.  An error co=
de of
+ * -ENOSPC means that there are no free mft records in the currently
+ * initialized mft bitmap.
+ *
+ * Locking: Caller must hold vol->mftbmp_lock for writing.
+ */
+static int ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(struct ntfs_volu=
me *vol,
+		struct ntfs_inode *base_ni)
+{
+	s64 pass_end, ll, data_pos, pass_start, ofs, bit;
+	unsigned long flags;
+	struct address_space *mftbmp_mapping;
+	u8 *buf =3D NULL, *byte;
+	struct folio *folio;
+	unsigned int folio_ofs, size;
+	u8 pass, b;
+
+	ntfs_debug("Searching for free mft record in the currently initialized mf=
t bitmap.");
+	mftbmp_mapping =3D vol->mftbmp_ino->i_mapping;
+	/*
+	 * Set the end of the pass making sure we do not overflow the mft
+	 * bitmap.
+	 */
+	read_lock_irqsave(&NTFS_I(vol->mft_ino)->size_lock, flags);
+	pass_end =3D NTFS_I(vol->mft_ino)->allocated_size >>
+			vol->mft_record_size_bits;
+	read_unlock_irqrestore(&NTFS_I(vol->mft_ino)->size_lock, flags);
+	read_lock_irqsave(&NTFS_I(vol->mftbmp_ino)->size_lock, flags);
+	ll =3D NTFS_I(vol->mftbmp_ino)->initialized_size << 3;
+	read_unlock_irqrestore(&NTFS_I(vol->mftbmp_ino)->size_lock, flags);
+	if (pass_end > ll)
+		pass_end =3D ll;
+	pass =3D 1;
+	if (!base_ni)
+		data_pos =3D vol->mft_data_pos;
+	else
+		data_pos =3D base_ni->mft_no + 1;
+	if (data_pos < RESERVED_MFT_RECORDS)
+		data_pos =3D RESERVED_MFT_RECORDS;
+	if (data_pos >=3D pass_end) {
+		data_pos =3D RESERVED_MFT_RECORDS;
+		pass =3D 2;
+		/* This happens on a freshly formatted volume. */
+		if (data_pos >=3D pass_end)
+			return -ENOSPC;
+	}
+
+	if (base_ni && base_ni->mft_no =3D=3D FILE_MFT) {
+		data_pos =3D 0;
+		pass =3D 2;
+	}
+
+	pass_start =3D data_pos;
+	ntfs_debug("Starting bitmap search: pass %u, pass_start 0x%llx, pass_end =
0x%llx, data_pos 0x%llx.",
+			pass, pass_start, pass_end, data_pos);
+	/* Loop until a free mft record is found. */
+	for (; pass <=3D 2;) {
+		/* Cap size to pass_end. */
+		ofs =3D data_pos >> 3;
+		folio_ofs =3D ofs & ~PAGE_MASK;
+		size =3D PAGE_SIZE - folio_ofs;
+		ll =3D ((pass_end + 7) >> 3) - ofs;
+		if (size > ll)
+			size =3D ll;
+		size <<=3D 3;
+		/*
+		 * If we are still within the active pass, search the next page
+		 * for a zero bit.
+		 */
+		if (size) {
+			folio =3D ntfs_read_mapping_folio(mftbmp_mapping,
+					ofs >> PAGE_SHIFT);
+			if (IS_ERR(folio)) {
+				ntfs_error(vol->sb, "Failed to read mft bitmap, aborting.");
+				return PTR_ERR(folio);
+			}
+			folio_lock(folio);
+			buf =3D (u8 *)kmap_local_folio(folio, 0) + folio_ofs;
+			bit =3D data_pos & 7;
+			data_pos &=3D ~7ull;
+			ntfs_debug("Before inner for loop: size 0x%x, data_pos 0x%llx, bit 0x%l=
lx",
+					size, data_pos, bit);
+			for (; bit < size && data_pos + bit < pass_end;
+					bit &=3D ~7ull, bit +=3D 8) {
+				/*
+				 * If we're extending $MFT and running out of the first
+				 * mft record (base record) then give up searching since
+				 * no guarantee that the found record will be accessible.
+				 */
+				if (base_ni && base_ni->mft_no =3D=3D FILE_MFT && bit > 400) {
+					folio_unlock(folio);
+					ntfs_unmap_folio(folio, buf);
+					return -ENOSPC;
+				}
+
+				byte =3D buf + (bit >> 3);
+				if (*byte =3D=3D 0xff)
+					continue;
+				b =3D ffz((unsigned long)*byte);
+				if (b < 8 && b >=3D (bit & 7)) {
+					ll =3D data_pos + (bit & ~7ull) + b;
+					if (unlikely(ll > (1ll << 32))) {
+						folio_unlock(folio);
+						ntfs_unmap_folio(folio, buf);
+						return -ENOSPC;
+					}
+					*byte |=3D 1 << b;
+					flush_dcache_folio(folio);
+					folio_mark_dirty(folio);
+					folio_unlock(folio);
+					ntfs_unmap_folio(folio, buf);
+					ntfs_debug("Done.  (Found and allocated mft record 0x%llx.)",
+							ll);
+					return ll;
+				}
+			}
+			ntfs_debug("After inner for loop: size 0x%x, data_pos 0x%llx, bit 0x%ll=
x",
+					size, data_pos, bit);
+			data_pos +=3D size;
+			folio_unlock(folio);
+			ntfs_unmap_folio(folio, buf);
+			/*
+			 * If the end of the pass has not been reached yet,
+			 * continue searching the mft bitmap for a zero bit.
+			 */
+			if (data_pos < pass_end)
+				continue;
+		}
+		/* Do the next pass. */
+		if (++pass =3D=3D 2) {
+			/*
+			 * Starting the second pass, in which we scan the first
+			 * part of the zone which we omitted earlier.
+			 */
+			pass_end =3D pass_start;
+			data_pos =3D pass_start =3D RESERVED_MFT_RECORDS;
+			ntfs_debug("pass %i, pass_start 0x%llx, pass_end 0x%llx.",
+					pass, pass_start, pass_end);
+			if (data_pos >=3D pass_end)
+				break;
+		}
+	}
+	/* No free mft records in currently initialized mft bitmap. */
+	ntfs_debug("Done.  (No free mft records left in currently initialized mft=
 bitmap.)");
+	return -ENOSPC;
+}
+
+static int ntfs_mft_attr_extend(struct ntfs_inode *ni)
+{
+	int ret =3D 0;
+	struct ntfs_inode *base_ni;
+
+	if (NInoAttr(ni))
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+
+	if (!NInoAttrList(base_ni)) {
+		ret =3D ntfs_inode_add_attrlist(base_ni);
+		if (ret) {
+			pr_err("Can not add attrlist\n");
+			goto out;
+		} else {
+			ret =3D -EAGAIN;
+			goto out;
+		}
+	}
+
+	ret =3D ntfs_attr_update_mapping_pairs(ni, 0);
+	if (ret)
+		pr_err("MP update failed\n");
+
+out:
+	return ret;
+}
+
+/**
+ * ntfs_mft_bitmap_extend_allocation_nolock - extend mft bitmap by a clust=
er
+ * @vol:	volume on which to extend the mft bitmap attribute
+ *
+ * Extend the mft bitmap attribute on the ntfs volume @vol by one cluster.
+ *
+ * Note: Only changes allocated_size, i.e. does not touch initialized_size=
 or
+ * data_size.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: - Caller must hold vol->mftbmp_lock for writing.
+ *	    - This function takes NTFS_I(vol->mftbmp_ino)->runlist.lock for
+ *	      writing and releases it before returning.
+ *	    - This function takes vol->lcnbmp_lock for writing and releases it
+ *	      before returning.
+ */
+static int ntfs_mft_bitmap_extend_allocation_nolock(struct ntfs_volume *vo=
l)
+{
+	s64 lcn;
+	s64 ll;
+	unsigned long flags;
+	struct folio *folio;
+	struct ntfs_inode *mft_ni, *mftbmp_ni;
+	struct runlist_element *rl, *rl2 =3D NULL;
+	struct ntfs_attr_search_ctx *ctx =3D NULL;
+	struct mft_record *mrec;
+	struct attr_record *a =3D NULL;
+	int ret, mp_size;
+	u32 old_alen =3D 0;
+	u8 *b, tb;
+	struct {
+		u8 added_cluster:1;
+		u8 added_run:1;
+		u8 mp_rebuilt:1;
+		u8 mp_extended:1;
+	} status =3D { 0, 0, 0, 0 };
+	size_t new_rl_count;
+
+	ntfs_debug("Extending mft bitmap allocation.");
+	mft_ni =3D NTFS_I(vol->mft_ino);
+	mftbmp_ni =3D NTFS_I(vol->mftbmp_ino);
+	/*
+	 * Determine the last lcn of the mft bitmap.  The allocated size of the
+	 * mft bitmap cannot be zero so we are ok to do this.
+	 */
+	down_write(&mftbmp_ni->runlist.lock);
+	read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	ll =3D mftbmp_ni->allocated_size;
+	read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	rl =3D ntfs_attr_find_vcn_nolock(mftbmp_ni,
+			(ll - 1) >> vol->cluster_size_bits, NULL);
+	if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) {
+		up_write(&mftbmp_ni->runlist.lock);
+		ntfs_error(vol->sb,
+			"Failed to determine last allocated cluster of mft bitmap attribute.");
+		if (!IS_ERR(rl))
+			ret =3D -EIO;
+		else
+			ret =3D PTR_ERR(rl);
+		return ret;
+	}
+	lcn =3D rl->lcn + rl->length;
+	ntfs_debug("Last lcn of mft bitmap attribute is 0x%llx.",
+			(long long)lcn);
+	/*
+	 * Attempt to get the cluster following the last allocated cluster by
+	 * hand as it may be in the MFT zone so the allocator would not give it
+	 * to us.
+	 */
+	ll =3D lcn >> 3;
+	folio =3D ntfs_read_mapping_folio(vol->lcnbmp_ino->i_mapping,
+			ll >> PAGE_SHIFT);
+	if (IS_ERR(folio)) {
+		up_write(&mftbmp_ni->runlist.lock);
+		ntfs_error(vol->sb, "Failed to read from lcn bitmap.");
+		return PTR_ERR(folio);
+	}
+
+	down_write(&vol->lcnbmp_lock);
+	folio_lock(folio);
+	b =3D (u8 *)kmap_local_folio(folio, 0) + (ll & ~PAGE_MASK);
+	tb =3D 1 << (lcn & 7ull);
+	if (*b !=3D 0xff && !(*b & tb)) {
+		/* Next cluster is free, allocate it. */
+		*b |=3D tb;
+		flush_dcache_folio(folio);
+		folio_mark_dirty(folio);
+		folio_unlock(folio);
+		ntfs_unmap_folio(folio, b);
+		up_write(&vol->lcnbmp_lock);
+		/* Update the mft bitmap runlist. */
+		rl->length++;
+		rl[1].vcn++;
+		status.added_cluster =3D 1;
+		ntfs_debug("Appending one cluster to mft bitmap.");
+	} else {
+		folio_unlock(folio);
+		ntfs_unmap_folio(folio, b);
+		up_write(&vol->lcnbmp_lock);
+		/* Allocate a cluster from the DATA_ZONE. */
+		rl2 =3D ntfs_cluster_alloc(vol, rl[1].vcn, 1, lcn, DATA_ZONE,
+				true, false, false);
+		if (IS_ERR(rl2)) {
+			up_write(&mftbmp_ni->runlist.lock);
+			ntfs_error(vol->sb,
+					"Failed to allocate a cluster for the mft bitmap.");
+			return PTR_ERR(rl2);
+		}
+		rl =3D ntfs_runlists_merge(&mftbmp_ni->runlist, rl2, 0, &new_rl_count);
+		if (IS_ERR(rl)) {
+			up_write(&mftbmp_ni->runlist.lock);
+			ntfs_error(vol->sb, "Failed to merge runlists for mft bitmap.");
+			if (ntfs_cluster_free_from_rl(vol, rl2)) {
+				ntfs_error(vol->sb, "Failed to deallocate allocated cluster.%s",
+						es);
+				NVolSetErrors(vol);
+			}
+			ntfs_free(rl2);
+			return PTR_ERR(rl);
+		}
+		mftbmp_ni->runlist.rl =3D rl;
+		mftbmp_ni->runlist.count =3D new_rl_count;
+		status.added_run =3D 1;
+		ntfs_debug("Adding one run to mft bitmap.");
+		/* Find the last run in the new runlist. */
+		for (; rl[1].length; rl++)
+			;
+	}
+	/*
+	 * Update the attribute record as well.  Note: @rl is the last
+	 * (non-terminator) runlist element of mft bitmap.
+	 */
+	mrec =3D map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		ret =3D PTR_ERR(mrec);
+		goto undo_alloc;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		ret =3D -ENOMEM;
+		goto undo_alloc;
+	}
+	ret =3D ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
+			0, ctx);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb,
+			"Failed to find last attribute extent of mft bitmap attribute.");
+		if (ret =3D=3D -ENOENT)
+			ret =3D -EIO;
+		goto undo_alloc;
+	}
+	a =3D ctx->attr;
+	ll =3D le64_to_cpu(a->data.non_resident.lowest_vcn);
+	/* Search back for the previous last allocated cluster of mft bitmap. */
+	for (rl2 =3D rl; rl2 > mftbmp_ni->runlist.rl; rl2--) {
+		if (ll >=3D rl2->vcn)
+			break;
+	}
+	WARN_ON(ll < rl2->vcn);
+	WARN_ON(ll >=3D rl2->vcn + rl2->length);
+	/* Get the size for the new mapping pairs array for this extent. */
+	mp_size =3D ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1, -1);
+	if (unlikely(mp_size <=3D 0)) {
+		ntfs_error(vol->sb,
+			"Get size for mapping pairs failed for mft bitmap attribute extent.");
+		ret =3D mp_size;
+		if (!ret)
+			ret =3D -EIO;
+		goto undo_alloc;
+	}
+	/* Expand the attribute record if necessary. */
+	old_alen =3D le32_to_cpu(a->length);
+	ret =3D ntfs_attr_record_resize(ctx->mrec, a, mp_size +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
+	if (unlikely(ret)) {
+		ret =3D ntfs_mft_attr_extend(mftbmp_ni);
+		if (!ret)
+			goto extended_ok;
+		status.mp_extended =3D 1;
+		goto undo_alloc;
+	}
+	status.mp_rebuilt =3D 1;
+	/* Generate the mapping pairs array directly into the attr record. */
+	ret =3D ntfs_mapping_pairs_build(vol, (u8 *)a +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
+			mp_size, rl2, ll, -1, NULL, NULL, NULL);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb,
+			"Failed to build mapping pairs array for mft bitmap attribute.");
+		goto undo_alloc;
+	}
+	/* Update the highest_vcn. */
+	a->data.non_resident.highest_vcn =3D cpu_to_le64(rl[1].vcn - 1);
+	/*
+	 * We now have extended the mft bitmap allocated_size by one cluster.
+	 * Reflect this in the struct ntfs_inode structure and the attribute reco=
rd.
+	 */
+	if (a->data.non_resident.lowest_vcn) {
+		/*
+		 * We are not in the first attribute extent, switch to it, but
+		 * first ensure the changes will make it to disk later.
+		 */
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		ntfs_attr_reinit_search_ctx(ctx);
+		ret =3D ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+				mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL,
+				0, ctx);
+		if (unlikely(ret)) {
+			ntfs_error(vol->sb,
+				"Failed to find first attribute extent of mft bitmap attribute.");
+			goto restore_undo_alloc;
+		}
+		a =3D ctx->attr;
+	}
+
+extended_ok:
+	write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	mftbmp_ni->allocated_size +=3D vol->cluster_size;
+	a->data.non_resident.allocated_size =3D
+			cpu_to_le64(mftbmp_ni->allocated_size);
+	write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	/* Ensure the changes make it to disk. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	up_write(&mftbmp_ni->runlist.lock);
+	ntfs_debug("Done.");
+	return 0;
+
+restore_undo_alloc:
+	ntfs_attr_reinit_search_ctx(ctx);
+	if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
+			0, ctx)) {
+		ntfs_error(vol->sb,
+			"Failed to find last attribute extent of mft bitmap attribute.%s", es);
+		write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+		mftbmp_ni->allocated_size +=3D vol->cluster_size;
+		write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(mft_ni);
+		up_write(&mftbmp_ni->runlist.lock);
+		/*
+		 * The only thing that is now wrong is ->allocated_size of the
+		 * base attribute extent which chkdsk should be able to fix.
+		 */
+		NVolSetErrors(vol);
+		return ret;
+	}
+	a =3D ctx->attr;
+	a->data.non_resident.highest_vcn =3D cpu_to_le64(rl[1].vcn - 2);
+undo_alloc:
+	if (status.added_cluster) {
+		/* Truncate the last run in the runlist by one cluster. */
+		rl->length--;
+		rl[1].vcn--;
+	} else if (status.added_run) {
+		lcn =3D rl->lcn;
+		/* Remove the last run from the runlist. */
+		rl->lcn =3D rl[1].lcn;
+		rl->length =3D 0;
+		mftbmp_ni->runlist.count--;
+	}
+	/* Deallocate the cluster. */
+	down_write(&vol->lcnbmp_lock);
+	if (ntfs_bitmap_clear_bit(vol->lcnbmp_ino, lcn)) {
+		ntfs_error(vol->sb, "Failed to free allocated cluster.%s", es);
+		NVolSetErrors(vol);
+	} else
+		ntfs_inc_free_clusters(vol, 1);
+	up_write(&vol->lcnbmp_lock);
+	if (status.mp_rebuilt) {
+		if (ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset),
+				old_alen - le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset),
+				rl2, ll, -1, NULL, NULL, NULL)) {
+			ntfs_error(vol->sb, "Failed to restore mapping pairs array.%s", es);
+			NVolSetErrors(vol);
+		}
+		if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
+			ntfs_error(vol->sb, "Failed to restore attribute record.%s", es);
+			NVolSetErrors(vol);
+		}
+		mark_mft_record_dirty(ctx->ntfs_ino);
+	} else if (status.mp_extended && ntfs_attr_update_mapping_pairs(mftbmp_ni=
, 0)) {
+		ntfs_error(vol->sb, "Failed to restore mapping pairs.%s", es);
+		NVolSetErrors(vol);
+	}
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	if (!IS_ERR(mrec))
+		unmap_mft_record(mft_ni);
+	up_write(&mftbmp_ni->runlist.lock);
+	return ret;
+}
+
+/**
+ * ntfs_mft_bitmap_extend_initialized_nolock - extend mftbmp initialized d=
ata
+ * @vol:	volume on which to extend the mft bitmap attribute
+ *
+ * Extend the initialized portion of the mft bitmap attribute on the ntfs
+ * volume @vol by 8 bytes.
+ *
+ * Note:  Only changes initialized_size and data_size, i.e. requires that
+ * allocated_size is big enough to fit the new initialized_size.
+ *
+ * Return 0 on success and -error on error.
+ *
+ * Locking: Caller must hold vol->mftbmp_lock for writing.
+ */
+static int ntfs_mft_bitmap_extend_initialized_nolock(struct ntfs_volume *v=
ol)
+{
+	s64 old_data_size, old_initialized_size;
+	unsigned long flags;
+	struct inode *mftbmp_vi;
+	struct ntfs_inode *mft_ni, *mftbmp_ni;
+	struct ntfs_attr_search_ctx *ctx;
+	struct mft_record *mrec;
+	struct attr_record *a;
+	int ret;
+
+	ntfs_debug("Extending mft bitmap initiailized (and data) size.");
+	mft_ni =3D NTFS_I(vol->mft_ino);
+	mftbmp_vi =3D vol->mftbmp_ino;
+	mftbmp_ni =3D NTFS_I(mftbmp_vi);
+	/* Get the attribute record. */
+	mrec =3D map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		return PTR_ERR(mrec);
+	}
+	ctx =3D ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		ret =3D -ENOMEM;
+		goto unm_err_out;
+	}
+	ret =3D ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb,
+			"Failed to find first attribute extent of mft bitmap attribute.");
+		if (ret =3D=3D -ENOENT)
+			ret =3D -EIO;
+		goto put_err_out;
+	}
+	a =3D ctx->attr;
+	write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	old_data_size =3D i_size_read(mftbmp_vi);
+	old_initialized_size =3D mftbmp_ni->initialized_size;
+	/*
+	 * We can simply update the initialized_size before filling the space
+	 * with zeroes because the caller is holding the mft bitmap lock for
+	 * writing which ensures that no one else is trying to access the data.
+	 */
+	mftbmp_ni->initialized_size +=3D 8;
+	a->data.non_resident.initialized_size =3D
+			cpu_to_le64(mftbmp_ni->initialized_size);
+	if (mftbmp_ni->initialized_size > old_data_size) {
+		i_size_write(mftbmp_vi, mftbmp_ni->initialized_size);
+		a->data.non_resident.data_size =3D
+				cpu_to_le64(mftbmp_ni->initialized_size);
+	}
+	write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	/* Ensure the changes make it to disk. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	/* Initialize the mft bitmap attribute value with zeroes. */
+	ret =3D ntfs_attr_set(mftbmp_ni, old_initialized_size, 8, 0);
+	if (likely(!ret)) {
+		ntfs_debug("Done.  (Wrote eight initialized bytes to mft bitmap.");
+		ntfs_inc_free_mft_records(vol, 8 * 8);
+		return 0;
+	}
+	ntfs_error(vol->sb, "Failed to write to mft bitmap.");
+	/* Try to recover from the error. */
+	mrec =3D map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.%s", es);
+		NVolSetErrors(vol);
+		return ret;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.%s", es);
+		NVolSetErrors(vol);
+		goto unm_err_out;
+	}
+	if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx)) {
+		ntfs_error(vol->sb,
+			"Failed to find first attribute extent of mft bitmap attribute.%s", es);
+		NVolSetErrors(vol);
+put_err_out:
+		ntfs_attr_put_search_ctx(ctx);
+unm_err_out:
+		unmap_mft_record(mft_ni);
+		goto err_out;
+	}
+	a =3D ctx->attr;
+	write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	mftbmp_ni->initialized_size =3D old_initialized_size;
+	a->data.non_resident.initialized_size =3D
+			cpu_to_le64(old_initialized_size);
+	if (i_size_read(mftbmp_vi) !=3D old_data_size) {
+		i_size_write(mftbmp_vi, old_data_size);
+		a->data.non_resident.data_size =3D cpu_to_le64(old_data_size);
+	}
+	write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+#ifdef DEBUG
+	read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	ntfs_debug("Restored status of mftbmp: allocated_size 0x%llx, data_size 0=
x%llx, initialized_size 0x%llx.",
+			mftbmp_ni->allocated_size, i_size_read(mftbmp_vi),
+			mftbmp_ni->initialized_size);
+	read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+#endif /* DEBUG */
+err_out:
+	return ret;
+}
+
+/**
+ * ntfs_mft_data_extend_allocation_nolock - extend mft data attribute
+ * @vol:	volume on which to extend the mft data attribute
+ *
+ * Extend the mft data attribute on the ntfs volume @vol by 16 mft records
+ * worth of clusters or if not enough space for this by one mft record wor=
th
+ * of clusters.
+ *
+ * Note:  Only changes allocated_size, i.e. does not touch initialized_siz=
e or
+ * data_size.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: - Caller must hold vol->mftbmp_lock for writing.
+ *	    - This function takes NTFS_I(vol->mft_ino)->runlist.lock for
+ *	      writing and releases it before returning.
+ *	    - This function calls functions which take vol->lcnbmp_lock for
+ *	      writing and release it before returning.
+ */
+static int ntfs_mft_data_extend_allocation_nolock(struct ntfs_volume *vol)
+{
+	s64 lcn;
+	s64 old_last_vcn;
+	s64 min_nr, nr, ll;
+	unsigned long flags;
+	struct ntfs_inode *mft_ni;
+	struct runlist_element *rl, *rl2;
+	struct ntfs_attr_search_ctx *ctx =3D NULL;
+	struct mft_record *mrec;
+	struct attr_record *a =3D NULL;
+	int ret, mp_size;
+	u32 old_alen =3D 0;
+	bool mp_rebuilt =3D false, mp_extended =3D false;
+	size_t new_rl_count;
+
+	ntfs_debug("Extending mft data allocation.");
+	mft_ni =3D NTFS_I(vol->mft_ino);
+	/*
+	 * Determine the preferred allocation location, i.e. the last lcn of
+	 * the mft data attribute.  The allocated size of the mft data
+	 * attribute cannot be zero so we are ok to do this.
+	 */
+	down_write(&mft_ni->runlist.lock);
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	ll =3D mft_ni->allocated_size;
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	rl =3D ntfs_attr_find_vcn_nolock(mft_ni,
+			(ll - 1) >> vol->cluster_size_bits, NULL);
+	if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) {
+		up_write(&mft_ni->runlist.lock);
+		ntfs_error(vol->sb,
+			"Failed to determine last allocated cluster of mft data attribute.");
+		if (!IS_ERR(rl))
+			ret =3D -EIO;
+		else
+			ret =3D PTR_ERR(rl);
+		return ret;
+	}
+	lcn =3D rl->lcn + rl->length;
+	ntfs_debug("Last lcn of mft data attribute is 0x%llx.", lcn);
+	/* Minimum allocation is one mft record worth of clusters. */
+	min_nr =3D vol->mft_record_size >> vol->cluster_size_bits;
+	if (!min_nr)
+		min_nr =3D 1;
+	/* Want to allocate 16 mft records worth of clusters. */
+	nr =3D vol->mft_record_size << 4 >> vol->cluster_size_bits;
+	if (!nr)
+		nr =3D min_nr;
+	/* Ensure we do not go above 2^32-1 mft records. */
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	ll =3D mft_ni->allocated_size;
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	if (unlikely((ll + (nr << vol->cluster_size_bits)) >>
+			vol->mft_record_size_bits >=3D (1ll << 32))) {
+		nr =3D min_nr;
+		if (unlikely((ll + (nr << vol->cluster_size_bits)) >>
+				vol->mft_record_size_bits >=3D (1ll << 32))) {
+			ntfs_warning(vol->sb,
+				"Cannot allocate mft record because the maximum number of inodes (2^32=
) has already been reached.");
+			up_write(&mft_ni->runlist.lock);
+			return -ENOSPC;
+		}
+	}
+	ntfs_debug("Trying mft data allocation with %s cluster count %lli.",
+			nr > min_nr ? "default" : "minimal", (long long)nr);
+	old_last_vcn =3D rl[1].vcn;
+	/*
+	 * We can release the mft_ni runlist lock, Because this function is
+	 * the only one that expends $MFT data attribute and is called with
+	 * mft_ni->mrec_lock.
+	 * This is required for the lock order, vol->lcnbmp_lock =3D>
+	 * mft_ni->runlist.lock.
+	 */
+	up_write(&mft_ni->runlist.lock);
+
+	do {
+		rl2 =3D ntfs_cluster_alloc(vol, old_last_vcn, nr, lcn, MFT_ZONE,
+				true, false, false);
+		if (!IS_ERR(rl2))
+			break;
+		if (PTR_ERR(rl2) !=3D -ENOSPC || nr =3D=3D min_nr) {
+			ntfs_error(vol->sb,
+				"Failed to allocate the minimal number of clusters (%lli) for the mft =
data attribute.",
+				nr);
+			return PTR_ERR(rl2);
+		}
+		/*
+		 * There is not enough space to do the allocation, but there
+		 * might be enough space to do a minimal allocation so try that
+		 * before failing.
+		 */
+		nr =3D min_nr;
+		ntfs_debug("Retrying mft data allocation with minimal cluster count %lli=
.", nr);
+	} while (1);
+
+	down_write(&mft_ni->runlist.lock);
+	rl =3D ntfs_runlists_merge(&mft_ni->runlist, rl2, 0, &new_rl_count);
+	if (IS_ERR(rl)) {
+		up_write(&mft_ni->runlist.lock);
+		ntfs_error(vol->sb, "Failed to merge runlists for mft data attribute.");
+		if (ntfs_cluster_free_from_rl(vol, rl2)) {
+			ntfs_error(vol->sb,
+				"Failed to deallocate clusters from the mft data attribute.%s", es);
+			NVolSetErrors(vol);
+		}
+		ntfs_free(rl2);
+		return PTR_ERR(rl);
+	}
+	mft_ni->runlist.rl =3D rl;
+	mft_ni->runlist.count =3D new_rl_count;
+	ntfs_debug("Allocated %lli clusters.", (long long)nr);
+	/* Find the last run in the new runlist. */
+	for (; rl[1].length; rl++)
+		;
+	up_write(&mft_ni->runlist.lock);
+
+	/* Update the attribute record as well. */
+	mrec =3D map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		ret =3D PTR_ERR(mrec);
+		down_write(&mft_ni->runlist.lock);
+		goto undo_alloc;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		ret =3D -ENOMEM;
+		goto undo_alloc;
+	}
+	ret =3D ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+			CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb, "Failed to find last attribute extent of mft data at=
tribute.");
+		if (ret =3D=3D -ENOENT)
+			ret =3D -EIO;
+		goto undo_alloc;
+	}
+	a =3D ctx->attr;
+	ll =3D le64_to_cpu(a->data.non_resident.lowest_vcn);
+
+	down_write(&mft_ni->runlist.lock);
+	/* Search back for the previous last allocated cluster of mft bitmap. */
+	for (rl2 =3D rl; rl2 > mft_ni->runlist.rl; rl2--) {
+		if (ll >=3D rl2->vcn)
+			break;
+	}
+	WARN_ON(ll < rl2->vcn);
+	WARN_ON(ll >=3D rl2->vcn + rl2->length);
+	/* Get the size for the new mapping pairs array for this extent. */
+	mp_size =3D ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1, -1);
+	if (unlikely(mp_size <=3D 0)) {
+		ntfs_error(vol->sb,
+			"Get size for mapping pairs failed for mft data attribute extent.");
+		ret =3D mp_size;
+		if (!ret)
+			ret =3D -EIO;
+		up_write(&mft_ni->runlist.lock);
+		goto undo_alloc;
+	}
+	up_write(&mft_ni->runlist.lock);
+
+	/* Expand the attribute record if necessary. */
+	old_alen =3D le32_to_cpu(a->length);
+	ret =3D ntfs_attr_record_resize(ctx->mrec, a, mp_size +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
+	if (unlikely(ret)) {
+		ret =3D ntfs_mft_attr_extend(mft_ni);
+		if (!ret)
+			goto extended_ok;
+		mp_extended =3D true;
+		goto undo_alloc;
+	}
+	mp_rebuilt =3D true;
+	/* Generate the mapping pairs array directly into the attr record. */
+	ret =3D ntfs_mapping_pairs_build(vol, (u8 *)a +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
+			mp_size, rl2, ll, -1, NULL, NULL, NULL);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb, "Failed to build mapping pairs array of mft data att=
ribute.");
+		goto undo_alloc;
+	}
+	/* Update the highest_vcn. */
+	a->data.non_resident.highest_vcn =3D cpu_to_le64(rl[1].vcn - 1);
+	/*
+	 * We now have extended the mft data allocated_size by nr clusters.
+	 * Reflect this in the struct ntfs_inode structure and the attribute reco=
rd.
+	 * @rl is the last (non-terminator) runlist element of mft data
+	 * attribute.
+	 */
+	if (a->data.non_resident.lowest_vcn) {
+		/*
+		 * We are not in the first attribute extent, switch to it, but
+		 * first ensure the changes will make it to disk later.
+		 */
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		ntfs_attr_reinit_search_ctx(ctx);
+		ret =3D ntfs_attr_lookup(mft_ni->type, mft_ni->name,
+				mft_ni->name_len, CASE_SENSITIVE, 0, NULL, 0,
+				ctx);
+		if (unlikely(ret)) {
+			ntfs_error(vol->sb,
+				"Failed to find first attribute extent of mft data attribute.");
+			goto restore_undo_alloc;
+		}
+		a =3D ctx->attr;
+	}
+
+extended_ok:
+	write_lock_irqsave(&mft_ni->size_lock, flags);
+	mft_ni->allocated_size +=3D nr << vol->cluster_size_bits;
+	a->data.non_resident.allocated_size =3D
+			cpu_to_le64(mft_ni->allocated_size);
+	write_unlock_irqrestore(&mft_ni->size_lock, flags);
+	/* Ensure the changes make it to disk. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	ntfs_debug("Done.");
+	return 0;
+restore_undo_alloc:
+	ntfs_attr_reinit_search_ctx(ctx);
+	if (ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+			CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx)) {
+		ntfs_error(vol->sb,
+			"Failed to find last attribute extent of mft data attribute.%s", es);
+		write_lock_irqsave(&mft_ni->size_lock, flags);
+		mft_ni->allocated_size +=3D nr << vol->cluster_size_bits;
+		write_unlock_irqrestore(&mft_ni->size_lock, flags);
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(mft_ni);
+		up_write(&mft_ni->runlist.lock);
+		/*
+		 * The only thing that is now wrong is ->allocated_size of the
+		 * base attribute extent which chkdsk should be able to fix.
+		 */
+		NVolSetErrors(vol);
+		return ret;
+	}
+	ctx->attr->data.non_resident.highest_vcn =3D
+			cpu_to_le64(old_last_vcn - 1);
+undo_alloc:
+	if (ntfs_cluster_free(mft_ni, old_last_vcn, -1, ctx) < 0) {
+		ntfs_error(vol->sb, "Failed to free clusters from mft data attribute.%s"=
, es);
+		NVolSetErrors(vol);
+	}
+
+	if (ntfs_rl_truncate_nolock(vol, &mft_ni->runlist, old_last_vcn)) {
+		ntfs_error(vol->sb, "Failed to truncate mft data attribute runlist.%s", =
es);
+		NVolSetErrors(vol);
+	}
+	if (mp_extended && ntfs_attr_update_mapping_pairs(mft_ni, 0)) {
+		ntfs_error(vol->sb, "Failed to restore mapping pairs.%s",
+			   es);
+		NVolSetErrors(vol);
+	}
+	if (ctx) {
+		a =3D ctx->attr;
+		if (mp_rebuilt && !IS_ERR(ctx->mrec)) {
+			if (ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset),
+				old_alen - le16_to_cpu(
+					a->data.non_resident.mapping_pairs_offset),
+				rl2, ll, -1, NULL, NULL, NULL)) {
+				ntfs_error(vol->sb, "Failed to restore mapping pairs array.%s", es);
+				NVolSetErrors(vol);
+			}
+			if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
+				ntfs_error(vol->sb, "Failed to restore attribute record.%s", es);
+				NVolSetErrors(vol);
+			}
+			mark_mft_record_dirty(ctx->ntfs_ino);
+		} else if (IS_ERR(ctx->mrec)) {
+			ntfs_error(vol->sb, "Failed to restore attribute search context.%s", es=
);
+			NVolSetErrors(vol);
+		}
+		ntfs_attr_put_search_ctx(ctx);
+	}
+	if (!IS_ERR(mrec))
+		unmap_mft_record(mft_ni);
+	return ret;
+}
+
+/**
+ * ntfs_mft_record_layout - layout an mft record into a memory buffer
+ * @vol:	volume to which the mft record will belong
+ * @mft_no:	mft reference specifying the mft record number
+ * @m:		destination buffer of size >=3D @vol->mft_record_size bytes
+ *
+ * Layout an empty, unused mft record with the mft record number @mft_no i=
nto
+ * the buffer @m.  The volume @vol is needed because the mft record struct=
ure
+ * was modified in NTFS 3.1 so we need to know which volume version this m=
ft
+ * record will be used on.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_mft_record_layout(const struct ntfs_volume *vol, const s64=
 mft_no,
+		struct mft_record *m)
+{
+	struct attr_record *a;
+
+	ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
+	if (mft_no >=3D (1ll << 32)) {
+		ntfs_error(vol->sb, "Mft record number 0x%llx exceeds maximum of 2^32.",
+				(long long)mft_no);
+		return -ERANGE;
+	}
+	/* Start by clearing the whole mft record to gives us a clean slate. */
+	memset(m, 0, vol->mft_record_size);
+	/* Aligned to 2-byte boundary. */
+	if (vol->major_ver < 3 || (vol->major_ver =3D=3D 3 && !vol->minor_ver))
+		m->usa_ofs =3D cpu_to_le16((sizeof(struct mft_record_old) + 1) & ~1);
+	else {
+		m->usa_ofs =3D cpu_to_le16((sizeof(struct mft_record) + 1) & ~1);
+		/*
+		 * Set the NTFS 3.1+ specific fields while we know that the
+		 * volume version is 3.1+.
+		 */
+		m->reserved =3D 0;
+		m->mft_record_number =3D cpu_to_le32((u32)mft_no);
+	}
+	m->magic =3D magic_FILE;
+	if (vol->mft_record_size >=3D NTFS_BLOCK_SIZE)
+		m->usa_count =3D cpu_to_le16(vol->mft_record_size /
+				NTFS_BLOCK_SIZE + 1);
+	else {
+		m->usa_count =3D cpu_to_le16(1);
+		ntfs_warning(vol->sb,
+			"Sector size is bigger than mft record size.  Setting usa_count to 1.  =
If chkdsk reports this as corruption");
+	}
+	/* Set the update sequence number to 1. */
+	*(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs)) =3D cpu_to_le16(1);
+	m->lsn =3D 0;
+	m->sequence_number =3D cpu_to_le16(1);
+	m->link_count =3D 0;
+	/*
+	 * Place the attributes straight after the update sequence array,
+	 * aligned to 8-byte boundary.
+	 */
+	m->attrs_offset =3D cpu_to_le16((le16_to_cpu(m->usa_ofs) +
+			(le16_to_cpu(m->usa_count) << 1) + 7) & ~7);
+	m->flags =3D 0;
+	/*
+	 * Using attrs_offset plus eight bytes (for the termination attribute).
+	 * attrs_offset is already aligned to 8-byte boundary, so no need to
+	 * align again.
+	 */
+	m->bytes_in_use =3D cpu_to_le32(le16_to_cpu(m->attrs_offset) + 8);
+	m->bytes_allocated =3D cpu_to_le32(vol->mft_record_size);
+	m->base_mft_record =3D 0;
+	m->next_attr_instance =3D 0;
+	/* Add the termination attribute. */
+	a =3D (struct attr_record *)((u8 *)m + le16_to_cpu(m->attrs_offset));
+	a->type =3D AT_END;
+	a->length =3D 0;
+	ntfs_debug("Done.");
+	return 0;
+}
+
+/**
+ * ntfs_mft_record_format - format an mft record on an ntfs volume
+ * @vol:	volume on which to format the mft record
+ * @mft_no:	mft record number to format
+ *
+ * Format the mft record @mft_no in $MFT/$DATA, i.e. lay out an empty, unu=
sed
+ * mft record into the appropriate place of the mft data attribute.  This =
is
+ * used when extending the mft data attribute.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_mft_record_format(const struct ntfs_volume *vol, const s64=
 mft_no)
+{
+	loff_t i_size;
+	struct inode *mft_vi =3D vol->mft_ino;
+	struct folio *folio;
+	struct mft_record *m;
+	pgoff_t index, end_index;
+	unsigned int ofs;
+	int err;
+
+	ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
+	/*
+	 * The index into the page cache and the offset within the page cache
+	 * page of the wanted mft record.
+	 */
+	index =3D mft_no << vol->mft_record_size_bits >> PAGE_SHIFT;
+	ofs =3D (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
+	/* The maximum valid index into the page cache for $MFT's data. */
+	i_size =3D i_size_read(mft_vi);
+	end_index =3D i_size >> PAGE_SHIFT;
+	if (unlikely(index >=3D end_index)) {
+		if (unlikely(index > end_index ||
+			     ofs + vol->mft_record_size > (i_size & ~PAGE_MASK))) {
+			ntfs_error(vol->sb, "Tried to format non-existing mft record 0x%llx.",
+					(long long)mft_no);
+			return -ENOENT;
+		}
+	}
+
+	/* Read, map, and pin the folio containing the mft record. */
+	folio =3D ntfs_read_mapping_folio(mft_vi->i_mapping, index);
+	if (IS_ERR(folio)) {
+		ntfs_error(vol->sb, "Failed to map page containing mft record to format =
0x%llx.",
+				(long long)mft_no);
+		return PTR_ERR(folio);
+	}
+	folio_lock(folio);
+	folio_clear_uptodate(folio);
+	m =3D (struct mft_record *)((u8 *)kmap_local_folio(folio, 0) + ofs);
+	err =3D ntfs_mft_record_layout(vol, mft_no, m);
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Failed to layout mft record 0x%llx.",
+				(long long)mft_no);
+		folio_mark_uptodate(folio);
+		folio_unlock(folio);
+		ntfs_unmap_folio(folio, m);
+		return err;
+	}
+	pre_write_mst_fixup((struct ntfs_record *)m, vol->mft_record_size);
+	flush_dcache_folio(folio);
+	folio_mark_uptodate(folio);
+	/*
+	 * Make sure the mft record is written out to disk.  We could use
+	 * ilookup5() to check if an inode is in icache and so on but this is
+	 * unnecessary as ntfs_writepage() will write the dirty record anyway.
+	 */
+	mark_ntfs_record_dirty(folio);
+	folio_unlock(folio);
+	ntfs_unmap_folio(folio, m);
+	ntfs_debug("Done.");
+	return 0;
+}
+
+/**
+ * ntfs_mft_record_alloc - allocate an mft record on an ntfs volume
+ * @vol:	[IN]  volume on which to allocate the mft record
+ * @mode:	[IN]  mode if want a file or directory, i.e. base inode or 0
+ * @base_ni:	[IN]  open base inode if allocating an extent mft record or N=
ULL
+ * @ni_mrec:	[OUT] on successful return this is the mapped mft record
+ *
+ * Allocate an mft record in $MFT/$DATA of an open ntfs volume @vol.
+ *
+ * If @base_ni is NULL make the mft record a base mft record, i.e. a file =
or
+ * direvctory inode, and allocate it at the default allocator position.  In
+ * this case @mode is the file mode as given to us by the caller.  We in
+ * particular use @mode to distinguish whether a file or a directory is be=
ing
+ * created (S_IFDIR(mode) and S_IFREG(mode), respectively).
+ *
+ * If @base_ni is not NULL make the allocated mft record an extent record,
+ * allocate it starting at the mft record after the base mft record and at=
tach
+ * the allocated and opened ntfs inode to the base inode @base_ni.  In this
+ * case @mode must be 0 as it is meaningless for extent inodes.
+ *
+ * You need to check the return value with IS_ERR().  If false, the functi=
on
+ * was successful and the return value is the now opened ntfs inode of the
+ * allocated mft record.  *@mrec is then set to the allocated, mapped, pin=
ned,
+ * and locked mft record.  If IS_ERR() is true, the function failed and the
+ * error code is obtained from PTR_ERR(return value).  *@mrec is undefined=
 in
+ * this case.
+ *
+ * Allocation strategy:
+ *
+ * To find a free mft record, we scan the mft bitmap for a zero bit.  To
+ * optimize this we start scanning at the place specified by @base_ni or if
+ * @base_ni is NULL we start where we last stopped and we perform wrap aro=
und
+ * when we reach the end.  Note, we do not try to allocate mft records bel=
ow
+ * number 64 because numbers 0 to 15 are the defined system files anyway a=
nd 16
+ * to 64 are special in that they are used for storing extension mft recor=
ds
+ * for the $DATA attribute of $MFT.  This is required to avoid the possibi=
lity
+ * of creating a runlist with a circular dependency which once written to =
disk
+ * can never be read in again.  Windows will only use records 16 to 24 for
+ * normal files if the volume is completely out of space.  We never use th=
em
+ * which means that when the volume is really out of space we cannot creat=
e any
+ * more files while Windows can still create up to 8 small files.  We can =
start
+ * doing this at some later time, it does not matter much for now.
+ *
+ * When scanning the mft bitmap, we only search up to the last allocated m=
ft
+ * record.  If there are no free records left in the range 64 to number of
+ * allocated mft records, then we extend the $MFT/$DATA attribute in order=
 to
+ * create free mft records.  We extend the allocated size of $MFT/$DATA by=
 16
+ * records at a time or one cluster, if cluster size is above 16kiB.  If t=
here
+ * is not sufficient space to do this, we try to extend by a single mft re=
cord
+ * or one cluster, if cluster size is above the mft record size.
+ *
+ * No matter how many mft records we allocate, we initialize only the first
+ * allocated mft record, incrementing mft data size and initialized size
+ * accordingly, open an struct ntfs_inode for it and return it to the call=
er, unless
+ * there are less than 64 mft records, in which case we allocate and initi=
alize
+ * mft records until we reach record 64 which we consider as the first fre=
e mft
+ * record for use by normal files.
+ *
+ * If during any stage we overflow the initialized data in the mft bitmap,=
 we
+ * extend the initialized size (and data size) by 8 bytes, allocating anot=
her
+ * cluster if required.  The bitmap data size has to be at least equal to =
the
+ * number of mft records in the mft, but it can be bigger, in which case t=
he
+ * superfluous bits are padded with zeroes.
+ *
+ * Thus, when we return successfully (IS_ERR() is false), we will have:
+ *	- initialized / extended the mft bitmap if necessary,
+ *	- initialized / extended the mft data if necessary,
+ *	- set the bit corresponding to the mft record being allocated in the
+ *	  mft bitmap,
+ *	- opened an struct ntfs_inode for the allocated mft record, and we will=
 have
+ *	- returned the struct ntfs_inode as well as the allocated mapped, pinne=
d, and
+ *	  locked mft record.
+ *
+ * On error, the volume will be left in a consistent state and no record w=
ill
+ * be allocated.  If rolling back a partial operation fails, we may leave =
some
+ * inconsistent metadata in which case we set NVolErrors() so the volume is
+ * left dirty when unmounted.
+ *
+ * Note, this function cannot make use of most of the normal functions, li=
ke
+ * for example for attribute resizing, etc, because when the run list over=
flows
+ * the base mft record and an attribute list is used, it is very important=
 that
+ * the extension mft records used to store the $DATA attribute of $MFT can=
 be
+ * reached without having to read the information contained inside them, as
+ * this would make it impossible to find them in the first place after the
+ * volume is unmounted.  $MFT/$BITMAP probably does not need to follow this
+ * rule because the bitmap is not essential for finding the mft records, b=
ut on
+ * the other hand, handling the bitmap in this special way would make life
+ * easier because otherwise there might be circular invocations of functio=
ns
+ * when reading the bitmap.
+ */
+int ntfs_mft_record_alloc(struct ntfs_volume *vol, const int mode,
+			  struct ntfs_inode **ni, struct ntfs_inode *base_ni,
+			  struct mft_record **ni_mrec)
+{
+	s64 ll, bit, old_data_initialized, old_data_size;
+	unsigned long flags;
+	struct folio *folio;
+	struct ntfs_inode *mft_ni, *mftbmp_ni;
+	struct ntfs_attr_search_ctx *ctx;
+	struct mft_record *m =3D NULL;
+	struct attr_record *a;
+	pgoff_t index;
+	unsigned int ofs;
+	int err;
+	__le16 seq_no, usn;
+	bool record_formatted =3D false;
+	unsigned int memalloc_flags;
+
+	if (base_ni && *ni)
+		return -EINVAL;
+
+	/* @mode and @base_ni are mutually exclusive. */
+	if (mode && base_ni)
+		return -EINVAL;
+
+	if (base_ni)
+		ntfs_debug("Entering (allocating an extent mft record for base mft recor=
d 0x%llx).",
+				(long long)base_ni->mft_no);
+	else
+		ntfs_debug("Entering (allocating a base mft record).");
+
+	memalloc_flags =3D memalloc_nofs_save();
+
+	mft_ni =3D NTFS_I(vol->mft_ino);
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		mutex_lock(&mft_ni->mrec_lock);
+	mftbmp_ni =3D NTFS_I(vol->mftbmp_ino);
+search_free_rec:
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		down_write(&vol->mftbmp_lock);
+	bit =3D ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(vol, base_ni);
+	if (bit >=3D 0) {
+		ntfs_debug("Found and allocated free record (#1), bit 0x%llx.",
+				(long long)bit);
+		goto have_alloc_rec;
+	}
+	if (bit !=3D -ENOSPC) {
+		if (!base_ni || base_ni->mft_no !=3D FILE_MFT) {
+			up_write(&vol->mftbmp_lock);
+			mutex_unlock(&mft_ni->mrec_lock);
+		}
+		memalloc_nofs_restore(memalloc_flags);
+		return bit;
+	}
+
+	if (base_ni && base_ni->mft_no =3D=3D FILE_MFT) {
+		memalloc_nofs_restore(memalloc_flags);
+		return bit;
+	}
+
+	/*
+	 * No free mft records left.  If the mft bitmap already covers more
+	 * than the currently used mft records, the next records are all free,
+	 * so we can simply allocate the first unused mft record.
+	 * Note: We also have to make sure that the mft bitmap at least covers
+	 * the first 24 mft records as they are special and whilst they may not
+	 * be in use, we do not allocate from them.
+	 */
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	ll =3D mft_ni->initialized_size >> vol->mft_record_size_bits;
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	old_data_initialized =3D mftbmp_ni->initialized_size;
+	read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	if (old_data_initialized << 3 > ll &&
+	    old_data_initialized > RESERVED_MFT_RECORDS / 8) {
+		bit =3D ll;
+		if (bit < RESERVED_MFT_RECORDS)
+			bit =3D RESERVED_MFT_RECORDS;
+		if (unlikely(bit >=3D (1ll << 32)))
+			goto max_err_out;
+		ntfs_debug("Found free record (#2), bit 0x%llx.",
+				(long long)bit);
+		goto found_free_rec;
+	}
+	/*
+	 * The mft bitmap needs to be expanded until it covers the first unused
+	 * mft record that we can allocate.
+	 * Note: The smallest mft record we allocate is mft record 24.
+	 */
+	bit =3D old_data_initialized << 3;
+	if (unlikely(bit >=3D (1ll << 32)))
+		goto max_err_out;
+	read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	old_data_size =3D mftbmp_ni->allocated_size;
+	ntfs_debug("Status of mftbmp before extension: allocated_size 0x%llx, dat=
a_size 0x%llx, initialized_size 0x%llx.",
+			old_data_size, i_size_read(vol->mftbmp_ino),
+			old_data_initialized);
+	read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	if (old_data_initialized + 8 > old_data_size) {
+		/* Need to extend bitmap by one more cluster. */
+		ntfs_debug("mftbmp: initialized_size + 8 > allocated_size.");
+		err =3D ntfs_mft_bitmap_extend_allocation_nolock(vol);
+		if (err =3D=3D -EAGAIN)
+			err =3D ntfs_mft_bitmap_extend_allocation_nolock(vol);
+
+		if (unlikely(err)) {
+			if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+				up_write(&vol->mftbmp_lock);
+			goto err_out;
+		}
+#ifdef DEBUG
+		read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+		ntfs_debug("Status of mftbmp after allocation extension: allocated_size =
0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+				mftbmp_ni->allocated_size,
+				i_size_read(vol->mftbmp_ino),
+				mftbmp_ni->initialized_size);
+		read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+#endif /* DEBUG */
+	}
+	/*
+	 * We now have sufficient allocated space, extend the initialized_size
+	 * as well as the data_size if necessary and fill the new space with
+	 * zeroes.
+	 */
+	err =3D ntfs_mft_bitmap_extend_initialized_nolock(vol);
+	if (unlikely(err)) {
+		if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+			up_write(&vol->mftbmp_lock);
+		goto err_out;
+	}
+#ifdef DEBUG
+	read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	ntfs_debug("Status of mftbmp after initialized extension: allocated_size =
0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+			mftbmp_ni->allocated_size,
+			i_size_read(vol->mftbmp_ino),
+			mftbmp_ni->initialized_size);
+	read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+#endif /* DEBUG */
+	ntfs_debug("Found free record (#3), bit 0x%llx.", (long long)bit);
+found_free_rec:
+	/* @bit is the found free mft record, allocate it in the mft bitmap. */
+	ntfs_debug("At found_free_rec.");
+	err =3D ntfs_bitmap_set_bit(vol->mftbmp_ino, bit);
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Failed to allocate bit in mft bitmap.");
+		if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+			up_write(&vol->mftbmp_lock);
+		goto err_out;
+	}
+	ntfs_debug("Set bit 0x%llx in mft bitmap.", (long long)bit);
+have_alloc_rec:
+	/*
+	 * The mft bitmap is now uptodate.  Deal with mft data attribute now.
+	 * Note, we keep hold of the mft bitmap lock for writing until all
+	 * modifications to the mft data attribute are complete, too, as they
+	 * will impact decisions for mft bitmap and mft record allocation done
+	 * by a parallel allocation and if the lock is not maintained a
+	 * parallel allocation could allocate the same mft record as this one.
+	 */
+	ll =3D (bit + 1) << vol->mft_record_size_bits;
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	old_data_initialized =3D mft_ni->initialized_size;
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	if (ll <=3D old_data_initialized) {
+		ntfs_debug("Allocated mft record already initialized.");
+		goto mft_rec_already_initialized;
+	}
+	ntfs_debug("Initializing allocated mft record.");
+	/*
+	 * The mft record is outside the initialized data.  Extend the mft data
+	 * attribute until it covers the allocated record.  The loop is only
+	 * actually traversed more than once when a freshly formatted volume is
+	 * first written to so it optimizes away nicely in the common case.
+	 */
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT) {
+		read_lock_irqsave(&mft_ni->size_lock, flags);
+		ntfs_debug("Status of mft data before extension: allocated_size 0x%llx, =
data_size 0x%llx, initialized_size 0x%llx.",
+				mft_ni->allocated_size, i_size_read(vol->mft_ino),
+				mft_ni->initialized_size);
+		while (ll > mft_ni->allocated_size) {
+			read_unlock_irqrestore(&mft_ni->size_lock, flags);
+			err =3D ntfs_mft_data_extend_allocation_nolock(vol);
+			if (err =3D=3D -EAGAIN)
+				err =3D ntfs_mft_data_extend_allocation_nolock(vol);
+
+			if (unlikely(err)) {
+				ntfs_error(vol->sb, "Failed to extend mft data allocation.");
+				goto undo_mftbmp_alloc_nolock;
+			}
+			read_lock_irqsave(&mft_ni->size_lock, flags);
+			ntfs_debug("Status of mft data after allocation extension: allocated_si=
ze 0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+					mft_ni->allocated_size, i_size_read(vol->mft_ino),
+					mft_ni->initialized_size);
+		}
+		read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	} else if (ll > mft_ni->allocated_size) {
+		err =3D -ENOSPC;
+		goto undo_mftbmp_alloc_nolock;
+	}
+	/*
+	 * Extend mft data initialized size (and data size of course) to reach
+	 * the allocated mft record, formatting the mft records allong the way.
+	 * Note: We only modify the struct ntfs_inode structure as that is all th=
at is
+	 * needed by ntfs_mft_record_format().  We will update the attribute
+	 * record itself in one fell swoop later on.
+	 */
+	write_lock_irqsave(&mft_ni->size_lock, flags);
+	old_data_initialized =3D mft_ni->initialized_size;
+	old_data_size =3D vol->mft_ino->i_size;
+	while (ll > mft_ni->initialized_size) {
+		s64 new_initialized_size, mft_no;
+
+		new_initialized_size =3D mft_ni->initialized_size +
+				vol->mft_record_size;
+		mft_no =3D mft_ni->initialized_size >> vol->mft_record_size_bits;
+		if (new_initialized_size > i_size_read(vol->mft_ino))
+			i_size_write(vol->mft_ino, new_initialized_size);
+		write_unlock_irqrestore(&mft_ni->size_lock, flags);
+		ntfs_debug("Initializing mft record 0x%llx.",
+				(long long)mft_no);
+		err =3D ntfs_mft_record_format(vol, mft_no);
+		if (unlikely(err)) {
+			ntfs_error(vol->sb, "Failed to format mft record.");
+			goto undo_data_init;
+		}
+		write_lock_irqsave(&mft_ni->size_lock, flags);
+		mft_ni->initialized_size =3D new_initialized_size;
+	}
+	write_unlock_irqrestore(&mft_ni->size_lock, flags);
+	record_formatted =3D true;
+	/* Update the mft data attribute record to reflect the new sizes. */
+	m =3D map_mft_record(mft_ni);
+	if (IS_ERR(m)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		err =3D PTR_ERR(m);
+		goto undo_data_init;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(mft_ni, m);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		err =3D -ENOMEM;
+		unmap_mft_record(mft_ni);
+		goto undo_data_init;
+	}
+	err =3D ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Failed to find first attribute extent of mft data a=
ttribute.");
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(mft_ni);
+		goto undo_data_init;
+	}
+	a =3D ctx->attr;
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	a->data.non_resident.initialized_size =3D
+			cpu_to_le64(mft_ni->initialized_size);
+	a->data.non_resident.data_size =3D
+			cpu_to_le64(i_size_read(vol->mft_ino));
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	/* Ensure the changes make it to disk. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	ntfs_debug("Status of mft data after mft record initialization: allocated=
_size 0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+			mft_ni->allocated_size,	i_size_read(vol->mft_ino),
+			mft_ni->initialized_size);
+	WARN_ON(i_size_read(vol->mft_ino) > mft_ni->allocated_size);
+	WARN_ON(mft_ni->initialized_size > i_size_read(vol->mft_ino));
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+mft_rec_already_initialized:
+	/*
+	 * We can finally drop the mft bitmap lock as the mft data attribute
+	 * has been fully updated.  The only disparity left is that the
+	 * allocated mft record still needs to be marked as in use to match the
+	 * set bit in the mft bitmap but this is actually not a problem since
+	 * this mft record is not referenced from anywhere yet and the fact
+	 * that it is allocated in the mft bitmap means that no-one will try to
+	 * allocate it either.
+	 */
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		up_write(&vol->mftbmp_lock);
+	/*
+	 * We now have allocated and initialized the mft record.  Calculate the
+	 * index of and the offset within the page cache page the record is in.
+	 */
+	index =3D bit << vol->mft_record_size_bits >> PAGE_SHIFT;
+	ofs =3D (bit << vol->mft_record_size_bits) & ~PAGE_MASK;
+	/* Read, map, and pin the folio containing the mft record. */
+	folio =3D ntfs_read_mapping_folio(vol->mft_ino->i_mapping, index);
+	if (IS_ERR(folio)) {
+		ntfs_error(vol->sb, "Failed to map page containing allocated mft record =
0x%llx.",
+				bit);
+		err =3D PTR_ERR(folio);
+		goto undo_mftbmp_alloc;
+	}
+	folio_lock(folio);
+	folio_clear_uptodate(folio);
+	m =3D (struct mft_record *)((u8 *)kmap_local_folio(folio, 0) + ofs);
+	/* If we just formatted the mft record no need to do it again. */
+	if (!record_formatted) {
+		/* Sanity check that the mft record is really not in use. */
+		if (ntfs_is_file_record(m->magic) &&
+				(m->flags & MFT_RECORD_IN_USE)) {
+			ntfs_warning(vol->sb,
+				"Mft record 0x%llx was marked free in mft bitmap but is marked used it=
self. Unmount and run chkdsk.",
+				bit);
+			folio_mark_uptodate(folio);
+			folio_unlock(folio);
+			ntfs_unmap_folio(folio, m);
+			NVolSetErrors(vol);
+			goto search_free_rec;
+		}
+		/*
+		 * We need to (re-)format the mft record, preserving the
+		 * sequence number if it is not zero as well as the update
+		 * sequence number if it is not zero or -1 (0xffff).  This
+		 * means we do not need to care whether or not something went
+		 * wrong with the previous mft record.
+		 */
+		seq_no =3D m->sequence_number;
+		usn =3D *(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs));
+		err =3D ntfs_mft_record_layout(vol, bit, m);
+		if (unlikely(err)) {
+			ntfs_error(vol->sb, "Failed to layout allocated mft record 0x%llx.",
+					bit);
+			folio_mark_uptodate(folio);
+			folio_unlock(folio);
+			ntfs_unmap_folio(folio, m);
+			goto undo_mftbmp_alloc;
+		}
+		if (seq_no)
+			m->sequence_number =3D seq_no;
+		if (usn && le16_to_cpu(usn) !=3D 0xffff)
+			*(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs)) =3D usn;
+		pre_write_mst_fixup((struct ntfs_record *)m, vol->mft_record_size);
+	}
+	/* Set the mft record itself in use. */
+	m->flags |=3D MFT_RECORD_IN_USE;
+	if (S_ISDIR(mode))
+		m->flags |=3D MFT_RECORD_IS_DIRECTORY;
+	flush_dcache_folio(folio);
+	folio_mark_uptodate(folio);
+	if (base_ni) {
+		struct mft_record *m_tmp;
+
+		/*
+		 * Setup the base mft record in the extent mft record.  This
+		 * completes initialization of the allocated extent mft record
+		 * and we can simply use it with map_extent_mft_record().
+		 */
+		m->base_mft_record =3D MK_LE_MREF(base_ni->mft_no,
+				base_ni->seq_no);
+		/*
+		 * Allocate an extent inode structure for the new mft record,
+		 * attach it to the base inode @base_ni and map, pin, and lock
+		 * its, i.e. the allocated, mft record.
+		 */
+		m_tmp =3D map_extent_mft_record(base_ni,
+					      MK_MREF(bit, le16_to_cpu(m->sequence_number)),
+					      ni);
+		if (IS_ERR(m_tmp)) {
+			ntfs_error(vol->sb, "Failed to map allocated extent mft record 0x%llx.",
+					bit);
+			err =3D PTR_ERR(m_tmp);
+			/* Set the mft record itself not in use. */
+			m->flags &=3D cpu_to_le16(
+					~le16_to_cpu(MFT_RECORD_IN_USE));
+			flush_dcache_folio(folio);
+			/* Make sure the mft record is written out to disk. */
+			mark_ntfs_record_dirty(folio);
+			folio_unlock(folio);
+			ntfs_unmap_folio(folio, m);
+			goto undo_mftbmp_alloc;
+		}
+
+		/*
+		 * Make sure the allocated mft record is written out to disk.
+		 * No need to set the inode dirty because the caller is going
+		 * to do that anyway after finishing with the new extent mft
+		 * record (e.g. at a minimum a new attribute will be added to
+		 * the mft record.
+		 */
+		mark_ntfs_record_dirty(folio);
+		folio_unlock(folio);
+		/*
+		 * Need to unmap the page since map_extent_mft_record() mapped
+		 * it as well so we have it mapped twice at the moment.
+		 */
+		ntfs_unmap_folio(folio, m);
+	} else {
+		/*
+		 * Manually map, pin, and lock the mft record as we already
+		 * have its page mapped and it is very easy to do.
+		 */
+		(*ni)->seq_no =3D le16_to_cpu(m->sequence_number);
+		/*
+		 * Make sure the allocated mft record is written out to disk.
+		 * NOTE: We do not set the ntfs inode dirty because this would
+		 * fail in ntfs_write_inode() because the inode does not have a
+		 * standard information attribute yet.  Also, there is no need
+		 * to set the inode dirty because the caller is going to do
+		 * that anyway after finishing with the new mft record (e.g. at
+		 * a minimum some new attributes will be added to the mft
+		 * record.
+		 */
+
+		(*ni)->mrec =3D kmalloc(vol->mft_record_size, GFP_NOFS);
+		if (!(*ni)->mrec) {
+			folio_unlock(folio);
+			ntfs_unmap_folio(folio, m);
+			goto undo_mftbmp_alloc;
+		}
+
+		memcpy((*ni)->mrec, m, vol->mft_record_size);
+		post_read_mst_fixup((struct ntfs_record *)(*ni)->mrec, vol->mft_record_s=
ize);
+		mark_ntfs_record_dirty(folio);
+		folio_unlock(folio);
+		(*ni)->folio =3D folio;
+		(*ni)->folio_ofs =3D ofs;
+		atomic_inc(&(*ni)->count);
+		/* Update the default mft allocation position. */
+		vol->mft_data_pos =3D bit + 1;
+	}
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		mutex_unlock(&NTFS_I(vol->mft_ino)->mrec_lock);
+	memalloc_nofs_restore(memalloc_flags);
+
+	/*
+	 * Return the opened, allocated inode of the allocated mft record as
+	 * well as the mapped, pinned, and locked mft record.
+	 */
+	ntfs_debug("Returning opened, allocated %sinode 0x%llx.",
+			base_ni ? "extent " : "", bit);
+	(*ni)->mft_no =3D bit;
+	if (ni_mrec)
+		*ni_mrec =3D (*ni)->mrec;
+	ntfs_dec_free_mft_records(vol, 1);
+	return 0;
+undo_data_init:
+	write_lock_irqsave(&mft_ni->size_lock, flags);
+	mft_ni->initialized_size =3D old_data_initialized;
+	i_size_write(vol->mft_ino, old_data_size);
+	write_unlock_irqrestore(&mft_ni->size_lock, flags);
+	goto undo_mftbmp_alloc_nolock;
+undo_mftbmp_alloc:
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		down_write(&vol->mftbmp_lock);
+undo_mftbmp_alloc_nolock:
+	if (ntfs_bitmap_clear_bit(vol->mftbmp_ino, bit)) {
+		ntfs_error(vol->sb, "Failed to clear bit in mft bitmap.%s", es);
+		NVolSetErrors(vol);
+	}
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		up_write(&vol->mftbmp_lock);
+err_out:
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		mutex_unlock(&mft_ni->mrec_lock);
+	memalloc_nofs_restore(memalloc_flags);
+	return err;
+max_err_out:
+	ntfs_warning(vol->sb,
+		"Cannot allocate mft record because the maximum number of inodes (2^32) =
has already been reached.");
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT) {
+		up_write(&vol->mftbmp_lock);
+		mutex_unlock(&NTFS_I(vol->mft_ino)->mrec_lock);
+	}
+	memalloc_nofs_restore(memalloc_flags);
+	return -ENOSPC;
+}
+
+/**
+ * ntfs_mft_record_free - free an mft record on an ntfs volume
+ * @vol:	volume on which to free the mft record
+ * @ni:		open ntfs inode of the mft record to free
+ *
+ * Free the mft record of the open inode @ni on the mounted ntfs volume @v=
ol.
+ * Note that this function calls ntfs_inode_close() internally and hence y=
ou
+ * cannot use the pointer @ni any more after this function returns success.
+ *
+ * On success return 0 and on error return -1 with errno set to the error =
code.
+ */
+int ntfs_mft_record_free(struct ntfs_volume *vol, struct ntfs_inode *ni)
+{
+	u64 mft_no;
+	int err;
+	u16 seq_no;
+	__le16 old_seq_no;
+	struct mft_record *ni_mrec;
+	unsigned int memalloc_flags;
+	struct ntfs_inode *base_ni;
+
+	ntfs_debug("Entering for inode 0x%llx.\n", (long long)ni->mft_no);
+
+	if (!vol || !ni)
+		return -EINVAL;
+
+	ni_mrec =3D map_mft_record(ni);
+	if (IS_ERR(ni_mrec))
+		return -EIO;
+
+	/* Cache the mft reference for later. */
+	mft_no =3D ni->mft_no;
+
+	/* Mark the mft record as not in use. */
+	ni_mrec->flags &=3D ~MFT_RECORD_IN_USE;
+
+	/* Increment the sequence number, skipping zero, if it is not zero. */
+	old_seq_no =3D ni_mrec->sequence_number;
+	seq_no =3D le16_to_cpu(old_seq_no);
+	if (seq_no =3D=3D 0xffff)
+		seq_no =3D 1;
+	else if (seq_no)
+		seq_no++;
+	ni_mrec->sequence_number =3D cpu_to_le16(seq_no);
+
+	/*
+	 * Set the ntfs inode dirty and write it out.  We do not need to worry
+	 * about the base inode here since whatever caused the extent mft
+	 * record to be freed is guaranteed to do it already.
+	 */
+	NInoSetDirty(ni);
+	err =3D write_mft_record(ni, ni_mrec, 0);
+	if (err)
+		goto sync_rollback;
+
+	if (likely(ni->nr_extents >=3D 0))
+		base_ni =3D ni;
+	else
+		base_ni =3D ni->ext.base_ntfs_ino;
+
+	/* Clear the bit in the $MFT/$BITMAP corresponding to this record. */
+	memalloc_flags =3D memalloc_nofs_save();
+	if (base_ni->mft_no !=3D FILE_MFT)
+		down_write(&vol->mftbmp_lock);
+	err =3D ntfs_bitmap_clear_bit(vol->mftbmp_ino, mft_no);
+	if (base_ni->mft_no !=3D FILE_MFT)
+		up_write(&vol->mftbmp_lock);
+	memalloc_nofs_restore(memalloc_flags);
+	if (err)
+		goto bitmap_rollback;
+
+	unmap_mft_record(ni);
+	ntfs_inc_free_mft_records(vol, 1);
+	return 0;
+
+	/* Rollback what we did... */
+bitmap_rollback:
+	memalloc_flags =3D memalloc_nofs_save();
+	if (base_ni->mft_no !=3D FILE_MFT)
+		down_write(&vol->mftbmp_lock);
+	if (ntfs_bitmap_set_bit(vol->mftbmp_ino, mft_no))
+		ntfs_error(vol->sb, "ntfs_bitmap_set_bit failed in bitmap_rollback\n");
+	if (base_ni->mft_no !=3D FILE_MFT)
+		up_write(&vol->mftbmp_lock);
+	memalloc_nofs_restore(memalloc_flags);
+sync_rollback:
+	ntfs_error(vol->sb,
+		"Eeek! Rollback failed in %s. Leaving inconsistent metadata!\n", __func_=
_);
+	ni_mrec->flags |=3D MFT_RECORD_IN_USE;
+	ni_mrec->sequence_number =3D old_seq_no;
+	NInoSetDirty(ni);
+	write_mft_record(ni, ni_mrec, 0);
+	unmap_mft_record(ni);
+	return err;
+}
diff --git a/fs/ntfsplus/mst.c b/fs/ntfsplus/mst.c
new file mode 100644
index 000000000000..e88f52831cb8
--- /dev/null
+++ b/fs/ntfsplus/mst.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS multi sector transfer protection handling code.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ */
+
+#include <linux/ratelimit.h>
+
+#include "ntfs.h"
+
+/**
+ * post_read_mst_fixup - deprotect multi sector transfer protected data
+ * @b:		pointer to the data to deprotect
+ * @size:	size in bytes of @b
+ *
+ * Perform the necessary post read multi sector transfer fixup and detect =
the
+ * presence of incomplete multi sector transfers. - In that case, overwrit=
e the
+ * magic of the ntfs record header being processed with "BAAD" (in memory =
only!)
+ * and abort processing.
+ *
+ * Return 0 on success and -EINVAL on error ("BAAD" magic will be present).
+ *
+ * NOTE: We consider the absence / invalidity of an update sequence array =
to
+ * mean that the structure is not protected at all and hence doesn't need =
to
+ * be fixed up. Thus, we return success and not failure in this case. This=
 is
+ * in contrast to pre_write_mst_fixup(), see below.
+ */
+int post_read_mst_fixup(struct ntfs_record *b, const u32 size)
+{
+	u16 usa_ofs, usa_count, usn;
+	u16 *usa_pos, *data_pos;
+
+	/* Setup the variables. */
+	usa_ofs =3D le16_to_cpu(b->usa_ofs);
+	/* Decrement usa_count to get number of fixups. */
+	usa_count =3D le16_to_cpu(b->usa_count) - 1;
+	/* Size and alignment checks. */
+	if (size & (NTFS_BLOCK_SIZE - 1) || usa_ofs & 1	||
+	    usa_ofs + (usa_count * 2) > size ||
+	    (size >> NTFS_BLOCK_SIZE_BITS) !=3D usa_count)
+		return 0;
+	/* Position of usn in update sequence array. */
+	usa_pos =3D (u16 *)b + usa_ofs/sizeof(u16);
+	/*
+	 * The update sequence number which has to be equal to each of the
+	 * u16 values before they are fixed up. Note no need to care for
+	 * endianness since we are comparing and moving data for on disk
+	 * structures which means the data is consistent. - If it is
+	 * consistenty the wrong endianness it doesn't make any difference.
+	 */
+	usn =3D *usa_pos;
+	/*
+	 * Position in protected data of first u16 that needs fixing up.
+	 */
+	data_pos =3D (u16 *)b + NTFS_BLOCK_SIZE / sizeof(u16) - 1;
+	/*
+	 * Check for incomplete multi sector transfer(s).
+	 */
+	while (usa_count--) {
+		if (*data_pos !=3D usn) {
+			struct mft_record *m =3D (struct mft_record *)b;
+
+			pr_err_ratelimited("ntfs: Incomplete multi sector transfer detected! (R=
ecord magic : 0x%x, mft number : 0x%x, base mft number : 0x%lx, mft in use =
: %d, data : 0x%x, usn 0x%x)\n",
+					le32_to_cpu(m->magic), le32_to_cpu(m->mft_record_number),
+					MREF_LE(m->base_mft_record), m->flags & MFT_RECORD_IN_USE,
+					*data_pos, usn);
+			/*
+			 * Incomplete multi sector transfer detected! )-:
+			 * Set the magic to "BAAD" and return failure.
+			 * Note that magic_BAAD is already converted to le32.
+			 */
+			b->magic =3D magic_BAAD;
+			return -EINVAL;
+		}
+		data_pos +=3D NTFS_BLOCK_SIZE / sizeof(u16);
+	}
+	/* Re-setup the variables. */
+	usa_count =3D le16_to_cpu(b->usa_count) - 1;
+	data_pos =3D (u16 *)b + NTFS_BLOCK_SIZE / sizeof(u16) - 1;
+	/* Fixup all sectors. */
+	while (usa_count--) {
+		/*
+		 * Increment position in usa and restore original data from
+		 * the usa into the data buffer.
+		 */
+		*data_pos =3D *(++usa_pos);
+		/* Increment position in data as well. */
+		data_pos +=3D NTFS_BLOCK_SIZE/sizeof(u16);
+	}
+	return 0;
+}
+
+/**
+ * pre_write_mst_fixup - apply multi sector transfer protection
+ * @b:		pointer to the data to protect
+ * @size:	size in bytes of @b
+ *
+ * Perform the necessary pre write multi sector transfer fixup on the data
+ * pointer to by @b of @size.
+ *
+ * Return 0 if fixup applied (success) or -EINVAL if no fixup was performed
+ * (assumed not needed). This is in contrast to post_read_mst_fixup() abov=
e.
+ *
+ * NOTE: We consider the absence / invalidity of an update sequence array =
to
+ * mean that the structure is not subject to protection and hence doesn't =
need
+ * to be fixed up. This means that you have to create a valid update seque=
nce
+ * array header in the ntfs record before calling this function, otherwise=
 it
+ * will fail (the header needs to contain the position of the update seque=
nce
+ * array together with the number of elements in the array). You also need=
 to
+ * initialise the update sequence number before calling this function
+ * otherwise a random word will be used (whatever was in the record at that
+ * position at that time).
+ */
+int pre_write_mst_fixup(struct ntfs_record *b, const u32 size)
+{
+	__le16 *usa_pos, *data_pos;
+	u16 usa_ofs, usa_count, usn;
+	__le16 le_usn;
+
+	/* Sanity check + only fixup if it makes sense. */
+	if (!b || ntfs_is_baad_record(b->magic) ||
+	    ntfs_is_hole_record(b->magic))
+		return -EINVAL;
+	/* Setup the variables. */
+	usa_ofs =3D le16_to_cpu(b->usa_ofs);
+	/* Decrement usa_count to get number of fixups. */
+	usa_count =3D le16_to_cpu(b->usa_count) - 1;
+	/* Size and alignment checks. */
+	if (size & (NTFS_BLOCK_SIZE - 1) || usa_ofs & 1	||
+	    usa_ofs + (usa_count * 2) > size ||
+	    (size >> NTFS_BLOCK_SIZE_BITS) !=3D usa_count)
+		return -EINVAL;
+	/* Position of usn in update sequence array. */
+	usa_pos =3D (__le16 *)((u8 *)b + usa_ofs);
+	/*
+	 * Cyclically increment the update sequence number
+	 * (skipping 0 and -1, i.e. 0xffff).
+	 */
+	usn =3D le16_to_cpup(usa_pos) + 1;
+	if (usn =3D=3D 0xffff || !usn)
+		usn =3D 1;
+	le_usn =3D cpu_to_le16(usn);
+	*usa_pos =3D le_usn;
+	/* Position in data of first u16 that needs fixing up. */
+	data_pos =3D (__le16 *)b + NTFS_BLOCK_SIZE/sizeof(__le16) - 1;
+	/* Fixup all sectors. */
+	while (usa_count--) {
+		/*
+		 * Increment the position in the usa and save the
+		 * original data from the data buffer into the usa.
+		 */
+		*(++usa_pos) =3D *data_pos;
+		/* Apply fixup to data. */
+		*data_pos =3D le_usn;
+		/* Increment position in data as well. */
+		data_pos +=3D NTFS_BLOCK_SIZE / sizeof(__le16);
+	}
+	return 0;
+}
+
+/**
+ * post_write_mst_fixup - fast deprotect multi sector transfer protected d=
ata
+ * @b:		pointer to the data to deprotect
+ *
+ * Perform the necessary post write multi sector transfer fixup, not check=
ing
+ * for any errors, because we assume we have just used pre_write_mst_fixup=
(),
+ * thus the data will be fine or we would never have gotten here.
+ */
+void post_write_mst_fixup(struct ntfs_record *b)
+{
+	__le16 *usa_pos, *data_pos;
+
+	u16 usa_ofs =3D le16_to_cpu(b->usa_ofs);
+	u16 usa_count =3D le16_to_cpu(b->usa_count) - 1;
+
+	/* Position of usn in update sequence array. */
+	usa_pos =3D (__le16 *)b + usa_ofs/sizeof(__le16);
+
+	/* Position in protected data of first u16 that needs fixing up. */
+	data_pos =3D (__le16 *)b + NTFS_BLOCK_SIZE/sizeof(__le16) - 1;
+
+	/* Fixup all sectors. */
+	while (usa_count--) {
+		/*
+		 * Increment position in usa and restore original data from
+		 * the usa into the data buffer.
+		 */
+		*data_pos =3D *(++usa_pos);
+
+		/* Increment position in data as well. */
+		data_pos +=3D NTFS_BLOCK_SIZE/sizeof(__le16);
+	}
+}
diff --git a/fs/ntfsplus/namei.c b/fs/ntfsplus/namei.c
new file mode 100644
index 000000000000..911f9139c3a2
--- /dev/null
+++ b/fs/ntfsplus/namei.c
@@ -0,0 +1,1677 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel directory inode operations.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2006 Anton Altaparmakov
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/exportfs.h>
+#include <linux/iversion.h>
+
+#include "ntfs.h"
+#include "misc.h"
+#include "index.h"
+#include "reparse.h"
+#include "ea.h"
+
+static const __le16 aux_name_le[3] =3D {
+	cpu_to_le16('A'), cpu_to_le16('U'), cpu_to_le16('X')
+};
+
+static const __le16 con_name_le[3] =3D {
+	cpu_to_le16('C'), cpu_to_le16('O'), cpu_to_le16('N')
+};
+
+static const __le16 com_name_le[3] =3D {
+	cpu_to_le16('C'), cpu_to_le16('O'), cpu_to_le16('M')
+};
+
+static const __le16 lpt_name_le[3] =3D {
+	cpu_to_le16('L'), cpu_to_le16('P'), cpu_to_le16('T')
+};
+
+static const __le16 nul_name_le[3] =3D {
+	cpu_to_le16('N'), cpu_to_le16('U'), cpu_to_le16('L')
+};
+
+static const __le16 prn_name_le[3] =3D {
+	cpu_to_le16('P'), cpu_to_le16('R'), cpu_to_le16('N')
+};
+
+static inline int ntfs_check_bad_char(const unsigned short *wc,
+		unsigned int wc_len)
+{
+	int i;
+
+	for (i =3D 0; i < wc_len; i++) {
+		if ((wc[i] < 0x0020) ||
+		    (wc[i] =3D=3D 0x0022) || (wc[i] =3D=3D 0x002A) || (wc[i] =3D=3D 0x00=
2F) ||
+		    (wc[i] =3D=3D 0x003A) || (wc[i] =3D=3D 0x003C) || (wc[i] =3D=3D 0x00=
3E) ||
+		    (wc[i] =3D=3D 0x003F) || (wc[i] =3D=3D 0x005C) || (wc[i] =3D=3D 0x00=
7C))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ntfs_check_bad_windows_name(struct ntfs_volume *vol,
+				       const unsigned short *wc,
+				       unsigned int wc_len)
+{
+	if (ntfs_check_bad_char(wc, wc_len))
+		return -EINVAL;
+
+	if (!NVolCheckWindowsNames(vol))
+		return 0;
+
+	/* Check for trailing space or dot. */
+	if (wc_len > 0 &&
+	    (wc[wc_len - 1] =3D=3D cpu_to_le16(' ') ||
+	    wc[wc_len - 1] =3D=3D cpu_to_le16('.')))
+		return -EINVAL;
+
+	if (wc_len =3D=3D 3 || (wc_len > 3 && wc[3] =3D=3D cpu_to_le16('.'))) {
+		__le16 *upcase =3D vol->upcase;
+		u32 size =3D vol->upcase_len;
+
+		if (ntfs_are_names_equal(wc, 3, aux_name_le, 3, IGNORE_CASE, upcase, siz=
e) ||
+		    ntfs_are_names_equal(wc, 3, con_name_le, 3, IGNORE_CASE, upcase, siz=
e) ||
+		    ntfs_are_names_equal(wc, 3, nul_name_le, 3, IGNORE_CASE, upcase, siz=
e) ||
+		    ntfs_are_names_equal(wc, 3, prn_name_le, 3, IGNORE_CASE, upcase, siz=
e))
+			return -EINVAL;
+	}
+
+	if (wc_len =3D=3D 4 || (wc_len > 4 && wc[4] =3D=3D cpu_to_le16('.'))) {
+		__le16 *upcase =3D vol->upcase;
+		u32 size =3D vol->upcase_len, port;
+
+		if (ntfs_are_names_equal(wc, 3, com_name_le, 3, IGNORE_CASE, upcase, siz=
e) ||
+		    ntfs_are_names_equal(wc, 3, lpt_name_le, 3, IGNORE_CASE, upcase, siz=
e)) {
+			port =3D le16_to_cpu(wc[3]);
+			if (port >=3D '1' && port <=3D '9')
+				return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+/**
+ * ntfs_lookup - find the inode represented by a dentry in a directory ino=
de
+ * @dir_ino:	directory inode in which to look for the inode
+ * @dent:	dentry representing the inode to look for
+ * @flags:	lookup flags
+ *
+ * In short, ntfs_lookup() looks for the inode represented by the dentry @=
dent
+ * in the directory inode @dir_ino and if found attaches the inode to the
+ * dentry @dent.
+ *
+ * In more detail, the dentry @dent specifies which inode to look for by
+ * supplying the name of the inode in @dent->d_name.name. ntfs_lookup()
+ * converts the name to Unicode and walks the contents of the directory in=
ode
+ * @dir_ino looking for the converted Unicode name. If the name is found i=
n the
+ * directory, the corresponding inode is loaded by calling ntfs_iget() on =
its
+ * inode number and the inode is associated with the dentry @dent via a ca=
ll to
+ * d_splice_alias().
+ *
+ * If the name is not found in the directory, a NULL inode is inserted int=
o the
+ * dentry @dent via a call to d_add(). The dentry is then termed a negative
+ * dentry.
+ *
+ * Only if an actual error occurs, do we return an error via ERR_PTR().
+ *
+ * In order to handle the case insensitivity issues of NTFS with regards t=
o the
+ * dcache and the dcache requiring only one dentry per directory, we deal =
with
+ * dentry aliases that only differ in case in ->ntfs_lookup() while mainta=
ining
+ * a case sensitive dcache. This means that we get the full benefit of dca=
che
+ * speed when the file/directory is looked up with the same case as return=
ed by
+ * ->ntfs_readdir() but that a lookup for any other case (or for the short=
 file
+ * name) will not find anything in dcache and will enter ->ntfs_lookup()
+ * instead, where we search the directory for a fully matching file name
+ * (including case) and if that is not found, we search for a file name th=
at
+ * matches with different case and if that has non-POSIX semantics we retu=
rn
+ * that. We actually do only one search (case sensitive) and keep tabs on
+ * whether we have found a case insensitive match in the process.
+ *
+ * To simplify matters for us, we do not treat the short vs long filenames=
 as
+ * two hard links but instead if the lookup matches a short filename, we
+ * return the dentry for the corresponding long filename instead.
+ *
+ * There are three cases we need to distinguish here:
+ *
+ * 1) @dent perfectly matches (i.e. including case) a directory entry with=
 a
+ *    file name in the WIN32 or POSIX namespaces. In this case
+ *    ntfs_lookup_inode_by_name() will return with name set to NULL and we
+ *    just d_splice_alias() @dent.
+ * 2) @dent matches (not including case) a directory entry with a file nam=
e in
+ *    the WIN32 namespace. In this case ntfs_lookup_inode_by_name() will r=
eturn
+ *    with name set to point to a kmalloc()ed ntfs_name structure containi=
ng
+ *    the properly cased little endian Unicode name. We convert the name t=
o the
+ *    current NLS code page, search if a dentry with this name already exi=
sts
+ *    and if so return that instead of @dent.  At this point things are
+ *    complicated by the possibility of 'disconnected' dentries due to NFS
+ *    which we deal with appropriately (see the code comments).  The VFS w=
ill
+ *    then destroy the old @dent and use the one we returned.  If a dentry=
 is
+ *    not found, we allocate a new one, d_splice_alias() it, and return it=
 as
+ *    above.
+ * 3) @dent matches either perfectly or not (i.e. we don't care about case=
) a
+ *    directory entry with a file name in the DOS namespace. In this case
+ *    ntfs_lookup_inode_by_name() will return with name set to point to a
+ *    kmalloc()ed ntfs_name structure containing the mft reference (cpu en=
dian)
+ *    of the inode. We use the mft reference to read the inode and to find=
 the
+ *    file name in the WIN32 namespace corresponding to the matched short =
file
+ *    name. We then convert the name to the current NLS code page, and pro=
ceed
+ *    searching for a dentry with this name, etc, as in case 2), above.
+ *
+ * Locking: Caller must hold i_mutex on the directory.
+ */
+static struct dentry *ntfs_lookup(struct inode *dir_ino, struct dentry *de=
nt,
+		unsigned int flags)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(dir_ino->i_sb);
+	struct inode *dent_inode;
+	__le16 *uname;
+	struct ntfs_name *name =3D NULL;
+	u64 mref;
+	unsigned long dent_ino;
+	int uname_len;
+
+	ntfs_debug("Looking up %pd in directory inode 0x%lx.",
+			dent, dir_ino->i_ino);
+	/* Convert the name of the dentry to Unicode. */
+	uname_len =3D ntfs_nlstoucs(vol, dent->d_name.name, dent->d_name.len,
+				  &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_debug("Failed to convert name to Unicode.");
+		return ERR_PTR(uname_len);
+	}
+	mutex_lock(&NTFS_I(dir_ino)->mrec_lock);
+	mref =3D ntfs_lookup_inode_by_name(NTFS_I(dir_ino), uname, uname_len,
+			&name);
+	mutex_unlock(&NTFS_I(dir_ino)->mrec_lock);
+	kmem_cache_free(ntfs_name_cache, uname);
+	if (!IS_ERR_MREF(mref)) {
+		dent_ino =3D MREF(mref);
+		ntfs_debug("Found inode 0x%lx. Calling ntfs_iget.", dent_ino);
+		dent_inode =3D ntfs_iget(vol->sb, dent_ino);
+		if (!IS_ERR(dent_inode)) {
+			/* Consistency check. */
+			if (MSEQNO(mref) =3D=3D NTFS_I(dent_inode)->seq_no ||
+			    dent_ino =3D=3D FILE_MFT) {
+				/* Perfect WIN32/POSIX match. -- Case 1. */
+				if (!name) {
+					ntfs_debug("Done.  (Case 1.)");
+					return d_splice_alias(dent_inode, dent);
+				}
+				/*
+				 * We are too indented.  Handle imperfect
+				 * matches and short file names further below.
+				 */
+				goto handle_name;
+			}
+			ntfs_error(vol->sb,
+				"Found stale reference to inode 0x%lx (reference sequence number =3D 0=
x%x, inode sequence number =3D 0x%x), returning -EIO. Run chkdsk.",
+				dent_ino, MSEQNO(mref),
+				NTFS_I(dent_inode)->seq_no);
+			iput(dent_inode);
+			dent_inode =3D ERR_PTR(-EIO);
+		} else
+			ntfs_error(vol->sb, "ntfs_iget(0x%lx) failed with error code %li.",
+					dent_ino, PTR_ERR(dent_inode));
+		kfree(name);
+		/* Return the error code. */
+		return ERR_CAST(dent_inode);
+	}
+	kfree(name);
+	/* It is guaranteed that @name is no longer allocated at this point. */
+	if (MREF_ERR(mref) =3D=3D -ENOENT) {
+		ntfs_debug("Entry was not found, adding negative dentry.");
+		/* The dcache will handle negative entries. */
+		d_add(dent, NULL);
+		ntfs_debug("Done.");
+		return NULL;
+	}
+	ntfs_error(vol->sb, "ntfs_lookup_ino_by_name() failed with error code %i.=
",
+			-MREF_ERR(mref));
+	return ERR_PTR(MREF_ERR(mref));
+handle_name:
+	{
+		struct mft_record *m;
+		struct ntfs_attr_search_ctx *ctx;
+		struct ntfs_inode *ni =3D NTFS_I(dent_inode);
+		int err;
+		struct qstr nls_name;
+
+		nls_name.name =3D NULL;
+		if (name->type !=3D FILE_NAME_DOS) {			/* Case 2. */
+			ntfs_debug("Case 2.");
+			nls_name.len =3D (unsigned int)ntfs_ucstonls(vol,
+					(__le16 *)&name->name, name->len,
+					(unsigned char **)&nls_name.name, 0);
+			kfree(name);
+		} else /* if (name->type =3D=3D FILE_NAME_DOS) */ {		/* Case 3. */
+			struct file_name_attr *fn;
+
+			ntfs_debug("Case 3.");
+			kfree(name);
+
+			/* Find the WIN32 name corresponding to the matched DOS name. */
+			ni =3D NTFS_I(dent_inode);
+			m =3D map_mft_record(ni);
+			if (IS_ERR(m)) {
+				err =3D PTR_ERR(m);
+				m =3D NULL;
+				ctx =3D NULL;
+				goto err_out;
+			}
+			ctx =3D ntfs_attr_get_search_ctx(ni, m);
+			if (unlikely(!ctx)) {
+				err =3D -ENOMEM;
+				goto err_out;
+			}
+			do {
+				struct attr_record *a;
+				u32 val_len;
+
+				err =3D ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0,
+						NULL, 0, ctx);
+				if (unlikely(err)) {
+					ntfs_error(vol->sb,
+						"Inode corrupt: No WIN32 namespace counterpart to DOS file name. Run=
 chkdsk.");
+					if (err =3D=3D -ENOENT)
+						err =3D -EIO;
+					goto err_out;
+				}
+				/* Consistency checks. */
+				a =3D ctx->attr;
+				if (a->non_resident || a->flags)
+					goto eio_err_out;
+				val_len =3D le32_to_cpu(a->data.resident.value_length);
+				if (le16_to_cpu(a->data.resident.value_offset) +
+						val_len > le32_to_cpu(a->length))
+					goto eio_err_out;
+				fn =3D (struct file_name_attr *)((u8 *)ctx->attr + le16_to_cpu(
+							ctx->attr->data.resident.value_offset));
+				if ((u32)(fn->file_name_length * sizeof(__le16) +
+							sizeof(struct file_name_attr)) > val_len)
+					goto eio_err_out;
+			} while (fn->file_name_type !=3D FILE_NAME_WIN32);
+
+			/* Convert the found WIN32 name to current NLS code page. */
+			nls_name.len =3D (unsigned int)ntfs_ucstonls(vol,
+					(__le16 *)&fn->file_name, fn->file_name_length,
+					(unsigned char **)&nls_name.name, 0);
+
+			ntfs_attr_put_search_ctx(ctx);
+			unmap_mft_record(ni);
+		}
+		m =3D NULL;
+		ctx =3D NULL;
+
+		/* Check if a conversion error occurred. */
+		if ((int)nls_name.len < 0) {
+			err =3D (int)nls_name.len;
+			goto err_out;
+		}
+		nls_name.hash =3D full_name_hash(dent, nls_name.name, nls_name.len);
+
+		dent =3D d_add_ci(dent, dent_inode, &nls_name);
+		kfree(nls_name.name);
+		return dent;
+
+eio_err_out:
+		ntfs_error(vol->sb, "Illegal file name attribute. Run chkdsk.");
+		err =3D -EIO;
+err_out:
+		if (ctx)
+			ntfs_attr_put_search_ctx(ctx);
+		if (m)
+			unmap_mft_record(ni);
+		iput(dent_inode);
+		ntfs_error(vol->sb, "Failed, returning error code %i.", err);
+		return ERR_PTR(err);
+	}
+}
+
+static int ntfs_sd_add_everyone(struct ntfs_inode *ni)
+{
+	struct security_descriptor_relative *sd;
+	struct ntfs_acl *acl;
+	struct ntfs_ace *ace;
+	struct ntfs_sid *sid;
+	int ret, sd_len;
+
+	/* Create SECURITY_DESCRIPTOR attribute (everyone has full access). */
+	/*
+	 * Calculate security descriptor length. We have 2 sub-authorities in
+	 * owner and group SIDs, So add 8 bytes to every SID.
+	 */
+	sd_len =3D sizeof(struct security_descriptor_relative) + 2 *
+		(sizeof(struct ntfs_sid) + 8) + sizeof(struct ntfs_acl) +
+		sizeof(struct ntfs_ace) + 4;
+	sd =3D ntfs_malloc_nofs(sd_len);
+	if (!sd)
+		return -1;
+
+	sd->revision =3D 1;
+	sd->control =3D SE_DACL_PRESENT | SE_SELF_RELATIVE;
+
+	sid =3D (struct ntfs_sid *)((u8 *)sd + sizeof(struct security_descriptor_=
relative));
+	sid->revision =3D 1;
+	sid->sub_authority_count =3D 2;
+	sid->sub_authority[0] =3D cpu_to_le32(SECURITY_BUILTIN_DOMAIN_RID);
+	sid->sub_authority[1] =3D cpu_to_le32(DOMAIN_ALIAS_RID_ADMINS);
+	sid->identifier_authority.value[5] =3D 5;
+	sd->owner =3D cpu_to_le32((u8 *)sid - (u8 *)sd);
+
+	sid =3D (struct ntfs_sid *)((u8 *)sid + sizeof(struct ntfs_sid) + 8);
+	sid->revision =3D 1;
+	sid->sub_authority_count =3D 2;
+	sid->sub_authority[0] =3D cpu_to_le32(SECURITY_BUILTIN_DOMAIN_RID);
+	sid->sub_authority[1] =3D cpu_to_le32(DOMAIN_ALIAS_RID_ADMINS);
+	sid->identifier_authority.value[5] =3D 5;
+	sd->group =3D cpu_to_le32((u8 *)sid - (u8 *)sd);
+
+	acl =3D (struct ntfs_acl *)((u8 *)sid + sizeof(struct ntfs_sid) + 8);
+	acl->revision =3D 2;
+	acl->size =3D cpu_to_le16(sizeof(struct ntfs_acl) + sizeof(struct ntfs_ac=
e) + 4);
+	acl->ace_count =3D cpu_to_le16(1);
+	sd->dacl =3D cpu_to_le32((u8 *)acl - (u8 *)sd);
+
+	ace =3D (struct ntfs_ace *)((u8 *)acl + sizeof(struct ntfs_acl));
+	ace->type =3D ACCESS_ALLOWED_ACE_TYPE;
+	ace->flags =3D OBJECT_INHERIT_ACE | CONTAINER_INHERIT_ACE;
+	ace->size =3D cpu_to_le16(sizeof(struct ntfs_ace) + 4);
+	ace->mask =3D cpu_to_le32(0x1f01ff);
+	ace->sid.revision =3D 1;
+	ace->sid.sub_authority_count =3D 1;
+	ace->sid.sub_authority[0] =3D 0;
+	ace->sid.identifier_authority.value[5] =3D 1;
+
+	ret =3D ntfs_attr_add(ni, AT_SECURITY_DESCRIPTOR, AT_UNNAMED, 0, (u8 *)sd,
+			sd_len);
+	if (ret)
+		ntfs_error(ni->vol->sb, "Failed to add SECURITY_DESCRIPTOR\n");
+
+	ntfs_free(sd);
+	return ret;
+}
+
+static struct ntfs_inode *__ntfs_create(struct mnt_idmap *idmap, struct in=
ode *dir,
+		__le16 *name, u8 name_len, mode_t mode, dev_t dev,
+		__le16 *target, int target_len)
+{
+	struct ntfs_inode *dir_ni =3D NTFS_I(dir);
+	struct ntfs_volume *vol =3D dir_ni->vol;
+	struct ntfs_inode *ni;
+	bool rollback_data =3D false, rollback_sd =3D false, rollback_reparse =3D=
 false;
+	struct file_name_attr *fn =3D NULL;
+	struct standard_information *si =3D NULL;
+	int err =3D 0, fn_len, si_len;
+	struct inode *vi;
+	struct mft_record *ni_mrec, *dni_mrec;
+	struct super_block *sb =3D dir_ni->vol->sb;
+	__le64 parent_mft_ref;
+	u64 child_mft_ref;
+	__le16 ea_size;
+
+	vi =3D new_inode(vol->sb);
+	if (!vi)
+		return ERR_PTR(-ENOMEM);
+
+	ntfs_init_big_inode(vi);
+	ni =3D NTFS_I(vi);
+	ni->vol =3D dir_ni->vol;
+	ni->name_len =3D 0;
+	ni->name =3D NULL;
+
+	/*
+	 * Set the appropriate mode, attribute type, and name.  For
+	 * directories, also setup the index values to the defaults.
+	 */
+	if (S_ISDIR(mode)) {
+		mode &=3D ~vol->dmask;
+
+		NInoSetMstProtected(ni);
+		ni->itype.index.block_size =3D 4096;
+		ni->itype.index.block_size_bits =3D ntfs_ffs(4096) - 1;
+		ni->itype.index.collation_rule =3D COLLATION_FILE_NAME;
+		if (vol->cluster_size <=3D ni->itype.index.block_size) {
+			ni->itype.index.vcn_size =3D vol->cluster_size;
+			ni->itype.index.vcn_size_bits =3D
+				vol->cluster_size_bits;
+		} else {
+			ni->itype.index.vcn_size =3D vol->sector_size;
+			ni->itype.index.vcn_size_bits =3D
+				vol->sector_size_bits;
+		}
+	} else {
+		mode &=3D ~vol->fmask;
+	}
+
+	if (IS_RDONLY(vi))
+		mode &=3D ~0222;
+
+	inode_init_owner(idmap, vi, dir, mode);
+
+	if (uid_valid(vol->uid))
+		vi->i_uid =3D vol->uid;
+
+	if (gid_valid(vol->gid))
+		vi->i_gid =3D vol->gid;
+
+	/*
+	 * Set the file size to 0, the ntfs inode sizes are set to 0 by
+	 * the call to ntfs_init_big_inode() below.
+	 */
+	vi->i_size =3D 0;
+	vi->i_blocks =3D 0;
+
+	inode_inc_iversion(vi);
+
+	simple_inode_init_ts(vi);
+	ni->i_crtime =3D inode_get_ctime(vi);
+
+	inode_set_mtime_to_ts(dir, ni->i_crtime);
+	inode_set_ctime_to_ts(dir, ni->i_crtime);
+	mark_inode_dirty(dir);
+
+	err =3D ntfs_mft_record_alloc(dir_ni->vol, mode, &ni, NULL,
+				    &ni_mrec);
+	if (err) {
+		iput(vi);
+		return ERR_PTR(err);
+	}
+
+	/*
+	 * Prevent iget and writeback from finding this inode.
+	 * Caller must call d_instantiate_new instead of d_instantiate.
+	 */
+	spin_lock(&vi->i_lock);
+	vi->i_state =3D I_NEW | I_CREATING;
+	spin_unlock(&vi->i_lock);
+
+	/* Add the inode to the inode hash for the superblock. */
+	vi->i_ino =3D ni->mft_no;
+	inode_set_iversion(vi, 1);
+	insert_inode_hash(vi);
+
+	mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+	mutex_lock_nested(&dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+	if (NInoBeingDeleted(dir_ni)) {
+		err =3D -ENOENT;
+		goto err_out;
+	}
+
+	dni_mrec =3D map_mft_record(dir_ni);
+	if (IS_ERR(dni_mrec)) {
+		ntfs_error(dir_ni->vol->sb, "failed to map mft record for file %ld.\n",
+			   dir_ni->mft_no);
+		err =3D -EIO;
+		goto err_out;
+	}
+	parent_mft_ref =3D MK_LE_MREF(dir_ni->mft_no,
+				    le16_to_cpu(dni_mrec->sequence_number));
+	unmap_mft_record(dir_ni);
+
+	/*
+	 * Create STANDARD_INFORMATION attribute. Write STANDARD_INFORMATION
+	 * version 1.2, windows will upgrade it to version 3 if needed.
+	 */
+	si_len =3D offsetof(struct standard_information, file_attributes) +
+		sizeof(__le32) + 12;
+	si =3D ntfs_malloc_nofs(si_len);
+	if (!si) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+	si->creation_time =3D si->last_data_change_time =3D utc2ntfs(ni->i_crtime=
);
+	si->last_mft_change_time =3D si->last_access_time =3D si->creation_time;
+
+	if (!S_ISREG(mode) && !S_ISDIR(mode))
+		si->file_attributes =3D FILE_ATTR_SYSTEM;
+
+	/* Add STANDARD_INFORMATION to inode. */
+	err =3D ntfs_attr_add(ni, AT_STANDARD_INFORMATION, AT_UNNAMED, 0, (u8 *)s=
i,
+			si_len);
+	if (err) {
+		ntfs_error(sb, "Failed to add STANDARD_INFORMATION attribute.\n");
+		goto err_out;
+	}
+
+	err =3D ntfs_sd_add_everyone(ni);
+	if (err)
+		goto err_out;
+	rollback_sd =3D true;
+
+	if (S_ISDIR(mode)) {
+		struct index_root *ir =3D NULL;
+		struct index_entry *ie;
+		int ir_len, index_len;
+
+		/* Create struct index_root attribute. */
+		index_len =3D sizeof(struct index_header) + sizeof(struct index_entry_he=
ader);
+		ir_len =3D offsetof(struct index_root, index) + index_len;
+		ir =3D ntfs_malloc_nofs(ir_len);
+		if (!ir) {
+			err =3D -ENOMEM;
+			goto err_out;
+		}
+		ir->type =3D AT_FILE_NAME;
+		ir->collation_rule =3D COLLATION_FILE_NAME;
+		ir->index_block_size =3D cpu_to_le32(ni->vol->index_record_size);
+		if (ni->vol->cluster_size <=3D ni->vol->index_record_size)
+			ir->clusters_per_index_block =3D
+				ni->vol->index_record_size >> ni->vol->cluster_size_bits;
+		else
+			ir->clusters_per_index_block =3D
+				ni->vol->index_record_size >> ni->vol->sector_size_bits;
+		ir->index.entries_offset =3D cpu_to_le32(sizeof(struct index_header));
+		ir->index.index_length =3D cpu_to_le32(index_len);
+		ir->index.allocated_size =3D cpu_to_le32(index_len);
+		ie =3D (struct index_entry *)((u8 *)ir + sizeof(struct index_root));
+		ie->length =3D cpu_to_le16(sizeof(struct index_entry_header));
+		ie->key_length =3D 0;
+		ie->flags =3D INDEX_ENTRY_END;
+
+		/* Add struct index_root attribute to inode. */
+		err =3D ntfs_attr_add(ni, AT_INDEX_ROOT, I30, 4, (u8 *)ir, ir_len);
+		if (err) {
+			ntfs_free(ir);
+			ntfs_error(vi->i_sb, "Failed to add struct index_root attribute.\n");
+			goto err_out;
+		}
+		ntfs_free(ir);
+		err =3D ntfs_attr_open(ni, AT_INDEX_ROOT, I30, 4);
+		if (err)
+			goto err_out;
+	} else {
+		/* Add DATA attribute to inode. */
+		err =3D ntfs_attr_add(ni, AT_DATA, AT_UNNAMED, 0, NULL, 0);
+		if (err) {
+			ntfs_error(dir_ni->vol->sb, "Failed to add DATA attribute.\n");
+			goto err_out;
+		}
+		rollback_data =3D true;
+
+		err =3D ntfs_attr_open(ni, AT_DATA, AT_UNNAMED, 0);
+		if (err)
+			goto err_out;
+
+		if (S_ISLNK(mode)) {
+			err =3D ntfs_reparse_set_wsl_symlink(ni, target, target_len);
+			if (!err)
+				rollback_reparse =3D true;
+		} else if (S_ISBLK(mode) || S_ISCHR(mode) || S_ISSOCK(mode) ||
+			   S_ISFIFO(mode)) {
+			si->file_attributes =3D FILE_ATTRIBUTE_RECALL_ON_OPEN;
+			ni->flags =3D FILE_ATTRIBUTE_RECALL_ON_OPEN;
+			err =3D ntfs_reparse_set_wsl_not_symlink(ni, mode);
+			if (!err)
+				rollback_reparse =3D true;
+		}
+		if (err)
+			goto err_out;
+	}
+
+	err =3D ntfs_ea_set_wsl_inode(vi, dev, &ea_size,
+			NTFS_EA_UID | NTFS_EA_GID | NTFS_EA_MODE);
+	if (err)
+		goto err_out;
+
+	/* Create FILE_NAME attribute. */
+	fn_len =3D sizeof(struct file_name_attr) + name_len * sizeof(__le16);
+	fn =3D ntfs_malloc_nofs(fn_len);
+	if (!fn) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+	fn->file_attributes |=3D ni->flags;
+	fn->parent_directory =3D parent_mft_ref;
+	fn->file_name_length =3D name_len;
+	fn->file_name_type =3D FILE_NAME_POSIX;
+	fn->type.ea.packed_ea_size =3D ea_size;
+	if (S_ISDIR(mode)) {
+		fn->file_attributes =3D FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT;
+		fn->allocated_size =3D fn->data_size =3D 0;
+	} else {
+		fn->data_size =3D cpu_to_le64(ni->data_size);
+		fn->allocated_size =3D cpu_to_le64(ni->allocated_size);
+	}
+	if (!S_ISREG(mode) && !S_ISDIR(mode))
+		fn->file_attributes |=3D FILE_ATTR_SYSTEM;
+	if (NVolHideDotFiles(vol) && (name_len > 0 && name[0] =3D=3D '.'))
+		fn->file_attributes |=3D FILE_ATTR_HIDDEN;
+	fn->creation_time =3D fn->last_data_change_time =3D utc2ntfs(ni->i_crtime=
);
+	fn->last_mft_change_time =3D fn->last_access_time =3D fn->creation_time;
+	memcpy(fn->file_name, name, name_len * sizeof(__le16));
+
+	/* Add FILE_NAME attribute to inode. */
+	err =3D ntfs_attr_add(ni, AT_FILE_NAME, AT_UNNAMED, 0, (u8 *)fn, fn_len);
+	if (err) {
+		ntfs_error(sb, "Failed to add FILE_NAME attribute.\n");
+		goto err_out;
+	}
+
+	child_mft_ref =3D MK_MREF(ni->mft_no,
+				le16_to_cpu(ni_mrec->sequence_number));
+	/* Set hard links count and directory flag. */
+	ni_mrec->link_count =3D cpu_to_le16(1);
+	mark_mft_record_dirty(ni);
+
+	/* Add FILE_NAME attribute to index. */
+	err =3D ntfs_index_add_filename(dir_ni, fn, child_mft_ref);
+	if (err) {
+		ntfs_debug("Failed to add entry to the index");
+		goto err_out;
+	}
+
+	unmap_mft_record(ni);
+	mutex_unlock(&dir_ni->mrec_lock);
+	mutex_unlock(&ni->mrec_lock);
+
+	ni->flags =3D fn->file_attributes;
+	/* Set the sequence number. */
+	vi->i_generation =3D ni->seq_no;
+	set_nlink(vi, 1);
+	ntfs_set_vfs_operations(vi, mode, dev);
+
+#ifdef CONFIG_NTFSPLUS_FS_POSIX_ACL
+	if (!S_ISLNK(mode) && (sb->s_flags & SB_POSIXACL)) {
+		err =3D ntfsp_init_acl(idmap, vi, dir);
+		if (err)
+			goto err_out;
+	} else
+#endif
+	{
+		vi->i_flags |=3D S_NOSEC;
+	}
+
+	/* Done! */
+	ntfs_free(fn);
+	ntfs_free(si);
+	ntfs_debug("Done.\n");
+	return ni;
+
+err_out:
+	if (rollback_sd)
+		ntfs_attr_remove(ni, AT_SECURITY_DESCRIPTOR, AT_UNNAMED, 0);
+
+	if (rollback_data)
+		ntfs_attr_remove(ni, AT_DATA, AT_UNNAMED, 0);
+
+	if (rollback_reparse)
+		ntfs_delete_reparse_index(ni);
+	/*
+	 * Free extent MFT records (should not exist any with current
+	 * ntfs_create implementation, but for any case if something will be
+	 * changed in the future).
+	 */
+	while (ni->nr_extents !=3D 0) {
+		int err2;
+
+		err2 =3D ntfs_mft_record_free(ni->vol, *(ni->ext.extent_ntfs_inos));
+		if (err2)
+			ntfs_error(sb,
+				"Failed to free extent MFT record. Leaving inconsistent metadata.\n");
+		ntfs_inode_close(*(ni->ext.extent_ntfs_inos));
+	}
+	if (ntfs_mft_record_free(ni->vol, ni))
+		ntfs_error(sb,
+			"Failed to free MFT record. Leaving inconsistent metadata. Run chkdsk.\=
n");
+	unmap_mft_record(ni);
+	ntfs_free(fn);
+	ntfs_free(si);
+
+	mutex_unlock(&dir_ni->mrec_lock);
+	mutex_unlock(&ni->mrec_lock);
+
+	remove_inode_hash(vi);
+	discard_new_inode(vi);
+	return ERR_PTR(err);
+}
+
+static int ntfs_create(struct mnt_idmap *idmap, struct inode *dir,
+		struct dentry *dentry, umode_t mode, bool excl)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(dir->i_sb);
+	struct ntfs_inode *ni;
+	__le16 *uname;
+	int uname_len, err;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+				  &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(vol->sb, "Failed to convert name to unicode.");
+		return uname_len;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname, uname_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname);
+		return err;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	ni =3D __ntfs_create(idmap, dir, uname, uname_len, S_IFREG | mode, 0, NUL=
L, 0);
+	kmem_cache_free(ntfs_name_cache, uname);
+	if (IS_ERR(ni))
+		return PTR_ERR(ni);
+
+	d_instantiate_new(dentry, VFS_I(ni));
+
+	return 0;
+}
+
+static int ntfs_check_unlinkable_dir(struct ntfs_attr_search_ctx *ctx, str=
uct file_name_attr *fn)
+{
+	int link_count;
+	int ret;
+	struct ntfs_inode *ni =3D ctx->base_ntfs_ino ? ctx->base_ntfs_ino : ctx->=
ntfs_ino;
+	struct mft_record *ni_mrec =3D ctx->base_mrec ? ctx->base_mrec : ctx->mre=
c;
+
+	ret =3D ntfs_check_empty_dir(ni, ni_mrec);
+	if (!ret || ret !=3D -ENOTEMPTY)
+		return ret;
+
+	link_count =3D le16_to_cpu(ni_mrec->link_count);
+	/*
+	 * Directory is non-empty, so we can unlink only if there is more than
+	 * one "real" hard link, i.e. links aren't different DOS and WIN32 names
+	 */
+	if ((link_count =3D=3D 1) ||
+	    (link_count =3D=3D 2 && fn->file_name_type =3D=3D FILE_NAME_DOS)) {
+		ret =3D -ENOTEMPTY;
+		ntfs_debug("Non-empty directory without hard links\n");
+		goto no_hardlink;
+	}
+
+	ret =3D 0;
+no_hardlink:
+	return ret;
+}
+
+static int ntfs_test_inode_attr(struct inode *vi, void *data)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	unsigned long mft_no =3D (unsigned long)data;
+
+	if (ni->mft_no !=3D mft_no)
+		return 0;
+	if (NInoAttr(ni) || ni->nr_extents =3D=3D -1)
+		return 1;
+	else
+		return 0;
+}
+
+/**
+ * ntfs_delete - delete file or directory from ntfs volume
+ * @ni:         ntfs inode for object to delte
+ * @dir_ni:     ntfs inode for directory in which delete object
+ * @name:       unicode name of the object to delete
+ * @name_len:   length of the name in unicode characters
+ * @need_lock:  whether mrec lock is needed or not
+ *
+ * @ni is always closed after the call to this function (even if it failed=
),
+ * user does not need to call ntfs_inode_close himself.
+ */
+static int ntfs_delete(struct ntfs_inode *ni, struct ntfs_inode *dir_ni,
+		__le16 *name, u8 name_len, bool need_lock)
+{
+	struct ntfs_attr_search_ctx *actx =3D NULL;
+	struct file_name_attr *fn =3D NULL;
+	bool looking_for_dos_name =3D false, looking_for_win32_name =3D false;
+	bool case_sensitive_match =3D true;
+	int err =3D 0;
+	struct mft_record *ni_mrec;
+	struct super_block *sb;
+	bool link_count_zero =3D false;
+
+	ntfs_debug("Entering.\n");
+
+	if (need_lock =3D=3D true) {
+		mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+		mutex_lock_nested(&dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+	}
+
+	sb =3D dir_ni->vol->sb;
+
+	if (ni->nr_extents =3D=3D -1)
+		ni =3D ni->ext.base_ntfs_ino;
+	if (dir_ni->nr_extents =3D=3D -1)
+		dir_ni =3D dir_ni->ext.base_ntfs_ino;
+	/*
+	 * Search for FILE_NAME attribute with such name. If it's in POSIX or
+	 * WIN32_AND_DOS namespace, then simply remove it from index and inode.
+	 * If filename in DOS or in WIN32 namespace, then remove DOS name first,
+	 * only then remove WIN32 name.
+	 */
+	actx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!actx) {
+		ntfs_error(sb, "%s, Failed to get search context", __func__);
+		mutex_unlock(&dir_ni->mrec_lock);
+		mutex_unlock(&ni->mrec_lock);
+		return -ENOMEM;
+	}
+search:
+	while ((err =3D ntfs_attr_lookup(AT_FILE_NAME, AT_UNNAMED, 0, CASE_SENSIT=
IVE,
+				0, NULL, 0, actx)) =3D=3D 0) {
+#ifdef DEBUG
+		unsigned char *s;
+#endif
+		bool case_sensitive =3D IGNORE_CASE;
+
+		fn =3D (struct file_name_attr *)((u8 *)actx->attr +
+				le16_to_cpu(actx->attr->data.resident.value_offset));
+#ifdef DEBUG
+		s =3D ntfs_attr_name_get(ni->vol, fn->file_name, fn->file_name_length);
+		ntfs_debug("name: '%s'  type: %d  dos: %d  win32: %d case: %d\n",
+				s, fn->file_name_type,
+				looking_for_dos_name, looking_for_win32_name,
+				case_sensitive_match);
+		ntfs_attr_name_free(&s);
+#endif
+		if (looking_for_dos_name) {
+			if (fn->file_name_type =3D=3D FILE_NAME_DOS)
+				break;
+			continue;
+		}
+		if (looking_for_win32_name) {
+			if  (fn->file_name_type =3D=3D FILE_NAME_WIN32)
+				break;
+			continue;
+		}
+
+		/* Ignore hard links from other directories */
+		if (dir_ni->mft_no !=3D MREF_LE(fn->parent_directory)) {
+			ntfs_debug("MFT record numbers don't match (%lu !=3D %lu)\n",
+					dir_ni->mft_no,
+					MREF_LE(fn->parent_directory));
+			continue;
+		}
+
+		if (fn->file_name_type =3D=3D FILE_NAME_POSIX || case_sensitive_match)
+			case_sensitive =3D CASE_SENSITIVE;
+
+		if (ntfs_names_are_equal(fn->file_name, fn->file_name_length,
+					name, name_len, case_sensitive,
+					ni->vol->upcase, ni->vol->upcase_len)) {
+			if (fn->file_name_type =3D=3D FILE_NAME_WIN32) {
+				looking_for_dos_name =3D true;
+				ntfs_attr_reinit_search_ctx(actx);
+				continue;
+			}
+			if (fn->file_name_type =3D=3D FILE_NAME_DOS)
+				looking_for_dos_name =3D true;
+			break;
+		}
+	}
+	if (err) {
+		/*
+		 * If case sensitive search failed, then try once again
+		 * ignoring case.
+		 */
+		if (err =3D=3D -ENOENT && case_sensitive_match) {
+			case_sensitive_match =3D false;
+			ntfs_attr_reinit_search_ctx(actx);
+			goto search;
+		}
+		goto err_out;
+	}
+
+	err =3D ntfs_check_unlinkable_dir(actx, fn);
+	if (err)
+		goto err_out;
+
+	err =3D ntfs_index_remove(dir_ni, fn, le32_to_cpu(actx->attr->data.reside=
nt.value_length));
+	if (err)
+		goto err_out;
+
+	err =3D ntfs_attr_record_rm(actx);
+	if (err)
+		goto err_out;
+
+	ni_mrec =3D actx->base_mrec ? actx->base_mrec : actx->mrec;
+	ni_mrec->link_count =3D cpu_to_le16(le16_to_cpu(ni_mrec->link_count) - 1);
+	drop_nlink(VFS_I(ni));
+
+	mark_mft_record_dirty(ni);
+	if (looking_for_dos_name) {
+		looking_for_dos_name =3D false;
+		looking_for_win32_name =3D true;
+		ntfs_attr_reinit_search_ctx(actx);
+		goto search;
+	}
+
+	/*
+	 * If hard link count is not equal to zero then we are done. In other
+	 * case there are no reference to this inode left, so we should free all
+	 * non-resident attributes and mark all MFT record as not in use.
+	 */
+	if (ni_mrec->link_count =3D=3D 0) {
+		NInoSetBeingDeleted(ni);
+		ntfs_delete_reparse_index(ni);
+		link_count_zero =3D true;
+	}
+
+	ntfs_attr_put_search_ctx(actx);
+	if (need_lock =3D=3D true) {
+		mutex_unlock(&dir_ni->mrec_lock);
+		mutex_unlock(&ni->mrec_lock);
+	}
+
+	/*
+	 * If hard link count is not equal to zero then we are done. In other
+	 * case there are no reference to this inode left, so we should free all
+	 * non-resident attributes and mark all MFT record as not in use.
+	 */
+	if (link_count_zero =3D=3D true) {
+		struct inode *attr_vi;
+
+		while ((attr_vi =3D ilookup5(sb, ni->mft_no, ntfs_test_inode_attr,
+					   (void *)ni->mft_no)) !=3D NULL) {
+			clear_nlink(attr_vi);
+			iput(attr_vi);
+		}
+	}
+	ntfs_debug("Done.\n");
+	return 0;
+err_out:
+	ntfs_attr_put_search_ctx(actx);
+	mutex_unlock(&dir_ni->mrec_lock);
+	mutex_unlock(&ni->mrec_lock);
+	return err;
+}
+
+static int ntfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+	struct inode *vi =3D dentry->d_inode;
+	struct super_block *sb =3D dir->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	int err =3D 0;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	__le16 *uname =3D NULL;
+	int uname_len;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+				  &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to Unicode.");
+		return -ENOMEM;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname, uname_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname);
+		return err;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	err =3D ntfs_delete(ni, NTFS_I(dir), uname, uname_len, true);
+	if (err)
+		goto out;
+
+	inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
+	mark_inode_dirty(dir);
+	inode_set_ctime_to_ts(vi, inode_get_ctime(dir));
+	if (vi->i_nlink)
+		mark_inode_dirty(vi);
+out:
+	kmem_cache_free(ntfs_name_cache, uname);
+	return err;
+}
+
+static struct dentry *ntfs_mkdir(struct mnt_idmap *idmap, struct inode *di=
r,
+		struct dentry *dentry, umode_t mode)
+{
+	struct super_block *sb =3D dir->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	int err =3D 0;
+	struct ntfs_inode *ni;
+	__le16 *uname;
+	int uname_len;
+
+	if (NVolShutdown(vol))
+		return ERR_PTR(-EIO);
+
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+				  &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to unicode.");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname, uname_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname);
+		return ERR_PTR(err);
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	ni =3D __ntfs_create(idmap, dir, uname, uname_len, S_IFDIR | mode, 0, NUL=
L, 0);
+	kmem_cache_free(ntfs_name_cache, uname);
+	if (IS_ERR(ni)) {
+		err =3D PTR_ERR(ni);
+		return ERR_PTR(err);
+	}
+
+	d_instantiate_new(dentry, VFS_I(ni));
+	return ERR_PTR(err);
+}
+
+static int ntfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+	struct inode *vi =3D dentry->d_inode;
+	struct super_block *sb =3D dir->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	int err =3D 0;
+	struct ntfs_inode *ni;
+	__le16 *uname =3D NULL;
+	int uname_len;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	ni =3D NTFS_I(vi);
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+				  &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to unicode.");
+		return -ENOMEM;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname, uname_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname);
+		return err;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	err =3D ntfs_delete(ni, NTFS_I(dir), uname, uname_len, true);
+	if (err)
+		goto out;
+
+	inode_set_mtime_to_ts(vi, inode_set_atime_to_ts(vi, current_time(vi)));
+out:
+	kmem_cache_free(ntfs_name_cache, uname);
+	return err;
+}
+
+/**
+ * __ntfs_link - create hard link for file or directory
+ * @ni:		ntfs inode for object to create hard link
+ * @dir_ni:	ntfs inode for directory in which new link should be placed
+ * @name:	unicode name of the new link
+ * @name_len:	length of the name in unicode characters
+ *
+ * NOTE: At present we allow creating hardlinks to directories, we use them
+ * in a temporary state during rename. But it's defenitely bad idea to have
+ * hard links to directories as a result of operation.
+ */
+static int __ntfs_link(struct ntfs_inode *ni, struct ntfs_inode *dir_ni,
+		__le16 *name, u8 name_len)
+{
+	struct super_block *sb;
+	struct inode *vi =3D VFS_I(ni);
+	struct file_name_attr *fn =3D NULL;
+	int fn_len, err =3D 0;
+	struct mft_record *dir_mrec =3D NULL, *ni_mrec =3D NULL;
+
+	ntfs_debug("Entering.\n");
+
+	sb =3D dir_ni->vol->sb;
+	if (NInoBeingDeleted(dir_ni) || NInoBeingDeleted(ni))
+		return -ENOENT;
+
+	ni_mrec =3D map_mft_record(ni);
+	if (IS_ERR(ni_mrec)) {
+		err =3D -EIO;
+		goto err_out;
+	}
+
+	if (le16_to_cpu(ni_mrec->link_count) =3D=3D 0) {
+		err =3D -ENOENT;
+		goto err_out;
+	}
+
+	/* Create FILE_NAME attribute. */
+	fn_len =3D sizeof(struct file_name_attr) + name_len * sizeof(__le16);
+	fn =3D ntfs_malloc_nofs(fn_len);
+	if (!fn) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+	dir_mrec =3D map_mft_record(dir_ni);
+	if (IS_ERR(dir_mrec)) {
+		err =3D -EIO;
+		goto err_out;
+	}
+
+	fn->parent_directory =3D MK_LE_MREF(dir_ni->mft_no,
+			le16_to_cpu(dir_mrec->sequence_number));
+	unmap_mft_record(dir_ni);
+	fn->file_name_length =3D name_len;
+	fn->file_name_type =3D FILE_NAME_POSIX;
+	fn->file_attributes =3D ni->flags;
+	if (ni_mrec->flags & MFT_RECORD_IS_DIRECTORY) {
+		fn->file_attributes |=3D FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT;
+		fn->allocated_size =3D fn->data_size =3D 0;
+	} else {
+		if (NInoSparse(ni) || NInoCompressed(ni))
+			fn->allocated_size =3D
+				cpu_to_le64(ni->itype.compressed.size);
+		else
+			fn->allocated_size =3D cpu_to_le64(ni->allocated_size);
+		fn->data_size =3D cpu_to_le64(ni->data_size);
+	}
+	if (NVolHideDotFiles(dir_ni->vol) && (name_len > 0 && name[0] =3D=3D '.'))
+		fn->file_attributes |=3D FILE_ATTR_HIDDEN;
+
+	fn->creation_time =3D utc2ntfs(ni->i_crtime);
+	fn->last_data_change_time =3D utc2ntfs(inode_get_mtime(vi));
+	fn->last_mft_change_time =3D utc2ntfs(inode_get_ctime(vi));
+	fn->last_access_time =3D utc2ntfs(inode_get_atime(vi));
+	memcpy(fn->file_name, name, name_len * sizeof(__le16));
+
+	/* Add FILE_NAME attribute to index. */
+	err =3D ntfs_index_add_filename(dir_ni, fn, MK_MREF(ni->mft_no,
+					le16_to_cpu(ni_mrec->sequence_number)));
+	if (err) {
+		ntfs_error(sb, "Failed to add filename to the index");
+		goto err_out;
+	}
+	/* Add FILE_NAME attribute to inode. */
+	err =3D ntfs_attr_add(ni, AT_FILE_NAME, AT_UNNAMED, 0, (u8 *)fn, fn_len);
+	if (err) {
+		ntfs_error(sb, "Failed to add FILE_NAME attribute.\n");
+		/* Try to remove just added attribute from index. */
+		if (ntfs_index_remove(dir_ni, fn, fn_len))
+			goto rollback_failed;
+		goto err_out;
+	}
+	/* Increment hard links count. */
+	ni_mrec->link_count =3D cpu_to_le16(le16_to_cpu(ni_mrec->link_count) + 1);
+	inc_nlink(VFS_I(ni));
+
+	/* Done! */
+	mark_mft_record_dirty(ni);
+	ntfs_free(fn);
+	unmap_mft_record(ni);
+
+	ntfs_debug("Done.\n");
+
+	return 0;
+rollback_failed:
+	ntfs_error(sb, "Rollback failed. Leaving inconsistent metadata.\n");
+err_out:
+	ntfs_free(fn);
+	if (!IS_ERR_OR_NULL(ni_mrec))
+		unmap_mft_record(ni);
+	return err;
+}
+
+static int ntfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
+		struct dentry *old_dentry, struct inode *new_dir,
+		struct dentry *new_dentry, unsigned int flags)
+{
+	struct inode *old_inode, *new_inode =3D NULL;
+	int err =3D 0;
+	int is_dir;
+	struct super_block *sb =3D old_dir->i_sb;
+	__le16 *uname_new =3D NULL;
+	__le16 *uname_old =3D NULL;
+	int new_name_len;
+	int old_name_len;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	struct ntfs_inode *old_ni, *new_ni =3D NULL;
+	struct ntfs_inode *old_dir_ni =3D NTFS_I(old_dir), *new_dir_ni =3D NTFS_I=
(new_dir);
+
+	if (NVolShutdown(old_dir_ni->vol))
+		return -EIO;
+
+	if (flags & (RENAME_EXCHANGE | RENAME_WHITEOUT))
+		return -EINVAL;
+
+	new_name_len =3D ntfs_nlstoucs(NTFS_I(new_dir)->vol, new_dentry->d_name.n=
ame,
+				     new_dentry->d_name.len, &uname_new,
+				     NTFS_MAX_NAME_LEN);
+	if (new_name_len < 0) {
+		if (new_name_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to unicode.");
+		return -ENOMEM;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname_new, new_name_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname_new);
+		return err;
+	}
+
+	old_name_len =3D ntfs_nlstoucs(NTFS_I(old_dir)->vol, old_dentry->d_name.n=
ame,
+				     old_dentry->d_name.len, &uname_old,
+				     NTFS_MAX_NAME_LEN);
+	if (old_name_len < 0) {
+		kmem_cache_free(ntfs_name_cache, uname_new);
+		if (old_name_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to unicode.");
+		return -ENOMEM;
+	}
+
+	old_inode =3D old_dentry->d_inode;
+	new_inode =3D new_dentry->d_inode;
+	old_ni =3D NTFS_I(old_inode);
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	mutex_lock_nested(&old_ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+	mutex_lock_nested(&old_dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+
+	if (NInoBeingDeleted(old_ni) || NInoBeingDeleted(old_dir_ni)) {
+		err =3D -ENOENT;
+		goto unlock_old;
+	}
+
+	is_dir =3D S_ISDIR(old_inode->i_mode);
+
+	if (new_inode) {
+		new_ni =3D NTFS_I(new_inode);
+		mutex_lock_nested(&new_ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL_2);
+		if (old_dir !=3D new_dir) {
+			mutex_lock_nested(&new_dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT_2);
+			if (NInoBeingDeleted(new_dir_ni)) {
+				err =3D -ENOENT;
+				goto err_out;
+			}
+		}
+
+		if (NInoBeingDeleted(new_ni)) {
+			err =3D -ENOENT;
+			goto err_out;
+		}
+
+		if (is_dir) {
+			struct mft_record *ni_mrec;
+
+			ni_mrec =3D map_mft_record(NTFS_I(new_inode));
+			if (IS_ERR(ni_mrec)) {
+				err =3D -EIO;
+				goto err_out;
+			}
+			err =3D ntfs_check_empty_dir(NTFS_I(new_inode), ni_mrec);
+			unmap_mft_record(NTFS_I(new_inode));
+			if (err)
+				goto err_out;
+		}
+
+		err =3D ntfs_delete(new_ni, new_dir_ni, uname_new, new_name_len, false);
+		if (err)
+			goto err_out;
+	} else {
+		if (old_dir !=3D new_dir) {
+			mutex_lock_nested(&new_dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT_2);
+			if (NInoBeingDeleted(new_dir_ni)) {
+				err =3D -ENOENT;
+				goto err_out;
+			}
+		}
+	}
+
+	err =3D __ntfs_link(old_ni, new_dir_ni, uname_new, new_name_len);
+	if (err)
+		goto err_out;
+
+	err =3D ntfs_delete(old_ni, old_dir_ni, uname_old, old_name_len, false);
+	if (err) {
+		int err2;
+
+		ntfs_error(sb, "Failed to delete old ntfs inode(%ld) in old dir, err : %=
d\n",
+				old_ni->mft_no, err);
+		err2 =3D ntfs_delete(old_ni, new_dir_ni, uname_new, new_name_len, false);
+		if (err2)
+			ntfs_error(sb, "Failed to delete old ntfs inode in new dir, err : %d\n",
+					err2);
+		goto err_out;
+	}
+
+	simple_rename_timestamp(old_dir, old_dentry, new_dir, new_dentry);
+	mark_inode_dirty(old_inode);
+	mark_inode_dirty(old_dir);
+	if (old_dir !=3D new_dir)
+		mark_inode_dirty(new_dir);
+	if (new_inode)
+		mark_inode_dirty(old_inode);
+
+	inode_inc_iversion(new_dir);
+
+err_out:
+	if (old_dir !=3D new_dir)
+		mutex_unlock(&new_dir_ni->mrec_lock);
+	if (new_inode)
+		mutex_unlock(&new_ni->mrec_lock);
+
+unlock_old:
+	mutex_unlock(&old_dir_ni->mrec_lock);
+	mutex_unlock(&old_ni->mrec_lock);
+	if (uname_new)
+		kmem_cache_free(ntfs_name_cache, uname_new);
+	if (uname_old)
+		kmem_cache_free(ntfs_name_cache, uname_old);
+
+	return err;
+}
+
+static int ntfs_symlink(struct mnt_idmap *idmap, struct inode *dir,
+		struct dentry *dentry, const char *symname)
+{
+	struct super_block *sb =3D dir->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	struct inode *vi;
+	int err =3D 0;
+	struct ntfs_inode *ni;
+	__le16 *usrc;
+	__le16 *utarget;
+	int usrc_len;
+	int utarget_len;
+	int symlen =3D strlen(symname);
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	usrc_len =3D ntfs_nlstoucs(vol, dentry->d_name.name,
+				 dentry->d_name.len, &usrc, NTFS_MAX_NAME_LEN);
+	if (usrc_len < 0) {
+		if (usrc_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to Unicode.");
+		err =3D  -ENOMEM;
+		goto out;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, usrc, usrc_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, usrc);
+		goto out;
+	}
+
+	utarget_len =3D ntfs_nlstoucs(vol, symname, symlen, &utarget,
+				    PATH_MAX);
+	if (utarget_len < 0) {
+		if (utarget_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert target name to Unicode.");
+		err =3D  -ENOMEM;
+		kmem_cache_free(ntfs_name_cache, usrc);
+		goto out;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	ni =3D __ntfs_create(idmap, dir, usrc, usrc_len, S_IFLNK | 0777, 0,
+			utarget, utarget_len);
+	kmem_cache_free(ntfs_name_cache, usrc);
+	kvfree(utarget);
+	if (IS_ERR(ni)) {
+		err =3D PTR_ERR(ni);
+		goto out;
+	}
+
+	vi =3D VFS_I(ni);
+	vi->i_size =3D symlen;
+	d_instantiate_new(dentry, vi);
+out:
+	return err;
+}
+
+static int ntfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
+		struct dentry *dentry, umode_t mode, dev_t rdev)
+{
+	struct super_block *sb =3D dir->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	int err =3D 0;
+	struct ntfs_inode *ni;
+	__le16 *uname =3D NULL;
+	int uname_len;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name,
+			dentry->d_name.len, &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to Unicode.");
+		return -ENOMEM;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname, uname_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname);
+		return err;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	switch (mode & S_IFMT) {
+	case S_IFCHR:
+	case S_IFBLK:
+		ni =3D __ntfs_create(idmap, dir, uname, uname_len,
+				mode, rdev, NULL, 0);
+		break;
+	default:
+		ni =3D __ntfs_create(idmap, dir, uname, uname_len,
+				mode, 0, NULL, 0);
+	}
+
+	kmem_cache_free(ntfs_name_cache, uname);
+	if (IS_ERR(ni)) {
+		err =3D PTR_ERR(ni);
+		goto out;
+	}
+
+	d_instantiate_new(dentry, VFS_I(ni));
+out:
+	return err;
+}
+
+static int ntfs_link(struct dentry *old_dentry, struct inode *dir,
+		struct dentry *dentry)
+{
+	struct inode *vi =3D old_dentry->d_inode;
+	struct super_block *sb =3D vi->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	__le16 *uname =3D NULL;
+	int uname_len;
+	int err;
+	struct ntfs_inode *ni =3D NTFS_I(vi), *dir_ni =3D NTFS_I(dir);
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name,
+			dentry->d_name.len, &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to unicode.");
+		err =3D -ENOMEM;
+		goto out;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	ihold(vi);
+	mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+	mutex_lock_nested(&dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+	err =3D __ntfs_link(NTFS_I(vi), NTFS_I(dir), uname, uname_len);
+	if (err) {
+		mutex_unlock(&dir_ni->mrec_lock);
+		mutex_unlock(&ni->mrec_lock);
+		iput(vi);
+		pr_err("failed to create link, err =3D %d\n", err);
+		goto out;
+	}
+
+	inode_inc_iversion(dir);
+	simple_inode_init_ts(dir);
+
+	inode_inc_iversion(vi);
+	simple_inode_init_ts(vi);
+
+	/* timestamp is already written, so mark_inode_dirty() is unneeded. */
+	d_instantiate(dentry, vi);
+	mutex_unlock(&dir_ni->mrec_lock);
+	mutex_unlock(&ni->mrec_lock);
+
+out:
+	ntfs_free(uname);
+	return err;
+}
+
+/**
+ * Inode operations for directories.
+ */
+const struct inode_operations ntfs_dir_inode_ops =3D {
+	.lookup		=3D ntfs_lookup,	/* VFS: Lookup directory. */
+	.create		=3D ntfs_create,
+	.unlink		=3D ntfs_unlink,
+	.mkdir		=3D ntfs_mkdir,
+	.rmdir		=3D ntfs_rmdir,
+	.rename		=3D ntfs_rename,
+	.get_acl	=3D ntfsp_get_acl,
+	.set_acl	=3D ntfsp_set_acl,
+	.listxattr	=3D ntfsp_listxattr,
+	.setattr	=3D ntfsp_setattr,
+	.getattr	=3D ntfsp_getattr,
+	.symlink	=3D ntfs_symlink,
+	.mknod		=3D ntfs_mknod,
+	.link		=3D ntfs_link,
+};
+
+/**
+ * ntfs_get_parent - find the dentry of the parent of a given directory de=
ntry
+ * @child_dent:		dentry of the directory whose parent directory to find
+ *
+ * Find the dentry for the parent directory of the directory specified by =
the
+ * dentry @child_dent.  This function is called from
+ * fs/exportfs/expfs.c::find_exported_dentry() which in turn is called fro=
m the
+ * default ->decode_fh() which is export_decode_fh() in the same file.
+ *
+ * Note: ntfs_get_parent() is called with @d_inode(child_dent)->i_mutex do=
wn.
+ *
+ * Return the dentry of the parent directory on success or the error code =
on
+ * error (IS_ERR() is true).
+ */
+static struct dentry *ntfs_get_parent(struct dentry *child_dent)
+{
+	struct inode *vi =3D d_inode(child_dent);
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct mft_record *mrec;
+	struct ntfs_attr_search_ctx *ctx;
+	struct attr_record *attr;
+	struct file_name_attr *fn;
+	unsigned long parent_ino;
+	int err;
+
+	ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
+	/* Get the mft record of the inode belonging to the child dentry. */
+	mrec =3D map_mft_record(ni);
+	if (IS_ERR(mrec))
+		return ERR_CAST(mrec);
+	/* Find the first file name attribute in the mft record. */
+	ctx =3D ntfs_attr_get_search_ctx(ni, mrec);
+	if (unlikely(!ctx)) {
+		unmap_mft_record(ni);
+		return ERR_PTR(-ENOMEM);
+	}
+try_next:
+	err =3D ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, CASE_SENSITIVE, 0, NULL,
+			0, ctx);
+	if (unlikely(err)) {
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(ni);
+		if (err =3D=3D -ENOENT)
+			ntfs_error(vi->i_sb,
+				   "Inode 0x%lx does not have a file name attribute.  Run chkdsk.",
+				   vi->i_ino);
+		return ERR_PTR(err);
+	}
+	attr =3D ctx->attr;
+	if (unlikely(attr->non_resident))
+		goto try_next;
+	fn =3D (struct file_name_attr *)((u8 *)attr +
+			le16_to_cpu(attr->data.resident.value_offset));
+	if (unlikely((u8 *)fn + le32_to_cpu(attr->data.resident.value_length) >
+	    (u8 *)attr + le32_to_cpu(attr->length)))
+		goto try_next;
+	/* Get the inode number of the parent directory. */
+	parent_ino =3D MREF_LE(fn->parent_directory);
+	/* Release the search context and the mft record of the child. */
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(ni);
+
+	return d_obtain_alias(ntfs_iget(vi->i_sb, parent_ino));
+}
+
+static struct inode *ntfs_nfs_get_inode(struct super_block *sb,
+		u64 ino, u32 generation)
+{
+	struct inode *inode;
+
+	inode =3D ntfs_iget(sb, ino);
+	if (!IS_ERR(inode)) {
+		if (inode->i_generation !=3D generation) {
+			iput(inode);
+			inode =3D ERR_PTR(-ESTALE);
+		}
+	}
+
+	return inode;
+}
+
+static struct dentry *ntfs_fh_to_dentry(struct super_block *sb, struct fid=
 *fid,
+		int fh_len, int fh_type)
+{
+	return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
+				    ntfs_nfs_get_inode);
+}
+
+static struct dentry *ntfs_fh_to_parent(struct super_block *sb, struct fid=
 *fid,
+		int fh_len, int fh_type)
+{
+	return generic_fh_to_parent(sb, fid, fh_len, fh_type,
+				    ntfs_nfs_get_inode);
+}
+
+/**
+ * Export operations allowing NFS exporting of mounted NTFS partitions.
+ */
+const struct export_operations ntfs_export_ops =3D {
+	.encode_fh =3D generic_encode_ino32_fh,
+	.get_parent	=3D ntfs_get_parent,	/* Find the parent of a given directory.=
 */
+	.fh_to_dentry	=3D ntfs_fh_to_dentry,
+	.fh_to_parent	=3D ntfs_fh_to_parent,
+};
--=20
2.25.1
From nobody Sun Feb  8 02:39:38 2026
Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com
 [209.85.214.173])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26F9130F7E9
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:00:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.173
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219643; cv=none;
 b=WGwl+VUnZk2iY7T86kvALg0P2yzT6iZmwVtxUiHpxm7LnQ7LrPCeKS11LvG15lJWzIC5f1AwnB9wOE3AcU5ejdmtIkVtNsSZzQ0OW11fhfLKbZpbbzWXVth21znisnMuCGcLT5/MqVY9au0/Xywr7AbKIyYdPQf8RcTosEXeu38=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219643; c=relaxed/simple;
	bh=H9rocB1T2+NA5MiD1IL0D6KPMY3rceXNY0XfEBfcIfY=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=gp8ABn9SGglfmS3kFyirTVuil7SMeOarpmI005bE2DzEVD6ZVTrkacpy6CFd6ge8dVcR4TtZ951t90d727qhHCKpOCz/80ZK56WpdrpdDZyQCbqv11lsphHbxa4E+Hu77h26LabViw0lFQMI+n63o9EKqpcew6roguhU/8zAEfw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.173
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f173.google.com with SMTP id
 d9443c01a7336-29844c68068so5387635ad.2
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:00:37 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219637; x=1764824437;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=k5rK2aVj6/sGHQm3VPe6glAbAw/p7icN3as7ejJA324=;
        b=GsTSdY0+uBs0VOQbR2HkZqlUZ6PWKPXKtb80RxQIXl+QcglffvvpFhBHX6mBohvr7I
         cIH92cTmimYvuBa/Rc51kyvO9Vwx0ajKOHMWzQHSVFOe76+p14m/Y7Gg+JQTdFdchLm9
         Q4fXUfiwin28YkBOaZ3uqQ3CXv9C5ouunVKoUm2TR+j6DXUzvtStUXa4bUOwp1NuMZCq
         fpWZrpIwzKLpFKivLR+bD0IjkgUU6nyUQB5KZJlcbZ1Zk+qfkv1oz9TNuEGwdOyqVuuY
         Kc1LrB1MG6dISJVEQ95ZMLj5IGtEE9JKKbylFWh7tIlyTw/07sxNMXg1ocbKObXCuwjA
         KOWQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCWZdFWKq/WnTn1yjXWqRi9VRucMFaPj+zlxVtB7kAQhBNbSDeJYbbfzyyecoBhsoGHWLVCw5NZFjVOsYhU=@vger.kernel.org
X-Gm-Message-State: AOJu0YxZ+XR2yM92fV+mStdp3wdyPtHPC6q3MwlHRvRarX8XgvREP7/e
	BKWzy26yjhLpP2cu3P2WJRcCOsrn5PO1A9p1eYa209UkVPLDvaY482uN
X-Gm-Gg: ASbGnctlaupfkxOHP9T5/T7IrDckshAqGv/xC2rLEg91i7v8ubqqIfWkqZOLdUwLk+o
	Ktyc8pgGlX7j7xha1zYpqqyge+VDvYqJauk4PlWgWEfXajrAaHBvyPZ7RePtVdw/2jo8WjNhctn
	qSIm57k4M7zeI4uoa8g5cJvF9MRT1XfGxuTeD+LuHgwl/FrKXbfZ9J+XLEv2XDGyrozQQcLjkGE
	D5FWhOpiOu0YyCaAnqC5SZmtBM5ZU2CA9Ca1NOCZUX//4dCrTxw18RUyURwGNqVD7aeEAA3SYQu
	kWLA3c/sQ7j1kyF9mzxqhD79ZM2m8OqOZ31PJNh7PYZcZTvWP8i5CsWAPnHyb0RpmoUtDpAfjBY
	WvP5U2v9LUaAr9j7Doy/W02oDgdTRWFNEt3msp4iqkbJFTbYslq1TkyHlFL4pXE6Xt04oW56SA9
	Y+FpkaBnUmtVxDfLDuItKFUuCNjg==
X-Google-Smtp-Source: 
 AGHT+IFOMROSOiinbA7WKni7TBdGMlGdar6Bzj0xJGm0SfIq5nKSX8IbytUIhnSv8TzHze+mhpuqUw==
X-Received: by 2002:a17:903:11c3:b0:295:7b8c:661c with SMTP id
 d9443c01a7336-29b6bee314dmr267792595ad.26.1764219636249;
        Wed, 26 Nov 2025 21:00:36 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.00.30
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:00:35 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>
Subject: [PATCH v2 04/11] ntfsplus: add directory operations
Date: Thu, 27 Nov 2025 13:59:37 +0900
Message-Id: <20251127045944.26009-5-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds the implementation of directory operations for ntfsplus.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/ntfsplus/dir.c   | 1230 +++++++++++++++++++++++++
 fs/ntfsplus/index.c | 2112 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 3342 insertions(+)
 create mode 100644 fs/ntfsplus/dir.c
 create mode 100644 fs/ntfsplus/index.c

diff --git a/fs/ntfsplus/dir.c b/fs/ntfsplus/dir.c
new file mode 100644
index 000000000000..4ce9295882dc
--- /dev/null
+++ b/fs/ntfsplus/dir.c
@@ -0,0 +1,1230 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS kernel directory operations. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2007 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/blkdev.h>
+
+#include "dir.h"
+#include "mft.h"
+#include "ntfs.h"
+#include "index.h"
+#include "reparse.h"
+
+/**
+ * The little endian Unicode string $I30 as a global constant.
+ */
+__le16 I30[5] =3D { cpu_to_le16('$'), cpu_to_le16('I'),
+		cpu_to_le16('3'),	cpu_to_le16('0'), 0 };
+
+/**
+ * ntfs_lookup_inode_by_name - find an inode in a directory given its name
+ * @dir_ni:	ntfs inode of the directory in which to search for the name
+ * @uname:	Unicode name for which to search in the directory
+ * @uname_len:	length of the name @uname in Unicode characters
+ * @res:	return the found file name if necessary (see below)
+ *
+ * Look for an inode with name @uname in the directory with inode @dir_ni.
+ * ntfs_lookup_inode_by_name() walks the contents of the directory looking=
 for
+ * the Unicode name. If the name is found in the directory, the correspond=
ing
+ * inode number (>=3D 0) is returned as a mft reference in cpu format, i.e=
. it
+ * is a 64-bit number containing the sequence number.
+ *
+ * On error, a negative value is returned corresponding to the error code.=
 In
+ * particular if the inode is not found -ENOENT is returned. Note that you
+ * can't just check the return value for being negative, you have to check=
 the
+ * inode number for being negative which you can extract using MREC(return
+ * value).
+ *
+ * Note, @uname_len does not include the (optional) terminating NULL chara=
cter.
+ *
+ * Note, we look for a case sensitive match first but we also look for a c=
ase
+ * insensitive match at the same time. If we find a case insensitive match=
, we
+ * save that for the case that we don't find an exact match, where we retu=
rn
+ * the case insensitive match and setup @res (which we allocate!) with the=
 mft
+ * reference, the file name type, length and with a copy of the little end=
ian
+ * Unicode file name itself. If we match a file name which is in the DOS n=
ame
+ * space, we only return the mft reference and file name type in @res.
+ * ntfs_lookup() then uses this to find the long file name in the inode it=
self.
+ * This is to avoid polluting the dcache with short file names. We want th=
em to
+ * work but we don't care for how quickly one can access them. This also f=
ixes
+ * the dcache aliasing issues.
+ *
+ * Locking:  - Caller must hold i_mutex on the directory.
+ *	     - Each page cache page in the index allocation mapping must be
+ *	       locked whilst being accessed otherwise we may find a corrupt
+ *	       page due to it being under ->writepage at the moment which
+ *	       applies the mst protection fixups before writing out and then
+ *	       removes them again after the write is complete after which it
+ *	       unlocks the page.
+ */
+u64 ntfs_lookup_inode_by_name(struct ntfs_inode *dir_ni, const __le16 *una=
me,
+		const int uname_len, struct ntfs_name **res)
+{
+	struct ntfs_volume *vol =3D dir_ni->vol;
+	struct super_block *sb =3D vol->sb;
+	struct inode *ia_vi =3D NULL;
+	struct mft_record *m;
+	struct index_root *ir;
+	struct index_entry *ie;
+	struct index_block *ia;
+	u8 *index_end;
+	u64 mref;
+	struct ntfs_attr_search_ctx *ctx;
+	int err, rc;
+	s64 vcn, old_vcn;
+	struct address_space *ia_mapping;
+	struct folio *folio;
+	u8 *kaddr =3D NULL;
+	struct ntfs_name *name =3D NULL;
+
+	/* Get hold of the mft record for the directory. */
+	m =3D map_mft_record(dir_ni);
+	if (IS_ERR(m)) {
+		ntfs_error(sb, "map_mft_record() failed with error code %ld.",
+				-PTR_ERR(m));
+		return ERR_MREF(PTR_ERR(m));
+	}
+	ctx =3D ntfs_attr_get_search_ctx(dir_ni, m);
+	if (unlikely(!ctx)) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+	/* Find the index root attribute in the mft record. */
+	err =3D ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL,
+			0, ctx);
+	if (unlikely(err)) {
+		if (err =3D=3D -ENOENT) {
+			ntfs_error(sb,
+				"Index root attribute missing in directory inode 0x%lx.",
+				dir_ni->mft_no);
+			err =3D -EIO;
+		}
+		goto err_out;
+	}
+	/* Get to the index root value (it's been verified in read_inode). */
+	ir =3D (struct index_root *)((u8 *)ctx->attr +
+			le16_to_cpu(ctx->attr->data.resident.value_offset));
+	index_end =3D (u8 *)&ir->index + le32_to_cpu(ir->index.index_length);
+	/* The first index entry. */
+	ie =3D (struct index_entry *)((u8 *)&ir->index +
+			le32_to_cpu(ir->index.entries_offset));
+	/*
+	 * Loop until we exceed valid memory (corruption case) or until we
+	 * reach the last entry.
+	 */
+	for (;; ie =3D (struct index_entry *)((u8 *)ie + le16_to_cpu(ie->length))=
) {
+		/* Bounds checks. */
+		if ((u8 *)ie < (u8 *)ctx->mrec ||
+		    (u8 *)ie + sizeof(struct index_entry_header) > index_end ||
+		    (u8 *)ie + sizeof(struct index_entry_header) + le16_to_cpu(ie->key_l=
ength) >
+				index_end || (u8 *)ie + le16_to_cpu(ie->length) > index_end)
+			goto dir_err_out;
+		/*
+		 * The last entry cannot contain a name. It can however contain
+		 * a pointer to a child node in the B+tree so we just break out.
+		 */
+		if (ie->flags & INDEX_ENTRY_END)
+			break;
+		/* Key length should not be zero if it is not last entry. */
+		if (!ie->key_length)
+			goto dir_err_out;
+		/* Check the consistency of an index entry */
+		if (ntfs_index_entry_inconsistent(NULL, vol, ie, COLLATION_FILE_NAME,
+				dir_ni->mft_no))
+			goto dir_err_out;
+		/*
+		 * We perform a case sensitive comparison and if that matches
+		 * we are done and return the mft reference of the inode (i.e.
+		 * the inode number together with the sequence number for
+		 * consistency checking). We convert it to cpu format before
+		 * returning.
+		 */
+		if (ntfs_are_names_equal(uname, uname_len,
+				(__le16 *)&ie->key.file_name.file_name,
+				ie->key.file_name.file_name_length,
+				CASE_SENSITIVE, vol->upcase, vol->upcase_len)) {
+found_it:
+			/*
+			 * We have a perfect match, so we don't need to care
+			 * about having matched imperfectly before, so we can
+			 * free name and set *res to NULL.
+			 * However, if the perfect match is a short file name,
+			 * we need to signal this through *res, so that
+			 * ntfs_lookup() can fix dcache aliasing issues.
+			 * As an optimization we just reuse an existing
+			 * allocation of *res.
+			 */
+			if (ie->key.file_name.file_name_type =3D=3D FILE_NAME_DOS) {
+				if (!name) {
+					name =3D kmalloc(sizeof(struct ntfs_name),
+							GFP_NOFS);
+					if (!name) {
+						err =3D -ENOMEM;
+						goto err_out;
+					}
+				}
+				name->mref =3D le64_to_cpu(
+						ie->data.dir.indexed_file);
+				name->type =3D FILE_NAME_DOS;
+				name->len =3D 0;
+				*res =3D name;
+			} else {
+				kfree(name);
+				*res =3D NULL;
+			}
+			mref =3D le64_to_cpu(ie->data.dir.indexed_file);
+			ntfs_attr_put_search_ctx(ctx);
+			unmap_mft_record(dir_ni);
+			return mref;
+		}
+		/*
+		 * For a case insensitive mount, we also perform a case
+		 * insensitive comparison (provided the file name is not in the
+		 * POSIX namespace). If the comparison matches, and the name is
+		 * in the WIN32 namespace, we cache the filename in *res so
+		 * that the caller, ntfs_lookup(), can work on it. If the
+		 * comparison matches, and the name is in the DOS namespace, we
+		 * only cache the mft reference and the file name type (we set
+		 * the name length to zero for simplicity).
+		 */
+		if ((!NVolCaseSensitive(vol) ||
+		     ie->key.file_name.file_name_type =3D=3D FILE_NAME_DOS) &&
+		    ntfs_are_names_equal(uname, uname_len,
+					 (__le16 *)&ie->key.file_name.file_name,
+					 ie->key.file_name.file_name_length,
+					 IGNORE_CASE, vol->upcase,
+					 vol->upcase_len)) {
+			int name_size =3D sizeof(struct ntfs_name);
+			u8 type =3D ie->key.file_name.file_name_type;
+			u8 len =3D ie->key.file_name.file_name_length;
+
+			/* Only one case insensitive matching name allowed. */
+			if (name) {
+				ntfs_error(sb,
+					"Found already allocated name in phase 1. Please run chkdsk");
+				goto dir_err_out;
+			}
+
+			if (type !=3D FILE_NAME_DOS)
+				name_size +=3D len * sizeof(__le16);
+			name =3D kmalloc(name_size, GFP_NOFS);
+			if (!name) {
+				err =3D -ENOMEM;
+				goto err_out;
+			}
+			name->mref =3D le64_to_cpu(ie->data.dir.indexed_file);
+			name->type =3D type;
+			if (type !=3D FILE_NAME_DOS) {
+				name->len =3D len;
+				memcpy(name->name, ie->key.file_name.file_name,
+						len * sizeof(__le16));
+			} else
+				name->len =3D 0;
+			*res =3D name;
+		}
+		/*
+		 * Not a perfect match, need to do full blown collation so we
+		 * know which way in the B+tree we have to go.
+		 */
+		rc =3D ntfs_collate_names(uname, uname_len,
+				(__le16 *)&ie->key.file_name.file_name,
+				ie->key.file_name.file_name_length, 1,
+				IGNORE_CASE, vol->upcase, vol->upcase_len);
+		/*
+		 * If uname collates before the name of the current entry, there
+		 * is definitely no such name in this index but we might need to
+		 * descend into the B+tree so we just break out of the loop.
+		 */
+		if (rc =3D=3D -1)
+			break;
+		/* The names are not equal, continue the search. */
+		if (rc)
+			continue;
+		/*
+		 * Names match with case insensitive comparison, now try the
+		 * case sensitive comparison, which is required for proper
+		 * collation.
+		 */
+		rc =3D ntfs_collate_names(uname, uname_len,
+				(__le16 *)&ie->key.file_name.file_name,
+				ie->key.file_name.file_name_length, 1,
+				CASE_SENSITIVE, vol->upcase, vol->upcase_len);
+		if (rc =3D=3D -1)
+			break;
+		if (rc)
+			continue;
+		/*
+		 * Perfect match, this will never happen as the
+		 * ntfs_are_names_equal() call will have gotten a match but we
+		 * still treat it correctly.
+		 */
+		goto found_it;
+	}
+	/*
+	 * We have finished with this index without success. Check for the
+	 * presence of a child node and if not present return -ENOENT, unless
+	 * we have got a matching name cached in name in which case return the
+	 * mft reference associated with it.
+	 */
+	if (!(ie->flags & INDEX_ENTRY_NODE)) {
+		if (name) {
+			ntfs_attr_put_search_ctx(ctx);
+			unmap_mft_record(dir_ni);
+			return name->mref;
+		}
+		ntfs_debug("Entry not found.");
+		err =3D -ENOENT;
+		goto err_out;
+	} /* Child node present, descend into it. */
+
+	/* Get the starting vcn of the index_block holding the child node. */
+	vcn =3D le64_to_cpup((__le64 *)((u8 *)ie + le16_to_cpu(ie->length) - 8));
+
+	/*
+	 * We are done with the index root and the mft record. Release them,
+	 * otherwise we deadlock with ntfs_read_mapping_folio().
+	 */
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(dir_ni);
+	m =3D NULL;
+	ctx =3D NULL;
+
+	ia_vi =3D ntfs_index_iget(VFS_I(dir_ni), I30, 4);
+	if (IS_ERR(ia_vi)) {
+		err =3D PTR_ERR(ia_vi);
+		goto err_out;
+	}
+
+	ia_mapping =3D ia_vi->i_mapping;
+descend_into_child_node:
+	/*
+	 * Convert vcn to index into the index allocation attribute in units
+	 * of PAGE_SIZE and map the page cache page, reading it from
+	 * disk if necessary.
+	 */
+	folio =3D ntfs_read_mapping_folio(ia_mapping, vcn <<
+			dir_ni->itype.index.vcn_size_bits >> PAGE_SHIFT);
+	if (IS_ERR(folio)) {
+		ntfs_error(sb, "Failed to map directory index page, error %ld.",
+				-PTR_ERR(folio));
+		err =3D PTR_ERR(folio);
+		goto err_out;
+	}
+
+	folio_lock(folio);
+	kaddr =3D kmalloc(PAGE_SIZE, GFP_NOFS);
+	if (!kaddr) {
+		err =3D -ENOMEM;
+		folio_unlock(folio);
+		folio_put(folio);
+		goto unm_err_out;
+	}
+
+	memcpy_from_folio(kaddr, folio, 0, PAGE_SIZE);
+	post_read_mst_fixup((struct ntfs_record *)kaddr, PAGE_SIZE);
+	folio_unlock(folio);
+	folio_put(folio);
+fast_descend_into_child_node:
+	/* Get to the index allocation block. */
+	ia =3D (struct index_block *)(kaddr + ((vcn <<
+			dir_ni->itype.index.vcn_size_bits) & ~PAGE_MASK));
+	/* Bounds checks. */
+	if ((u8 *)ia < kaddr || (u8 *)ia > kaddr + PAGE_SIZE) {
+		ntfs_error(sb,
+			"Out of bounds check failed. Corrupt directory inode 0x%lx or driver bu=
g.",
+			dir_ni->mft_no);
+		goto unm_err_out;
+	}
+	/* Catch multi sector transfer fixup errors. */
+	if (unlikely(!ntfs_is_indx_record(ia->magic))) {
+		ntfs_error(sb,
+			"Directory index record with vcn 0x%llx is corrupt.  Corrupt inode 0x%l=
x.  Run chkdsk.",
+			(unsigned long long)vcn, dir_ni->mft_no);
+		goto unm_err_out;
+	}
+	if (le64_to_cpu(ia->index_block_vcn) !=3D vcn) {
+		ntfs_error(sb,
+			"Actual VCN (0x%llx) of index buffer is different from expected VCN (0x=
%llx). Directory inode 0x%lx is corrupt or driver bug.",
+			(unsigned long long)le64_to_cpu(ia->index_block_vcn),
+			(unsigned long long)vcn, dir_ni->mft_no);
+		goto unm_err_out;
+	}
+	if (le32_to_cpu(ia->index.allocated_size) + 0x18 !=3D
+			dir_ni->itype.index.block_size) {
+		ntfs_error(sb,
+			"Index buffer (VCN 0x%llx) of directory inode 0x%lx has a size (%u) dif=
fering from the directory specified size (%u). Directory inode is corrupt o=
r driver bug.",
+			(unsigned long long)vcn, dir_ni->mft_no,
+			le32_to_cpu(ia->index.allocated_size) + 0x18,
+			dir_ni->itype.index.block_size);
+		goto unm_err_out;
+	}
+	index_end =3D (u8 *)ia + dir_ni->itype.index.block_size;
+	if (index_end > kaddr + PAGE_SIZE) {
+		ntfs_error(sb,
+			"Index buffer (VCN 0x%llx) of directory inode 0x%lx crosses page bounda=
ry. Impossible! Cannot access! This is probably a bug in the driver.",
+			(unsigned long long)vcn, dir_ni->mft_no);
+		goto unm_err_out;
+	}
+	index_end =3D (u8 *)&ia->index + le32_to_cpu(ia->index.index_length);
+	if (index_end > (u8 *)ia + dir_ni->itype.index.block_size) {
+		ntfs_error(sb,
+			"Size of index buffer (VCN 0x%llx) of directory inode 0x%lx exceeds max=
imum size.",
+			(unsigned long long)vcn, dir_ni->mft_no);
+		goto unm_err_out;
+	}
+	/* The first index entry. */
+	ie =3D (struct index_entry *)((u8 *)&ia->index +
+			le32_to_cpu(ia->index.entries_offset));
+	/*
+	 * Iterate similar to above big loop but applied to index buffer, thus
+	 * loop until we exceed valid memory (corruption case) or until we
+	 * reach the last entry.
+	 */
+	for (;; ie =3D (struct index_entry *)((u8 *)ie + le16_to_cpu(ie->length))=
) {
+		/* Bounds checks. */
+		if ((u8 *)ie < (u8 *)ia ||
+		    (u8 *)ie + sizeof(struct index_entry_header) > index_end ||
+		    (u8 *)ie + sizeof(struct index_entry_header) + le16_to_cpu(ie->key_l=
ength) >
+				index_end || (u8 *)ie + le16_to_cpu(ie->length) > index_end) {
+			ntfs_error(sb, "Index entry out of bounds in directory inode 0x%lx.",
+					dir_ni->mft_no);
+			goto unm_err_out;
+		}
+		/*
+		 * The last entry cannot contain a name. It can however contain
+		 * a pointer to a child node in the B+tree so we just break out.
+		 */
+		if (ie->flags & INDEX_ENTRY_END)
+			break;
+		/* Key length should not be zero if it is not last entry. */
+		if (!ie->key_length)
+			goto unm_err_out;
+		/* Check the consistency of an index entry */
+		if (ntfs_index_entry_inconsistent(NULL, vol, ie, COLLATION_FILE_NAME,
+				dir_ni->mft_no))
+			goto unm_err_out;
+		/*
+		 * We perform a case sensitive comparison and if that matches
+		 * we are done and return the mft reference of the inode (i.e.
+		 * the inode number together with the sequence number for
+		 * consistency checking). We convert it to cpu format before
+		 * returning.
+		 */
+		if (ntfs_are_names_equal(uname, uname_len,
+				(__le16 *)&ie->key.file_name.file_name,
+				ie->key.file_name.file_name_length,
+				CASE_SENSITIVE, vol->upcase, vol->upcase_len)) {
+found_it2:
+			/*
+			 * We have a perfect match, so we don't need to care
+			 * about having matched imperfectly before, so we can
+			 * free name and set *res to NULL.
+			 * However, if the perfect match is a short file name,
+			 * we need to signal this through *res, so that
+			 * ntfs_lookup() can fix dcache aliasing issues.
+			 * As an optimization we just reuse an existing
+			 * allocation of *res.
+			 */
+			if (ie->key.file_name.file_name_type =3D=3D FILE_NAME_DOS) {
+				if (!name) {
+					name =3D kmalloc(sizeof(struct ntfs_name),
+							GFP_NOFS);
+					if (!name) {
+						err =3D -ENOMEM;
+						goto unm_err_out;
+					}
+				}
+				name->mref =3D le64_to_cpu(
+						ie->data.dir.indexed_file);
+				name->type =3D FILE_NAME_DOS;
+				name->len =3D 0;
+				*res =3D name;
+			} else {
+				kfree(name);
+				*res =3D NULL;
+			}
+			mref =3D le64_to_cpu(ie->data.dir.indexed_file);
+			kfree(kaddr);
+			iput(ia_vi);
+			return mref;
+		}
+		/*
+		 * For a case insensitive mount, we also perform a case
+		 * insensitive comparison (provided the file name is not in the
+		 * POSIX namespace). If the comparison matches, and the name is
+		 * in the WIN32 namespace, we cache the filename in *res so
+		 * that the caller, ntfs_lookup(), can work on it. If the
+		 * comparison matches, and the name is in the DOS namespace, we
+		 * only cache the mft reference and the file name type (we set
+		 * the name length to zero for simplicity).
+		 */
+		if ((!NVolCaseSensitive(vol) ||
+		     ie->key.file_name.file_name_type =3D=3D FILE_NAME_DOS) &&
+		    ntfs_are_names_equal(uname, uname_len,
+					 (__le16 *)&ie->key.file_name.file_name,
+					 ie->key.file_name.file_name_length,
+					 IGNORE_CASE, vol->upcase,
+					 vol->upcase_len)) {
+			int name_size =3D sizeof(struct ntfs_name);
+			u8 type =3D ie->key.file_name.file_name_type;
+			u8 len =3D ie->key.file_name.file_name_length;
+
+			/* Only one case insensitive matching name allowed. */
+			if (name) {
+				ntfs_error(sb,
+					"Found already allocated name in phase 2. Please run chkdsk");
+				kfree(kaddr);
+				goto dir_err_out;
+			}
+
+			if (type !=3D FILE_NAME_DOS)
+				name_size +=3D len * sizeof(__le16);
+			name =3D kmalloc(name_size, GFP_NOFS);
+			if (!name) {
+				err =3D -ENOMEM;
+				goto unm_err_out;
+			}
+			name->mref =3D le64_to_cpu(ie->data.dir.indexed_file);
+			name->type =3D type;
+			if (type !=3D FILE_NAME_DOS) {
+				name->len =3D len;
+				memcpy(name->name, ie->key.file_name.file_name,
+						len * sizeof(__le16));
+			} else
+				name->len =3D 0;
+			*res =3D name;
+		}
+		/*
+		 * Not a perfect match, need to do full blown collation so we
+		 * know which way in the B+tree we have to go.
+		 */
+		rc =3D ntfs_collate_names(uname, uname_len,
+				(__le16 *)&ie->key.file_name.file_name,
+				ie->key.file_name.file_name_length, 1,
+				IGNORE_CASE, vol->upcase, vol->upcase_len);
+		/*
+		 * If uname collates before the name of the current entry, there
+		 * is definitely no such name in this index but we might need to
+		 * descend into the B+tree so we just break out of the loop.
+		 */
+		if (rc =3D=3D -1)
+			break;
+		/* The names are not equal, continue the search. */
+		if (rc)
+			continue;
+		/*
+		 * Names match with case insensitive comparison, now try the
+		 * case sensitive comparison, which is required for proper
+		 * collation.
+		 */
+		rc =3D ntfs_collate_names(uname, uname_len,
+				(__le16 *)&ie->key.file_name.file_name,
+				ie->key.file_name.file_name_length, 1,
+				CASE_SENSITIVE, vol->upcase, vol->upcase_len);
+		if (rc =3D=3D -1)
+			break;
+		if (rc)
+			continue;
+		/*
+		 * Perfect match, this will never happen as the
+		 * ntfs_are_names_equal() call will have gotten a match but we
+		 * still treat it correctly.
+		 */
+		goto found_it2;
+	}
+	/*
+	 * We have finished with this index buffer without success. Check for
+	 * the presence of a child node.
+	 */
+	if (ie->flags & INDEX_ENTRY_NODE) {
+		if ((ia->index.flags & NODE_MASK) =3D=3D LEAF_NODE) {
+			ntfs_error(sb,
+				"Index entry with child node found in a leaf node in directory inode 0=
x%lx.",
+				dir_ni->mft_no);
+			goto unm_err_out;
+		}
+		/* Child node present, descend into it. */
+		old_vcn =3D vcn;
+		vcn =3D le64_to_cpup((__le64 *)((u8 *)ie +
+				le16_to_cpu(ie->length) - 8));
+		if (vcn >=3D 0) {
+			/*
+			 * If vcn is in the same page cache page as old_vcn we
+			 * recycle the mapped page.
+			 */
+			if ((old_vcn << vol->cluster_size_bits >> PAGE_SHIFT) =3D=3D
+			    (vcn << vol->cluster_size_bits >> PAGE_SHIFT))
+				goto fast_descend_into_child_node;
+			kfree(kaddr);
+			kaddr =3D NULL;
+			goto descend_into_child_node;
+		}
+		ntfs_error(sb, "Negative child node vcn in directory inode 0x%lx.",
+				dir_ni->mft_no);
+		goto unm_err_out;
+	}
+	/*
+	 * No child node present, return -ENOENT, unless we have got a matching
+	 * name cached in name in which case return the mft reference
+	 * associated with it.
+	 */
+	if (name) {
+		kfree(kaddr);
+		iput(ia_vi);
+		return name->mref;
+	}
+	ntfs_debug("Entry not found.");
+	err =3D -ENOENT;
+unm_err_out:
+	kfree(kaddr);
+err_out:
+	if (!err)
+		err =3D -EIO;
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	if (m)
+		unmap_mft_record(dir_ni);
+	kfree(name);
+	*res =3D NULL;
+	if (ia_vi && !IS_ERR(ia_vi))
+		iput(ia_vi);
+	return ERR_MREF(err);
+dir_err_out:
+	ntfs_error(sb, "Corrupt directory.  Aborting lookup.");
+	goto err_out;
+}
+
+/**
+ * ntfs_filldir - ntfs specific filldir method
+ * @vol:	current ntfs volume
+ * @ndir:	ntfs inode of current directory
+ * @ia_page:	page in which the index allocation buffer @ie is in resides
+ * @ie:		current index entry
+ * @name:	buffer to use for the converted name
+ * @actor:	what to feed the entries to
+ *
+ * Convert the Unicode @name to the loaded NLS and pass it to the @filldir
+ * callback.
+ *
+ * If @ia_page is not NULL it is the locked page containing the index
+ * allocation block containing the index entry @ie.
+ *
+ * Note, we drop (and then reacquire) the page lock on @ia_page across the
+ * @filldir() call otherwise we would deadlock with NFSd when it calls ->l=
ookup
+ * since ntfs_lookup() will lock the same page.  As an optimization, we do=
 not
+ * retake the lock if we are returning a non-zero value as ntfs_readdir()
+ * would need to drop the lock immediately anyway.
+ */
+static inline int ntfs_filldir(struct ntfs_volume *vol,
+		struct ntfs_inode *ndir, struct page *ia_page, struct index_entry *ie,
+		u8 *name, struct dir_context *actor)
+{
+	unsigned long mref;
+	int name_len;
+	unsigned int dt_type;
+	u8 name_type;
+
+	name_type =3D ie->key.file_name.file_name_type;
+	if (name_type =3D=3D FILE_NAME_DOS) {
+		ntfs_debug("Skipping DOS name space entry.");
+		return 0;
+	}
+	if (MREF_LE(ie->data.dir.indexed_file) =3D=3D FILE_root) {
+		ntfs_debug("Skipping root directory self reference entry.");
+		return 0;
+	}
+	if (MREF_LE(ie->data.dir.indexed_file) < FILE_first_user &&
+			!NVolShowSystemFiles(vol)) {
+		ntfs_debug("Skipping system file.");
+		return 0;
+	}
+	if (!NVolShowHiddenFiles(vol) &&
+	    (ie->key.file_name.file_attributes & FILE_ATTR_HIDDEN)) {
+		ntfs_debug("Skipping hidden file.");
+		return 0;
+	}
+
+	name_len =3D ntfs_ucstonls(vol, (__le16 *)&ie->key.file_name.file_name,
+			ie->key.file_name.file_name_length, &name,
+			NTFS_MAX_NAME_LEN * NLS_MAX_CHARSET_SIZE + 1);
+	if (name_len <=3D 0) {
+		ntfs_warning(vol->sb, "Skipping unrepresentable inode 0x%llx.",
+				(long long)MREF_LE(ie->data.dir.indexed_file));
+		return 0;
+	}
+
+	mref =3D MREF_LE(ie->data.dir.indexed_file);
+	if (ie->key.file_name.file_attributes &
+			FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT)
+		dt_type =3D DT_DIR;
+	else if (ie->key.file_name.file_attributes & FILE_ATTR_REPARSE_POINT)
+		dt_type =3D ntfs_reparse_tag_dt_types(vol, mref);
+	else
+		dt_type =3D DT_REG;
+
+	/*
+	 * Drop the page lock otherwise we deadlock with NFS when it calls
+	 * ->lookup since ntfs_lookup() will lock the same page.
+	 */
+	if (ia_page)
+		unlock_page(ia_page);
+	ntfs_debug("Calling filldir for %s with len %i, fpos 0x%llx, inode 0x%lx,=
 DT_%s.",
+		name, name_len, actor->pos, mref, dt_type =3D=3D DT_DIR ? "DIR" : "REG");
+	if (!dir_emit(actor, name, name_len, mref, dt_type))
+		return 1;
+	/* Relock the page but not if we are aborting ->readdir. */
+	if (ia_page)
+		lock_page(ia_page);
+	return 0;
+}
+
+struct ntfs_file_private {
+	void *key;
+	__le16 key_length;
+	bool end_in_iterate;
+	loff_t curr_pos;
+};
+
+struct ntfs_index_ra {
+	unsigned long start_index;
+	unsigned int count;
+	struct rb_node rb_node;
+};
+
+static void ntfs_insert_rb(struct ntfs_index_ra *nir, struct rb_root *root)
+{
+	struct rb_node **new =3D &root->rb_node, *parent =3D NULL;
+	struct ntfs_index_ra *cnir;
+
+	while (*new) {
+		parent =3D *new;
+		cnir =3D rb_entry(parent, struct ntfs_index_ra, rb_node);
+		if (nir->start_index < cnir->start_index)
+			new =3D &parent->rb_left;
+		else if (nir->start_index >=3D cnir->start_index + cnir->count)
+			new =3D &parent->rb_right;
+		else {
+			pr_err("nir start index : %ld, count : %d, cnir start_index : %ld, coun=
t : %d\n",
+				nir->start_index, nir->count, cnir->start_index, cnir->count);
+			return;
+		}
+	}
+
+	rb_link_node(&nir->rb_node, parent, new);
+	rb_insert_color(&nir->rb_node, root);
+}
+
+static int ntfs_ia_blocks_readahead(struct ntfs_inode *ia_ni, loff_t pos)
+{
+	unsigned long dir_start_index, dir_end_index;
+	struct inode *ia_vi =3D VFS_I(ia_ni);
+	struct file_ra_state *dir_ra;
+
+	dir_end_index =3D (i_size_read(ia_vi) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	dir_start_index =3D (pos + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+	if (dir_start_index >=3D dir_end_index)
+		return 0;
+
+	dir_ra =3D kzalloc(sizeof(*dir_ra), GFP_NOFS);
+	if (!dir_ra)
+		return -ENOMEM;
+
+	file_ra_state_init(dir_ra, ia_vi->i_mapping);
+	dir_end_index =3D (i_size_read(ia_vi) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	dir_start_index =3D (pos + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	dir_ra->ra_pages =3D dir_end_index - dir_start_index;
+	page_cache_sync_readahead(ia_vi->i_mapping, dir_ra, NULL,
+			dir_start_index, dir_end_index - dir_start_index);
+	kfree(dir_ra);
+
+	return 0;
+}
+
+static int ntfs_readdir(struct file *file, struct dir_context *actor)
+{
+	struct inode *vdir =3D file_inode(file);
+	struct super_block *sb =3D vdir->i_sb;
+	struct ntfs_inode *ndir =3D NTFS_I(vdir);
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	struct ntfs_attr_search_ctx *ctx =3D NULL;
+	struct ntfs_index_context *ictx =3D NULL;
+	u8 *name;
+	struct index_root *ir;
+	struct index_entry *next =3D NULL;
+	struct ntfs_file_private *private =3D NULL;
+	int err =3D 0;
+	loff_t ie_pos =3D 2; /* initialize it with dot and dotdot size */
+	struct ntfs_index_ra *nir =3D NULL;
+	unsigned long index;
+	struct rb_root ra_root =3D RB_ROOT;
+	struct file_ra_state *ra;
+
+	ntfs_debug("Entering for inode 0x%lx, fpos 0x%llx.",
+			vdir->i_ino, actor->pos);
+
+	if (file->private_data) {
+		private =3D file->private_data;
+
+		if (actor->pos !=3D private->curr_pos) {
+			/*
+			 * If actor->pos is different from the previous passed
+			 * one, Discard the private->key and fill dirent buffer
+			 * with linear lookup.
+			 */
+			kfree(private->key);
+			private->key =3D NULL;
+			private->end_in_iterate =3D false;
+		} else if (private->end_in_iterate) {
+			kfree(private->key);
+			kfree(file->private_data);
+			file->private_data =3D NULL;
+			return 0;
+		}
+	}
+
+	/* Emulate . and .. for all directories. */
+	if (!dir_emit_dots(file, actor))
+		return 0;
+
+	/*
+	 * Allocate a buffer to store the current name being processed
+	 * converted to format determined by current NLS.
+	 */
+	name =3D kmalloc(NTFS_MAX_NAME_LEN * NLS_MAX_CHARSET_SIZE + 1, GFP_NOFS);
+	if (unlikely(!name))
+		return -ENOMEM;
+
+	mutex_lock_nested(&ndir->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+	ictx =3D ntfs_index_ctx_get(ndir, I30, 4);
+	if (!ictx) {
+		kfree(name);
+		mutex_unlock(&ndir->mrec_lock);
+		return -ENOMEM;
+	}
+
+	ra =3D kzalloc(sizeof(struct file_ra_state), GFP_NOFS);
+	if (!ra) {
+		kfree(name);
+		ntfs_index_ctx_put(ictx);
+		mutex_unlock(&ndir->mrec_lock);
+		return -ENOMEM;
+	}
+	file_ra_state_init(ra, vol->mft_ino->i_mapping);
+
+	if (private && private->key) {
+		/*
+		 * Find index witk private->key using ntfs_index_lookup()
+		 * instead of linear index lookup.
+		 */
+		err =3D ntfs_index_lookup(private->key,
+					le16_to_cpu(private->key_length),
+					ictx);
+		if (!err) {
+			next =3D ictx->entry;
+			/*
+			 * Update ie_pos with private->curr_pos
+			 * to make next d_off of dirent correct.
+			 */
+			ie_pos =3D private->curr_pos;
+
+			if (actor->pos > vol->mft_record_size && ictx->ia_ni) {
+				err =3D ntfs_ia_blocks_readahead(ictx->ia_ni, actor->pos);
+				if (err)
+					goto out;
+			}
+
+			goto nextdir;
+		} else {
+			goto out;
+		}
+	} else if (!private) {
+		private =3D kzalloc(sizeof(struct ntfs_file_private), GFP_KERNEL);
+		if (!private) {
+			err =3D -ENOMEM;
+			goto out;
+		}
+		file->private_data =3D private;
+	}
+
+	ctx =3D ntfs_attr_get_search_ctx(ndir, NULL);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto out;
+	}
+
+	/* Find the index root attribute in the mft record. */
+	if (ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL, 0,
+				ctx)) {
+		ntfs_error(sb, "Index root attribute missing in directory inode %ld",
+				ndir->mft_no);
+		ntfs_attr_put_search_ctx(ctx);
+		err =3D -ENOMEM;
+		goto out;
+	}
+
+	/* Get to the index root value. */
+	ir =3D (struct index_root *)((u8 *)ctx->attr +
+			le16_to_cpu(ctx->attr->data.resident.value_offset));
+
+	ictx->ir =3D ir;
+	ictx->actx =3D ctx;
+	ictx->parent_vcn[ictx->pindex] =3D VCN_INDEX_ROOT_PARENT;
+	ictx->is_in_root =3D true;
+	ictx->parent_pos[ictx->pindex] =3D 0;
+
+	ictx->block_size =3D le32_to_cpu(ir->index_block_size);
+	if (ictx->block_size < NTFS_BLOCK_SIZE) {
+		ntfs_error(sb, "Index block size (%d) is smaller than the sector size (%=
d)",
+				ictx->block_size, NTFS_BLOCK_SIZE);
+		err =3D -EIO;
+		goto out;
+	}
+
+	if (vol->cluster_size <=3D ictx->block_size)
+		ictx->vcn_size_bits =3D vol->cluster_size_bits;
+	else
+		ictx->vcn_size_bits =3D NTFS_BLOCK_SIZE_BITS;
+
+	/* The first index entry. */
+	next =3D (struct index_entry *)((u8 *)&ir->index +
+			le32_to_cpu(ir->index.entries_offset));
+
+	if (next->flags & INDEX_ENTRY_NODE) {
+		ictx->ia_ni =3D ntfs_ia_open(ictx, ictx->idx_ni);
+		if (!ictx->ia_ni) {
+			err =3D -EINVAL;
+			goto out;
+		}
+
+		err =3D ntfs_ia_blocks_readahead(ictx->ia_ni, actor->pos);
+		if (err)
+			goto out;
+	}
+
+	if (next->flags & INDEX_ENTRY_NODE) {
+		next =3D ntfs_index_walk_down(next, ictx);
+		if (!next) {
+			err =3D -EIO;
+			goto out;
+		}
+	}
+
+	if (next && !(next->flags & INDEX_ENTRY_END))
+		goto nextdir;
+
+	while ((next =3D ntfs_index_next(next, ictx)) !=3D NULL) {
+nextdir:
+		/* Check the consistency of an index entry */
+		if (ntfs_index_entry_inconsistent(ictx, vol, next, COLLATION_FILE_NAME,
+					ndir->mft_no)) {
+			err =3D -EIO;
+			goto out;
+		}
+
+		if (ie_pos < actor->pos) {
+			ie_pos +=3D next->length;
+			continue;
+		}
+
+		actor->pos =3D ie_pos;
+
+		index =3D (MREF_LE(next->data.dir.indexed_file) <<
+				vol->mft_record_size_bits) >> PAGE_SHIFT;
+		if (nir) {
+			struct ntfs_index_ra *cnir;
+			struct rb_node *node =3D ra_root.rb_node;
+
+			if (nir->start_index <=3D index &&
+			    index < nir->start_index + nir->count) {
+				/* No behavior */
+				goto filldir;
+			}
+
+			while (node) {
+				cnir =3D rb_entry(node, struct ntfs_index_ra, rb_node);
+				if (cnir->start_index <=3D index &&
+				    index < cnir->start_index + cnir->count) {
+					goto filldir;
+				} else if (cnir->start_index + cnir->count =3D=3D index) {
+					cnir->count++;
+					goto filldir;
+				} else if (!cnir->start_index && cnir->start_index - 1 =3D=3D index) {
+					cnir->start_index =3D index;
+					goto filldir;
+				}
+
+				if (index < cnir->start_index)
+					node =3D node->rb_left;
+				else if (index >=3D cnir->start_index + cnir->count)
+					node =3D node->rb_right;
+			}
+
+			if (nir->start_index + nir->count =3D=3D index) {
+				nir->count++;
+			} else if (!nir->start_index && nir->start_index - 1 =3D=3D index) {
+				nir->start_index =3D index;
+			} else if (nir->count > 2) {
+				ntfs_insert_rb(nir, &ra_root);
+				nir =3D NULL;
+			} else {
+				nir->start_index =3D index;
+				nir->count =3D 1;
+			}
+		}
+
+		if (!nir) {
+			nir =3D kzalloc(sizeof(struct ntfs_index_ra), GFP_KERNEL);
+			if (nir) {
+				nir->start_index =3D index;
+				nir->count =3D 1;
+			}
+		}
+
+filldir:
+		/* Submit the name to the filldir callback. */
+		err =3D ntfs_filldir(vol, ndir, NULL, next, name, actor);
+		if (err) {
+			/*
+			 * Store index key value to file private_data to start
+			 * from current index offset on next round.
+			 */
+			private =3D file->private_data;
+			kfree(private->key);
+			private->key =3D kmalloc(le16_to_cpu(next->key_length), GFP_KERNEL);
+			if (!private->key) {
+				err =3D -ENOMEM;
+				goto out;
+			}
+
+			memcpy(private->key, &next->key.file_name, le16_to_cpu(next->key_length=
));
+			private->key_length =3D next->key_length;
+			break;
+		}
+		ie_pos +=3D next->length;
+	}
+
+	if (!err)
+		private->end_in_iterate =3D true;
+	else
+		err =3D 0;
+
+	private->curr_pos =3D actor->pos =3D ie_pos;
+out:
+	while (!RB_EMPTY_ROOT(&ra_root)) {
+		struct ntfs_index_ra *cnir;
+		struct rb_node *node;
+
+		node =3D rb_first(&ra_root);
+		cnir =3D rb_entry(node, struct ntfs_index_ra, rb_node);
+		ra->ra_pages =3D cnir->count;
+		page_cache_sync_readahead(vol->mft_ino->i_mapping, ra, NULL,
+				cnir->start_index, cnir->count);
+		rb_erase(node, &ra_root);
+		kfree(cnir);
+	}
+
+	if (err) {
+		private->curr_pos =3D actor->pos;
+		private->end_in_iterate =3D true;
+		err =3D 0;
+	}
+	ntfs_index_ctx_put(ictx);
+	kfree(name);
+	kfree(nir);
+	kfree(ra);
+	mutex_unlock(&ndir->mrec_lock);
+	return err;
+}
+
+int ntfs_check_empty_dir(struct ntfs_inode *ni, struct mft_record *ni_mrec)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	int ret =3D 0;
+
+	if (!(ni_mrec->flags & MFT_RECORD_IS_DIRECTORY))
+		return 0;
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		ntfs_error(ni->vol->sb, "Failed to get search context");
+		return -ENOMEM;
+	}
+
+	/* Find the index root attribute in the mft record. */
+	ret =3D ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL,
+				0, ctx);
+	if (ret) {
+		ntfs_error(ni->vol->sb, "Index root attribute missing in directory inode=
 %lld",
+				(unsigned long long)ni->mft_no);
+		ntfs_attr_put_search_ctx(ctx);
+		return ret;
+	}
+
+	/* Non-empty directory? */
+	if (ctx->attr->data.resident.value_length !=3D
+	    sizeof(struct index_root) + sizeof(struct index_entry_header)) {
+		/* Both ENOTEMPTY and EEXIST are ok. We use the more common. */
+		ret =3D -ENOTEMPTY;
+		ntfs_debug("Directory is not empty\n");
+	}
+
+	ntfs_attr_put_search_ctx(ctx);
+
+	return ret;
+}
+
+/**
+ * ntfs_dir_open - called when an inode is about to be opened
+ * @vi:		inode to be opened
+ * @filp:	file structure describing the inode
+ *
+ * Limit directory size to the page cache limit on architectures where uns=
igned
+ * long is 32-bits. This is the most we can do for now without overflowing=
 the
+ * page cache page index. Doing it this way means we don't run into proble=
ms
+ * because of existing too large directories. It would be better to allow =
the
+ * user to read the accessible part of the directory but I doubt very much
+ * anyone is going to hit this check on a 32-bit architecture, so there is=
 no
+ * point in adding the extra complexity required to support this.
+ *
+ * On 64-bit architectures, the check is hopefully optimized away by the
+ * compiler.
+ */
+static int ntfs_dir_open(struct inode *vi, struct file *filp)
+{
+	if (sizeof(unsigned long) < 8) {
+		if (i_size_read(vi) > MAX_LFS_FILESIZE)
+			return -EFBIG;
+	}
+	return 0;
+}
+
+static int ntfs_dir_release(struct inode *vi, struct file *filp)
+{
+	if (filp->private_data) {
+		kfree(((struct ntfs_file_private *)filp->private_data)->key);
+		kfree(filp->private_data);
+		filp->private_data =3D NULL;
+	}
+	return 0;
+}
+
+/**
+ * ntfs_dir_fsync - sync a directory to disk
+ * @filp:	file describing the directory to be synced
+ * @start:	start offset to be synced
+ * @end:	end offset to be synced
+ * @datasync:	if non-zero only flush user data and not metadata
+ *
+ * Data integrity sync of a directory to disk.  Used for fsync, fdatasync,=
 and
+ * msync system calls.  This function is based on file.c::ntfs_file_fsync(=
).
+ *
+ * Write the mft record and all associated extent mft records as well as t=
he
+ * $INDEX_ALLOCATION and $BITMAP attributes and then sync the block device.
+ *
+ * If @datasync is true, we do not wait on the inode(s) to be written out
+ * but we always wait on the page cache pages to be written out.
+ *
+ * Note: In the past @filp could be NULL so we ignore it as we don't need =
it
+ * anyway.
+ *
+ * Locking: Caller must hold i_mutex on the inode.
+ */
+static int ntfs_dir_fsync(struct file *filp, loff_t start, loff_t end,
+			  int datasync)
+{
+	struct inode *bmp_vi, *vi =3D filp->f_mapping->host;
+	struct ntfs_volume *vol =3D NTFS_I(vi)->vol;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_attr_search_ctx *ctx;
+	struct inode *parent_vi, *ia_vi;
+	int err, ret;
+	struct ntfs_attr na;
+
+	ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx)
+		return -ENOMEM;
+
+	mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL_2);
+	while (!(err =3D ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0, NULL, 0, c=
tx))) {
+		struct file_name_attr *fn =3D (struct file_name_attr *)((u8 *)ctx->attr +
+				le16_to_cpu(ctx->attr->data.resident.value_offset));
+
+		parent_vi =3D ntfs_iget(vi->i_sb, MREF_LE(fn->parent_directory));
+		if (IS_ERR(parent_vi))
+			continue;
+		mutex_lock_nested(&NTFS_I(parent_vi)->mrec_lock, NTFS_INODE_MUTEX_PARENT=
_2);
+		ia_vi =3D ntfs_index_iget(parent_vi, I30, 4);
+		mutex_unlock(&NTFS_I(parent_vi)->mrec_lock);
+		if (IS_ERR(ia_vi)) {
+			iput(parent_vi);
+			continue;
+		}
+		write_inode_now(ia_vi, 1);
+		iput(ia_vi);
+		write_inode_now(parent_vi, 1);
+		iput(parent_vi);
+	}
+	mutex_unlock(&ni->mrec_lock);
+	ntfs_attr_put_search_ctx(ctx);
+
+	err =3D file_write_and_wait_range(filp, start, end);
+	if (err)
+		return err;
+	inode_lock(vi);
+
+	/* If the bitmap attribute inode is in memory sync it, too. */
+	na.mft_no =3D vi->i_ino;
+	na.type =3D AT_BITMAP;
+	na.name =3D I30;
+	na.name_len =3D 4;
+	bmp_vi =3D ilookup5(vi->i_sb, vi->i_ino, ntfs_test_inode, &na);
+	if (bmp_vi) {
+		write_inode_now(bmp_vi, !datasync);
+		iput(bmp_vi);
+	}
+	ret =3D __ntfs_write_inode(vi, 1);
+
+	write_inode_now(vi, !datasync);
+
+	write_inode_now(vol->mftbmp_ino, 1);
+	down_write(&vol->lcnbmp_lock);
+	write_inode_now(vol->lcnbmp_ino, 1);
+	up_write(&vol->lcnbmp_lock);
+	write_inode_now(vol->mft_ino, 1);
+
+	err =3D sync_blockdev(vi->i_sb->s_bdev);
+	if (unlikely(err && !ret))
+		ret =3D err;
+	if (likely(!ret))
+		ntfs_debug("Done.");
+	else
+		ntfs_warning(vi->i_sb,
+			"Failed to f%ssync inode 0x%lx.  Error %u.",
+			datasync ? "data" : "", vi->i_ino, -ret);
+	inode_unlock(vi);
+	return ret;
+}
+
+const struct file_operations ntfs_dir_ops =3D {
+	.llseek		=3D generic_file_llseek,	/* Seek inside directory. */
+	.read		=3D generic_read_dir,	/* Return -EISDIR. */
+	.iterate_shared	=3D ntfs_readdir,		/* Read directory contents. */
+	.fsync		=3D ntfs_dir_fsync,	/* Sync a directory to disk. */
+	.open		=3D ntfs_dir_open,	/* Open directory. */
+	.release	=3D ntfs_dir_release,
+	.unlocked_ioctl	=3D ntfsp_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	=3D ntfsp_compat_ioctl,
+#endif
+};
diff --git a/fs/ntfsplus/index.c b/fs/ntfsplus/index.c
new file mode 100644
index 000000000000..9258a2c59c9f
--- /dev/null
+++ b/fs/ntfsplus/index.c
@@ -0,0 +1,2112 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel index handling. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2004-2005 Anton Altaparmakov
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ *
+ * Part of this file is based on code from the NTFS-3G project.
+ * and is copyrighted by the respective authors below:
+ * Copyright (c) 2004-2005 Anton Altaparmakov
+ * Copyright (c) 2004-2005 Richard Russon
+ * Copyright (c) 2005-2006 Yura Pakhuchiy
+ * Copyright (c) 2005-2008 Szabolcs Szakacsits
+ * Copyright (c) 2007-2021 Jean-Pierre Andre
+ */
+
+#include "collate.h"
+#include "index.h"
+#include "ntfs.h"
+#include "misc.h"
+#include "attrlist.h"
+
+/*
+ * ntfs_index_entry_inconsistent - Check the consistency of an index entry
+ *
+ * Make sure data and key do not overflow from entry.
+ * As a side effect, an entry with zero length is rejected.
+ * This entry must be a full one (no INDEX_ENTRY_END flag), and its
+ * length must have been checked beforehand to not overflow from the
+ * index record.
+ */
+int ntfs_index_entry_inconsistent(struct ntfs_index_context *icx,
+		struct ntfs_volume *vol, const struct index_entry *ie,
+		__le32 collation_rule, u64 inum)
+{
+	if (icx) {
+		struct index_header *ih;
+		u8 *ie_start, *ie_end;
+
+		if (icx->is_in_root)
+			ih =3D &icx->ir->index;
+		else
+			ih =3D &icx->ib->index;
+
+		if ((le32_to_cpu(ih->index_length) > le32_to_cpu(ih->allocated_size)) ||
+				(le32_to_cpu(ih->index_length) > icx->block_size)) {
+			ntfs_error(vol->sb, "%s Index entry(0x%p)'s length is too big.",
+					icx->is_in_root ? "Index root" : "Index block",
+					(u8 *)icx->entry);
+			return -EINVAL;
+		}
+
+		ie_start =3D (u8 *)ih + le32_to_cpu(ih->entries_offset);
+		ie_end =3D (u8 *)ih + le32_to_cpu(ih->index_length);
+
+		if (ie_start > (u8 *)ie ||
+		    ie_end <=3D ((u8 *)ie + ie->length) ||
+		    ie->length > le32_to_cpu(ih->allocated_size) ||
+		    ie->length > icx->block_size) {
+			ntfs_error(vol->sb, "Index entry(0x%p) is out of range from %s",
+					(u8 *)icx->entry,
+					icx->is_in_root ? "index root" : "index block");
+			return -EIO;
+		}
+	}
+
+	if (ie->key_length &&
+	    ((le16_to_cpu(ie->key_length) + offsetof(struct index_entry, key)) >
+	     le16_to_cpu(ie->length))) {
+		ntfs_error(vol->sb, "Overflow from index entry in inode %lld\n",
+				(long long)inum);
+		return -EIO;
+
+	} else {
+		if (collation_rule =3D=3D COLLATION_FILE_NAME) {
+			if ((offsetof(struct index_entry, key.file_name.file_name) +
+			     ie->key.file_name.file_name_length	* sizeof(__le16)) >
+					le16_to_cpu(ie->length)) {
+				ntfs_error(vol->sb,
+					"File name overflow from index entry in inode %lld\n",
+					(long long)inum);
+				return -EIO;
+			}
+		} else {
+			if (ie->data.vi.data_length &&
+			    ((le16_to_cpu(ie->data.vi.data_offset) +
+			      le16_to_cpu(ie->data.vi.data_length)) >
+			     le16_to_cpu(ie->length))) {
+				ntfs_error(vol->sb,
+					"Data overflow from index entry in inode %lld\n",
+					(long long)inum);
+				return -EIO;
+			}
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * ntfs_index_entry_mark_dirty - mark an index entry dirty
+ * @ictx:	ntfs index context describing the index entry
+ *
+ * Mark the index entry described by the index entry context @ictx dirty.
+ *
+ * If the index entry is in the index root attribute, simply mark the inode
+ * containing the index root attribute dirty.  This ensures the mftrecord,=
 and
+ * hence the index root attribute, will be written out to disk later.
+ *
+ * If the index entry is in an index block belonging to the index allocati=
on
+ * attribute, set ib_dirty to true, thus index block will be updated during
+ * ntfs_index_ctx_put.
+ */
+void ntfs_index_entry_mark_dirty(struct ntfs_index_context *ictx)
+{
+	if (ictx->is_in_root)
+		mark_mft_record_dirty(ictx->actx->ntfs_ino);
+	else if (ictx->ib)
+		ictx->ib_dirty =3D true;
+}
+
+static s64 ntfs_ib_vcn_to_pos(struct ntfs_index_context *icx, s64 vcn)
+{
+	return vcn << icx->vcn_size_bits;
+}
+
+static s64 ntfs_ib_pos_to_vcn(struct ntfs_index_context *icx, s64 pos)
+{
+	return pos >> icx->vcn_size_bits;
+}
+
+static int ntfs_ib_write(struct ntfs_index_context *icx, struct index_bloc=
k *ib)
+{
+	s64 ret, vcn =3D le64_to_cpu(ib->index_block_vcn);
+
+	ntfs_debug("vcn: %lld\n", vcn);
+
+	ret =3D pre_write_mst_fixup((struct ntfs_record *)ib, icx->block_size);
+	if (ret)
+		return -EIO;
+
+	ret =3D ntfs_inode_attr_pwrite(VFS_I(icx->ia_ni),
+			ntfs_ib_vcn_to_pos(icx, vcn), icx->block_size,
+			(u8 *)ib, icx->sync_write);
+	if (ret !=3D icx->block_size) {
+		ntfs_debug("Failed to write index block %lld, inode %llu",
+				vcn, (unsigned long long)icx->idx_ni->mft_no);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int ntfs_icx_ib_write(struct ntfs_index_context *icx)
+{
+	int err;
+
+	err =3D ntfs_ib_write(icx, icx->ib);
+	if (err)
+		return err;
+
+	icx->ib_dirty =3D false;
+
+	return 0;
+}
+
+int ntfs_icx_ib_sync_write(struct ntfs_index_context *icx)
+{
+	int ret;
+
+	if (icx->ib_dirty =3D=3D false)
+		return 0;
+
+	icx->sync_write =3D true;
+
+	ret =3D ntfs_ib_write(icx, icx->ib);
+	if (!ret) {
+		ntfs_free(icx->ib);
+		icx->ib =3D NULL;
+		icx->ib_dirty =3D false;
+	} else {
+		post_write_mst_fixup((struct ntfs_record *)icx->ib);
+		icx->sync_write =3D false;
+	}
+
+	return ret;
+}
+
+/**
+ * ntfs_index_ctx_get - allocate and initialize a new index context
+ * @ni:		ntfs inode with which to initialize the context
+ * @name:	name of the which context describes
+ * @name_len:	length of the index name
+ *
+ * Allocate a new index context, initialize it with @ni and return it.
+ * Return NULL if allocation failed.
+ */
+struct ntfs_index_context *ntfs_index_ctx_get(struct ntfs_inode *ni,
+		__le16 *name, u32 name_len)
+{
+	struct ntfs_index_context *icx;
+
+	ntfs_debug("Entering\n");
+
+	if (!ni)
+		return NULL;
+
+	if (ni->nr_extents =3D=3D -1)
+		ni =3D ni->ext.base_ntfs_ino;
+
+	icx =3D kmem_cache_alloc(ntfs_index_ctx_cache, GFP_NOFS);
+	if (icx)
+		*icx =3D (struct ntfs_index_context) {
+			.idx_ni =3D ni,
+			.name =3D name,
+			.name_len =3D name_len,
+		};
+	return icx;
+}
+
+static void ntfs_index_ctx_free(struct ntfs_index_context *icx)
+{
+	ntfs_debug("Entering\n");
+
+	if (icx->actx) {
+		ntfs_attr_put_search_ctx(icx->actx);
+		icx->actx =3D NULL;
+	}
+
+	if (!icx->is_in_root) {
+		if (icx->ib_dirty)
+			ntfs_ib_write(icx, icx->ib);
+		ntfs_free(icx->ib);
+		icx->ib =3D NULL;
+	}
+
+	if (icx->ia_ni) {
+		iput(VFS_I(icx->ia_ni));
+		icx->ia_ni =3D NULL;
+	}
+}
+
+/**
+ * ntfs_index_ctx_put - release an index context
+ * @icx:	index context to free
+ *
+ * Release the index context @icx, releasing all associated resources.
+ */
+void ntfs_index_ctx_put(struct ntfs_index_context *icx)
+{
+	ntfs_index_ctx_free(icx);
+	kmem_cache_free(ntfs_index_ctx_cache, icx);
+}
+
+/**
+ * ntfs_index_ctx_reinit - reinitialize an index context
+ * @icx:	index context to reinitialize
+ *
+ * Reinitialize the index context @icx so it can be used for ntfs_index_lo=
okup.
+ */
+void ntfs_index_ctx_reinit(struct ntfs_index_context *icx)
+{
+	ntfs_debug("Entering\n");
+
+	ntfs_index_ctx_free(icx);
+
+	*icx =3D (struct ntfs_index_context) {
+		.idx_ni =3D icx->idx_ni,
+		.name =3D icx->name,
+		.name_len =3D icx->name_len,
+	};
+}
+
+static __le64 *ntfs_ie_get_vcn_addr(struct index_entry *ie)
+{
+	return (__le64 *)((u8 *)ie + le16_to_cpu(ie->length) - sizeof(s64));
+}
+
+/**
+ *  Get the subnode vcn to which the index entry refers.
+ */
+static s64 ntfs_ie_get_vcn(struct index_entry *ie)
+{
+	return le64_to_cpup(ntfs_ie_get_vcn_addr(ie));
+}
+
+static struct index_entry *ntfs_ie_get_first(struct index_header *ih)
+{
+	return (struct index_entry *)((u8 *)ih + le32_to_cpu(ih->entries_offset));
+}
+
+static struct index_entry *ntfs_ie_get_next(struct index_entry *ie)
+{
+	return (struct index_entry *)((char *)ie + le16_to_cpu(ie->length));
+}
+
+static u8 *ntfs_ie_get_end(struct index_header *ih)
+{
+	return (u8 *)ih + le32_to_cpu(ih->index_length);
+}
+
+static int ntfs_ie_end(struct index_entry *ie)
+{
+	return ie->flags & INDEX_ENTRY_END || !ie->length;
+}
+
+/**
+ *  Find the last entry in the index block
+ */
+static struct index_entry *ntfs_ie_get_last(struct index_entry *ie, char *=
ies_end)
+{
+	ntfs_debug("Entering\n");
+
+	while ((char *)ie < ies_end && !ntfs_ie_end(ie))
+		ie =3D ntfs_ie_get_next(ie);
+
+	return ie;
+}
+
+static struct index_entry *ntfs_ie_get_by_pos(struct index_header *ih, int=
 pos)
+{
+	struct index_entry *ie;
+
+	ntfs_debug("pos: %d\n", pos);
+
+	ie =3D ntfs_ie_get_first(ih);
+
+	while (pos-- > 0)
+		ie =3D ntfs_ie_get_next(ie);
+
+	return ie;
+}
+
+static struct index_entry *ntfs_ie_prev(struct index_header *ih, struct in=
dex_entry *ie)
+{
+	struct index_entry *ie_prev =3D NULL;
+	struct index_entry *tmp;
+
+	ntfs_debug("Entering\n");
+
+	tmp =3D ntfs_ie_get_first(ih);
+
+	while (tmp !=3D ie) {
+		ie_prev =3D tmp;
+		tmp =3D ntfs_ie_get_next(tmp);
+	}
+
+	return ie_prev;
+}
+
+static int ntfs_ih_numof_entries(struct index_header *ih)
+{
+	int n;
+	struct index_entry *ie;
+	u8 *end;
+
+	ntfs_debug("Entering\n");
+
+	end =3D ntfs_ie_get_end(ih);
+	ie =3D ntfs_ie_get_first(ih);
+	for (n =3D 0; !ntfs_ie_end(ie) && (u8 *)ie < end; n++)
+		ie =3D ntfs_ie_get_next(ie);
+	return n;
+}
+
+static int ntfs_ih_one_entry(struct index_header *ih)
+{
+	return (ntfs_ih_numof_entries(ih) =3D=3D 1);
+}
+
+static int ntfs_ih_zero_entry(struct index_header *ih)
+{
+	return (ntfs_ih_numof_entries(ih) =3D=3D 0);
+}
+
+static void ntfs_ie_delete(struct index_header *ih, struct index_entry *ie)
+{
+	u32 new_size;
+
+	ntfs_debug("Entering\n");
+
+	new_size =3D le32_to_cpu(ih->index_length) - le16_to_cpu(ie->length);
+	ih->index_length =3D cpu_to_le32(new_size);
+	memmove(ie, (u8 *)ie + le16_to_cpu(ie->length),
+			new_size - ((u8 *)ie - (u8 *)ih));
+}
+
+static void ntfs_ie_set_vcn(struct index_entry *ie, s64 vcn)
+{
+	*ntfs_ie_get_vcn_addr(ie) =3D cpu_to_le64(vcn);
+}
+
+/**
+ *  Insert @ie index entry at @pos entry. Used @ih values should be ok alr=
eady.
+ */
+static void ntfs_ie_insert(struct index_header *ih, struct index_entry *ie,
+		struct index_entry *pos)
+{
+	int ie_size =3D le16_to_cpu(ie->length);
+
+	ntfs_debug("Entering\n");
+
+	ih->index_length =3D cpu_to_le32(le32_to_cpu(ih->index_length) + ie_size);
+	memmove((u8 *)pos + ie_size, pos,
+			le32_to_cpu(ih->index_length) - ((u8 *)pos - (u8 *)ih) - ie_size);
+	memcpy(pos, ie, ie_size);
+}
+
+static struct index_entry *ntfs_ie_dup(struct index_entry *ie)
+{
+	struct index_entry *dup;
+
+	ntfs_debug("Entering\n");
+
+	dup =3D ntfs_malloc_nofs(le16_to_cpu(ie->length));
+	if (dup)
+		memcpy(dup, ie, le16_to_cpu(ie->length));
+
+	return dup;
+}
+
+static struct index_entry *ntfs_ie_dup_novcn(struct index_entry *ie)
+{
+	struct index_entry *dup;
+	int size =3D le16_to_cpu(ie->length);
+
+	ntfs_debug("Entering\n");
+
+	if (ie->flags & INDEX_ENTRY_NODE)
+		size -=3D sizeof(s64);
+
+	dup =3D ntfs_malloc_nofs(size);
+	if (dup) {
+		memcpy(dup, ie, size);
+		dup->flags &=3D ~INDEX_ENTRY_NODE;
+		dup->length =3D cpu_to_le16(size);
+	}
+	return dup;
+}
+
+/*
+ * Check the consistency of an index block
+ *
+ * Make sure the index block does not overflow from the index record.
+ * The size of block is assumed to have been checked to be what is
+ * defined in the index root.
+ *
+ * Returns 0 if no error was found -1 otherwise (with errno unchanged)
+ *
+ * |<--->|  offsetof(struct index_block, index)
+ * |     |<--->|  sizeof(struct index_header)
+ * |     |     |
+ * |     |     | seq          index entries         unused
+ * |=3D=3D=3D=3D=3D|=3D=3D=3D=3D=3D|=3D=3D=3D=3D=3D|=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D|=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D|
+ * |     |           |                           |              |
+ * |     |<--------->| entries_offset            |              |
+ * |     |<---------------- index_length ------->|              |
+ * |     |<--------------------- allocated_size --------------->|
+ * |<--------------------------- block_size ------------------->|
+ *
+ * size(struct index_header) <=3D ent_offset < ind_length <=3D alloc_size =
< bk_size
+ */
+static int ntfs_index_block_inconsistent(struct ntfs_index_context *icx,
+		struct index_block *ib, s64 vcn)
+{
+	u32 ib_size =3D (unsigned int)le32_to_cpu(ib->index.allocated_size) +
+		offsetof(struct index_block, index);
+	struct super_block *sb =3D icx->idx_ni->vol->sb;
+	unsigned long long inum =3D icx->idx_ni->mft_no;
+
+	ntfs_debug("Entering\n");
+
+	if (!ntfs_is_indx_record(ib->magic)) {
+
+		ntfs_error(sb, "Corrupt index block signature: vcn %lld inode %llu\n",
+				vcn, (unsigned long long)icx->idx_ni->mft_no);
+		return -1;
+	}
+
+	if (le64_to_cpu(ib->index_block_vcn) !=3D vcn) {
+		ntfs_error(sb,
+			"Corrupt index block: s64 (%lld) is different from expected s64 (%lld) =
in inode %llu\n",
+			(long long)le64_to_cpu(ib->index_block_vcn),
+			vcn, inum);
+		return -1;
+	}
+
+	if (ib_size !=3D icx->block_size) {
+		ntfs_error(sb,
+			"Corrupt index block : s64 (%lld) of inode %llu has a size (%u) differi=
ng from the index specified size (%u)\n",
+			vcn, inum, ib_size, icx->block_size);
+		return -1;
+	}
+
+	if (le32_to_cpu(ib->index.entries_offset) < sizeof(struct index_header)) {
+		ntfs_error(sb, "Invalid index entry offset in inode %lld\n", inum);
+		return -1;
+	}
+	if (le32_to_cpu(ib->index.index_length) <=3D
+	    le32_to_cpu(ib->index.entries_offset)) {
+		ntfs_error(sb, "No space for index entries in inode %lld\n", inum);
+		return -1;
+	}
+	if (le32_to_cpu(ib->index.allocated_size) <
+	    le32_to_cpu(ib->index.index_length)) {
+		ntfs_error(sb, "Index entries overflow in inode %lld\n", inum);
+		return -1;
+	}
+
+	return 0;
+}
+
+static struct index_root *ntfs_ir_lookup(struct ntfs_inode *ni, __le16 *na=
me,
+		u32 name_len, struct ntfs_attr_search_ctx **ctx)
+{
+	struct attr_record *a;
+	struct index_root *ir =3D NULL;
+
+	ntfs_debug("Entering\n");
+	*ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!*ctx) {
+		ntfs_error(ni->vol->sb, "%s, Failed to get search context", __func__);
+		return NULL;
+	}
+
+	if (ntfs_attr_lookup(AT_INDEX_ROOT, name, name_len, CASE_SENSITIVE,
+				0, NULL, 0, *ctx)) {
+		ntfs_error(ni->vol->sb, "Failed to lookup $INDEX_ROOT");
+		goto err_out;
+	}
+
+	a =3D (*ctx)->attr;
+	if (a->non_resident) {
+		ntfs_error(ni->vol->sb, "Non-resident $INDEX_ROOT detected");
+		goto err_out;
+	}
+
+	ir =3D (struct index_root *)((char *)a + le16_to_cpu(a->data.resident.val=
ue_offset));
+err_out:
+	if (!ir) {
+		ntfs_attr_put_search_ctx(*ctx);
+		*ctx =3D NULL;
+	}
+	return ir;
+}
+
+static struct index_root *ntfs_ir_lookup2(struct ntfs_inode *ni, __le16 *n=
ame, u32 len)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	struct index_root *ir;
+
+	ir =3D ntfs_ir_lookup(ni, name, len, &ctx);
+	if (ir)
+		ntfs_attr_put_search_ctx(ctx);
+	return ir;
+}
+
+/**
+ * Find a key in the index block.
+ */
+static int ntfs_ie_lookup(const void *key, const int key_len,
+		struct ntfs_index_context *icx, struct index_header *ih,
+		s64 *vcn, struct index_entry **ie_out)
+{
+	struct index_entry *ie;
+	u8 *index_end;
+	int rc, item =3D 0;
+
+	ntfs_debug("Entering\n");
+
+	index_end =3D ntfs_ie_get_end(ih);
+
+	/*
+	 * Loop until we exceed valid memory (corruption case) or until we
+	 * reach the last entry.
+	 */
+	for (ie =3D ntfs_ie_get_first(ih); ; ie =3D ntfs_ie_get_next(ie)) {
+		/* Bounds checks. */
+		if ((u8 *)ie + sizeof(struct index_entry_header) > index_end ||
+				(u8 *)ie + le16_to_cpu(ie->length) > index_end) {
+			ntfs_error(icx->idx_ni->vol->sb,
+					"Index entry out of bounds in inode %llu.\n",
+					(unsigned long long)icx->idx_ni->mft_no);
+			return -ERANGE;
+		}
+
+		/*
+		 * The last entry cannot contain a key.  It can however contain
+		 * a pointer to a child node in the B+tree so we just break out.
+		 */
+		if (ntfs_ie_end(ie))
+			break;
+
+		/*
+		 * Not a perfect match, need to do full blown collation so we
+		 * know which way in the B+tree we have to go.
+		 */
+		rc =3D ntfs_collate(icx->idx_ni->vol, icx->cr, key, key_len, &ie->key,
+				le16_to_cpu(ie->key_length));
+		if (rc =3D=3D -2) {
+			ntfs_error(icx->idx_ni->vol->sb,
+				"Collation error. Perhaps a filename contains invalid characters?\n");
+			return -ERANGE;
+		}
+		/*
+		 * If @key collates before the key of the current entry, there
+		 * is definitely no such key in this index but we might need to
+		 * descend into the B+tree so we just break out of the loop.
+		 */
+		if (rc =3D=3D -1)
+			break;
+
+		if (!rc) {
+			*ie_out =3D ie;
+			icx->parent_pos[icx->pindex] =3D item;
+			return 0;
+		}
+
+		item++;
+	}
+	/*
+	 * We have finished with this index block without success. Check for the
+	 * presence of a child node and if not present return with errno ENOENT,
+	 * otherwise we will keep searching in another index block.
+	 */
+	if (!(ie->flags & INDEX_ENTRY_NODE)) {
+		ntfs_debug("Index entry wasn't found.\n");
+		*ie_out =3D ie;
+		return -ENOENT;
+	}
+
+	/* Get the starting vcn of the index_block holding the child node. */
+	*vcn =3D ntfs_ie_get_vcn(ie);
+	if (*vcn < 0) {
+		ntfs_error(icx->idx_ni->vol->sb, "Negative vcn in inode %llu\n",
+				(unsigned long long)icx->idx_ni->mft_no);
+		return -EINVAL;
+	}
+
+	ntfs_debug("Parent entry number %d\n", item);
+	icx->parent_pos[icx->pindex] =3D item;
+
+	return -EAGAIN;
+}
+
+struct ntfs_inode *ntfs_ia_open(struct ntfs_index_context *icx, struct ntf=
s_inode *ni)
+{
+	struct inode *ia_vi;
+
+	ia_vi =3D ntfs_index_iget(VFS_I(ni), icx->name, icx->name_len);
+	if (IS_ERR(ia_vi)) {
+		ntfs_error(icx->idx_ni->vol->sb,
+				"Failed to open index allocation of inode %llu",
+				(unsigned long long)ni->mft_no);
+		return NULL;
+	}
+
+	return NTFS_I(ia_vi);
+}
+
+static int ntfs_ib_read(struct ntfs_index_context *icx, s64 vcn, struct in=
dex_block *dst)
+{
+	s64 pos, ret;
+
+	ntfs_debug("vcn: %lld\n", vcn);
+
+	pos =3D ntfs_ib_vcn_to_pos(icx, vcn);
+
+	ret =3D ntfs_inode_attr_pread(VFS_I(icx->ia_ni), pos, icx->block_size, (u=
8 *)dst);
+	if (ret !=3D icx->block_size) {
+		if (ret =3D=3D -1)
+			ntfs_error(icx->idx_ni->vol->sb, "Failed to read index block");
+		else
+			ntfs_error(icx->idx_ni->vol->sb,
+				"Failed to read full index block at %lld\n", pos);
+		return -1;
+	}
+
+	post_read_mst_fixup((struct ntfs_record *)((u8 *)dst), icx->block_size);
+	if (ntfs_index_block_inconsistent(icx, dst, vcn))
+		return -1;
+
+	return 0;
+}
+
+static int ntfs_icx_parent_inc(struct ntfs_index_context *icx)
+{
+	icx->pindex++;
+	if (icx->pindex >=3D MAX_PARENT_VCN) {
+		ntfs_error(icx->idx_ni->vol->sb, "Index is over %d level deep", MAX_PARE=
NT_VCN);
+		return -EOPNOTSUPP;
+	}
+	return 0;
+}
+
+static int ntfs_icx_parent_dec(struct ntfs_index_context *icx)
+{
+	icx->pindex--;
+	if (icx->pindex < 0) {
+		ntfs_error(icx->idx_ni->vol->sb, "Corrupt index pointer (%d)", icx->pind=
ex);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * ntfs_index_lookup - find a key in an index and return its index entry
+ * @key:	key for which to search in the index
+ * @key_len:	length of @key in bytes
+ * @icx:	context describing the index and the returned entry
+ *
+ * Before calling ntfs_index_lookup(), @icx must have been obtained from a
+ * call to ntfs_index_ctx_get().
+ *
+ * Look for the @key in the index specified by the index lookup context @i=
cx.
+ * ntfs_index_lookup() walks the contents of the index looking for the @ke=
y.
+ *
+ * If the @key is found in the index, 0 is returned and @icx is setup to
+ * describe the index entry containing the matching @key.  @icx->entry is =
the
+ * index entry and @icx->data and @icx->data_len are the index entry data =
and
+ * its length in bytes, respectively.
+ *
+ * If the @key is not found in the index, -ENOENT is returned and
+ * @icx is setup to describe the index entry whose key collates immediately
+ * after the search @key, i.e. this is the position in the index at which
+ * an index entry with a key of @key would need to be inserted.
+ *
+ * When finished with the entry and its data, call ntfs_index_ctx_put() to=
 free
+ * the context and other associated resources.
+ *
+ * If the index entry was modified, call ntfs_index_entry_mark_dirty() bef=
ore
+ * the call to ntfs_index_ctx_put() to ensure that the changes are written
+ * to disk.
+ */
+int ntfs_index_lookup(const void *key, const int key_len, struct ntfs_inde=
x_context *icx)
+{
+	s64 old_vcn, vcn;
+	struct ntfs_inode *ni =3D icx->idx_ni;
+	struct super_block *sb =3D ni->vol->sb;
+	struct index_root *ir;
+	struct index_entry *ie;
+	struct index_block *ib =3D NULL;
+	int err =3D 0;
+
+	ntfs_debug("Entering\n");
+
+	if (!key || key_len <=3D 0) {
+		ntfs_error(sb, "key: %p  key_len: %d", key, key_len);
+		return -EINVAL;
+	}
+
+	ir =3D ntfs_ir_lookup(ni, icx->name, icx->name_len, &icx->actx);
+	if (!ir)
+		return -EIO;
+
+	icx->block_size =3D le32_to_cpu(ir->index_block_size);
+	if (icx->block_size < NTFS_BLOCK_SIZE) {
+		err =3D -EINVAL;
+		ntfs_error(sb,
+			"Index block size (%d) is smaller than the sector size (%d)",
+			icx->block_size, NTFS_BLOCK_SIZE);
+		goto err_out;
+	}
+
+	if (ni->vol->cluster_size <=3D icx->block_size)
+		icx->vcn_size_bits =3D ni->vol->cluster_size_bits;
+	else
+		icx->vcn_size_bits =3D ni->vol->sector_size_bits;
+
+	icx->cr =3D ir->collation_rule;
+	if (!ntfs_is_collation_rule_supported(icx->cr)) {
+		err =3D -EOPNOTSUPP;
+		ntfs_error(sb, "Unknown collation rule 0x%x",
+				(unsigned int)le32_to_cpu(icx->cr));
+		goto err_out;
+	}
+
+	old_vcn =3D VCN_INDEX_ROOT_PARENT;
+	err =3D ntfs_ie_lookup(key, key_len, icx, &ir->index, &vcn, &ie);
+	if (err =3D=3D -ERANGE || err =3D=3D -EINVAL)
+		goto err_out;
+
+	icx->ir =3D ir;
+	if (err !=3D -EAGAIN) {
+		icx->is_in_root =3D true;
+		icx->parent_vcn[icx->pindex] =3D old_vcn;
+		goto done;
+	}
+
+	/* Child node present, descend into it. */
+	icx->ia_ni =3D ntfs_ia_open(icx, ni);
+	if (!icx->ia_ni) {
+		err =3D -ENOENT;
+		goto err_out;
+	}
+
+	ib =3D ntfs_malloc_nofs(icx->block_size);
+	if (!ib) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+descend_into_child_node:
+	icx->parent_vcn[icx->pindex] =3D old_vcn;
+	if (ntfs_icx_parent_inc(icx)) {
+		err =3D -EIO;
+		goto err_out;
+	}
+	old_vcn =3D vcn;
+
+	ntfs_debug("Descend into node with s64 %lld.\n", vcn);
+
+	if (ntfs_ib_read(icx, vcn, ib)) {
+		err =3D -EIO;
+		goto err_out;
+	}
+	err =3D ntfs_ie_lookup(key, key_len, icx, &ib->index, &vcn, &ie);
+	if (err !=3D -EAGAIN) {
+		if (err =3D=3D -EINVAL || err =3D=3D -ERANGE)
+			goto err_out;
+
+		icx->is_in_root =3D false;
+		icx->ib =3D ib;
+		icx->parent_vcn[icx->pindex] =3D vcn;
+		goto done;
+	}
+
+	if ((ib->index.flags & NODE_MASK) =3D=3D LEAF_NODE) {
+		ntfs_error(icx->idx_ni->vol->sb,
+			"Index entry with child node found in a leaf node in inode 0x%llx.\n",
+			(unsigned long long)ni->mft_no);
+		goto err_out;
+	}
+
+	goto descend_into_child_node;
+err_out:
+	if (icx->actx) {
+		ntfs_attr_put_search_ctx(icx->actx);
+		icx->actx =3D NULL;
+	}
+	ntfs_free(ib);
+	if (!err)
+		err =3D -EIO;
+	return err;
+done:
+	icx->entry =3D ie;
+	icx->data =3D (u8 *)ie + offsetof(struct index_entry, key);
+	icx->data_len =3D le16_to_cpu(ie->key_length);
+	ntfs_debug("Done.\n");
+	return err;
+
+}
+
+static struct index_block *ntfs_ib_alloc(s64 ib_vcn, u32 ib_size,
+		u8 node_type)
+{
+	struct index_block *ib;
+	int ih_size =3D sizeof(struct index_header);
+
+	ntfs_debug("Entering ib_vcn =3D %lld ib_size =3D %u\n", ib_vcn, ib_size);
+
+	ib =3D ntfs_malloc_nofs(ib_size);
+	if (!ib)
+		return NULL;
+
+	ib->magic =3D magic_INDX;
+	ib->usa_ofs =3D cpu_to_le16(sizeof(struct index_block));
+	ib->usa_count =3D cpu_to_le16(ib_size / NTFS_BLOCK_SIZE + 1);
+	/* Set USN to 1 */
+	*(__le16 *)((char *)ib + le16_to_cpu(ib->usa_ofs)) =3D cpu_to_le16(1);
+	ib->lsn =3D 0;
+	ib->index_block_vcn =3D cpu_to_le64(ib_vcn);
+	ib->index.entries_offset =3D cpu_to_le32((ih_size +
+				le16_to_cpu(ib->usa_count) * 2 + 7) & ~7);
+	ib->index.index_length =3D 0;
+	ib->index.allocated_size =3D cpu_to_le32(ib_size -
+			(sizeof(struct index_block) - ih_size));
+	ib->index.flags =3D node_type;
+
+	return ib;
+}
+
+/**
+ *  Find the median by going through all the entries
+ */
+static struct index_entry *ntfs_ie_get_median(struct index_header *ih)
+{
+	struct index_entry *ie, *ie_start;
+	u8 *ie_end;
+	int i =3D 0, median;
+
+	ntfs_debug("Entering\n");
+
+	ie =3D ie_start =3D ntfs_ie_get_first(ih);
+	ie_end =3D (u8 *)ntfs_ie_get_end(ih);
+
+	while ((u8 *)ie < ie_end && !ntfs_ie_end(ie)) {
+		ie =3D ntfs_ie_get_next(ie);
+		i++;
+	}
+	/*
+	 * NOTE: this could be also the entry at the half of the index block.
+	 */
+	median =3D i / 2 - 1;
+
+	ntfs_debug("Entries: %d  median: %d\n", i, median);
+
+	for (i =3D 0, ie =3D ie_start; i <=3D median; i++)
+		ie =3D ntfs_ie_get_next(ie);
+
+	return ie;
+}
+
+static s64 ntfs_ibm_vcn_to_pos(struct ntfs_index_context *icx, s64 vcn)
+{
+	return ntfs_ib_vcn_to_pos(icx, vcn) / icx->block_size;
+}
+
+static s64 ntfs_ibm_pos_to_vcn(struct ntfs_index_context *icx, s64 pos)
+{
+	return ntfs_ib_pos_to_vcn(icx, pos * icx->block_size);
+}
+
+static int ntfs_ibm_add(struct ntfs_index_context *icx)
+{
+	u8 bmp[8];
+
+	ntfs_debug("Entering\n");
+
+	if (ntfs_attr_exist(icx->idx_ni, AT_BITMAP, icx->name, icx->name_len))
+		return 0;
+	/*
+	 * AT_BITMAP must be at least 8 bytes.
+	 */
+	memset(bmp, 0, sizeof(bmp));
+	if (ntfs_attr_add(icx->idx_ni, AT_BITMAP, icx->name, icx->name_len,
+				bmp, sizeof(bmp))) {
+		ntfs_error(icx->idx_ni->vol->sb, "Failed to add AT_BITMAP");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ntfs_ibm_modify(struct ntfs_index_context *icx, s64 vcn, int se=
t)
+{
+	u8 byte;
+	u64 pos =3D (u64)ntfs_ibm_vcn_to_pos(icx, vcn);
+	u32 bpos =3D pos / 8;
+	u32 bit =3D 1 << (pos % 8);
+	struct ntfs_inode *bmp_ni;
+	struct inode *bmp_vi;
+	int ret =3D 0;
+
+	ntfs_debug("%s vcn: %lld\n", set ? "set" : "clear", vcn);
+
+	bmp_vi =3D ntfs_attr_iget(VFS_I(icx->idx_ni), AT_BITMAP, icx->name, icx->=
name_len);
+	if (IS_ERR(bmp_vi)) {
+		ntfs_error(icx->idx_ni->vol->sb, "Failed to open $BITMAP attribute");
+		return PTR_ERR(bmp_vi);
+	}
+
+	bmp_ni =3D NTFS_I(bmp_vi);
+
+	if (set) {
+		if (bmp_ni->data_size < bpos + 1) {
+			ret =3D ntfs_attr_truncate(bmp_ni, (bmp_ni->data_size + 8) & ~7);
+			if (ret) {
+				ntfs_error(icx->idx_ni->vol->sb, "Failed to truncate AT_BITMAP");
+				goto err;
+			}
+			i_size_write(bmp_vi, (loff_t)bmp_ni->data_size);
+		}
+	}
+
+	if (ntfs_inode_attr_pread(bmp_vi, bpos, 1, &byte) !=3D 1) {
+		ret =3D -EIO;
+		ntfs_error(icx->idx_ni->vol->sb, "Failed to read $BITMAP");
+		goto err;
+	}
+
+	if (set)
+		byte |=3D bit;
+	else
+		byte &=3D ~bit;
+
+	if (ntfs_inode_attr_pwrite(bmp_vi, bpos, 1, &byte, false) !=3D 1) {
+		ret =3D -EIO;
+		ntfs_error(icx->idx_ni->vol->sb, "Failed to write $Bitmap");
+		goto err;
+	}
+
+err:
+	iput(bmp_vi);
+	return ret;
+}
+
+static int ntfs_ibm_set(struct ntfs_index_context *icx, s64 vcn)
+{
+	return ntfs_ibm_modify(icx, vcn, 1);
+}
+
+static int ntfs_ibm_clear(struct ntfs_index_context *icx, s64 vcn)
+{
+	return ntfs_ibm_modify(icx, vcn, 0);
+}
+
+static s64 ntfs_ibm_get_free(struct ntfs_index_context *icx)
+{
+	u8 *bm;
+	int bit;
+	s64 vcn, byte, size;
+
+	ntfs_debug("Entering\n");
+
+	bm =3D ntfs_attr_readall(icx->idx_ni, AT_BITMAP,  icx->name, icx->name_le=
n,
+			&size);
+	if (!bm)
+		return (s64)-1;
+
+	for (byte =3D 0; byte < size; byte++) {
+		if (bm[byte] =3D=3D 255)
+			continue;
+
+		for (bit =3D 0; bit < 8; bit++) {
+			if (!(bm[byte] & (1 << bit))) {
+				vcn =3D ntfs_ibm_pos_to_vcn(icx, byte * 8 + bit);
+				goto out;
+			}
+		}
+	}
+
+	vcn =3D ntfs_ibm_pos_to_vcn(icx, size * 8);
+out:
+	ntfs_debug("allocated vcn: %lld\n", vcn);
+
+	if (ntfs_ibm_set(icx, vcn))
+		vcn =3D (s64)-1;
+
+	ntfs_free(bm);
+	return vcn;
+}
+
+static struct index_block *ntfs_ir_to_ib(struct index_root *ir, s64 ib_vcn)
+{
+	struct index_block *ib;
+	struct index_entry *ie_last;
+	char *ies_start, *ies_end;
+	int i;
+
+	ntfs_debug("Entering\n");
+
+	ib =3D ntfs_ib_alloc(ib_vcn, le32_to_cpu(ir->index_block_size), LEAF_NODE=
);
+	if (!ib)
+		return NULL;
+
+	ies_start =3D (char *)ntfs_ie_get_first(&ir->index);
+	ies_end   =3D (char *)ntfs_ie_get_end(&ir->index);
+	ie_last   =3D ntfs_ie_get_last((struct index_entry *)ies_start, ies_end);
+	/*
+	 * Copy all entries, including the termination entry
+	 * as well, which can never have any data.
+	 */
+	i =3D (char *)ie_last - ies_start + le16_to_cpu(ie_last->length);
+	memcpy(ntfs_ie_get_first(&ib->index), ies_start, i);
+
+	ib->index.flags =3D ir->index.flags;
+	ib->index.index_length =3D cpu_to_le32(i +
+			le32_to_cpu(ib->index.entries_offset));
+	return ib;
+}
+
+static void ntfs_ir_nill(struct index_root *ir)
+{
+	struct index_entry *ie_last;
+	char *ies_start, *ies_end;
+
+	ntfs_debug("Entering\n");
+
+	ies_start =3D (char *)ntfs_ie_get_first(&ir->index);
+	ies_end   =3D (char *)ntfs_ie_get_end(&ir->index);
+	ie_last   =3D ntfs_ie_get_last((struct index_entry *)ies_start, ies_end);
+	/*
+	 * Move the index root termination entry forward
+	 */
+	if ((char *)ie_last > ies_start) {
+		memmove((char *)ntfs_ie_get_first(&ir->index),
+			(char *)ie_last, le16_to_cpu(ie_last->length));
+		ie_last =3D (struct index_entry *)ies_start;
+	}
+}
+
+static int ntfs_ib_copy_tail(struct ntfs_index_context *icx, struct index_=
block *src,
+		struct index_entry *median, s64 new_vcn)
+{
+	u8 *ies_end;
+	struct index_entry *ie_head;		/* first entry after the median */
+	int tail_size, ret;
+	struct index_block *dst;
+
+	ntfs_debug("Entering\n");
+
+	dst =3D ntfs_ib_alloc(new_vcn, icx->block_size,
+			src->index.flags & NODE_MASK);
+	if (!dst)
+		return -ENOMEM;
+
+	ie_head =3D ntfs_ie_get_next(median);
+
+	ies_end =3D (u8 *)ntfs_ie_get_end(&src->index);
+	tail_size =3D ies_end - (u8 *)ie_head;
+	memcpy(ntfs_ie_get_first(&dst->index), ie_head, tail_size);
+
+	dst->index.index_length =3D cpu_to_le32(tail_size +
+			le32_to_cpu(dst->index.entries_offset));
+	ret =3D ntfs_ib_write(icx, dst);
+
+	ntfs_free(dst);
+	return ret;
+}
+
+static int ntfs_ib_cut_tail(struct ntfs_index_context *icx, struct index_b=
lock *ib,
+		struct index_entry *ie)
+{
+	char *ies_start, *ies_end;
+	struct index_entry *ie_last;
+	int ret;
+
+	ntfs_debug("Entering\n");
+
+	ies_start =3D (char *)ntfs_ie_get_first(&ib->index);
+	ies_end   =3D (char *)ntfs_ie_get_end(&ib->index);
+
+	ie_last   =3D ntfs_ie_get_last((struct index_entry *)ies_start, ies_end);
+	if (ie_last->flags & INDEX_ENTRY_NODE)
+		ntfs_ie_set_vcn(ie_last, ntfs_ie_get_vcn(ie));
+
+	unsafe_memcpy(ie, ie_last, le16_to_cpu(ie_last->length),
+			/* alloc is larger than ie_last->length, see ntfs_ie_get_last() */);
+
+	ib->index.index_length =3D cpu_to_le32(((char *)ie - ies_start) +
+			le16_to_cpu(ie->length) + le32_to_cpu(ib->index.entries_offset));
+
+	ret =3D ntfs_ib_write(icx, ib);
+	return ret;
+}
+
+static int ntfs_ia_add(struct ntfs_index_context *icx)
+{
+	int ret;
+
+	ntfs_debug("Entering\n");
+
+	ret =3D ntfs_ibm_add(icx);
+	if (ret)
+		return ret;
+
+	if (!ntfs_attr_exist(icx->idx_ni, AT_INDEX_ALLOCATION, icx->name, icx->na=
me_len)) {
+		ret =3D ntfs_attr_add(icx->idx_ni, AT_INDEX_ALLOCATION, icx->name,
+					icx->name_len, NULL, 0);
+		if (ret) {
+			ntfs_error(icx->idx_ni->vol->sb, "Failed to add AT_INDEX_ALLOCATION");
+			return ret;
+		}
+	}
+
+	icx->ia_ni =3D ntfs_ia_open(icx, icx->idx_ni);
+	if (!icx->ia_ni)
+		return -ENOENT;
+
+	return 0;
+}
+
+static int ntfs_ir_reparent(struct ntfs_index_context *icx)
+{
+	struct ntfs_attr_search_ctx *ctx =3D NULL;
+	struct index_root *ir;
+	struct index_entry *ie;
+	struct index_block *ib =3D NULL;
+	s64 new_ib_vcn;
+	int ix_root_size;
+	int ret =3D 0;
+
+	ntfs_debug("Entering\n");
+
+	ir =3D ntfs_ir_lookup2(icx->idx_ni, icx->name, icx->name_len);
+	if (!ir) {
+		ret =3D -ENOENT;
+		goto out;
+	}
+
+	if ((ir->index.flags & NODE_MASK) =3D=3D SMALL_INDEX) {
+		ret =3D ntfs_ia_add(icx);
+		if (ret)
+			goto out;
+	}
+
+	new_ib_vcn =3D ntfs_ibm_get_free(icx);
+	if (new_ib_vcn < 0) {
+		ret =3D -EINVAL;
+		goto out;
+	}
+
+	ir =3D ntfs_ir_lookup2(icx->idx_ni, icx->name, icx->name_len);
+	if (!ir) {
+		ret =3D -ENOENT;
+		goto clear_bmp;
+	}
+
+	ib =3D ntfs_ir_to_ib(ir, new_ib_vcn);
+	if (ib =3D=3D NULL) {
+		ret =3D -EIO;
+		ntfs_error(icx->idx_ni->vol->sb, "Failed to move index root to index blo=
ck");
+		goto clear_bmp;
+	}
+
+	ret =3D ntfs_ib_write(icx, ib);
+	if (ret)
+		goto clear_bmp;
+
+retry:
+	ir =3D ntfs_ir_lookup(icx->idx_ni, icx->name, icx->name_len, &ctx);
+	if (!ir) {
+		ret =3D -ENOENT;
+		goto clear_bmp;
+	}
+
+	ntfs_ir_nill(ir);
+
+	ie =3D ntfs_ie_get_first(&ir->index);
+	ie->flags |=3D INDEX_ENTRY_NODE;
+	ie->length =3D cpu_to_le16(sizeof(struct index_entry_header) + sizeof(s64=
));
+
+	ir->index.flags =3D LARGE_INDEX;
+	NInoSetIndexAllocPresent(icx->idx_ni);
+	ir->index.index_length =3D cpu_to_le32(le32_to_cpu(ir->index.entries_offs=
et) +
+			le16_to_cpu(ie->length));
+	ir->index.allocated_size =3D ir->index.index_length;
+
+	ix_root_size =3D sizeof(struct index_root) - sizeof(struct index_header) +
+		le32_to_cpu(ir->index.allocated_size);
+	ret  =3D ntfs_resident_attr_value_resize(ctx->mrec, ctx->attr, ix_root_si=
ze);
+	if (ret) {
+		/*
+		 * When there is no space to build a non-resident
+		 * index, we may have to move the root to an extent
+		 */
+		if ((ret =3D=3D -ENOSPC) && (ctx->al_entry || !ntfs_inode_add_attrlist(i=
cx->idx_ni))) {
+			ntfs_attr_put_search_ctx(ctx);
+			ctx =3D NULL;
+			ir =3D ntfs_ir_lookup(icx->idx_ni, icx->name, icx->name_len, &ctx);
+			if (ir && !ntfs_attr_record_move_away(ctx, ix_root_size -
+					le32_to_cpu(ctx->attr->data.resident.value_length))) {
+				if (ntfs_attrlist_update(ctx->base_ntfs_ino ?
+							 ctx->base_ntfs_ino : ctx->ntfs_ino))
+					goto clear_bmp;
+				ntfs_attr_put_search_ctx(ctx);
+				ctx =3D NULL;
+				goto retry;
+			}
+		}
+		goto clear_bmp;
+	} else {
+		icx->idx_ni->data_size =3D icx->idx_ni->initialized_size =3D ix_root_siz=
e;
+		icx->idx_ni->allocated_size =3D (ix_root_size  + 7) & ~7;
+	}
+	ntfs_ie_set_vcn(ie, new_ib_vcn);
+
+err_out:
+	ntfs_free(ib);
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+out:
+	return ret;
+clear_bmp:
+	ntfs_ibm_clear(icx, new_ib_vcn);
+	goto err_out;
+}
+
+/**
+ * ntfs_ir_truncate - Truncate index root attribute
+ */
+static int ntfs_ir_truncate(struct ntfs_index_context *icx, int data_size)
+{
+	int ret;
+
+	ntfs_debug("Entering\n");
+
+	/*
+	 *  INDEX_ROOT must be resident and its entries can be moved to
+	 *  struct index_block, so ENOSPC isn't a real error.
+	 */
+	ret =3D ntfs_attr_truncate(icx->idx_ni, data_size + offsetof(struct index=
_root, index));
+	if (!ret) {
+		i_size_write(VFS_I(icx->idx_ni), icx->idx_ni->initialized_size);
+		icx->ir =3D ntfs_ir_lookup2(icx->idx_ni, icx->name, icx->name_len);
+		if (!icx->ir)
+			return -ENOENT;
+
+		icx->ir->index.allocated_size =3D cpu_to_le32(data_size);
+	} else if (ret !=3D -ENOSPC)
+		ntfs_error(icx->idx_ni->vol->sb, "Failed to truncate INDEX_ROOT");
+
+	return ret;
+}
+
+/**
+ * ntfs_ir_make_space - Make more space for the index root attribute
+ */
+static int ntfs_ir_make_space(struct ntfs_index_context *icx, int data_siz=
e)
+{
+	int ret;
+
+	ntfs_debug("Entering\n");
+
+	ret =3D ntfs_ir_truncate(icx, data_size);
+	if (ret =3D=3D -ENOSPC) {
+		ret =3D ntfs_ir_reparent(icx);
+		if (!ret)
+			ret =3D -EAGAIN;
+		else
+			ntfs_error(icx->idx_ni->vol->sb, "Failed to modify INDEX_ROOT");
+	}
+
+	return ret;
+}
+
+/*
+ * NOTE: 'ie' must be a copy of a real index entry.
+ */
+static int ntfs_ie_add_vcn(struct index_entry **ie)
+{
+	struct index_entry *p, *old =3D *ie;
+
+	old->length =3D cpu_to_le16(le16_to_cpu(old->length) + sizeof(s64));
+	p =3D ntfs_realloc_nofs(old, le16_to_cpu(old->length),
+			le16_to_cpu(old->length) - sizeof(s64));
+	if (!p)
+		return -ENOMEM;
+
+	p->flags |=3D INDEX_ENTRY_NODE;
+	*ie =3D p;
+	return 0;
+}
+
+static int ntfs_ih_insert(struct index_header *ih, struct index_entry *ori=
g_ie, s64 new_vcn,
+		int pos)
+{
+	struct index_entry *ie_node, *ie;
+	int ret =3D 0;
+	s64 old_vcn;
+
+	ntfs_debug("Entering\n");
+	ie =3D ntfs_ie_dup(orig_ie);
+	if (!ie)
+		return -ENOMEM;
+
+	if (!(ie->flags & INDEX_ENTRY_NODE)) {
+		ret =3D ntfs_ie_add_vcn(&ie);
+		if (ret)
+			goto out;
+	}
+
+	ie_node =3D ntfs_ie_get_by_pos(ih, pos);
+	old_vcn =3D ntfs_ie_get_vcn(ie_node);
+	ntfs_ie_set_vcn(ie_node, new_vcn);
+
+	ntfs_ie_insert(ih, ie, ie_node);
+	ntfs_ie_set_vcn(ie_node, old_vcn);
+out:
+	ntfs_free(ie);
+	return ret;
+}
+
+static s64 ntfs_icx_parent_vcn(struct ntfs_index_context *icx)
+{
+	return icx->parent_vcn[icx->pindex];
+}
+
+static s64 ntfs_icx_parent_pos(struct ntfs_index_context *icx)
+{
+	return icx->parent_pos[icx->pindex];
+}
+
+static int ntfs_ir_insert_median(struct ntfs_index_context *icx, struct in=
dex_entry *median,
+		s64 new_vcn)
+{
+	u32 new_size;
+	int ret;
+
+	ntfs_debug("Entering\n");
+
+	icx->ir =3D ntfs_ir_lookup2(icx->idx_ni, icx->name, icx->name_len);
+	if (!icx->ir)
+		return -ENOENT;
+
+	new_size =3D le32_to_cpu(icx->ir->index.index_length) +
+		le16_to_cpu(median->length);
+	if (!(median->flags & INDEX_ENTRY_NODE))
+		new_size +=3D sizeof(s64);
+
+	ret =3D ntfs_ir_make_space(icx, new_size);
+	if (ret)
+		return ret;
+
+	icx->ir =3D ntfs_ir_lookup2(icx->idx_ni, icx->name, icx->name_len);
+	if (!icx->ir)
+		return -ENOENT;
+
+	return ntfs_ih_insert(&icx->ir->index, median, new_vcn,
+			ntfs_icx_parent_pos(icx));
+}
+
+static int ntfs_ib_split(struct ntfs_index_context *icx, struct index_bloc=
k *ib);
+
+struct split_info {
+	struct list_head entry;
+	s64 new_vcn;
+	struct index_block *ib;
+};
+
+static int ntfs_ib_insert(struct ntfs_index_context *icx, struct index_ent=
ry *ie, s64 new_vcn,
+		struct split_info *si)
+{
+	struct index_block *ib;
+	u32 idx_size, allocated_size;
+	int err;
+	s64 old_vcn;
+
+	ntfs_debug("Entering\n");
+
+	ib =3D ntfs_malloc_nofs(icx->block_size);
+	if (!ib)
+		return -ENOMEM;
+
+	old_vcn =3D ntfs_icx_parent_vcn(icx);
+
+	err =3D ntfs_ib_read(icx, old_vcn, ib);
+	if (err)
+		goto err_out;
+
+	idx_size =3D le32_to_cpu(ib->index.index_length);
+	allocated_size =3D le32_to_cpu(ib->index.allocated_size);
+	if (idx_size + le16_to_cpu(ie->length) + sizeof(s64) > allocated_size) {
+		si->ib =3D ib;
+		si->new_vcn =3D new_vcn;
+		return -EAGAIN;
+	}
+
+	err =3D ntfs_ih_insert(&ib->index, ie, new_vcn, ntfs_icx_parent_pos(icx));
+	if (err)
+		goto err_out;
+
+	err =3D ntfs_ib_write(icx, ib);
+
+err_out:
+	ntfs_free(ib);
+	return err;
+}
+
+/**
+ * ntfs_ib_split - Split an index block
+ */
+static int ntfs_ib_split(struct ntfs_index_context *icx, struct index_bloc=
k *ib)
+{
+	struct index_entry *median;
+	s64 new_vcn;
+	int ret;
+	struct split_info *si;
+	LIST_HEAD(ntfs_cut_tail_list);
+
+	ntfs_debug("Entering\n");
+
+resplit:
+	ret =3D ntfs_icx_parent_dec(icx);
+	if (ret)
+		goto out;
+
+	median  =3D ntfs_ie_get_median(&ib->index);
+	new_vcn =3D ntfs_ibm_get_free(icx);
+	if (new_vcn < 0) {
+		ret =3D -EINVAL;
+		goto out;
+	}
+
+	ret =3D ntfs_ib_copy_tail(icx, ib, median, new_vcn);
+	if (ret) {
+		ntfs_ibm_clear(icx, new_vcn);
+		goto out;
+	}
+
+	if (ntfs_icx_parent_vcn(icx) =3D=3D VCN_INDEX_ROOT_PARENT) {
+		ret =3D ntfs_ir_insert_median(icx, median, new_vcn);
+		if (ret) {
+			ntfs_ibm_clear(icx, new_vcn);
+			goto out;
+		}
+	} else {
+		si =3D kzalloc(sizeof(struct split_info), GFP_NOFS);
+		if (!si) {
+			ntfs_ibm_clear(icx, new_vcn);
+			ret =3D -ENOMEM;
+			goto out;
+		}
+
+		ret =3D ntfs_ib_insert(icx, median, new_vcn, si);
+		if (ret =3D=3D -EAGAIN) {
+			list_add_tail(&si->entry, &ntfs_cut_tail_list);
+			ib =3D si->ib;
+			goto resplit;
+		} else if (ret) {
+			ntfs_free(si->ib);
+			kfree(si);
+			ntfs_ibm_clear(icx, new_vcn);
+			goto out;
+		}
+		kfree(si);
+	}
+
+	ret =3D ntfs_ib_cut_tail(icx, ib, median);
+
+out:
+	while (!list_empty(&ntfs_cut_tail_list)) {
+		si =3D list_last_entry(&ntfs_cut_tail_list, struct split_info, entry);
+		ntfs_ibm_clear(icx, si->new_vcn);
+		ntfs_free(si->ib);
+		list_del(&si->entry);
+		kfree(si);
+		if (!ret)
+			ret =3D -EAGAIN;
+	}
+
+	return ret;
+}
+
+int ntfs_ie_add(struct ntfs_index_context *icx, struct index_entry *ie)
+{
+	struct index_header *ih;
+	int allocated_size, new_size;
+	int ret;
+
+	while (1) {
+		ret =3D ntfs_index_lookup(&ie->key, le16_to_cpu(ie->key_length), icx);
+		if (!ret) {
+			ret =3D -EEXIST;
+			ntfs_error(icx->idx_ni->vol->sb, "Index already have such entry");
+			goto err_out;
+		}
+		if (ret !=3D -ENOENT) {
+			ntfs_error(icx->idx_ni->vol->sb, "Failed to find place for new entry");
+			goto err_out;
+		}
+		ret =3D 0;
+
+		if (icx->is_in_root)
+			ih =3D &icx->ir->index;
+		else
+			ih =3D &icx->ib->index;
+
+		allocated_size =3D le32_to_cpu(ih->allocated_size);
+		new_size =3D le32_to_cpu(ih->index_length) + le16_to_cpu(ie->length);
+
+		if (new_size <=3D allocated_size)
+			break;
+
+		ntfs_debug("index block sizes: allocated: %d  needed: %d\n",
+				allocated_size, new_size);
+
+		if (icx->is_in_root)
+			ret =3D ntfs_ir_make_space(icx, new_size);
+		else
+			ret =3D ntfs_ib_split(icx, icx->ib);
+		if (ret && ret !=3D -EAGAIN)
+			goto err_out;
+
+		mark_mft_record_dirty(icx->actx->ntfs_ino);
+		ntfs_index_ctx_reinit(icx);
+	}
+
+	ntfs_ie_insert(ih, ie, icx->entry);
+	ntfs_index_entry_mark_dirty(icx);
+
+err_out:
+	ntfs_debug("%s\n", ret ? "Failed" : "Done");
+	return ret;
+}
+
+/**
+ * ntfs_index_add_filename - add filename to directory index
+ * @ni:		ntfs inode describing directory to which index add filename
+ * @fn:		FILE_NAME attribute to add
+ * @mref:	reference of the inode which @fn describes
+ */
+int ntfs_index_add_filename(struct ntfs_inode *ni, struct file_name_attr *=
fn, u64 mref)
+{
+	struct index_entry *ie;
+	struct ntfs_index_context *icx;
+	int fn_size, ie_size, err;
+
+	ntfs_debug("Entering\n");
+
+	if (!ni || !fn)
+		return -EINVAL;
+
+	fn_size =3D (fn->file_name_length * sizeof(__le16)) +
+		sizeof(struct file_name_attr);
+	ie_size =3D (sizeof(struct index_entry_header) + fn_size + 7) & ~7;
+
+	ie =3D ntfs_malloc_nofs(ie_size);
+	if (!ie)
+		return -ENOMEM;
+
+	ie->data.dir.indexed_file =3D cpu_to_le64(mref);
+	ie->length	 =3D cpu_to_le16(ie_size);
+	ie->key_length	 =3D cpu_to_le16(fn_size);
+
+	unsafe_memcpy(&ie->key, fn, fn_size,
+		      /* "fn_size" was correctly calculated above */);
+
+	icx =3D ntfs_index_ctx_get(ni, I30, 4);
+	if (!icx) {
+		err =3D -ENOMEM;
+		goto out;
+	}
+
+	err =3D ntfs_ie_add(icx, ie);
+	ntfs_index_ctx_put(icx);
+out:
+	ntfs_free(ie);
+	return err;
+}
+
+static int ntfs_ih_takeout(struct ntfs_index_context *icx, struct index_he=
ader *ih,
+		struct index_entry *ie, struct index_block *ib)
+{
+	struct index_entry *ie_roam;
+	int freed_space;
+	bool full;
+	int ret =3D 0;
+
+	ntfs_debug("Entering\n");
+
+	full =3D ih->index_length =3D=3D ih->allocated_size;
+	ie_roam =3D ntfs_ie_dup_novcn(ie);
+	if (!ie_roam)
+		return -ENOMEM;
+
+	ntfs_ie_delete(ih, ie);
+
+	if (ntfs_icx_parent_vcn(icx) =3D=3D VCN_INDEX_ROOT_PARENT) {
+		/*
+		 * Recover the space which may have been freed
+		 * while deleting an entry from root index
+		 */
+		freed_space =3D le32_to_cpu(ih->allocated_size) -
+			le32_to_cpu(ih->index_length);
+		if (full && (freed_space > 0) && !(freed_space & 7)) {
+			ntfs_ir_truncate(icx, le32_to_cpu(ih->index_length));
+			/* do nothing if truncation fails */
+		}
+
+		mark_mft_record_dirty(icx->actx->ntfs_ino);
+	} else {
+		ret =3D ntfs_ib_write(icx, ib);
+		if (ret)
+			goto out;
+	}
+
+	ntfs_index_ctx_reinit(icx);
+
+	ret =3D ntfs_ie_add(icx, ie_roam);
+out:
+	ntfs_free(ie_roam);
+	return ret;
+}
+
+/**
+ *  Used if an empty index block to be deleted has END entry as the parent
+ *  in the INDEX_ROOT which is the only one there.
+ */
+static void ntfs_ir_leafify(struct ntfs_index_context *icx, struct index_h=
eader *ih)
+{
+	struct index_entry *ie;
+
+	ntfs_debug("Entering\n");
+
+	ie =3D ntfs_ie_get_first(ih);
+	ie->flags &=3D ~INDEX_ENTRY_NODE;
+	ie->length =3D cpu_to_le16(le16_to_cpu(ie->length) - sizeof(s64));
+
+	ih->index_length =3D cpu_to_le32(le32_to_cpu(ih->index_length) - sizeof(s=
64));
+	ih->flags &=3D ~LARGE_INDEX;
+	NInoClearIndexAllocPresent(icx->idx_ni);
+
+	/* Not fatal error */
+	ntfs_ir_truncate(icx, le32_to_cpu(ih->index_length));
+}
+
+/**
+ *  Used if an empty index block to be deleted has END entry as the parent
+ *  in the INDEX_ROOT which is not the only one there.
+ */
+static int ntfs_ih_reparent_end(struct ntfs_index_context *icx, struct ind=
ex_header *ih,
+		struct index_block *ib)
+{
+	struct index_entry *ie, *ie_prev;
+
+	ntfs_debug("Entering\n");
+
+	ie =3D ntfs_ie_get_by_pos(ih, ntfs_icx_parent_pos(icx));
+	ie_prev =3D ntfs_ie_prev(ih, ie);
+	if (!ie_prev)
+		return -EIO;
+	ntfs_ie_set_vcn(ie, ntfs_ie_get_vcn(ie_prev));
+
+	return ntfs_ih_takeout(icx, ih, ie_prev, ib);
+}
+
+static int ntfs_index_rm_leaf(struct ntfs_index_context *icx)
+{
+	struct index_block *ib =3D NULL;
+	struct index_header *parent_ih;
+	struct index_entry *ie;
+	int ret;
+
+	ntfs_debug("pindex: %d\n", icx->pindex);
+
+	ret =3D ntfs_icx_parent_dec(icx);
+	if (ret)
+		return ret;
+
+	ret =3D ntfs_ibm_clear(icx, icx->parent_vcn[icx->pindex + 1]);
+	if (ret)
+		return ret;
+
+	if (ntfs_icx_parent_vcn(icx) =3D=3D VCN_INDEX_ROOT_PARENT)
+		parent_ih =3D &icx->ir->index;
+	else {
+		ib =3D ntfs_malloc_nofs(icx->block_size);
+		if (!ib)
+			return -ENOMEM;
+
+		ret =3D ntfs_ib_read(icx, ntfs_icx_parent_vcn(icx), ib);
+		if (ret)
+			goto out;
+
+		parent_ih =3D &ib->index;
+	}
+
+	ie =3D ntfs_ie_get_by_pos(parent_ih, ntfs_icx_parent_pos(icx));
+	if (!ntfs_ie_end(ie)) {
+		ret =3D ntfs_ih_takeout(icx, parent_ih, ie, ib);
+		goto out;
+	}
+
+	if (ntfs_ih_zero_entry(parent_ih)) {
+		if (ntfs_icx_parent_vcn(icx) =3D=3D VCN_INDEX_ROOT_PARENT) {
+			ntfs_ir_leafify(icx, parent_ih);
+			goto out;
+		}
+
+		ret =3D ntfs_index_rm_leaf(icx);
+		goto out;
+	}
+
+	ret =3D ntfs_ih_reparent_end(icx, parent_ih, ib);
+out:
+	ntfs_free(ib);
+	return ret;
+}
+
+static int ntfs_index_rm_node(struct ntfs_index_context *icx)
+{
+	int entry_pos, pindex;
+	s64 vcn;
+	struct index_block *ib =3D NULL;
+	struct index_entry *ie_succ, *ie, *entry =3D icx->entry;
+	struct index_header *ih;
+	u32 new_size;
+	int delta, ret;
+
+	ntfs_debug("Entering\n");
+
+	if (!icx->ia_ni) {
+		icx->ia_ni =3D ntfs_ia_open(icx, icx->idx_ni);
+		if (!icx->ia_ni)
+			return -EINVAL;
+	}
+
+	ib =3D ntfs_malloc_nofs(icx->block_size);
+	if (!ib)
+		return -ENOMEM;
+
+	ie_succ =3D ntfs_ie_get_next(icx->entry);
+	entry_pos =3D icx->parent_pos[icx->pindex]++;
+	pindex =3D icx->pindex;
+descend:
+	vcn =3D ntfs_ie_get_vcn(ie_succ);
+	ret =3D ntfs_ib_read(icx, vcn, ib);
+	if (ret)
+		goto out;
+
+	ie_succ =3D ntfs_ie_get_first(&ib->index);
+
+	ret =3D ntfs_icx_parent_inc(icx);
+	if (ret)
+		goto out;
+
+	icx->parent_vcn[icx->pindex] =3D vcn;
+	icx->parent_pos[icx->pindex] =3D 0;
+
+	if ((ib->index.flags & NODE_MASK) =3D=3D INDEX_NODE)
+		goto descend;
+
+	if (ntfs_ih_zero_entry(&ib->index)) {
+		ret =3D -EIO;
+		ntfs_error(icx->idx_ni->vol->sb, "Empty index block");
+		goto out;
+	}
+
+	ie =3D ntfs_ie_dup(ie_succ);
+	if (!ie) {
+		ret =3D -ENOMEM;
+		goto out;
+	}
+
+	ret =3D ntfs_ie_add_vcn(&ie);
+	if (ret)
+		goto out2;
+
+	ntfs_ie_set_vcn(ie, ntfs_ie_get_vcn(icx->entry));
+
+	if (icx->is_in_root)
+		ih =3D &icx->ir->index;
+	else
+		ih =3D &icx->ib->index;
+
+	delta =3D le16_to_cpu(ie->length) - le16_to_cpu(icx->entry->length);
+	new_size =3D le32_to_cpu(ih->index_length) + delta;
+	if (delta > 0) {
+		if (icx->is_in_root) {
+			ret =3D ntfs_ir_make_space(icx, new_size);
+			if (ret !=3D 0)
+				goto out2;
+
+			ih =3D &icx->ir->index;
+			entry =3D ntfs_ie_get_by_pos(ih, entry_pos);
+
+		} else if (new_size > le32_to_cpu(ih->allocated_size)) {
+			icx->pindex =3D pindex;
+			ret =3D ntfs_ib_split(icx, icx->ib);
+			if (!ret)
+				ret =3D -EAGAIN;
+			goto out2;
+		}
+	}
+
+	ntfs_ie_delete(ih, entry);
+	ntfs_ie_insert(ih, ie, entry);
+
+	if (icx->is_in_root)
+		ret =3D ntfs_ir_truncate(icx, new_size);
+	else
+		ret =3D ntfs_icx_ib_write(icx);
+	if (ret)
+		goto out2;
+
+	ntfs_ie_delete(&ib->index, ie_succ);
+
+	if (ntfs_ih_zero_entry(&ib->index))
+		ret =3D ntfs_index_rm_leaf(icx);
+	else
+		ret =3D ntfs_ib_write(icx, ib);
+
+out2:
+	ntfs_free(ie);
+out:
+	ntfs_free(ib);
+	return ret;
+}
+
+/**
+ * ntfs_index_rm - remove entry from the index
+ * @icx:	index context describing entry to delete
+ *
+ * Delete entry described by @icx from the index. Index context is always
+ * reinitialized after use of this function, so it can be used for index
+ * lookup once again.
+ */
+int ntfs_index_rm(struct ntfs_index_context *icx)
+{
+	struct index_header *ih;
+	int ret =3D 0;
+
+	ntfs_debug("Entering\n");
+
+	if (!icx || (!icx->ib && !icx->ir) || ntfs_ie_end(icx->entry)) {
+		ret =3D -EINVAL;
+		goto err_out;
+	}
+	if (icx->is_in_root)
+		ih =3D &icx->ir->index;
+	else
+		ih =3D &icx->ib->index;
+
+	if (icx->entry->flags & INDEX_ENTRY_NODE) {
+		ret =3D ntfs_index_rm_node(icx);
+		if (ret)
+			goto err_out;
+	} else if (icx->is_in_root || !ntfs_ih_one_entry(ih)) {
+		ntfs_ie_delete(ih, icx->entry);
+
+		if (icx->is_in_root)
+			ret =3D ntfs_ir_truncate(icx, le32_to_cpu(ih->index_length));
+		else
+			ret =3D ntfs_icx_ib_write(icx);
+		if (ret)
+			goto err_out;
+	} else {
+		ret =3D ntfs_index_rm_leaf(icx);
+		if (ret)
+			goto err_out;
+	}
+
+	return 0;
+err_out:
+	return ret;
+}
+
+int ntfs_index_remove(struct ntfs_inode *dir_ni, const void *key, const in=
t keylen)
+{
+	int ret =3D 0;
+	struct ntfs_index_context *icx;
+
+	icx =3D ntfs_index_ctx_get(dir_ni, I30, 4);
+	if (!icx)
+		return -EINVAL;
+
+	while (1) {
+		ret =3D ntfs_index_lookup(key, keylen, icx);
+		if (ret)
+			goto err_out;
+
+		ret =3D ntfs_index_rm(icx);
+		if (ret && ret !=3D -EAGAIN)
+			goto err_out;
+		else if (!ret)
+			break;
+
+		mark_mft_record_dirty(icx->actx->ntfs_ino);
+		ntfs_index_ctx_reinit(icx);
+	}
+
+	mark_mft_record_dirty(icx->actx->ntfs_ino);
+
+	ntfs_index_ctx_put(icx);
+	return 0;
+err_out:
+	ntfs_index_ctx_put(icx);
+	ntfs_error(dir_ni->vol->sb, "Delete failed");
+	return ret;
+}
+
+/*
+ * ntfs_index_walk_down - walk down the index tree (leaf bound)
+ * until there are no subnode in the first index entry returns
+ * the entry at the bottom left in subnode
+ */
+struct index_entry *ntfs_index_walk_down(struct index_entry *ie, struct nt=
fs_index_context *ictx)
+{
+	struct index_entry *entry;
+	s64 vcn;
+
+	entry =3D ie;
+	do {
+		vcn =3D ntfs_ie_get_vcn(entry);
+		if (ictx->is_in_root) {
+			/* down from level zero */
+			ictx->ir =3D NULL;
+			ictx->ib =3D (struct index_block *)ntfs_malloc_nofs(ictx->block_size);
+			ictx->pindex =3D 1;
+			ictx->is_in_root =3D false;
+		} else {
+			/* down from non-zero level */
+			ictx->pindex++;
+		}
+
+		ictx->parent_pos[ictx->pindex] =3D 0;
+		ictx->parent_vcn[ictx->pindex] =3D vcn;
+		if (!ntfs_ib_read(ictx, vcn, ictx->ib)) {
+			ictx->entry =3D ntfs_ie_get_first(&ictx->ib->index);
+			entry =3D ictx->entry;
+		} else
+			entry =3D NULL;
+	} while (entry && (entry->flags & INDEX_ENTRY_NODE));
+
+	return entry;
+}
+
+/**
+ * ntfs_index_walk_up - walk up the index tree (root bound) until
+ * there is a valid data entry in parent returns the parent entry
+ * or NULL if no more parent.
+ */
+static struct index_entry *ntfs_index_walk_up(struct index_entry *ie,
+		struct ntfs_index_context *ictx)
+{
+	struct index_entry *entry;
+	s64 vcn;
+
+	entry =3D ie;
+	if (ictx->pindex > 0) {
+		do {
+			ictx->pindex--;
+			if (!ictx->pindex) {
+				/* we have reached the root */
+				kfree(ictx->ib);
+				ictx->ib =3D NULL;
+				ictx->is_in_root =3D true;
+				/* a new search context is to be allocated */
+				if (ictx->actx)
+					ntfs_attr_put_search_ctx(ictx->actx);
+				ictx->ir =3D ntfs_ir_lookup(ictx->idx_ni, ictx->name,
+						ictx->name_len, &ictx->actx);
+				if (ictx->ir)
+					entry =3D ntfs_ie_get_by_pos(&ictx->ir->index,
+							ictx->parent_pos[ictx->pindex]);
+				else
+					entry =3D NULL;
+			} else {
+					/* up into non-root node */
+				vcn =3D ictx->parent_vcn[ictx->pindex];
+				if (!ntfs_ib_read(ictx, vcn, ictx->ib)) {
+					entry =3D ntfs_ie_get_by_pos(&ictx->ib->index,
+							ictx->parent_pos[ictx->pindex]);
+				} else
+					entry =3D NULL;
+			}
+		ictx->entry =3D entry;
+		} while (entry && (ictx->pindex > 0) &&
+				(entry->flags & INDEX_ENTRY_END));
+	} else
+		entry =3D NULL;
+
+	return entry;
+}
+
+/**
+ * ntfs_index_next - get next entry in an index according to collating seq=
uence.
+ * Returns next entry or NULL if none.
+ *
+ * Sample layout :
+ *
+ *                 +---+---+---+---+---+---+---+---+    n ptrs to subnodes
+ *                 |   |   | 10| 25| 33|   |   |   |    n-1 keys in between
+ *                 +---+---+---+---+---+---+---+---+    no key in last ent=
ry
+ *                              | A | A
+ *                              | | | +-------------------------------+
+ *   +--------------------------+ | +-----+                           |
+ *   |                            +--+    |                           |
+ *   V                               |    V                           |
+ * +---+---+---+---+---+---+---+---+ |  +---+---+---+---+---+---+---+---+
+ * | 11| 12| 13| 14| 15| 16| 17|   | |  | 26| 27| 28| 29| 30| 31| 32|   |
+ * +---+---+---+---+---+---+---+---+ |  +---+---+---+---+---+---+---+---+
+ *                               |   |
+ *       +-----------------------+   |
+ *       |                           |
+ *     +---+---+---+---+---+---+---+---+
+ *     | 18| 19| 20| 21| 22| 23| 24|   |
+ *     +---+---+---+---+---+---+---+---+
+ */
+struct index_entry *ntfs_index_next(struct index_entry *ie, struct ntfs_in=
dex_context *ictx)
+{
+	struct index_entry *next;
+	__le16 flags;
+
+	/*
+	 * lookup() may have returned an invalid node
+	 * when searching for a partial key
+	 * if this happens, walk up
+	 */
+	if (ie->flags & INDEX_ENTRY_END)
+		next =3D ntfs_index_walk_up(ie, ictx);
+	else {
+		/*
+		 * get next entry in same node
+		 * there is always one after any entry with data
+		 */
+		next =3D (struct index_entry *)((char *)ie + le16_to_cpu(ie->length));
+		++ictx->parent_pos[ictx->pindex];
+		flags =3D next->flags;
+
+		/* walk down if it has a subnode */
+		if (flags & INDEX_ENTRY_NODE) {
+			if (!ictx->ia_ni)
+				ictx->ia_ni =3D ntfs_ia_open(ictx, ictx->idx_ni);
+
+			next =3D ntfs_index_walk_down(next, ictx);
+		} else {
+
+			/* walk up it has no subnode, nor data */
+			if (flags & INDEX_ENTRY_END)
+				next =3D ntfs_index_walk_up(next, ictx);
+		}
+	}
+
+	/* return NULL if stuck at end of a block */
+	if (next && (next->flags & INDEX_ENTRY_END))
+		next =3D NULL;
+
+	return next;
+}
--=20
2.25.1
From nobody Sun Feb  8 02:39:38 2026
Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com
 [209.85.214.172])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E553530F55F
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:00:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.172
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219647; cv=none;
 b=qLRXdKBw1FDhZaHfCGZmmxSHSRRkRdgQxAXRCqi6Z7xDTVyHkrf6bc9ttZrLFfryna8fJfPY0ehWAHEM3tKbCp1g8fyreWuXhX29gsQN/55aY1kBVjutoNdQf7mP86fpwv7nGtlyuvaUa1pSH6TaudYsr9Fjpg31b+Bfs8gKbc8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219647; c=relaxed/simple;
	bh=TQ1MZtfdPr3ez8lxDd/dx0tztxnlc6kTwMaZLHPnY74=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=qrSUpO1R2UjJRiU7JLU9E0CRlOLH14DER0IyF1StAtGI6uiu4XOIgqeQog3/mC+wCA/2ZCluC9v6QCssxgSiIV/PbelJtnE0hjWmbv8QxAN9ncYk/CIQ7rRNKJOuz/CpJVJr+rNQ+ap5vDiI9DAMeJz28YmWGaZ8kr6mjLqtP3Y=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.172
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f172.google.com with SMTP id
 d9443c01a7336-297ef378069so4777075ad.3
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:00:44 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219644; x=1764824444;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=BGJbg29Ek8FIjpP4eBocE0E3tlcBTfp/2pGJqFaPmF8=;
        b=Hq+Uv0+AN2LwoTuZkoW2nzroLPiKHodtNrmL32dBRODXAk9vpM5uPpfGjdl+mQelUp
         Kv12+Mr6w5Mo3+t53keiC2BEUCuC/0mWO4nsy0N6K5OkJgeKmwnx6mJXXJJVYaiH4pLM
         kCFbUgP+O/N1fOho2NE6394dRJ3NUf79RYR6hLu++CoH2gmUyOhiZevrMX0rfN/Myia4
         zqDFUPmzvQOrkkgLrURusjrva9owdIIwAGumwHafk4WON1vf8EZ1OgXPWbObyfG/bxq4
         YpRr2HoBEpBwinm1jMx6KB5Ex/QshEaN+Jm1sJWtbjnv18fid54ZZLLrJcSuAAkLcsr+
         NXUQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCXrAuXhBPH8g/XcEtQlKH+8FYfvtOasYXU1WkUxozufviAxAPzj5b4iUMxyMEQ5uGnNgM1QjIVHbMsMRmw=@vger.kernel.org
X-Gm-Message-State: AOJu0YzyvFpo+aM1ZaIi9ZaTcToWWVBvUdhogXqFzOOU6piv4t4arBm0
	FZuhye88x9N/N7kBpnHgTlbyoy6BZO5eukVH2MH756spSyn7NL1uBUf4
X-Gm-Gg: ASbGnct44VR7ob2YrU07dqgoeGG0idm45ehWBe2Tji6LcpnHYDgd41Of7EDAzgdrIWS
	ZBQLOhAJFZLyJuDUKCIMwfrrr+WRxKB8PauMQoiiVyKU4WmqI/B9B6SmjMslw14zijdhgQMeVUn
	d1Ijr8wkJKmqA4fQJQpELPciTbzAN9yyBu2EHZf9BG51yt6oP2V0vLLTJgKpl7kn6YcpNzyRn3n
	egS/70kyn1I2VCF+kz3Ab2JI1Putzrbw5q/ChXsZXNk7xUxCOP37XJejjN8yjoRbwb3gfC0e3rj
	NB0d0r/Kbu2N9koAyMTa98d0uY99HnqfutmtQvEK2SGQlnsfEtbPswe3H6YZkW7W1wMrKXKhNN7
	AXoNoAO5DHv5Nm0Vd502RFwTtWjFzIkiGYz09v2np+yxhqVDEaJFKqhYcKf9e0ZhkUlNgY78+y4
	mPr8gyL3IVodKYJX9HXSSVQFE3fA==
X-Google-Smtp-Source: 
 AGHT+IGl5r1uIkSd3eAo0Rmrhr0kizPMHK5dGsHrauLVbij6N5TZrpJnhI0PikdUOxsyAAkVWCiFzg==
X-Received: by 2002:a17:903:11c7:b0:295:5972:4363 with SMTP id
 d9443c01a7336-29baacae71emr114866685ad.0.1764219643774;
        Wed, 26 Nov 2025 21:00:43 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.00.37
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:00:42 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>,
	Hyunchul Lee <hyc.lee@gmail.com>
Subject: [PATCH v2 05/11] ntfsplus: add file operations
Date: Thu, 27 Nov 2025 13:59:38 +0900
Message-Id: <20251127045944.26009-6-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds the implementation of file operations for ntfsplus.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/ntfsplus/file.c | 1142 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1142 insertions(+)
 create mode 100644 fs/ntfsplus/file.c

diff --git a/fs/ntfsplus/file.c b/fs/ntfsplus/file.c
new file mode 100644
index 000000000000..aebc2b48b0d5
--- /dev/null
+++ b/fs/ntfsplus/file.c
@@ -0,0 +1,1142 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel file operations. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2015 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/writeback.h>
+#include <linux/blkdev.h>
+#include <linux/fs.h>
+#include <linux/iomap.h>
+#include <linux/uio.h>
+#include <linux/posix_acl.h>
+#include <linux/posix_acl_xattr.h>
+#include <linux/compat.h>
+#include <linux/falloc.h>
+#include <uapi/linux/ntfs.h>
+
+#include "lcnalloc.h"
+#include "ntfs.h"
+#include "aops.h"
+#include "reparse.h"
+#include "ea.h"
+#include "ntfs_iomap.h"
+#include "misc.h"
+#include "bitmap.h"
+
+/**
+ * ntfs_file_open - called when an inode is about to be opened
+ * @vi:		inode to be opened
+ * @filp:	file structure describing the inode
+ *
+ * Limit file size to the page cache limit on architectures where unsigned=
 long
+ * is 32-bits. This is the most we can do for now without overflowing the =
page
+ * cache page index. Doing it this way means we don't run into problems be=
cause
+ * of existing too large files. It would be better to allow the user to re=
ad
+ * the beginning of the file but I doubt very much anyone is going to hit =
this
+ * check on a 32-bit architecture, so there is no point in adding the extra
+ * complexity required to support this.
+ *
+ * On 64-bit architectures, the check is hopefully optimized away by the
+ * compiler.
+ *
+ * After the check passes, just call generic_file_open() to do its work.
+ */
+static int ntfs_file_open(struct inode *vi, struct file *filp)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+
+	if (NVolShutdown(ni->vol))
+		return -EIO;
+
+	if (sizeof(unsigned long) < 8) {
+		if (i_size_read(vi) > MAX_LFS_FILESIZE)
+			return -EOVERFLOW;
+	}
+
+	if (filp->f_flags & O_TRUNC && NInoNonResident(ni)) {
+		int err;
+
+		mutex_lock(&ni->mrec_lock);
+		down_read(&ni->runlist.lock);
+		if (!ni->runlist.rl) {
+			err =3D ntfs_attr_map_whole_runlist(ni);
+			if (err) {
+				up_read(&ni->runlist.lock);
+				mutex_unlock(&ni->mrec_lock);
+				return err;
+			}
+		}
+		ni->lcn_seek_trunc =3D ni->runlist.rl->lcn;
+		up_read(&ni->runlist.lock);
+		mutex_unlock(&ni->mrec_lock);
+	}
+
+	filp->f_mode |=3D FMODE_NOWAIT;
+
+	return generic_file_open(vi, filp);
+}
+
+static int ntfs_file_release(struct inode *vi, struct file *filp)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_volume *vol =3D ni->vol;
+	s64 aligned_data_size =3D round_up(ni->data_size, vol->cluster_size);
+
+	if (NInoCompressed(ni))
+		return 0;
+
+	inode_lock(vi);
+	mutex_lock(&ni->mrec_lock);
+	down_write(&ni->runlist.lock);
+	if (aligned_data_size < ni->allocated_size) {
+		int err;
+		s64 vcn_ds =3D aligned_data_size >> vol->cluster_size_bits;
+		s64 vcn_tr =3D -1;
+		struct runlist_element *rl =3D ni->runlist.rl;
+		ssize_t rc =3D ni->runlist.count - 2;
+
+		while (rc >=3D 0 && rl[rc].lcn =3D=3D LCN_HOLE && vcn_ds <=3D rl[rc].vcn=
) {
+			vcn_tr =3D rl[rc].vcn;
+			rc--;
+		}
+
+		if (vcn_tr >=3D 0) {
+			err =3D ntfs_rl_truncate_nolock(vol, &ni->runlist, vcn_tr);
+			if (err) {
+				ntfs_free(ni->runlist.rl);
+				ni->runlist.rl =3D NULL;
+				ntfs_error(vol->sb, "Preallocated block rollback failed");
+			} else {
+				ni->allocated_size =3D vcn_tr << vol->cluster_size_bits;
+				err =3D ntfs_attr_update_mapping_pairs(ni, 0);
+				if (err)
+					ntfs_error(vol->sb,
+						   "Failed to rollback mapping pairs for prealloc");
+			}
+		}
+	}
+	up_write(&ni->runlist.lock);
+	mutex_unlock(&ni->mrec_lock);
+	inode_unlock(vi);
+
+	return 0;
+}
+
+/**
+ * ntfs_file_fsync - sync a file to disk
+ * @filp:	file to be synced
+ * @start:	start offset to be synced
+ * @end:	end offset to be synced
+ * @datasync:	if non-zero only flush user data and not metadata
+ *
+ * Data integrity sync of a file to disk.  Used for fsync, fdatasync, and =
msync
+ * system calls.  This function is inspired by fs/buffer.c::file_fsync().
+ *
+ * If @datasync is false, write the mft record and all associated extent m=
ft
+ * records as well as the $DATA attribute and then sync the block device.
+ *
+ * If @datasync is true and the attribute is non-resident, we skip the wri=
ting
+ * of the mft record and all associated extent mft records (this might sti=
ll
+ * happen due to the write_inode_now() call).
+ *
+ * Also, if @datasync is true, we do not wait on the inode to be written o=
ut
+ * but we always wait on the page cache pages to be written out.
+ */
+static int ntfs_file_fsync(struct file *filp, loff_t start, loff_t end,
+			   int datasync)
+{
+	struct inode *vi =3D filp->f_mapping->host;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_volume *vol =3D ni->vol;
+	int err, ret =3D 0;
+	struct inode *parent_vi, *ia_vi;
+	struct ntfs_attr_search_ctx *ctx;
+
+	ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	err =3D file_write_and_wait_range(filp, start, end);
+	if (err)
+		return err;
+
+	if (!datasync || !NInoNonResident(NTFS_I(vi)))
+		ret =3D __ntfs_write_inode(vi, 1);
+	write_inode_now(vi, !datasync);
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx)
+		return -ENOMEM;
+
+	mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL_2);
+	while (!(err =3D ntfs_attr_lookup(AT_UNUSED, NULL, 0, 0, 0, NULL, 0, ctx)=
)) {
+		if (ctx->attr->type =3D=3D AT_FILE_NAME) {
+			struct file_name_attr *fn =3D (struct file_name_attr *)((u8 *)ctx->attr=
 +
+					le16_to_cpu(ctx->attr->data.resident.value_offset));
+
+			parent_vi =3D ntfs_iget(vi->i_sb, MREF_LE(fn->parent_directory));
+			if (IS_ERR(parent_vi))
+				continue;
+			mutex_lock_nested(&NTFS_I(parent_vi)->mrec_lock, NTFS_INODE_MUTEX_PAREN=
T_2);
+			ia_vi =3D ntfs_index_iget(parent_vi, I30, 4);
+			mutex_unlock(&NTFS_I(parent_vi)->mrec_lock);
+			if (IS_ERR(ia_vi)) {
+				iput(parent_vi);
+				continue;
+			}
+			write_inode_now(ia_vi, 1);
+			iput(ia_vi);
+			write_inode_now(parent_vi, 1);
+			iput(parent_vi);
+		} else if (ctx->attr->non_resident) {
+			struct inode *attr_vi;
+			__le16 *name;
+
+			name =3D (__le16 *)((u8 *)ctx->attr + le16_to_cpu(ctx->attr->name_offse=
t));
+			if (ctx->attr->type =3D=3D AT_DATA && ctx->attr->name_length =3D=3D 0)
+				continue;
+
+			attr_vi =3D ntfs_attr_iget(vi, ctx->attr->type,
+						 name, ctx->attr->name_length);
+			if (IS_ERR(attr_vi))
+				continue;
+			spin_lock(&attr_vi->i_lock);
+			if (attr_vi->i_state & I_DIRTY_PAGES) {
+				spin_unlock(&attr_vi->i_lock);
+				filemap_write_and_wait(attr_vi->i_mapping);
+			} else
+				spin_unlock(&attr_vi->i_lock);
+			iput(attr_vi);
+		}
+	}
+	mutex_unlock(&ni->mrec_lock);
+	ntfs_attr_put_search_ctx(ctx);
+
+	write_inode_now(vol->mftbmp_ino, 1);
+	down_write(&vol->lcnbmp_lock);
+	write_inode_now(vol->lcnbmp_ino, 1);
+	up_write(&vol->lcnbmp_lock);
+	write_inode_now(vol->mft_ino, 1);
+
+	/*
+	 * NOTE: If we were to use mapping->private_list (see ext2 and
+	 * fs/buffer.c) for dirty blocks then we could optimize the below to be
+	 * sync_mapping_buffers(vi->i_mapping).
+	 */
+	err =3D sync_blockdev(vi->i_sb->s_bdev);
+	if (unlikely(err && !ret))
+		ret =3D err;
+	if (likely(!ret))
+		ntfs_debug("Done.");
+	else
+		ntfs_warning(vi->i_sb,
+				"Failed to f%ssync inode 0x%lx.  Error %u.",
+				datasync ? "data" : "", vi->i_ino, -ret);
+	if (!ret)
+		blkdev_issue_flush(vi->i_sb->s_bdev);
+	return ret;
+}
+
+/**
+ * ntfsp_setattr - called from notify_change() when an attribute is being =
changed
+ * @idmap:	idmap of the mount the inode was found from
+ * @dentry:	dentry whose attributes to change
+ * @attr:	structure describing the attributes and the changes
+ *
+ * We have to trap VFS attempts to truncate the file described by @dentry =
as
+ * soon as possible, because we do not implement changes in i_size yet.  S=
o we
+ * abort all i_size changes here.
+ *
+ * We also abort all changes of user, group, and mode as we do not impleme=
nt
+ * the NTFS ACLs yet.
+ */
+int ntfsp_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
+		 struct iattr *attr)
+{
+	struct inode *vi =3D d_inode(dentry);
+	int err;
+	unsigned int ia_valid =3D attr->ia_valid;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_volume *vol =3D ni->vol;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	err =3D setattr_prepare(idmap, dentry, attr);
+	if (err)
+		goto out;
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	if (ia_valid & ATTR_SIZE) {
+		if (NInoCompressed(ni) || NInoEncrypted(ni)) {
+			ntfs_warning(vi->i_sb,
+				     "Changes in inode size are not supported yet for %s files, ignori=
ng.",
+				     NInoCompressed(ni) ? "compressed" : "encrypted");
+			err =3D -EOPNOTSUPP;
+		} else {
+			loff_t old_size =3D vi->i_size;
+
+			err =3D inode_newsize_ok(vi, attr->ia_size);
+			if (err)
+				goto out;
+
+			inode_dio_wait(vi);
+			/* Serialize against page faults */
+			if (NInoNonResident(NTFS_I(vi)) &&
+			    attr->ia_size < old_size) {
+				err =3D iomap_truncate_page(vi, attr->ia_size, NULL,
+							  &ntfs_read_iomap_ops,
+							  &ntfs_iomap_folio_ops, NULL);
+				if (err)
+					goto out;
+			}
+
+			truncate_setsize(vi, attr->ia_size);
+			err =3D ntfs_truncate_vfs(vi, attr->ia_size, old_size);
+			if (err) {
+				i_size_write(vi, old_size);
+				goto out;
+			}
+
+			if (NInoNonResident(ni) && attr->ia_size > old_size &&
+				old_size % PAGE_SIZE !=3D 0) {
+				loff_t len =3D min_t(loff_t,
+							round_up(old_size, PAGE_SIZE) - old_size,
+							attr->ia_size - old_size);
+				err =3D iomap_zero_range(vi, old_size, len,
+						       NULL, &ntfs_read_iomap_ops,
+						       &ntfs_iomap_folio_ops, NULL);
+			}
+		}
+		if (ia_valid =3D=3D ATTR_SIZE)
+			goto out;
+		ia_valid |=3D ATTR_MTIME | ATTR_CTIME;
+	}
+
+	setattr_copy(idmap, vi, attr);
+
+	if (vol->sb->s_flags & SB_POSIXACL && !S_ISLNK(vi->i_mode)) {
+		err =3D posix_acl_chmod(idmap, dentry, vi->i_mode);
+		if (err)
+			goto out;
+	}
+
+	if (0222 & vi->i_mode)
+		ni->flags &=3D ~FILE_ATTR_READONLY;
+	else
+		ni->flags |=3D FILE_ATTR_READONLY;
+
+	if (ia_valid & (ATTR_UID | ATTR_GID | ATTR_MODE)) {
+		unsigned int flags =3D 0;
+
+		if (ia_valid & ATTR_UID)
+			flags |=3D NTFS_EA_UID;
+		if (ia_valid & ATTR_GID)
+			flags |=3D NTFS_EA_GID;
+		if (ia_valid & ATTR_MODE)
+			flags |=3D NTFS_EA_MODE;
+
+		if (S_ISDIR(vi->i_mode))
+			vi->i_mode &=3D ~vol->dmask;
+		else
+			vi->i_mode &=3D ~vol->fmask;
+
+		mutex_lock(&ni->mrec_lock);
+		ntfs_ea_set_wsl_inode(vi, 0, NULL, flags);
+		mutex_unlock(&ni->mrec_lock);
+	}
+
+	mark_inode_dirty(vi);
+out:
+	return err;
+}
+
+int ntfsp_getattr(struct mnt_idmap *idmap, const struct path *path,
+		struct kstat *stat, unsigned int request_mask,
+		unsigned int query_flags)
+{
+	struct inode *inode =3D d_backing_inode(path->dentry);
+
+	generic_fillattr(idmap, request_mask, inode, stat);
+
+	stat->blksize =3D NTFS_SB(inode->i_sb)->cluster_size;
+	stat->blocks =3D (((u64)NTFS_I(inode)->i_dealloc_clusters <<
+			NTFS_SB(inode->i_sb)->cluster_size_bits) >> 9) + inode->i_blocks;
+	stat->result_mask |=3D STATX_BTIME;
+	stat->btime =3D NTFS_I(inode)->i_crtime;
+
+	return 0;
+}
+
+static loff_t ntfs_file_llseek(struct file *file, loff_t offset, int whenc=
e)
+{
+	struct inode *vi =3D file->f_mapping->host;
+
+	if (whence =3D=3D SEEK_DATA || whence =3D=3D SEEK_HOLE) {
+		struct ntfs_inode *ni =3D NTFS_I(vi);
+		struct ntfs_volume *vol =3D ni->vol;
+		struct runlist_element *rl;
+		s64 vcn;
+		unsigned int vcn_off;
+		loff_t end_off;
+		unsigned long flags;
+		int i;
+
+		inode_lock_shared(vi);
+
+		if (NInoCompressed(ni) || NInoEncrypted(ni))
+			goto error;
+
+		read_lock_irqsave(&ni->size_lock, flags);
+		end_off =3D ni->data_size;
+		read_unlock_irqrestore(&ni->size_lock, flags);
+
+		if (offset < 0 || offset >=3D end_off)
+			goto error;
+
+		if (!NInoNonResident(ni)) {
+			if (whence =3D=3D SEEK_HOLE)
+				offset =3D end_off;
+			goto found_no_runlist_lock;
+		}
+
+		vcn =3D offset >> vol->cluster_size_bits;
+		vcn_off =3D offset & vol->cluster_size_mask;
+
+		down_read(&ni->runlist.lock);
+		rl =3D ni->runlist.rl;
+		i =3D 0;
+
+#ifdef DEBUG
+		ntfs_debug("init:");
+		ntfs_debug_dump_runlist(rl);
+#endif
+		while (1) {
+			if (!rl || !NInoFullyMapped(ni) || rl[i].lcn =3D=3D LCN_RL_NOT_MAPPED) {
+				int ret;
+
+				up_read(&ni->runlist.lock);
+				ret =3D ntfs_map_runlist(ni, rl ? rl[i].vcn : 0);
+				if (ret)
+					goto error;
+				down_read(&ni->runlist.lock);
+				rl =3D ni->runlist.rl;
+#ifdef DEBUG
+				ntfs_debug("mapped:");
+				ntfs_debug_dump_runlist(ni->runlist.rl);
+#endif
+				continue;
+			} else if (rl[i].lcn =3D=3D LCN_ENOENT) {
+				if (whence =3D=3D SEEK_DATA) {
+					up_read(&ni->runlist.lock);
+					goto error;
+				} else {
+					offset =3D end_off;
+					goto found;
+				}
+			} else if (rl[i + 1].vcn > vcn) {
+				if ((whence =3D=3D SEEK_DATA && (rl[i].lcn >=3D 0 ||
+						rl[i].lcn =3D=3D LCN_DELALLOC)) ||
+				   (whence =3D=3D SEEK_HOLE && rl[i].lcn =3D=3D LCN_HOLE)) {
+					offset =3D (vcn << vol->cluster_size_bits) + vcn_off;
+					if (offset < ni->data_size)
+						goto found;
+				}
+				vcn =3D rl[i + 1].vcn;
+				vcn_off =3D 0;
+			}
+			i++;
+		}
+		up_read(&ni->runlist.lock);
+		inode_unlock_shared(vi);
+		return -EIO;
+found:
+		up_read(&ni->runlist.lock);
+found_no_runlist_lock:
+		inode_unlock_shared(vi);
+		return vfs_setpos(file, offset, vi->i_sb->s_maxbytes);
+error:
+		inode_unlock_shared(vi);
+		return -ENXIO;
+	} else {
+		return generic_file_llseek_size(file, offset, whence,
+						vi->i_sb->s_maxbytes,
+						i_size_read(vi));
+	}
+}
+
+static ssize_t ntfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	struct inode *vi =3D file_inode(iocb->ki_filp);
+	struct super_block *sb =3D vi->i_sb;
+	ssize_t ret;
+
+	if (NVolShutdown(NTFS_SB(sb)))
+		return -EIO;
+
+	if (NInoCompressed(NTFS_I(vi)) && iocb->ki_flags & IOCB_DIRECT)
+		return -EOPNOTSUPP;
+
+	inode_lock_shared(vi);
+
+	if (iocb->ki_flags & IOCB_DIRECT) {
+		size_t count =3D iov_iter_count(to);
+
+		if ((iocb->ki_pos | count) & (sb->s_blocksize - 1)) {
+			ret =3D -EINVAL;
+			goto inode_unlock;
+		}
+
+		file_accessed(iocb->ki_filp);
+		ret =3D iomap_dio_rw(iocb, to, &ntfs_read_iomap_ops, NULL, IOMAP_DIO_PAR=
TIAL,
+				NULL, 0);
+	} else {
+		ret =3D generic_file_read_iter(iocb, to);
+	}
+
+inode_unlock:
+	inode_unlock_shared(vi);
+
+	return ret;
+}
+
+static int ntfs_file_write_dio_end_io(struct kiocb *iocb, ssize_t size,
+		int error, unsigned int flags)
+{
+	struct inode *inode =3D file_inode(iocb->ki_filp);
+
+	if (error)
+		return error;
+
+	if (size) {
+		if (i_size_read(inode) < iocb->ki_pos + size) {
+			i_size_write(inode, iocb->ki_pos + size);
+			mark_inode_dirty(inode);
+		}
+	}
+
+	return 0;
+}
+
+static const struct iomap_dio_ops ntfs_write_dio_ops =3D {
+	.end_io			=3D ntfs_file_write_dio_end_io,
+};
+
+static ssize_t ntfs_file_write_iter(struct kiocb *iocb, struct iov_iter *f=
rom)
+{
+	struct file *file =3D iocb->ki_filp;
+	struct inode *vi =3D file->f_mapping->host;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_volume *vol =3D ni->vol;
+	ssize_t ret;
+	ssize_t count;
+	loff_t pos;
+	int err;
+	loff_t old_data_size, old_init_size;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	if (NInoEncrypted(ni)) {
+		ntfs_error(vi->i_sb, "Writing for %s files is not supported yet",
+			   NInoCompressed(ni) ? "Compressed" : "Encrypted");
+		return -EOPNOTSUPP;
+	}
+
+	if (NInoCompressed(ni) && iocb->ki_flags & IOCB_DIRECT)
+		return -EOPNOTSUPP;
+
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!inode_trylock(vi))
+			return -EAGAIN;
+	} else
+		inode_lock(vi);
+
+	ret =3D generic_write_checks(iocb, from);
+	if (ret <=3D 0)
+		goto out_lock;
+
+	if (NInoNonResident(ni) && (iocb->ki_flags & IOCB_DIRECT) &&
+	    ((iocb->ki_pos | ret) & (vi->i_sb->s_blocksize - 1))) {
+		ret =3D -EINVAL;
+		goto out_lock;
+	}
+
+	err =3D file_modified(iocb->ki_filp);
+	if (err) {
+		ret =3D err;
+		goto out_lock;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	pos =3D iocb->ki_pos;
+	count =3D ret;
+
+	old_data_size =3D ni->data_size;
+	old_init_size =3D ni->initialized_size;
+	if (iocb->ki_pos + ret > old_data_size) {
+		mutex_lock(&ni->mrec_lock);
+		if (!NInoCompressed(ni) && iocb->ki_pos + ret > ni->allocated_size &&
+		    iocb->ki_pos + ret < ni->allocated_size + vol->preallocated_size)
+			ret =3D ntfs_attr_expand(ni, iocb->ki_pos + ret,
+					ni->allocated_size + vol->preallocated_size);
+		else if (NInoCompressed(ni) && iocb->ki_pos + ret > ni->allocated_size)
+			ret =3D ntfs_attr_expand(ni, iocb->ki_pos + ret,
+				round_up(iocb->ki_pos + ret, ni->itype.compressed.block_size));
+		else
+			ret =3D ntfs_attr_expand(ni, iocb->ki_pos + ret, 0);
+		mutex_unlock(&ni->mrec_lock);
+		if (ret < 0)
+			goto out;
+	}
+
+	if (NInoNonResident(ni) && iocb->ki_pos + count > old_init_size) {
+		ret =3D ntfs_extend_initialized_size(vi, iocb->ki_pos,
+				iocb->ki_pos + count);
+		if (ret < 0)
+			goto out;
+	}
+
+	if (NInoNonResident(ni) && NInoCompressed(ni)) {
+		ret =3D ntfs_compress_write(ni, pos, count, from);
+		if (ret > 0)
+			iocb->ki_pos +=3D ret;
+		goto out;
+	}
+
+	if (NInoNonResident(ni) && iocb->ki_flags & IOCB_DIRECT) {
+		ret =3D iomap_dio_rw(iocb, from, &ntfs_dio_iomap_ops,
+				   &ntfs_write_dio_ops, 0, NULL, 0);
+		if (ret =3D=3D -ENOTBLK)
+			ret =3D 0;
+		else if (ret < 0)
+			goto out;
+
+		if (iov_iter_count(from)) {
+			loff_t offset, end;
+			ssize_t written;
+			int ret2;
+
+			offset =3D iocb->ki_pos;
+			iocb->ki_flags &=3D ~IOCB_DIRECT;
+			written =3D iomap_file_buffered_write(iocb, from,
+					&ntfs_write_iomap_ops, &ntfs_iomap_folio_ops,
+					NULL);
+			if (written < 0) {
+				err =3D written;
+				goto out;
+			}
+
+			ret +=3D written;
+			end =3D iocb->ki_pos + written - 1;
+			ret2 =3D filemap_write_and_wait_range(iocb->ki_filp->f_mapping,
+							    offset, end);
+			if (ret2)
+				goto out_err;
+			if (!ret2)
+				invalidate_mapping_pages(iocb->ki_filp->f_mapping,
+							 offset >> PAGE_SHIFT,
+							 end >> PAGE_SHIFT);
+		}
+	} else {
+		ret =3D iomap_file_buffered_write(iocb, from, &ntfs_write_iomap_ops,
+				&ntfs_iomap_folio_ops, NULL);
+	}
+out:
+	if (ret < 0 && ret !=3D -EIOCBQUEUED) {
+out_err:
+		if (ni->initialized_size !=3D old_init_size) {
+			mutex_lock(&ni->mrec_lock);
+			ntfs_attr_set_initialized_size(ni, old_init_size);
+			mutex_unlock(&ni->mrec_lock);
+		}
+		if (ni->data_size !=3D old_data_size) {
+			truncate_setsize(vi, old_data_size);
+			ntfs_attr_truncate(ni, old_data_size);
+		}
+	}
+out_lock:
+	inode_unlock(vi);
+	if (ret > 0)
+		ret =3D generic_write_sync(iocb, ret);
+	return ret;
+}
+
+static vm_fault_t ntfs_filemap_page_mkwrite(struct vm_fault *vmf)
+{
+	struct inode *inode =3D file_inode(vmf->vma->vm_file);
+	vm_fault_t ret;
+
+	if (unlikely(IS_IMMUTABLE(inode)))
+		return VM_FAULT_SIGBUS;
+
+	sb_start_pagefault(inode->i_sb);
+	file_update_time(vmf->vma->vm_file);
+
+	ret =3D iomap_page_mkwrite(vmf, &ntfs_page_mkwrite_iomap_ops, NULL);
+	sb_end_pagefault(inode->i_sb);
+	return ret;
+}
+
+static const struct vm_operations_struct ntfs_file_vm_ops =3D {
+	.fault		=3D filemap_fault,
+	.map_pages	=3D filemap_map_pages,
+	.page_mkwrite	=3D ntfs_filemap_page_mkwrite,
+};
+
+static int ntfs_file_mmap_prepare(struct vm_area_desc *desc)
+{
+	struct file *file =3D desc->file;
+	struct inode *inode =3D file_inode(file);
+
+	if (NVolShutdown(NTFS_SB(file->f_mapping->host->i_sb)))
+		return -EIO;
+
+	if (NInoCompressed(NTFS_I(inode)))
+		return -EOPNOTSUPP;
+
+	if (desc->vm_flags & VM_WRITE) {
+		struct inode *inode =3D file_inode(file);
+		loff_t from, to;
+		int err;
+
+		from =3D ((loff_t)desc->pgoff << PAGE_SHIFT);
+		to =3D min_t(loff_t, i_size_read(inode),
+			   from + desc->end - desc->start);
+
+		if (NTFS_I(inode)->initialized_size < to) {
+			err =3D ntfs_extend_initialized_size(inode, to, to);
+			if (err)
+				return err;
+		}
+	}
+
+
+	file_accessed(file);
+	desc->vm_ops =3D &ntfs_file_vm_ops;
+	return 0;
+}
+
+static int ntfs_fiemap(struct inode *inode, struct fiemap_extent_info *fie=
info,
+		u64 start, u64 len)
+{
+	return iomap_fiemap(inode, fieinfo, start, len, &ntfs_read_iomap_ops);
+}
+
+static const char *ntfs_get_link(struct dentry *dentry, struct inode *inod=
e,
+		struct delayed_call *done)
+{
+	if (!NTFS_I(inode)->target)
+		return ERR_PTR(-EINVAL);
+
+	return NTFS_I(inode)->target;
+}
+
+static ssize_t ntfs_file_splice_read(struct file *in, loff_t *ppos,
+		struct pipe_inode_info *pipe, size_t len, unsigned int flags)
+{
+	if (NVolShutdown(NTFS_SB(in->f_mapping->host->i_sb)))
+		return -EIO;
+
+	return filemap_splice_read(in, ppos, pipe, len, flags);
+}
+
+static int ntfs_ioctl_shutdown(struct super_block *sb, unsigned long arg)
+{
+	u32 flags;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (get_user(flags, (__u32 __user *)arg))
+		return -EFAULT;
+
+	return ntfs_force_shutdown(sb, flags);
+}
+
+static int ntfs_ioctl_get_volume_label(struct file *filp, unsigned long ar=
g)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(file_inode(filp)->i_sb);
+	char __user *buf =3D (char __user *)arg;
+
+	if (!vol->volume_label) {
+		if (copy_to_user(buf, "", 1))
+			return -EFAULT;
+	} else if (copy_to_user(buf, vol->volume_label,
+				MIN(FSLABEL_MAX, strlen(vol->volume_label) + 1)))
+		return -EFAULT;
+	return 0;
+}
+
+static int ntfs_ioctl_set_volume_label(struct file *filp, unsigned long ar=
g)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(file_inode(filp)->i_sb);
+	char *label;
+	int ret;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	label =3D strndup_user((const char __user *)arg, FSLABEL_MAX);
+	if (IS_ERR(label))
+		return PTR_ERR(label);
+
+	ret =3D mnt_want_write_file(filp);
+	if (ret)
+		goto out;
+
+	ret =3D ntfs_write_volume_label(vol, label);
+	mnt_drop_write_file(filp);
+out:
+	kfree(label);
+	return ret;
+}
+
+static int ntfs_ioctl_fitrim(struct ntfs_volume *vol, unsigned long arg)
+{
+	struct fstrim_range __user *user_range;
+	struct fstrim_range range;
+	struct block_device *dev;
+	int err;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	dev =3D vol->sb->s_bdev;
+	if (!bdev_max_discard_sectors(dev))
+		return -EOPNOTSUPP;
+
+	user_range =3D (struct fstrim_range __user *)arg;
+	if (copy_from_user(&range, user_range, sizeof(range)))
+		return -EFAULT;
+
+	if (range.len =3D=3D 0)
+		return -EINVAL;
+
+	if (range.len < vol->cluster_size)
+		return -EINVAL;
+
+	range.minlen =3D max_t(u32, range.minlen, bdev_discard_granularity(dev));
+
+	err =3D ntfsp_trim_fs(vol, &range);
+	if (err < 0)
+		return err;
+
+	if (copy_to_user(user_range, &range, sizeof(range)))
+		return -EFAULT;
+
+	return 0;
+}
+
+long ntfsp_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	switch (cmd) {
+	case NTFS_IOC_SHUTDOWN:
+		return ntfs_ioctl_shutdown(file_inode(filp)->i_sb, arg);
+	case FS_IOC_GETFSLABEL:
+		return ntfs_ioctl_get_volume_label(filp, arg);
+	case FS_IOC_SETFSLABEL:
+		return ntfs_ioctl_set_volume_label(filp, arg);
+	case FITRIM:
+		return ntfs_ioctl_fitrim(NTFS_SB(file_inode(filp)->i_sb), arg);
+	default:
+		return -ENOTTY;
+	}
+}
+
+#ifdef CONFIG_COMPAT
+long ntfsp_compat_ioctl(struct file *filp, unsigned int cmd,
+		unsigned long arg)
+{
+	return ntfsp_ioctl(filp, cmd, (unsigned long)compat_ptr(arg));
+}
+#endif
+
+static long ntfs_fallocate(struct file *file, int mode, loff_t offset, lof=
f_t len)
+{
+	struct inode *vi =3D file_inode(file);
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_volume *vol =3D ni->vol;
+	int err =3D 0;
+	loff_t end_offset =3D offset + len;
+	loff_t old_size, new_size;
+	s64 start_vcn, end_vcn;
+	bool map_locked =3D false;
+
+	if (!S_ISREG(vi->i_mode))
+		return -EOPNOTSUPP;
+
+	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_INSERT_RANGE |
+		     FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE))
+		return -EOPNOTSUPP;
+
+	if (!NVolFreeClusterKnown(vol))
+		wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+
+	if ((ni->vol->mft_zone_end - ni->vol->mft_zone_start) =3D=3D 0)
+		return -ENOSPC;
+
+	if (NInoNonResident(ni) && !NInoFullyMapped(ni)) {
+		down_write(&ni->runlist.lock);
+		err =3D ntfs_attr_map_whole_runlist(ni);
+		up_write(&ni->runlist.lock);
+		if (err)
+			return err;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY)) {
+		err =3D ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+		if (err)
+			return err;
+	}
+
+	old_size =3D i_size_read(vi);
+	new_size =3D max_t(loff_t, old_size, end_offset);
+	start_vcn =3D offset >> vol->cluster_size_bits;
+	end_vcn =3D ((end_offset - 1) >> vol->cluster_size_bits) + 1;
+
+	inode_lock(vi);
+	if (NInoCompressed(ni) || NInoEncrypted(ni)) {
+		err =3D -EOPNOTSUPP;
+		goto out;
+	}
+
+	inode_dio_wait(vi);
+	if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE |
+		    FALLOC_FL_INSERT_RANGE)) {
+		filemap_invalidate_lock(vi->i_mapping);
+		map_locked =3D true;
+	}
+
+	if (mode & FALLOC_FL_INSERT_RANGE) {
+		loff_t offset_down =3D round_down(offset,
+						max_t(unsigned long, vol->cluster_size, PAGE_SIZE));
+		loff_t alloc_size;
+
+		if (NVolDisableSparse(vol)) {
+			err =3D -EOPNOTSUPP;
+			goto out;
+		}
+
+		if ((offset & vol->cluster_size_mask) ||
+		    (len & vol->cluster_size_mask) ||
+		    offset >=3D ni->allocated_size) {
+			err =3D -EINVAL;
+			goto out;
+		}
+
+		new_size =3D old_size +
+			((end_vcn - start_vcn) << vol->cluster_size_bits);
+		alloc_size =3D ni->allocated_size +
+			((end_vcn - start_vcn) << vol->cluster_size_bits);
+		if (alloc_size < 0) {
+			err =3D -EFBIG;
+			goto out;
+		}
+		err =3D inode_newsize_ok(vi, alloc_size);
+		if (err)
+			goto out;
+
+		err =3D filemap_write_and_wait_range(vi->i_mapping,
+						   offset_down, LLONG_MAX);
+		if (err)
+			goto out;
+
+		truncate_pagecache(vi, offset_down);
+
+		mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+		err =3D ntfs_non_resident_attr_insert_range(ni, start_vcn,
+							  end_vcn - start_vcn);
+		mutex_unlock(&ni->mrec_lock);
+		if (err)
+			goto out;
+	} else if (mode & FALLOC_FL_COLLAPSE_RANGE) {
+		loff_t offset_down =3D round_down(offset,
+						max_t(unsigned long, vol->cluster_size, PAGE_SIZE));
+
+		if ((offset & vol->cluster_size_mask) ||
+		    (len & vol->cluster_size_mask) ||
+		    offset >=3D ni->allocated_size) {
+			err =3D -EINVAL;
+			goto out;
+		}
+
+		if ((end_vcn << vol->cluster_size_bits) > ni->allocated_size)
+			end_vcn =3D DIV_ROUND_UP(ni->allocated_size - 1,
+					       vol->cluster_size) + 1;
+		new_size =3D old_size -
+			((end_vcn - start_vcn) << vol->cluster_size_bits);
+		if (new_size < 0)
+			new_size =3D 0;
+		err =3D filemap_write_and_wait_range(vi->i_mapping,
+						   offset_down, LLONG_MAX);
+		if (err)
+			goto out;
+
+		truncate_pagecache(vi, offset_down);
+
+		mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+		err =3D ntfs_non_resident_attr_collapse_range(ni, start_vcn,
+							    end_vcn - start_vcn);
+		mutex_unlock(&ni->mrec_lock);
+		if (err)
+			goto out;
+	} else if (mode & FALLOC_FL_PUNCH_HOLE) {
+		loff_t offset_down =3D round_down(offset, max_t(unsigned int,
+							      vol->cluster_size, PAGE_SIZE));
+
+		if (NVolDisableSparse(vol)) {
+			err =3D -EOPNOTSUPP;
+			goto out;
+		}
+
+		if (!(mode & FALLOC_FL_KEEP_SIZE)) {
+			err =3D -EINVAL;
+			goto out;
+		}
+
+		if (offset >=3D ni->data_size)
+			goto out;
+
+		if (offset + len > ni->data_size) {
+			end_offset =3D ni->data_size;
+			end_vcn =3D ((end_offset - 1) >> vol->cluster_size_bits) + 1;
+		}
+
+		err =3D filemap_write_and_wait_range(vi->i_mapping, offset_down, LLONG_M=
AX);
+		if (err)
+			goto out;
+		truncate_pagecache(vi, offset_down);
+
+		if (offset & vol->cluster_size_mask) {
+			loff_t to;
+
+			to =3D min_t(loff_t, (start_vcn + 1) << vol->cluster_size_bits,
+				   end_offset);
+			err =3D iomap_zero_range(vi, offset, to - offset, NULL,
+					       &ntfs_read_iomap_ops,
+					       &ntfs_iomap_folio_ops, NULL);
+			if (err < 0 || (end_vcn - start_vcn) =3D=3D 1)
+				goto out;
+			start_vcn++;
+		}
+		if (end_offset & vol->cluster_size_mask) {
+			loff_t from;
+
+			from =3D (end_vcn - 1) << vol->cluster_size_bits;
+			err =3D iomap_zero_range(vi, from, end_offset - from, NULL,
+					       &ntfs_read_iomap_ops,
+					       &ntfs_iomap_folio_ops, NULL);
+			if (err < 0 || (end_vcn - start_vcn) =3D=3D 1)
+				goto out;
+			end_vcn--;
+		}
+
+		mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+		err =3D ntfs_non_resident_attr_punch_hole(ni, start_vcn,
+							end_vcn - start_vcn);
+		mutex_unlock(&ni->mrec_lock);
+		if (err)
+			goto out;
+	} else if (mode =3D=3D 0 || mode =3D=3D FALLOC_FL_KEEP_SIZE) {
+		s64 need_space;
+
+		err =3D inode_newsize_ok(vi, new_size);
+		if (err)
+			goto out;
+
+		need_space =3D ni->allocated_size << vol->cluster_size_bits;
+		if (need_space > start_vcn)
+			need_space =3D end_vcn - need_space;
+		else
+			need_space =3D end_vcn - start_vcn;
+		if (need_space > 0 &&
+		    need_space > (atomic64_read(&vol->free_clusters) -
+			    atomic64_read(&vol->dirty_clusters))) {
+			err =3D -ENOSPC;
+			goto out;
+		}
+
+		err =3D ntfs_attr_fallocate(ni, offset, len,
+					  mode & FALLOC_FL_KEEP_SIZE ? true : false);
+		if (err)
+			goto out;
+	}
+
+	/* inode->i_blocks is already updated in ntfs_attr_update_mapping_pairs */
+	if (!(mode & FALLOC_FL_KEEP_SIZE) && new_size !=3D old_size)
+		i_size_write(vi, ni->data_size);
+
+out:
+	if (map_locked)
+		filemap_invalidate_unlock(vi->i_mapping);
+	if (!err) {
+		if (mode =3D=3D 0 && NInoNonResident(ni) &&
+		offset > old_size && old_size % PAGE_SIZE !=3D 0) {
+			loff_t len =3D min_t(loff_t,
+					   round_up(old_size, PAGE_SIZE) - old_size,
+					   offset - old_size);
+			err =3D iomap_zero_range(vi, old_size, len, NULL,
+					       &ntfs_read_iomap_ops,
+					       &ntfs_iomap_folio_ops, NULL);
+		}
+		NInoSetFileNameDirty(ni);
+		inode_set_mtime_to_ts(vi, inode_set_ctime_current(vi));
+		mark_inode_dirty(vi);
+	}
+
+	inode_unlock(vi);
+	return err;
+}
+
+const struct file_operations ntfs_file_ops =3D {
+	.llseek		=3D ntfs_file_llseek,
+	.read_iter	=3D ntfs_file_read_iter,
+	.write_iter	=3D ntfs_file_write_iter,
+	.fsync		=3D ntfs_file_fsync,
+	.mmap_prepare	=3D ntfs_file_mmap_prepare,
+	.open		=3D ntfs_file_open,
+	.release	=3D ntfs_file_release,
+	.splice_read	=3D ntfs_file_splice_read,
+	.splice_write	=3D iter_file_splice_write,
+	.unlocked_ioctl	=3D ntfsp_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	=3D ntfsp_compat_ioctl,
+#endif
+	.fallocate	=3D ntfs_fallocate,
+};
+
+const struct inode_operations ntfs_file_inode_ops =3D {
+	.setattr	=3D ntfsp_setattr,
+	.getattr	=3D ntfsp_getattr,
+	.listxattr	=3D ntfsp_listxattr,
+	.get_acl	=3D ntfsp_get_acl,
+	.set_acl	=3D ntfsp_set_acl,
+	.fiemap		=3D ntfs_fiemap,
+};
+
+const struct inode_operations ntfs_symlink_inode_operations =3D {
+	.get_link	=3D ntfs_get_link,
+	.setattr	=3D ntfsp_setattr,
+	.listxattr	=3D ntfsp_listxattr,
+};
+
+const struct inode_operations ntfsp_special_inode_operations =3D {
+	.setattr	=3D ntfsp_setattr,
+	.getattr	=3D ntfsp_getattr,
+	.listxattr	=3D ntfsp_listxattr,
+	.get_acl	=3D ntfsp_get_acl,
+	.set_acl	=3D ntfsp_set_acl,
+};
+
+const struct file_operations ntfs_empty_file_ops =3D {};
+
+const struct inode_operations ntfs_empty_inode_ops =3D {};
--=20
2.25.1
From nobody Sun Feb  8 02:39:38 2026
Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com
 [209.85.214.173])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B057A283121
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:00:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.173
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219656; cv=none;
 b=Q5kbSQc3YGSl2w/1FvkkQmShNc7uKgtoSzX6OtrdHTCVVM0XjJ9JEAGlm8zxI8l5toZ33GZPK/uDxl1d//ii4Jc8l3NjCu1JrFOpgVs6pojQz5PCim6a6TKECIMx1w7/Yif92VE8LKL3yLCoCVY/I6/JVh1QHFsg0EP8paP3XkM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219656; c=relaxed/simple;
	bh=WvgqJ+r/HjTcRKYha7Et11rs7/EJBxvAEu52CuOp3IM=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=bvqTsqt+BED9Lb55jLtvXz8O8Wc1zfxZtwLqxdZisyGb5CAUhGgJ4MsmTPBTDKCvXo+jqCulRkE1q0c+KC/uvc8W7SudWhwZQSe4SRrIgjdzKJN1A74mLhO8+KG8MLSbo3nQ1THVsMY4bQ7K17rfo3DIUPeaIqI9iKlI1zfL/aU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.173
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f173.google.com with SMTP id
 d9443c01a7336-298287a26c3so5576935ad.0
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:00:52 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219652; x=1764824452;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=SbvyElh7wwEzZnGerfTKPBi+NGwrOd4j/Rr31hSN/gQ=;
        b=OPLKMiOB/ujNl3C/i6pPhsdWhKX6ANBqImbyiEqM1ZcahclHv1PBY+j7+s9q1XANet
         cxzkJ8jR1JryCMpAPTrD7UuORtcgLRXbSc0aCxOY7MKOOKRMAmqKXX48CRHwUSou3wJg
         XhoXM1Bh0F1d6/gBsantsq1fzPXOyO1Ci6P0k0ce8nvzPiu0b0MQanfxh3Z7bgHmTtea
         OwtRTBj8uO2wUdCVL70aMyUpNdsirlFJOUtMNsOM5OnHK0T5N5NWBqOiXZ8pPbqrZeVN
         3WfsVKBV6lcvErvOhLe0jPIkvEvM8XUIUON/x/tVBetRw6uKaavFPLRK91BOMNXafxDv
         AFkQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCXcE0PljgzN4CwL51IvmFbfDVc4KoKQrXSJxxqB4g5eu/nssgq66xLv9hrAzrmxdo4Cs4e2E+K/E6Hzfg0=@vger.kernel.org
X-Gm-Message-State: AOJu0YyIHi8oor4bRz4gRtPjF/EKYmVurtw9uB2Zm2+hACMPQTUfmaJ8
	W6aJ/KgZSrddgNg6Dm3+lZ8OdPG751uPS6WbzyxV+z1x2409JPA9MCmU
X-Gm-Gg: ASbGncvqH0sKp/iTGOVZnx7JsSAzXuDyp351ZUmzLimihRlzT/86WgOkCHSrMLwAzBL
	mkrCZ8QULNgdyp7w4lNWZNjrvCdb9xKiQojAAHO/c1k3M4FNwyZiIiOMeAFNsSKZAKugzw3ZxVT
	+3IwMu/6s9i7ueDoAWCe6mxf/lEkccLHxFJJIOSPaVqQOmroLmPi4G6iHhcAWkfeJ5mji42YbEK
	Jz7lRKHUYG8oGHt1Z1BXuYbny+QkCzYko5qq8fJm57bE4SKh2ogvbiN6Wk8BckfpM1uGhKXAsUN
	0dsQEuqMtsffW95LP7abyqBItjbCAzvf/RqiQzHOmnYjpwb93s3jMBM+FwLkU/OhJKQsIinTgPr
	87koF7RxIXEn2EBQQjwhvzM16Gu7449YzlSD6i1+umpsIcK3JEgXc30SJvzDcrQwczpaagw6udw
	rc9KB05O/44gC4w0e8IMGl1TX+OA==
X-Google-Smtp-Source: 
 AGHT+IG0VsD0yn9JHowhXxbnPXDPLEaPopFHzv8VEDaP8j840HV02P3HFhz82VCrrlgsKRrUwafxVg==
X-Received: by 2002:a17:903:1212:b0:295:f47:75cb with SMTP id
 d9443c01a7336-29b6becc173mr235722825ad.23.1764219651402;
        Wed, 26 Nov 2025 21:00:51 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.00.45
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:00:50 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>,
	Hyunchul Lee <hyc.lee@gmail.com>
Subject: [PATCH v2 06/11] ntfsplus: add iomap and address space operations
Date: Thu, 27 Nov 2025 13:59:39 +0900
Message-Id: <20251127045944.26009-7-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds the implementation of iomap and address space operations
for ntfsplus.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/ntfsplus/aops.c       | 617 ++++++++++++++++++++++++++++++++++
 fs/ntfsplus/ntfs_iomap.c | 700 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 1317 insertions(+)
 create mode 100644 fs/ntfsplus/aops.c
 create mode 100644 fs/ntfsplus/ntfs_iomap.c

diff --git a/fs/ntfsplus/aops.c b/fs/ntfsplus/aops.c
new file mode 100644
index 000000000000..9a1b3b80a146
--- /dev/null
+++ b/fs/ntfsplus/aops.c
@@ -0,0 +1,617 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS kernel address space operations and page cache handling.
+ *
+ * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/writeback.h>
+#include <linux/mpage.h>
+#include <linux/uio.h>
+
+#include "aops.h"
+#include "attrib.h"
+#include "mft.h"
+#include "ntfs.h"
+#include "misc.h"
+#include "ntfs_iomap.h"
+
+static s64 ntfs_convert_page_index_into_lcn(struct ntfs_volume *vol, struc=
t ntfs_inode *ni,
+		unsigned long page_index)
+{
+	sector_t iblock;
+	s64 vcn;
+	s64 lcn;
+	unsigned char blocksize_bits =3D vol->sb->s_blocksize_bits;
+
+	iblock =3D (s64)page_index << (PAGE_SHIFT - blocksize_bits);
+	vcn =3D (s64)iblock << blocksize_bits >> vol->cluster_size_bits;
+
+	down_read(&ni->runlist.lock);
+	lcn =3D ntfs_attr_vcn_to_lcn_nolock(ni, vcn, false);
+	up_read(&ni->runlist.lock);
+
+	return lcn;
+}
+
+struct bio *ntfs_setup_bio(struct ntfs_volume *vol, blk_opf_t opf, s64 lcn,
+		unsigned int pg_ofs)
+{
+	struct bio *bio;
+
+	bio =3D bio_alloc(vol->sb->s_bdev, 1, opf, GFP_NOIO);
+	if (!bio)
+		return NULL;
+	bio->bi_iter.bi_sector =3D ((lcn << vol->cluster_size_bits) + pg_ofs) >>
+		vol->sb->s_blocksize_bits;
+
+	return bio;
+}
+
+/**
+ * ntfs_read_folio - fill a @folio of a @file with data from the device
+ * @file:	open file to which the folio @folio belongs or NULL
+ * @folio:	page cache folio to fill with data
+ *
+ * For non-resident attributes, ntfs_read_folio() fills the @folio of the =
open
+ * file @file by calling the ntfs version of the generic block_read_full_f=
olio()
+ * function, which in turn creates and reads in the buffers associated with
+ * the folio asynchronously.
+ *
+ * For resident attributes, OTOH, ntfs_read_folio() fills @folio by copyin=
g the
+ * data from the mft record (which at this stage is most likely in memory)=
 and
+ * fills the remainder with zeroes. Thus, in this case, I/O is synchronous=
, as
+ * even if the mft record is not cached at this point in time, we need to =
wait
+ * for it to be read in before we can do the copy.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_read_folio(struct file *file, struct folio *folio)
+{
+	loff_t i_size;
+	struct inode *vi;
+	struct ntfs_inode *ni;
+
+	vi =3D folio->mapping->host;
+	i_size =3D i_size_read(vi);
+	/* Is the page fully outside i_size? (truncate in progress) */
+	if (unlikely(folio->index >=3D (i_size + PAGE_SIZE - 1) >>
+			PAGE_SHIFT)) {
+		folio_zero_segment(folio, 0, PAGE_SIZE);
+		ntfs_debug("Read outside i_size - truncated?");
+		folio_mark_uptodate(folio);
+		folio_unlock(folio);
+		return 0;
+	}
+	/*
+	 * This can potentially happen because we clear PageUptodate() during
+	 * ntfs_writepage() of MstProtected() attributes.
+	 */
+	if (folio_test_uptodate(folio)) {
+		folio_unlock(folio);
+		return 0;
+	}
+	ni =3D NTFS_I(vi);
+
+	/*
+	 * Only $DATA attributes can be encrypted and only unnamed $DATA
+	 * attributes can be compressed.  Index root can have the flags set but
+	 * this means to create compressed/encrypted files, not that the
+	 * attribute is compressed/encrypted.  Note we need to check for
+	 * AT_INDEX_ALLOCATION since this is the type of both directory and
+	 * index inodes.
+	 */
+	if (ni->type !=3D AT_INDEX_ALLOCATION) {
+		/* If attribute is encrypted, deny access, just like NT4. */
+		if (NInoEncrypted(ni)) {
+			folio_unlock(folio);
+			return -EACCES;
+		}
+		/* Compressed data streams are handled in compress.c. */
+		if (NInoNonResident(ni) && NInoCompressed(ni))
+			return ntfs_read_compressed_block(folio);
+	}
+
+	return iomap_read_folio(folio, &ntfs_read_iomap_ops);
+}
+
+static int ntfs_write_mft_block(struct ntfs_inode *ni, struct folio *folio,
+		struct writeback_control *wbc)
+{
+	struct inode *vi =3D VFS_I(ni);
+	struct ntfs_volume *vol =3D ni->vol;
+	u8 *kaddr;
+	struct ntfs_inode *locked_nis[PAGE_SIZE / NTFS_BLOCK_SIZE];
+	int nr_locked_nis =3D 0, err =3D 0, mft_ofs, prev_mft_ofs;
+	struct bio *bio =3D NULL;
+	unsigned long mft_no;
+	struct ntfs_inode *tni;
+	s64 lcn;
+	s64 vcn =3D (s64)folio->index << PAGE_SHIFT >> vol->cluster_size_bits;
+	s64 end_vcn =3D ni->allocated_size >> vol->cluster_size_bits;
+	unsigned int folio_sz;
+	struct runlist_element *rl;
+
+	ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, folio index 0x=
%lx.",
+			vi->i_ino, ni->type, folio->index);
+
+	lcn =3D ntfs_convert_page_index_into_lcn(vol, ni, folio->index);
+	if (lcn <=3D LCN_HOLE) {
+		folio_start_writeback(folio);
+		folio_unlock(folio);
+		folio_end_writeback(folio);
+		return -EIO;
+	}
+
+	/* Map folio so we can access its contents. */
+	kaddr =3D kmap_local_folio(folio, 0);
+	/* Clear the page uptodate flag whilst the mst fixups are applied. */
+	folio_clear_uptodate(folio);
+
+	for (mft_ofs =3D 0; mft_ofs < PAGE_SIZE && vcn < end_vcn;
+	     mft_ofs +=3D vol->mft_record_size) {
+		/* Get the mft record number. */
+		mft_no =3D (((s64)folio->index << PAGE_SHIFT) + mft_ofs) >>
+			vol->mft_record_size_bits;
+		vcn =3D mft_no << vol->mft_record_size_bits >> vol->cluster_size_bits;
+		/* Check whether to write this mft record. */
+		tni =3D NULL;
+		if (ntfs_may_write_mft_record(vol, mft_no,
+					(struct mft_record *)(kaddr + mft_ofs), &tni)) {
+			unsigned int mft_record_off =3D 0;
+			s64 vcn_off =3D vcn;
+
+			/*
+			 * The record should be written.  If a locked ntfs
+			 * inode was returned, add it to the array of locked
+			 * ntfs inodes.
+			 */
+			if (tni)
+				locked_nis[nr_locked_nis++] =3D tni;
+
+			if (bio && (mft_ofs !=3D prev_mft_ofs + vol->mft_record_size)) {
+flush_bio:
+				flush_dcache_folio(folio);
+				submit_bio_wait(bio);
+				bio_put(bio);
+				bio =3D NULL;
+			}
+
+			if (vol->cluster_size < folio_size(folio)) {
+				down_write(&ni->runlist.lock);
+				rl =3D ntfs_attr_vcn_to_rl(ni, vcn_off, &lcn);
+				up_write(&ni->runlist.lock);
+				if (IS_ERR(rl) || lcn < 0) {
+					err =3D -EIO;
+					goto unm_done;
+				}
+
+				if (bio &&
+				   (bio_end_sector(bio) >> (vol->cluster_size_bits - 9)) !=3D
+				    lcn) {
+					flush_dcache_folio(folio);
+					submit_bio_wait(bio);
+					bio_put(bio);
+					bio =3D NULL;
+				}
+			}
+
+			if (!bio) {
+				unsigned int off;
+
+				off =3D ((mft_no << vol->mft_record_size_bits) +
+				       mft_record_off) & vol->cluster_size_mask;
+
+				bio =3D ntfs_setup_bio(vol, REQ_OP_WRITE, lcn, off);
+				if (!bio) {
+					err =3D -ENOMEM;
+					goto unm_done;
+				}
+			}
+
+			if (vol->cluster_size =3D=3D NTFS_BLOCK_SIZE &&
+			    (mft_record_off ||
+			     rl->length - (vcn_off - rl->vcn) =3D=3D 1 ||
+			     mft_ofs + NTFS_BLOCK_SIZE >=3D PAGE_SIZE))
+				folio_sz =3D NTFS_BLOCK_SIZE;
+			else
+				folio_sz =3D vol->mft_record_size;
+			if (!bio_add_folio(bio, folio, folio_sz,
+					   mft_ofs + mft_record_off)) {
+				err =3D -EIO;
+				bio_put(bio);
+				goto unm_done;
+			}
+			mft_record_off +=3D folio_sz;
+
+			if (mft_record_off !=3D vol->mft_record_size) {
+				vcn_off++;
+				goto flush_bio;
+			}
+			prev_mft_ofs =3D mft_ofs;
+
+			if (mft_no < vol->mftmirr_size)
+				ntfs_sync_mft_mirror(vol, mft_no,
+						(struct mft_record *)(kaddr + mft_ofs));
+		}
+
+	}
+
+	if (bio) {
+		flush_dcache_folio(folio);
+		submit_bio_wait(bio);
+		bio_put(bio);
+	}
+	flush_dcache_folio(folio);
+unm_done:
+	folio_mark_uptodate(folio);
+	kunmap_local(kaddr);
+
+	folio_start_writeback(folio);
+	folio_unlock(folio);
+	folio_end_writeback(folio);
+
+	/* Unlock any locked inodes. */
+	while (nr_locked_nis-- > 0) {
+		struct ntfs_inode *base_tni;
+
+		tni =3D locked_nis[nr_locked_nis];
+		mutex_unlock(&tni->mrec_lock);
+
+		/* Get the base inode. */
+		mutex_lock(&tni->extent_lock);
+		if (tni->nr_extents >=3D 0)
+			base_tni =3D tni;
+		else
+			base_tni =3D tni->ext.base_ntfs_ino;
+		mutex_unlock(&tni->extent_lock);
+		ntfs_debug("Unlocking %s inode 0x%lx.",
+				tni =3D=3D base_tni ? "base" : "extent",
+				tni->mft_no);
+		atomic_dec(&tni->count);
+		iput(VFS_I(base_tni));
+	}
+
+	if (unlikely(err && err !=3D -ENOMEM))
+		NVolSetErrors(vol);
+	if (likely(!err))
+		ntfs_debug("Done.");
+	return err;
+}
+
+/**
+ * ntfs_bmap - map logical file block to physical device block
+ * @mapping:	address space mapping to which the block to be mapped belongs
+ * @block:	logical block to map to its physical device block
+ *
+ * For regular, non-resident files (i.e. not compressed and not encrypted)=
, map
+ * the logical @block belonging to the file described by the address space
+ * mapping @mapping to its physical device block.
+ *
+ * The size of the block is equal to the @s_blocksize field of the super b=
lock
+ * of the mounted file system which is guaranteed to be smaller than or eq=
ual
+ * to the cluster size thus the block is guaranteed to fit entirely inside=
 the
+ * cluster which means we do not need to care how many contiguous bytes are
+ * available after the beginning of the block.
+ *
+ * Return the physical device block if the mapping succeeded or 0 if the b=
lock
+ * is sparse or there was an error.
+ *
+ * Note: This is a problem if someone tries to run bmap() on $Boot system =
file
+ * as that really is in block zero but there is nothing we can do.  bmap()=
 is
+ * just broken in that respect (just like it cannot distinguish sparse from
+ * not available or error).
+ */
+static sector_t ntfs_bmap(struct address_space *mapping, sector_t block)
+{
+	s64 ofs, size;
+	loff_t i_size;
+	s64 lcn;
+	unsigned long blocksize, flags;
+	struct ntfs_inode *ni =3D NTFS_I(mapping->host);
+	struct ntfs_volume *vol =3D ni->vol;
+	unsigned int delta;
+	unsigned char blocksize_bits, cluster_size_shift;
+
+	ntfs_debug("Entering for mft_no 0x%lx, logical block 0x%llx.",
+			ni->mft_no, (unsigned long long)block);
+	if (ni->type !=3D AT_DATA || !NInoNonResident(ni) || NInoEncrypted(ni)) {
+		ntfs_error(vol->sb, "BMAP does not make sense for %s attributes, returni=
ng 0.",
+				(ni->type !=3D AT_DATA) ? "non-data" :
+				(!NInoNonResident(ni) ? "resident" :
+				"encrypted"));
+		return 0;
+	}
+	/* None of these can happen. */
+	blocksize =3D vol->sb->s_blocksize;
+	blocksize_bits =3D vol->sb->s_blocksize_bits;
+	ofs =3D (s64)block << blocksize_bits;
+	read_lock_irqsave(&ni->size_lock, flags);
+	size =3D ni->initialized_size;
+	i_size =3D i_size_read(VFS_I(ni));
+	read_unlock_irqrestore(&ni->size_lock, flags);
+	/*
+	 * If the offset is outside the initialized size or the block straddles
+	 * the initialized size then pretend it is a hole unless the
+	 * initialized size equals the file size.
+	 */
+	if (unlikely(ofs >=3D size || (ofs + blocksize > size && size < i_size)))
+		goto hole;
+	cluster_size_shift =3D vol->cluster_size_bits;
+	down_read(&ni->runlist.lock);
+	lcn =3D ntfs_attr_vcn_to_lcn_nolock(ni, ofs >> cluster_size_shift, false);
+	up_read(&ni->runlist.lock);
+	if (unlikely(lcn < LCN_HOLE)) {
+		/*
+		 * Step down to an integer to avoid gcc doing a long long
+		 * comparision in the switch when we know @lcn is between
+		 * LCN_HOLE and LCN_EIO (i.e. -1 to -5).
+		 *
+		 * Otherwise older gcc (at least on some architectures) will
+		 * try to use __cmpdi2() which is of course not available in
+		 * the kernel.
+		 */
+		switch ((int)lcn) {
+		case LCN_ENOENT:
+			/*
+			 * If the offset is out of bounds then pretend it is a
+			 * hole.
+			 */
+			goto hole;
+		case LCN_ENOMEM:
+			ntfs_error(vol->sb,
+				"Not enough memory to complete mapping for inode 0x%lx. Returning 0.",
+				ni->mft_no);
+			break;
+		default:
+			ntfs_error(vol->sb,
+				"Failed to complete mapping for inode 0x%lx.  Run chkdsk. Returning 0.=
",
+				ni->mft_no);
+			break;
+		}
+		return 0;
+	}
+	if (lcn < 0) {
+		/* It is a hole. */
+hole:
+		ntfs_debug("Done (returning hole).");
+		return 0;
+	}
+	/*
+	 * The block is really allocated and fullfils all our criteria.
+	 * Convert the cluster to units of block size and return the result.
+	 */
+	delta =3D ofs & vol->cluster_size_mask;
+	if (unlikely(sizeof(block) < sizeof(lcn))) {
+		block =3D lcn =3D ((lcn << cluster_size_shift) + delta) >>
+				blocksize_bits;
+		/* If the block number was truncated return 0. */
+		if (unlikely(block !=3D lcn)) {
+			ntfs_error(vol->sb,
+				"Physical block 0x%llx is too large to be returned, returning 0.",
+				(long long)lcn);
+			return 0;
+		}
+	} else
+		block =3D ((lcn << cluster_size_shift) + delta) >>
+				blocksize_bits;
+	ntfs_debug("Done (returning block 0x%llx).", (unsigned long long)lcn);
+	return block;
+}
+
+static void ntfs_readahead(struct readahead_control *rac)
+{
+	struct address_space *mapping =3D rac->mapping;
+	struct inode *inode =3D mapping->host;
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+
+	if (!NInoNonResident(ni) || NInoCompressed(ni)) {
+		/* No readahead for resident and compressed. */
+		return;
+	}
+
+	if (NInoMstProtected(ni) &&
+	    (ni->mft_no =3D=3D FILE_MFT || ni->mft_no =3D=3D FILE_MFTMirr))
+		return;
+
+	iomap_readahead(rac, &ntfs_read_iomap_ops);
+}
+
+static int ntfs_mft_writepage(struct folio *folio, struct writeback_contro=
l *wbc)
+{
+	struct address_space *mapping =3D folio->mapping;
+	struct inode *vi =3D mapping->host;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	loff_t i_size;
+	int ret;
+
+	i_size =3D i_size_read(vi);
+
+	/* We have to zero every time due to mmap-at-end-of-file. */
+	if (folio->index >=3D (i_size >> PAGE_SHIFT)) {
+		/* The page straddles i_size. */
+		unsigned int ofs =3D i_size & ~PAGE_MASK;
+
+		folio_zero_segment(folio, ofs, PAGE_SIZE);
+	}
+
+	ret =3D ntfs_write_mft_block(ni, folio, wbc);
+	mapping_set_error(mapping, ret);
+	return ret;
+}
+
+static int ntfs_writepages(struct address_space *mapping,
+		struct writeback_control *wbc)
+{
+	struct inode *inode =3D mapping->host;
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+	struct iomap_writepage_ctx wpc =3D {
+		.inode		=3D mapping->host,
+		.wbc		=3D wbc,
+		.ops		=3D &ntfs_writeback_ops,
+	};
+
+	if (NVolShutdown(ni->vol))
+		return -EIO;
+
+	if (!NInoNonResident(ni))
+		return 0;
+
+	if (NInoMstProtected(ni) && ni->mft_no =3D=3D FILE_MFT) {
+		struct folio *folio =3D NULL;
+		int error;
+
+		while ((folio =3D writeback_iter(mapping, wbc, folio, &error)))
+			error =3D ntfs_mft_writepage(folio, wbc);
+		return error;
+	}
+
+	/* If file is encrypted, deny access, just like NT4. */
+	if (NInoEncrypted(ni)) {
+		ntfs_debug("Denying write access to encrypted file.");
+		return -EACCES;
+	}
+
+	return iomap_writepages(&wpc);
+}
+
+static int ntfs_swap_activate(struct swap_info_struct *sis,
+		struct file *swap_file, sector_t *span)
+{
+	return iomap_swapfile_activate(sis, swap_file, span,
+			&ntfs_read_iomap_ops);
+}
+
+/**
+ * ntfs_normal_aops - address space operations for normal inodes and attri=
butes
+ *
+ * Note these are not used for compressed or mst protected inodes and
+ * attributes.
+ */
+const struct address_space_operations ntfs_normal_aops =3D {
+	.read_folio		=3D ntfs_read_folio,
+	.readahead		=3D ntfs_readahead,
+	.writepages		=3D ntfs_writepages,
+	.direct_IO		=3D noop_direct_IO,
+	.dirty_folio		=3D iomap_dirty_folio,
+	.bmap			=3D ntfs_bmap,
+	.migrate_folio		=3D filemap_migrate_folio,
+	.is_partially_uptodate	=3D iomap_is_partially_uptodate,
+	.error_remove_folio	=3D generic_error_remove_folio,
+	.release_folio		=3D iomap_release_folio,
+	.invalidate_folio	=3D iomap_invalidate_folio,
+	.swap_activate          =3D ntfs_swap_activate,
+};
+
+/**
+ * ntfs_compressed_aops - address space operations for compressed inodes
+ */
+const struct address_space_operations ntfs_compressed_aops =3D {
+	.read_folio		=3D ntfs_read_folio,
+	.direct_IO		=3D noop_direct_IO,
+	.writepages		=3D ntfs_writepages,
+	.dirty_folio		=3D iomap_dirty_folio,
+	.migrate_folio		=3D filemap_migrate_folio,
+	.is_partially_uptodate	=3D iomap_is_partially_uptodate,
+	.error_remove_folio	=3D generic_error_remove_folio,
+	.release_folio		=3D iomap_release_folio,
+	.invalidate_folio	=3D iomap_invalidate_folio,
+};
+
+/**
+ * ntfs_mst_aops - general address space operations for mst protecteed ino=
des
+ *		   and attributes
+ */
+const struct address_space_operations ntfs_mst_aops =3D {
+	.read_folio		=3D ntfs_read_folio,	/* Fill page with data. */
+	.readahead		=3D ntfs_readahead,
+	.writepages		=3D ntfs_writepages,	/* Write dirty page to disk. */
+	.dirty_folio		=3D iomap_dirty_folio,
+	.migrate_folio		=3D filemap_migrate_folio,
+	.is_partially_uptodate	=3D iomap_is_partially_uptodate,
+	.error_remove_folio	=3D generic_error_remove_folio,
+	.release_folio		=3D iomap_release_folio,
+	.invalidate_folio	=3D iomap_invalidate_folio,
+};
+
+void mark_ntfs_record_dirty(struct folio *folio)
+{
+	iomap_dirty_folio(folio->mapping, folio);
+}
+
+int ntfs_dev_read(struct super_block *sb, void *buf, loff_t start, loff_t =
size)
+{
+	pgoff_t idx, idx_end;
+	loff_t offset, end =3D start + size;
+	u32 from, to, buf_off =3D 0;
+	struct folio *folio;
+	char *kaddr;
+
+	idx =3D start >> PAGE_SHIFT;
+	idx_end =3D end >> PAGE_SHIFT;
+	from =3D start & ~PAGE_MASK;
+
+	if (idx =3D=3D idx_end)
+		idx_end++;
+
+	for (; idx < idx_end; idx++, from =3D 0) {
+		folio =3D ntfs_read_mapping_folio(sb->s_bdev->bd_mapping, idx);
+		if (IS_ERR(folio)) {
+			ntfs_error(sb, "Unable to read %ld page", idx);
+			return PTR_ERR(folio);
+		}
+
+		kaddr =3D kmap_local_folio(folio, 0);
+		offset =3D (loff_t)idx << PAGE_SHIFT;
+		to =3D min_t(u32, end - offset, PAGE_SIZE);
+
+		memcpy(buf + buf_off, kaddr + from, to);
+		buf_off +=3D to;
+		kunmap_local(kaddr);
+		folio_put(folio);
+	}
+
+	return 0;
+}
+
+int ntfs_dev_write(struct super_block *sb, void *buf, loff_t start,
+			loff_t size, bool wait)
+{
+	pgoff_t idx, idx_end;
+	loff_t offset, end =3D start + size;
+	u32 from, to, buf_off =3D 0;
+	struct folio *folio;
+	char *kaddr;
+
+	idx =3D start >> PAGE_SHIFT;
+	idx_end =3D end >> PAGE_SHIFT;
+	from =3D start & ~PAGE_MASK;
+
+	if (idx =3D=3D idx_end)
+		idx_end++;
+
+	for (; idx < idx_end; idx++, from =3D 0) {
+		folio =3D ntfs_read_mapping_folio(sb->s_bdev->bd_mapping, idx);
+		if (IS_ERR(folio)) {
+			ntfs_error(sb, "Unable to read %ld page", idx);
+			return PTR_ERR(folio);
+		}
+
+		kaddr =3D kmap_local_folio(folio, 0);
+		offset =3D (loff_t)idx << PAGE_SHIFT;
+		to =3D min_t(u32, end - offset, PAGE_SIZE);
+
+		memcpy(kaddr + from, buf + buf_off, to);
+		buf_off +=3D to;
+		kunmap_local(kaddr);
+		folio_mark_uptodate(folio);
+		folio_mark_dirty(folio);
+		if (wait)
+			folio_wait_stable(folio);
+		folio_put(folio);
+	}
+
+	return 0;
+}
diff --git a/fs/ntfsplus/ntfs_iomap.c b/fs/ntfsplus/ntfs_iomap.c
new file mode 100644
index 000000000000..c9fd999820f4
--- /dev/null
+++ b/fs/ntfsplus/ntfs_iomap.c
@@ -0,0 +1,700 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * iomap callack functions
+ *
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/writeback.h>
+#include <linux/mpage.h>
+#include <linux/uio.h>
+
+#include "aops.h"
+#include "attrib.h"
+#include "mft.h"
+#include "ntfs.h"
+#include "misc.h"
+#include "ntfs_iomap.h"
+
+static void ntfs_iomap_put_folio(struct inode *inode, loff_t pos,
+		unsigned int len, struct folio *folio)
+{
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+	unsigned long sector_size =3D 1UL << inode->i_blkbits;
+	loff_t start_down, end_up, init;
+
+	if (!NInoNonResident(ni))
+		goto out;
+
+	start_down =3D round_down(pos, sector_size);
+	end_up =3D (pos + len - 1) | (sector_size - 1);
+	init =3D ni->initialized_size;
+
+	if (init >=3D start_down && init <=3D end_up) {
+		if (init < pos) {
+			loff_t offset =3D offset_in_folio(folio, pos + len);
+
+			if (offset =3D=3D 0)
+				offset =3D folio_size(folio);
+			folio_zero_segments(folio,
+					    offset_in_folio(folio, init),
+					    offset_in_folio(folio, pos),
+					    offset,
+					    folio_size(folio));
+
+		} else  {
+			loff_t offset =3D max_t(loff_t, pos + len, init);
+
+			offset =3D offset_in_folio(folio, offset);
+			if (offset =3D=3D 0)
+				offset =3D folio_size(folio);
+			folio_zero_segment(folio,
+					   offset,
+					   folio_size(folio));
+		}
+	} else if (init <=3D pos) {
+		loff_t offset =3D 0, offset2 =3D offset_in_folio(folio, pos + len);
+
+		if ((init >> folio_shift(folio)) =3D=3D (pos >> folio_shift(folio)))
+			offset =3D offset_in_folio(folio, init);
+		if (offset2 =3D=3D 0)
+			offset2 =3D folio_size(folio);
+		folio_zero_segments(folio,
+				    offset,
+				    offset_in_folio(folio, pos),
+				    offset2,
+				    folio_size(folio));
+	}
+
+out:
+	folio_unlock(folio);
+	folio_put(folio);
+}
+
+const struct iomap_write_ops ntfs_iomap_folio_ops =3D {
+	.put_folio =3D ntfs_iomap_put_folio,
+};
+
+static int ntfs_read_iomap_begin(struct inode *inode, loff_t offset, loff_=
t length,
+		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
+{
+	struct ntfs_inode *base_ni, *ni =3D NTFS_I(inode);
+	struct ntfs_attr_search_ctx *ctx;
+	loff_t i_size;
+	u32 attr_len;
+	int err =3D 0;
+	char *kattr;
+	struct page *ipage;
+
+	if (NInoNonResident(ni)) {
+		s64 vcn;
+		s64 lcn;
+		struct runlist_element *rl;
+		struct ntfs_volume *vol =3D ni->vol;
+		loff_t vcn_ofs;
+		loff_t rl_length;
+
+		vcn =3D offset >> vol->cluster_size_bits;
+		vcn_ofs =3D offset & vol->cluster_size_mask;
+
+		down_write(&ni->runlist.lock);
+		rl =3D ntfs_attr_vcn_to_rl(ni, vcn, &lcn);
+		if (IS_ERR(rl)) {
+			up_write(&ni->runlist.lock);
+			return PTR_ERR(rl);
+		}
+
+		if (flags & IOMAP_REPORT) {
+			if (lcn < LCN_HOLE) {
+				up_write(&ni->runlist.lock);
+				return -ENOENT;
+			}
+		} else if (lcn < LCN_ENOENT) {
+			up_write(&ni->runlist.lock);
+			return -EINVAL;
+		}
+
+		iomap->bdev =3D inode->i_sb->s_bdev;
+		iomap->offset =3D offset;
+
+		if (lcn <=3D LCN_DELALLOC) {
+			if (lcn =3D=3D LCN_DELALLOC)
+				iomap->type =3D IOMAP_DELALLOC;
+			else
+				iomap->type =3D IOMAP_HOLE;
+			iomap->addr =3D IOMAP_NULL_ADDR;
+		} else {
+			if (!(flags & IOMAP_ZERO) && offset >=3D ni->initialized_size)
+				iomap->type =3D IOMAP_UNWRITTEN;
+			else
+				iomap->type =3D IOMAP_MAPPED;
+			iomap->addr =3D (lcn << vol->cluster_size_bits) + vcn_ofs;
+		}
+
+		rl_length =3D (rl->length - (vcn - rl->vcn)) << ni->vol->cluster_size_bi=
ts;
+
+		if (rl_length =3D=3D 0 && rl->lcn > LCN_DELALLOC) {
+			ntfs_error(inode->i_sb,
+				   "runlist(vcn : %lld, length : %lld, lcn : %lld) is corrupted\n",
+				   rl->vcn, rl->length, rl->lcn);
+			up_write(&ni->runlist.lock);
+			return -EIO;
+		}
+
+		if (rl_length && length > rl_length - vcn_ofs)
+			iomap->length =3D rl_length - vcn_ofs;
+		else
+			iomap->length =3D length;
+		up_write(&ni->runlist.lock);
+
+		if (!(flags & IOMAP_ZERO) &&
+		    iomap->type =3D=3D IOMAP_MAPPED &&
+		    iomap->offset < ni->initialized_size &&
+		    iomap->offset + iomap->length > ni->initialized_size) {
+			iomap->length =3D round_up(ni->initialized_size, 1 << inode->i_blkbits)=
 -
+				iomap->offset;
+		}
+		iomap->flags |=3D IOMAP_F_MERGED;
+		return 0;
+	}
+
+	if (NInoAttr(ni))
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, NULL);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto out;
+	}
+
+	err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err))
+		goto out;
+
+	attr_len =3D le32_to_cpu(ctx->attr->data.resident.value_length);
+	if (unlikely(attr_len > ni->initialized_size))
+		attr_len =3D ni->initialized_size;
+	i_size =3D i_size_read(inode);
+
+	if (unlikely(attr_len > i_size)) {
+		/* Race with shrinking truncate. */
+		attr_len =3D i_size;
+	}
+
+	if (offset >=3D attr_len) {
+		if (flags & IOMAP_REPORT)
+			err =3D -ENOENT;
+		else
+			err =3D -EFAULT;
+		goto out;
+	}
+
+	kattr =3D (u8 *)ctx->attr + le16_to_cpu(ctx->attr->data.resident.value_of=
fset);
+
+	ipage =3D alloc_page(__GFP_NOWARN | __GFP_IO | __GFP_ZERO);
+	if (!ipage) {
+		err =3D -ENOMEM;
+		goto out;
+	}
+
+	memcpy(page_address(ipage), kattr, attr_len);
+	iomap->type =3D IOMAP_INLINE;
+	iomap->inline_data =3D page_address(ipage);
+	iomap->offset =3D 0;
+	iomap->length =3D min_t(loff_t, attr_len, PAGE_SIZE);
+	iomap->private =3D ipage;
+
+out:
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	return err;
+}
+
+static int ntfs_read_iomap_end(struct inode *inode, loff_t pos, loff_t len=
gth,
+		ssize_t written, unsigned int flags, struct iomap *iomap)
+{
+	if (iomap->type =3D=3D IOMAP_INLINE) {
+		struct page *ipage =3D iomap->private;
+
+		put_page(ipage);
+	}
+	return written;
+}
+
+const struct iomap_ops ntfs_read_iomap_ops =3D {
+	.iomap_begin =3D ntfs_read_iomap_begin,
+	.iomap_end =3D ntfs_read_iomap_end,
+};
+
+static int ntfs_buffered_zeroed_clusters(struct inode *vi, s64 vcn)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_volume *vol =3D ni->vol;
+	struct address_space *mapping =3D vi->i_mapping;
+	struct folio *folio;
+	pgoff_t idx, idx_end;
+	u32 from, to;
+
+	idx =3D (vcn << vol->cluster_size_bits) >> PAGE_SHIFT;
+	idx_end =3D ((vcn + 1) << vol->cluster_size_bits) >> PAGE_SHIFT;
+	from =3D (vcn << vol->cluster_size_bits) & ~PAGE_MASK;
+	if (idx =3D=3D idx_end)
+		idx_end++;
+
+	to =3D min_t(u32, vol->cluster_size, PAGE_SIZE);
+	for (; idx < idx_end; idx++, from =3D 0) {
+		if (to !=3D PAGE_SIZE) {
+			folio =3D ntfs_read_mapping_folio(mapping, idx);
+			if (IS_ERR(folio))
+				return PTR_ERR(folio);
+			folio_lock(folio);
+		} else {
+			folio =3D __filemap_get_folio(mapping, idx,
+					FGP_WRITEBEGIN | FGP_NOFS, mapping_gfp_mask(mapping));
+			if (IS_ERR(folio))
+				return PTR_ERR(folio);
+		}
+
+		if (folio_test_uptodate(folio) ||
+		    iomap_is_partially_uptodate(folio, from, to))
+			goto next_folio;
+
+		folio_zero_segment(folio, from, from + to);
+		folio_mark_uptodate(folio);
+
+next_folio:
+		iomap_dirty_folio(mapping, folio);
+		folio_unlock(folio);
+		folio_put(folio);
+		balance_dirty_pages_ratelimited(mapping);
+		cond_resched();
+	}
+
+	return 0;
+}
+
+int ntfs_zeroed_clusters(struct inode *vi, s64 lcn, s64 num)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_volume *vol =3D ni->vol;
+	u32 to;
+	struct bio *bio =3D NULL;
+	s64 err =3D 0, zero_len =3D num << vol->cluster_size_bits;
+	s64 loc =3D lcn << vol->cluster_size_bits, curr =3D 0;
+
+	while (zero_len > 0) {
+setup_bio:
+		if (!bio) {
+			bio =3D bio_alloc(vol->sb->s_bdev,
+					bio_max_segs(DIV_ROUND_UP(zero_len, PAGE_SIZE)),
+					REQ_OP_WRITE | REQ_SYNC | REQ_IDLE, GFP_NOIO);
+			if (!bio)
+				return -ENOMEM;
+			bio->bi_iter.bi_sector =3D (loc + curr) >> vol->sb->s_blocksize_bits;
+		}
+
+		to =3D min_t(u32, zero_len, PAGE_SIZE);
+		if (!bio_add_page(bio, ZERO_PAGE(0), to, 0)) {
+			err =3D submit_bio_wait(bio);
+			bio_put(bio);
+			bio =3D NULL;
+			if (err)
+				break;
+			goto setup_bio;
+		}
+		zero_len -=3D to;
+		curr +=3D to;
+	}
+
+	if (bio) {
+		err =3D submit_bio_wait(bio);
+		bio_put(bio);
+	}
+
+	return err;
+}
+
+static int __ntfs_write_iomap_begin(struct inode *inode, loff_t offset,
+				    loff_t length, unsigned int flags,
+				    struct iomap *iomap, bool da, bool mapped)
+{
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+	struct ntfs_volume *vol =3D ni->vol;
+	struct attr_record *a;
+	struct ntfs_attr_search_ctx *ctx;
+	u32 attr_len;
+	int err =3D 0;
+	char *kattr;
+	struct page *ipage;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	mutex_lock(&ni->mrec_lock);
+	if (NInoNonResident(ni)) {
+		s64 vcn;
+		loff_t vcn_ofs;
+		loff_t rl_length;
+		s64 max_clu_count =3D
+			round_up(length, vol->cluster_size) >> vol->cluster_size_bits;
+
+		vcn =3D offset >> vol->cluster_size_bits;
+		vcn_ofs =3D offset & vol->cluster_size_mask;
+
+		if (da) {
+			bool balloc =3D false;
+			s64 start_lcn, lcn_count;
+			bool update_mp;
+
+			update_mp =3D (flags & IOMAP_DIRECT) || mapped ||
+				NInoAttr(ni) || ni->mft_no < FILE_first_user;
+			down_write(&ni->runlist.lock);
+			err =3D ntfs_attr_map_cluster(ni, vcn, &start_lcn, &lcn_count,
+					max_clu_count, &balloc, update_mp,
+					!(flags & IOMAP_DIRECT) && !mapped);
+			up_write(&ni->runlist.lock);
+			mutex_unlock(&ni->mrec_lock);
+			if (err) {
+				ni->i_dealloc_clusters =3D 0;
+				return err;
+			}
+
+			iomap->bdev =3D inode->i_sb->s_bdev;
+			iomap->offset =3D offset;
+
+			rl_length =3D lcn_count << ni->vol->cluster_size_bits;
+			if (length > rl_length - vcn_ofs)
+				iomap->length =3D rl_length - vcn_ofs;
+			else
+				iomap->length =3D length;
+
+			if (start_lcn =3D=3D LCN_HOLE)
+				iomap->type =3D IOMAP_HOLE;
+			else
+				iomap->type =3D IOMAP_MAPPED;
+			if (balloc =3D=3D true)
+				iomap->flags =3D IOMAP_F_NEW;
+
+			iomap->addr =3D (start_lcn << vol->cluster_size_bits) + vcn_ofs;
+
+			if (balloc =3D=3D true) {
+				if (flags & IOMAP_DIRECT || mapped =3D=3D true) {
+					loff_t end =3D offset + length;
+
+					if (vcn_ofs || ((vol->cluster_size > iomap->length) &&
+							end < ni->initialized_size))
+						err =3D ntfs_zeroed_clusters(inode,
+								start_lcn, 1);
+					if (!err && lcn_count > 1 &&
+					    (iomap->length & vol->cluster_size_mask &&
+					     end < ni->initialized_size))
+						err =3D ntfs_zeroed_clusters(inode,
+								start_lcn + (lcn_count - 1), 1);
+				} else {
+					if (lcn_count > ni->i_dealloc_clusters)
+						ni->i_dealloc_clusters =3D 0;
+					else
+						ni->i_dealloc_clusters -=3D lcn_count;
+				}
+				if (err < 0)
+					return err;
+			}
+
+			if (mapped && iomap->offset + iomap->length >
+			    ni->initialized_size) {
+				err =3D ntfs_attr_set_initialized_size(ni, iomap->offset +
+								     iomap->length);
+				if (err)
+					return err;
+			}
+		} else {
+			struct runlist_element *rl, *rlc;
+			s64 lcn;
+			bool is_retry =3D false;
+
+			down_read(&ni->runlist.lock);
+			rl =3D ni->runlist.rl;
+			if (!rl) {
+				up_read(&ni->runlist.lock);
+				err =3D ntfs_map_runlist(ni, vcn);
+				if (err) {
+					mutex_unlock(&ni->mrec_lock);
+					return -ENOENT;
+				}
+				down_read(&ni->runlist.lock);
+				rl =3D ni->runlist.rl;
+			}
+			up_read(&ni->runlist.lock);
+
+			down_write(&ni->runlist.lock);
+remap_rl:
+			/* Seek to element containing target vcn. */
+			while (rl->length && rl[1].vcn <=3D vcn)
+				rl++;
+			lcn =3D ntfs_rl_vcn_to_lcn(rl, vcn);
+
+			if (lcn <=3D LCN_RL_NOT_MAPPED && is_retry =3D=3D false) {
+				is_retry =3D true;
+				if (!ntfs_map_runlist_nolock(ni, vcn, NULL)) {
+					rl =3D ni->runlist.rl;
+					goto remap_rl;
+				}
+			}
+
+			max_clu_count =3D min(max_clu_count, rl->length - (vcn - rl->vcn));
+			if (max_clu_count =3D=3D 0) {
+				ntfs_error(inode->i_sb,
+					   "runlist(vcn : %lld, length : %lld) is corrupted\n",
+					   rl->vcn, rl->length);
+				up_write(&ni->runlist.lock);
+				mutex_unlock(&ni->mrec_lock);
+				return -EIO;
+			}
+
+			iomap->bdev =3D inode->i_sb->s_bdev;
+			iomap->offset =3D offset;
+
+			if (lcn <=3D LCN_DELALLOC) {
+				if (lcn < LCN_DELALLOC) {
+					max_clu_count =3D
+						ntfs_available_clusters_count(vol, max_clu_count);
+					if (max_clu_count < 0) {
+						err =3D max_clu_count;
+						up_write(&ni->runlist.lock);
+						mutex_unlock(&ni->mrec_lock);
+						return err;
+					}
+				}
+
+				iomap->type =3D IOMAP_DELALLOC;
+				iomap->addr =3D IOMAP_NULL_ADDR;
+
+				if (lcn <=3D LCN_HOLE) {
+					size_t new_rl_count;
+
+					rlc =3D ntfs_malloc_nofs(sizeof(struct runlist_element) * 2);
+					if (!rlc) {
+						up_write(&ni->runlist.lock);
+						mutex_unlock(&ni->mrec_lock);
+						return -ENOMEM;
+					}
+
+					rlc->vcn =3D vcn;
+					rlc->lcn =3D LCN_DELALLOC;
+					rlc->length =3D max_clu_count;
+
+					rlc[1].vcn =3D vcn + max_clu_count;
+					rlc[1].lcn =3D LCN_RL_NOT_MAPPED;
+					rlc[1].length =3D 0;
+
+					rl =3D ntfs_runlists_merge(&ni->runlist, rlc, 0,
+							&new_rl_count);
+					if (IS_ERR(rl)) {
+						ntfs_error(vol->sb, "Failed to merge runlists");
+						up_write(&ni->runlist.lock);
+						mutex_unlock(&ni->mrec_lock);
+						ntfs_free(rlc);
+						return PTR_ERR(rl);
+					}
+
+					ni->runlist.rl =3D rl;
+					ni->runlist.count =3D new_rl_count;
+					ni->i_dealloc_clusters +=3D max_clu_count;
+				}
+				up_write(&ni->runlist.lock);
+				mutex_unlock(&ni->mrec_lock);
+
+				if (lcn < LCN_DELALLOC)
+					ntfs_hold_dirty_clusters(vol, max_clu_count);
+
+				rl_length =3D max_clu_count << ni->vol->cluster_size_bits;
+				if (length > rl_length - vcn_ofs)
+					iomap->length =3D rl_length - vcn_ofs;
+				else
+					iomap->length =3D length;
+
+				iomap->flags =3D IOMAP_F_NEW;
+				if (lcn <=3D LCN_HOLE) {
+					loff_t end =3D offset + length;
+
+					if (vcn_ofs || ((vol->cluster_size > iomap->length) &&
+							end < ni->initialized_size))
+						err =3D ntfs_buffered_zeroed_clusters(inode, vcn);
+					if (!err && max_clu_count > 1 &&
+					    (iomap->length & vol->cluster_size_mask &&
+					     end < ni->initialized_size))
+						err =3D ntfs_buffered_zeroed_clusters(inode,
+								vcn + (max_clu_count - 1));
+					if (err) {
+						ntfs_release_dirty_clusters(vol, max_clu_count);
+						return err;
+					}
+				}
+			} else {
+				up_write(&ni->runlist.lock);
+				mutex_unlock(&ni->mrec_lock);
+
+				iomap->type =3D IOMAP_MAPPED;
+				iomap->addr =3D (lcn << vol->cluster_size_bits) + vcn_ofs;
+
+				rl_length =3D max_clu_count << ni->vol->cluster_size_bits;
+				if (length > rl_length - vcn_ofs)
+					iomap->length =3D rl_length - vcn_ofs;
+				else
+					iomap->length =3D length;
+			}
+		}
+
+		return 0;
+	}
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto out;
+	}
+
+	err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (err) {
+		if (err =3D=3D -ENOENT)
+			err =3D -EIO;
+		goto out;
+	}
+
+	a =3D ctx->attr;
+	/* The total length of the attribute value. */
+	attr_len =3D le32_to_cpu(a->data.resident.value_length);
+	kattr =3D (u8 *)a + le16_to_cpu(a->data.resident.value_offset);
+
+	ipage =3D alloc_page(__GFP_NOWARN | __GFP_IO | __GFP_ZERO);
+	if (!ipage) {
+		err =3D -ENOMEM;
+		goto out;
+	}
+	memcpy(page_address(ipage), kattr, attr_len);
+
+	iomap->type =3D IOMAP_INLINE;
+	iomap->inline_data =3D page_address(ipage);
+	iomap->offset =3D 0;
+	/* iomap requires there is only one INLINE_DATA extent */
+	iomap->length =3D attr_len;
+	iomap->private =3D ipage;
+
+out:
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	mutex_unlock(&ni->mrec_lock);
+
+	return err;
+}
+
+static int ntfs_write_iomap_begin(struct inode *inode, loff_t offset,
+				  loff_t length, unsigned int flags,
+				  struct iomap *iomap, struct iomap *srcmap)
+{
+	return __ntfs_write_iomap_begin(inode, offset, length, flags, iomap,
+			false, false);
+}
+
+static int ntfs_write_iomap_end(struct inode *inode, loff_t pos, loff_t le=
ngth,
+		ssize_t written, unsigned int flags, struct iomap *iomap)
+{
+	if (iomap->type =3D=3D IOMAP_INLINE) {
+		struct page *ipage =3D iomap->private;
+		struct ntfs_inode *ni =3D NTFS_I(inode);
+		struct ntfs_attr_search_ctx *ctx;
+		u32 attr_len;
+		int err;
+		char *kattr;
+
+		mutex_lock(&ni->mrec_lock);
+		ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+		if (!ctx) {
+			written =3D -ENOMEM;
+			mutex_unlock(&ni->mrec_lock);
+			goto out;
+		}
+
+		err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+				CASE_SENSITIVE, 0, NULL, 0, ctx);
+		if (err) {
+			if (err =3D=3D -ENOENT)
+				err =3D -EIO;
+			written =3D err;
+			goto err_out;
+		}
+
+		/* The total length of the attribute value. */
+		attr_len =3D le32_to_cpu(ctx->attr->data.resident.value_length);
+		if (pos >=3D attr_len || pos + written > attr_len)
+			goto err_out;
+
+		kattr =3D (u8 *)ctx->attr + le16_to_cpu(ctx->attr->data.resident.value_o=
ffset);
+		memcpy(kattr + pos, iomap_inline_data(iomap, pos), written);
+		mark_mft_record_dirty(ctx->ntfs_ino);
+err_out:
+		ntfs_attr_put_search_ctx(ctx);
+		put_page(ipage);
+		mutex_unlock(&ni->mrec_lock);
+	}
+
+out:
+	return written;
+}
+
+const struct iomap_ops ntfs_write_iomap_ops =3D {
+	.iomap_begin		=3D ntfs_write_iomap_begin,
+	.iomap_end		=3D ntfs_write_iomap_end,
+};
+
+static int ntfs_page_mkwrite_iomap_begin(struct inode *inode, loff_t offse=
t,
+				  loff_t length, unsigned int flags,
+				  struct iomap *iomap, struct iomap *srcmap)
+{
+	return __ntfs_write_iomap_begin(inode, offset, length, flags, iomap,
+			true, true);
+}
+
+const struct iomap_ops ntfs_page_mkwrite_iomap_ops =3D {
+	.iomap_begin		=3D ntfs_page_mkwrite_iomap_begin,
+	.iomap_end		=3D ntfs_write_iomap_end,
+};
+
+static int ntfs_dio_iomap_begin(struct inode *inode, loff_t offset,
+				  loff_t length, unsigned int flags,
+				  struct iomap *iomap, struct iomap *srcmap)
+{
+	return __ntfs_write_iomap_begin(inode, offset, length, flags, iomap,
+			true, false);
+}
+
+const struct iomap_ops ntfs_dio_iomap_ops =3D {
+	.iomap_begin		=3D ntfs_dio_iomap_begin,
+	.iomap_end		=3D ntfs_write_iomap_end,
+};
+
+static ssize_t ntfs_writeback_range(struct iomap_writepage_ctx *wpc,
+		struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
+{
+	if (offset < wpc->iomap.offset ||
+	    offset >=3D wpc->iomap.offset + wpc->iomap.length) {
+		int error;
+
+		error =3D __ntfs_write_iomap_begin(wpc->inode, offset,
+				NTFS_I(wpc->inode)->allocated_size - offset,
+				IOMAP_WRITE, &wpc->iomap, true, false);
+		if (error)
+			return error;
+	}
+
+	return iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
+}
+
+const struct iomap_writeback_ops ntfs_writeback_ops =3D {
+	.writeback_range	=3D ntfs_writeback_range,
+	.writeback_submit	=3D iomap_ioend_writeback_submit,
+};
--=20
2.25.1
From nobody Sun Feb  8 02:39:38 2026
Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com
 [209.85.214.173])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2227B30FF29
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:01:01 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.173
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219674; cv=none;
 b=mgfQmNtYBOiMvFM9osCTQxGiczKoQ/hVXkNzqvX3fznM5H8lvVG+/4zyxQe6z4Q2+kO6xXeUy7Jnm2CcPZoy9TJF5v+PuI82rxmi4IrrGSWpRqYtPEtBoNOxCIjN9lXGonzRBDsFE5LBRmMl0Ip+e4lMm/QvPLisRb6znB66JjQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219674; c=relaxed/simple;
	bh=oEQ8ZPu4SBIXWKMcDp3qkdlB1Vq/8PvMNkk9ETkNDBk=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=WGmSFsE5kJV4MmYlG/M4ljvnny9mjITbjz2U1ROpcxgJ+jwYbKiBKGpD8vY9k7n7sWKxXsRl079qnvdfoj5fiMJ5Sf9TGzcUTSVzavbnCwJyjilizJImA8w9fxX1TksohxTqAIiM4v0lKnqADqXkkSaqOFsRIujuWluwz5c6gRI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.173
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f173.google.com with SMTP id
 d9443c01a7336-297d4a56f97so7206315ad.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:01:01 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219661; x=1764824461;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=NR2u3D4tHuScHTPeTtPQ6i+BxIU6XxN+KjrEC2ybrrk=;
        b=DuG3iBBzBaa97UeLuWNyLqVH6EoHBQk7x0u2LbFk+Zyty2qLo1X1RS7m5j/YXyxgVv
         C1ge5KcvY/EMv3ZMnL9Z267A2h45mxQROfKdgRnbcVnfcjwhfuBk5o0cj4ZVufEClaXu
         pVeNiEk3LW0wQP7BdJB4u26ChWYqwhAbVuOVCjVqJLUMKGiTFPA3VyhrXeDN2g80F6rn
         SD0uoeoQ/B9oHRT8/LceUt5H4j3vpQZfv7Kb5S+6m5mZq5ZXg8mV+KqEkakqGMfjRGe+
         scFneYDbbQP7fO3Qxv4WrVl+CVLIWd6h1cPqM8QiQSyGViIwbWq+/p2NO3sluQAi2rmG
         JQmQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCVmiR4MBHxmJAbZbDz3NRvPStMvgyQ6MANP4PiR2pWqQG5KzCyyX+axjyD1c0OdGVg7KH/0AjbY+9YriuA=@vger.kernel.org
X-Gm-Message-State: AOJu0Yyh2JTqxE/8axqoLKo5SfuOQ+xrNONAjsJVYHe3LpMgFu2YvXPD
	dgL07R0Gw54cLA8qvKd4Qw93WQqkBdZqWiGyXpeyn0WjNkTcMvT8hsba
X-Gm-Gg: ASbGncuhpK6HRaE8obkyEa7Rcy/L3OS1gQaEnKlcaiz5hpTro2SH2+KZBD6vSwYFOmD
	9tN59XOKEgJ/hTrDC4r9CYjN+/dhTSNeQ3ykQEsdyCO9oabZKVu/Ehzrucy7Ro8PLVUrWfIm6Iz
	md/UEN/ZRsN6EPYG7jfEtDTBTRjZ6eWClZBIirPCPWLdAtX/4FRFFNW6k4n0tOKcEt8nmKsT7Gw
	nO7xkT+zLLu5+r67dc3J+s1xO9q4IG9E3ZiGvYhEtgZKdvsRkBwG3kgNLpfs34ueYbL521ZBsA7
	+NngS85jJ+VPTuJVEnwWnlo2ffU1RNW68M2gXTUqtK6WI1eXV5lMK0dDkqc8ZSDBRzTOyE/JWrg
	KfT1PuunHB2VgpnLy8DtLtQZ6mjhNFT7tTl+4YR0DvUKHc6HdssdmMYHqhHbZNF7XcAFQkV63EV
	dZW0m71nGL3ZeRRCe5YN5c/EVe6w==
X-Google-Smtp-Source: 
 AGHT+IFyEmW31Ccy1VotTugsLPE0kOPamcArAijZqsUZz16gONsNDXbTEpP86XNgQSr3cDNVl2YOWw==
X-Received: by 2002:a17:902:ce11:b0:298:49db:a9c5 with SMTP id
 d9443c01a7336-29b6c692349mr204560195ad.43.1764219659791;
        Wed, 26 Nov 2025 21:00:59 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.00.52
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:00:58 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>,
	Hyunchul Lee <hyc.lee@gmail.com>
Subject: [PATCH v2 07/11] ntfsplus: add attrib operatrions
Date: Thu, 27 Nov 2025 13:59:40 +0900
Message-Id: <20251127045944.26009-8-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds the implementation of attrib operatrions for ntfsplus.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/ntfsplus/attrib.c   | 5377 ++++++++++++++++++++++++++++++++++++++++
 fs/ntfsplus/attrlist.c |  285 +++
 fs/ntfsplus/compress.c | 1564 ++++++++++++
 3 files changed, 7226 insertions(+)
 create mode 100644 fs/ntfsplus/attrib.c
 create mode 100644 fs/ntfsplus/attrlist.c
 create mode 100644 fs/ntfsplus/compress.c

diff --git a/fs/ntfsplus/attrib.c b/fs/ntfsplus/attrib.c
new file mode 100644
index 000000000000..86e74e560e35
--- /dev/null
+++ b/fs/ntfsplus/attrib.c
@@ -0,0 +1,5377 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS attribute operations. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ *
+ * Part of this file is based on code from the NTFS-3G project.
+ * and is copyrighted by the respective authors below:
+ * Copyright (c) 2000-2010 Anton Altaparmakov
+ * Copyright (c) 2002-2005 Richard Russon
+ * Copyright (c) 2002-2008 Szabolcs Szakacsits
+ * Copyright (c) 2004-2007 Yura Pakhuchiy
+ * Copyright (c) 2007-2021 Jean-Pierre Andre
+ * Copyright (c) 2010 Erik Larsson
+ */
+
+#include <linux/writeback.h>
+#include <linux/iomap.h>
+
+#include "attrib.h"
+#include "attrlist.h"
+#include "lcnalloc.h"
+#include "misc.h"
+#include "mft.h"
+#include "ntfs.h"
+#include "aops.h"
+#include "ntfs_iomap.h"
+
+__le16 AT_UNNAMED[] =3D { cpu_to_le16('\0') };
+
+/**
+ * ntfs_map_runlist_nolock - map (a part of) a runlist of an ntfs inode
+ * @ni:		ntfs inode for which to map (part of) a runlist
+ * @vcn:	map runlist part containing this vcn
+ * @ctx:	active attribute search context if present or NULL if not
+ *
+ * Map the part of a runlist containing the @vcn of the ntfs inode @ni.
+ *
+ * If @ctx is specified, it is an active search context of @ni and its bas=
e mft
+ * record.  This is needed when ntfs_map_runlist_nolock() encounters unmap=
ped
+ * runlist fragments and allows their mapping.  If you do not have the mft
+ * record mapped, you can specify @ctx as NULL and ntfs_map_runlist_nolock=
()
+ * will perform the necessary mapping and unmapping.
+ *
+ * Note, ntfs_map_runlist_nolock() saves the state of @ctx on entry and
+ * restores it before returning.  Thus, @ctx will be left pointing to the =
same
+ * attribute on return as on entry.  However, the actual pointers in @ctx =
may
+ * point to different memory locations on return, so you must remember to =
reset
+ * any cached pointers from the @ctx, i.e. after the call to
+ * ntfs_map_runlist_nolock(), you will probably want to do:
+ *	m =3D ctx->mrec;
+ *	a =3D ctx->attr;
+ * Assuming you cache ctx->attr in a variable @a of type attr_record * and=
 that
+ * you cache ctx->mrec in a variable @m of type struct mft_record *.
+ */
+int ntfs_map_runlist_nolock(struct ntfs_inode *ni, s64 vcn, struct ntfs_at=
tr_search_ctx *ctx)
+{
+	s64 end_vcn;
+	unsigned long flags;
+	struct ntfs_inode *base_ni;
+	struct mft_record *m;
+	struct attr_record *a;
+	struct runlist_element *rl;
+	struct folio *put_this_folio =3D NULL;
+	int err =3D 0;
+	bool ctx_is_temporary, ctx_needs_reset;
+	struct ntfs_attr_search_ctx old_ctx =3D { NULL, };
+	size_t new_rl_count;
+
+	ntfs_debug("Mapping runlist part containing vcn 0x%llx.",
+			(unsigned long long)vcn);
+	if (!NInoAttr(ni))
+		base_ni =3D ni;
+	else
+		base_ni =3D ni->ext.base_ntfs_ino;
+	if (!ctx) {
+		ctx_is_temporary =3D ctx_needs_reset =3D true;
+		m =3D map_mft_record(base_ni);
+		if (IS_ERR(m))
+			return PTR_ERR(m);
+		ctx =3D ntfs_attr_get_search_ctx(base_ni, m);
+		if (unlikely(!ctx)) {
+			err =3D -ENOMEM;
+			goto err_out;
+		}
+	} else {
+		s64 allocated_size_vcn;
+
+		WARN_ON(IS_ERR(ctx->mrec));
+		a =3D ctx->attr;
+		ctx_is_temporary =3D false;
+		if (!a->non_resident) {
+			err =3D -EIO;
+			goto err_out;
+		}
+		end_vcn =3D le64_to_cpu(a->data.non_resident.highest_vcn);
+		read_lock_irqsave(&ni->size_lock, flags);
+		allocated_size_vcn =3D ni->allocated_size >>
+				ni->vol->cluster_size_bits;
+		read_unlock_irqrestore(&ni->size_lock, flags);
+		if (!a->data.non_resident.lowest_vcn && end_vcn <=3D 0)
+			end_vcn =3D allocated_size_vcn - 1;
+		/*
+		 * If we already have the attribute extent containing @vcn in
+		 * @ctx, no need to look it up again.  We slightly cheat in
+		 * that if vcn exceeds the allocated size, we will refuse to
+		 * map the runlist below, so there is definitely no need to get
+		 * the right attribute extent.
+		 */
+		if (vcn >=3D allocated_size_vcn || (a->type =3D=3D ni->type &&
+				a->name_length =3D=3D ni->name_len &&
+				!memcmp((u8 *)a + le16_to_cpu(a->name_offset),
+				ni->name, ni->name_len) &&
+				le64_to_cpu(a->data.non_resident.lowest_vcn)
+				<=3D vcn && end_vcn >=3D vcn))
+			ctx_needs_reset =3D false;
+		else {
+			/* Save the old search context. */
+			old_ctx =3D *ctx;
+			/*
+			 * If the currently mapped (extent) inode is not the
+			 * base inode we will unmap it when we reinitialize the
+			 * search context which means we need to get a
+			 * reference to the page containing the mapped mft
+			 * record so we do not accidentally drop changes to the
+			 * mft record when it has not been marked dirty yet.
+			 */
+			if (old_ctx.base_ntfs_ino && old_ctx.ntfs_ino !=3D
+					old_ctx.base_ntfs_ino) {
+				put_this_folio =3D old_ctx.ntfs_ino->folio;
+				folio_get(put_this_folio);
+			}
+			/*
+			 * Reinitialize the search context so we can lookup the
+			 * needed attribute extent.
+			 */
+			ntfs_attr_reinit_search_ctx(ctx);
+			ctx_needs_reset =3D true;
+		}
+	}
+	if (ctx_needs_reset) {
+		err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+				CASE_SENSITIVE, vcn, NULL, 0, ctx);
+		if (unlikely(err)) {
+			if (err =3D=3D -ENOENT)
+				err =3D -EIO;
+			goto err_out;
+		}
+		WARN_ON(!ctx->attr->non_resident);
+	}
+	a =3D ctx->attr;
+	/*
+	 * Only decompress the mapping pairs if @vcn is inside it.  Otherwise
+	 * we get into problems when we try to map an out of bounds vcn because
+	 * we then try to map the already mapped runlist fragment and
+	 * ntfs_mapping_pairs_decompress() fails.
+	 */
+	end_vcn =3D le64_to_cpu(a->data.non_resident.highest_vcn) + 1;
+	if (unlikely(vcn && vcn >=3D end_vcn)) {
+		err =3D -ENOENT;
+		goto err_out;
+	}
+	rl =3D ntfs_mapping_pairs_decompress(ni->vol, a, &ni->runlist, &new_rl_co=
unt);
+	if (IS_ERR(rl))
+		err =3D PTR_ERR(rl);
+	else {
+		ni->runlist.rl =3D rl;
+		ni->runlist.count =3D new_rl_count;
+	}
+err_out:
+	if (ctx_is_temporary) {
+		if (likely(ctx))
+			ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(base_ni);
+	} else if (ctx_needs_reset) {
+		/*
+		 * If there is no attribute list, restoring the search context
+		 * is accomplished simply by copying the saved context back over
+		 * the caller supplied context.  If there is an attribute list,
+		 * things are more complicated as we need to deal with mapping
+		 * of mft records and resulting potential changes in pointers.
+		 */
+		if (NInoAttrList(base_ni)) {
+			/*
+			 * If the currently mapped (extent) inode is not the
+			 * one we had before, we need to unmap it and map the
+			 * old one.
+			 */
+			if (ctx->ntfs_ino !=3D old_ctx.ntfs_ino) {
+				/*
+				 * If the currently mapped inode is not the
+				 * base inode, unmap it.
+				 */
+				if (ctx->base_ntfs_ino && ctx->ntfs_ino !=3D
+						ctx->base_ntfs_ino) {
+					unmap_extent_mft_record(ctx->ntfs_ino);
+					ctx->mrec =3D ctx->base_mrec;
+					WARN_ON(!ctx->mrec);
+				}
+				/*
+				 * If the old mapped inode is not the base
+				 * inode, map it.
+				 */
+				if (old_ctx.base_ntfs_ino &&
+				    old_ctx.ntfs_ino !=3D	old_ctx.base_ntfs_ino) {
+retry_map:
+					ctx->mrec =3D map_mft_record(old_ctx.ntfs_ino);
+					/*
+					 * Something bad has happened.  If out
+					 * of memory retry till it succeeds.
+					 * Any other errors are fatal and we
+					 * return the error code in ctx->mrec.
+					 * Let the caller deal with it...  We
+					 * just need to fudge things so the
+					 * caller can reinit and/or put the
+					 * search context safely.
+					 */
+					if (IS_ERR(ctx->mrec)) {
+						if (PTR_ERR(ctx->mrec) =3D=3D -ENOMEM) {
+							schedule();
+							goto retry_map;
+						} else
+							old_ctx.ntfs_ino =3D
+								old_ctx.base_ntfs_ino;
+					}
+				}
+			}
+			/* Update the changed pointers in the saved context. */
+			if (ctx->mrec !=3D old_ctx.mrec) {
+				if (!IS_ERR(ctx->mrec))
+					old_ctx.attr =3D (struct attr_record *)(
+							(u8 *)ctx->mrec +
+							((u8 *)old_ctx.attr -
+							(u8 *)old_ctx.mrec));
+				old_ctx.mrec =3D ctx->mrec;
+			}
+		}
+		/* Restore the search context to the saved one. */
+		*ctx =3D old_ctx;
+		/*
+		 * We drop the reference on the page we took earlier.  In the
+		 * case that IS_ERR(ctx->mrec) is true this means we might lose
+		 * some changes to the mft record that had been made between
+		 * the last time it was marked dirty/written out and now.  This
+		 * at this stage is not a problem as the mapping error is fatal
+		 * enough that the mft record cannot be written out anyway and
+		 * the caller is very likely to shutdown the whole inode
+		 * immediately and mark the volume dirty for chkdsk to pick up
+		 * the pieces anyway.
+		 */
+		if (put_this_folio)
+			folio_put(put_this_folio);
+	}
+	return err;
+}
+
+/**
+ * ntfs_map_runlist - map (a part of) a runlist of an ntfs inode
+ * @ni:		ntfs inode for which to map (part of) a runlist
+ * @vcn:	map runlist part containing this vcn
+ *
+ * Map the part of a runlist containing the @vcn of the ntfs inode @ni.
+ */
+int ntfs_map_runlist(struct ntfs_inode *ni, s64 vcn)
+{
+	int err =3D 0;
+
+	down_write(&ni->runlist.lock);
+	/* Make sure someone else didn't do the work while we were sleeping. */
+	if (likely(ntfs_rl_vcn_to_lcn(ni->runlist.rl, vcn) <=3D
+			LCN_RL_NOT_MAPPED))
+		err =3D ntfs_map_runlist_nolock(ni, vcn, NULL);
+	up_write(&ni->runlist.lock);
+	return err;
+}
+
+struct runlist_element *ntfs_attr_vcn_to_rl(struct ntfs_inode *ni, s64 vcn=
, s64 *lcn)
+{
+	struct runlist_element *rl;
+	int err;
+	bool is_retry =3D false;
+
+	rl =3D ni->runlist.rl;
+	if (!rl) {
+		err =3D ntfs_attr_map_whole_runlist(ni);
+		if (err)
+			return ERR_PTR(-ENOENT);
+		rl =3D ni->runlist.rl;
+	}
+
+remap_rl:
+	/* Seek to element containing target vcn. */
+	while (rl->length && rl[1].vcn <=3D vcn)
+		rl++;
+	*lcn =3D ntfs_rl_vcn_to_lcn(rl, vcn);
+
+	if (*lcn <=3D LCN_RL_NOT_MAPPED && is_retry =3D=3D false) {
+		is_retry =3D true;
+		if (!ntfs_map_runlist_nolock(ni, vcn, NULL)) {
+			rl =3D ni->runlist.rl;
+			goto remap_rl;
+		}
+	}
+
+	return rl;
+}
+
+/**
+ * ntfs_attr_vcn_to_lcn_nolock - convert a vcn into a lcn given an ntfs in=
ode
+ * @ni:			ntfs inode of the attribute whose runlist to search
+ * @vcn:		vcn to convert
+ * @write_locked:	true if the runlist is locked for writing
+ *
+ * Find the virtual cluster number @vcn in the runlist of the ntfs attribu=
te
+ * described by the ntfs inode @ni and return the corresponding logical cl=
uster
+ * number (lcn).
+ *
+ * If the @vcn is not mapped yet, the attempt is made to map the attribute
+ * extent containing the @vcn and the vcn to lcn conversion is retried.
+ *
+ * If @write_locked is true the caller has locked the runlist for writing =
and
+ * if false for reading.
+ *
+ * Since lcns must be >=3D 0, we use negative return codes with special me=
aning:
+ *
+ * Return code	Meaning / Description
+ * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+ *  LCN_HOLE	Hole / not allocated on disk.
+ *  LCN_ENOENT	There is no such vcn in the runlist, i.e. @vcn is out of bo=
unds.
+ *  LCN_ENOMEM	Not enough memory to map runlist.
+ *  LCN_EIO	Critical error (runlist/file is corrupt, i/o error, etc).
+ *
+ * Locking: - The runlist must be locked on entry and is left locked on re=
turn.
+ *	    - If @write_locked is 'false', i.e. the runlist is locked for readi=
ng,
+ *	      the lock may be dropped inside the function so you cannot rely on
+ *	      the runlist still being the same when this function returns.
+ */
+s64 ntfs_attr_vcn_to_lcn_nolock(struct ntfs_inode *ni, const s64 vcn,
+		const bool write_locked)
+{
+	s64 lcn;
+	unsigned long flags;
+	bool is_retry =3D false;
+
+	ntfs_debug("Entering for i_ino 0x%lx, vcn 0x%llx, %s_locked.",
+			ni->mft_no, (unsigned long long)vcn,
+			write_locked ? "write" : "read");
+	if (!ni->runlist.rl) {
+		read_lock_irqsave(&ni->size_lock, flags);
+		if (!ni->allocated_size) {
+			read_unlock_irqrestore(&ni->size_lock, flags);
+			return LCN_ENOENT;
+		}
+		read_unlock_irqrestore(&ni->size_lock, flags);
+	}
+retry_remap:
+	/* Convert vcn to lcn.  If that fails map the runlist and retry once. */
+	lcn =3D ntfs_rl_vcn_to_lcn(ni->runlist.rl, vcn);
+	if (likely(lcn >=3D LCN_HOLE)) {
+		ntfs_debug("Done, lcn 0x%llx.", (long long)lcn);
+		return lcn;
+	}
+	if (lcn !=3D LCN_RL_NOT_MAPPED) {
+		if (lcn !=3D LCN_ENOENT)
+			lcn =3D LCN_EIO;
+	} else if (!is_retry) {
+		int err;
+
+		if (!write_locked) {
+			up_read(&ni->runlist.lock);
+			down_write(&ni->runlist.lock);
+			if (unlikely(ntfs_rl_vcn_to_lcn(ni->runlist.rl, vcn) !=3D
+					LCN_RL_NOT_MAPPED)) {
+				up_write(&ni->runlist.lock);
+				down_read(&ni->runlist.lock);
+				goto retry_remap;
+			}
+		}
+		err =3D ntfs_map_runlist_nolock(ni, vcn, NULL);
+		if (!write_locked) {
+			up_write(&ni->runlist.lock);
+			down_read(&ni->runlist.lock);
+		}
+		if (likely(!err)) {
+			is_retry =3D true;
+			goto retry_remap;
+		}
+		if (err =3D=3D -ENOENT)
+			lcn =3D LCN_ENOENT;
+		else if (err =3D=3D -ENOMEM)
+			lcn =3D LCN_ENOMEM;
+		else
+			lcn =3D LCN_EIO;
+	}
+	if (lcn !=3D LCN_ENOENT)
+		ntfs_error(ni->vol->sb, "Failed with error code %lli.",
+				(long long)lcn);
+	return lcn;
+}
+
+struct runlist_element *__ntfs_attr_find_vcn_nolock(struct runlist *runlis=
t, const s64 vcn)
+{
+	size_t lower_idx, upper_idx, idx;
+	struct runlist_element *run;
+
+	if (runlist->count <=3D 1)
+		return ERR_PTR(-ENOENT);
+
+	run =3D &runlist->rl[0];
+	if (vcn < run->vcn)
+		return ERR_PTR(-ENOENT);
+	else if (vcn < run->vcn + run->length)
+		return run;
+
+	run =3D &runlist->rl[runlist->count - 2];
+	if (vcn >=3D run->vcn && vcn < run->vcn + run->length)
+		return run;
+	if (vcn >=3D run->vcn + run->length)
+		return ERR_PTR(-ENOENT);
+
+	lower_idx =3D 1;
+	upper_idx =3D runlist->count - 2;
+
+	while (lower_idx <=3D upper_idx) {
+		idx =3D (lower_idx + upper_idx) >> 1;
+		run =3D &runlist->rl[idx];
+
+		if (vcn < run->vcn)
+			upper_idx =3D idx - 1;
+		else if (vcn >=3D run->vcn + run->length)
+			lower_idx =3D idx + 1;
+		else
+			return run;
+	}
+
+	return ERR_PTR(-ENOENT);
+}
+
+/**
+ * ntfs_attr_find_vcn_nolock - find a vcn in the runlist of an ntfs inode
+ * @ni:		ntfs inode describing the runlist to search
+ * @vcn:	vcn to find
+ * @ctx:	active attribute search context if present or NULL if not
+ *
+ * Find the virtual cluster number @vcn in the runlist described by the nt=
fs
+ * inode @ni and return the address of the runlist element containing the =
@vcn.
+ *
+ * If the @vcn is not mapped yet, the attempt is made to map the attribute
+ * extent containing the @vcn and the vcn to lcn conversion is retried.
+ *
+ * If @ctx is specified, it is an active search context of @ni and its bas=
e mft
+ * record.  This is needed when ntfs_attr_find_vcn_nolock() encounters unm=
apped
+ * runlist fragments and allows their mapping.  If you do not have the mft
+ * record mapped, you can specify @ctx as NULL and ntfs_attr_find_vcn_nolo=
ck()
+ * will perform the necessary mapping and unmapping.
+ *
+ * Note, ntfs_attr_find_vcn_nolock() saves the state of @ctx on entry and
+ * restores it before returning.  Thus, @ctx will be left pointing to the =
same
+ * attribute on return as on entry.  However, the actual pointers in @ctx =
may
+ * point to different memory locations on return, so you must remember to =
reset
+ * any cached pointers from the @ctx, i.e. after the call to
+ * ntfs_attr_find_vcn_nolock(), you will probably want to do:
+ *	m =3D ctx->mrec;
+ *	a =3D ctx->attr;
+ * Assuming you cache ctx->attr in a variable @a of type attr_record * and=
 that
+ * you cache ctx->mrec in a variable @m of type struct mft_record *.
+ * Note you need to distinguish between the lcn of the returned runlist el=
ement
+ * being >=3D 0 and LCN_HOLE.  In the later case you have to return zeroes=
 on
+ * read and allocate clusters on write.
+ */
+struct runlist_element *ntfs_attr_find_vcn_nolock(struct ntfs_inode *ni, c=
onst s64 vcn,
+		struct ntfs_attr_search_ctx *ctx)
+{
+	unsigned long flags;
+	struct runlist_element *rl;
+	int err =3D 0;
+	bool is_retry =3D false;
+
+	ntfs_debug("Entering for i_ino 0x%lx, vcn 0x%llx, with%s ctx.",
+			ni->mft_no, (unsigned long long)vcn, ctx ? "" : "out");
+	if (!ni->runlist.rl) {
+		read_lock_irqsave(&ni->size_lock, flags);
+		if (!ni->allocated_size) {
+			read_unlock_irqrestore(&ni->size_lock, flags);
+			return ERR_PTR(-ENOENT);
+		}
+		read_unlock_irqrestore(&ni->size_lock, flags);
+	}
+
+retry_remap:
+	rl =3D ni->runlist.rl;
+	if (likely(rl && vcn >=3D rl[0].vcn)) {
+		rl =3D __ntfs_attr_find_vcn_nolock(&ni->runlist, vcn);
+		if (IS_ERR(rl))
+			err =3D PTR_ERR(rl);
+		else if (rl->lcn >=3D LCN_HOLE)
+			return rl;
+		else if (rl->lcn <=3D LCN_ENOENT)
+			err =3D -EIO;
+	}
+	if (!err && !is_retry) {
+		/*
+		 * If the search context is invalid we cannot map the unmapped
+		 * region.
+		 */
+		if (ctx && IS_ERR(ctx->mrec))
+			err =3D PTR_ERR(ctx->mrec);
+		else {
+			/*
+			 * The @vcn is in an unmapped region, map the runlist
+			 * and retry.
+			 */
+			err =3D ntfs_map_runlist_nolock(ni, vcn, ctx);
+			if (likely(!err)) {
+				is_retry =3D true;
+				goto retry_remap;
+			}
+		}
+		if (err =3D=3D -EINVAL)
+			err =3D -EIO;
+	} else if (!err)
+		err =3D -EIO;
+	if (err !=3D -ENOENT)
+		ntfs_error(ni->vol->sb, "Failed with error code %i.", err);
+	return ERR_PTR(err);
+}
+
+/**
+ * ntfs_attr_find - find (next) attribute in mft record
+ * @type:	attribute type to find
+ * @name:	attribute name to find (optional, i.e. NULL means don't care)
+ * @name_len:	attribute name length (only needed if @name present)
+ * @ic:		IGNORE_CASE or CASE_SENSITIVE (ignored if @name not present)
+ * @val:	attribute value to find (optional, resident attributes only)
+ * @val_len:	attribute value length
+ * @ctx:	search context with mft record and attribute to search from
+ *
+ * You should not need to call this function directly.  Use ntfs_attr_look=
up()
+ * instead.
+ *
+ * ntfs_attr_find() takes a search context @ctx as parameter and searches =
the
+ * mft record specified by @ctx->mrec, beginning at @ctx->attr, for an
+ * attribute of @type, optionally @name and @val.
+ *
+ * If the attribute is found, ntfs_attr_find() returns 0 and @ctx->attr wi=
ll
+ * point to the found attribute.
+ *
+ * If the attribute is not found, ntfs_attr_find() returns -ENOENT and
+ * @ctx->attr will point to the attribute before which the attribute being
+ * searched for would need to be inserted if such an action were to be des=
ired.
+ *
+ * On actual error, ntfs_attr_find() returns -EIO.  In this case @ctx->att=
r is
+ * undefined and in particular do not rely on it not changing.
+ *
+ * If @ctx->is_first is 'true', the search begins with @ctx->attr itself. =
 If it
+ * is 'false', the search begins after @ctx->attr.
+ *
+ * If @ic is IGNORE_CASE, the @name comparisson is not case sensitive and
+ * @ctx->ntfs_ino must be set to the ntfs inode to which the mft record
+ * @ctx->mrec belongs.  This is so we can get at the ntfs volume and hence=
 at
+ * the upcase table.  If @ic is CASE_SENSITIVE, the comparison is case
+ * sensitive.  When @name is present, @name_len is the @name length in Uni=
code
+ * characters.
+ *
+ * If @name is not present (NULL), we assume that the unnamed attribute is
+ * being searched for.
+ *
+ * Finally, the resident attribute value @val is looked for, if present.  =
If
+ * @val is not present (NULL), @val_len is ignored.
+ *
+ * ntfs_attr_find() only searches the specified mft record and it ignores =
the
+ * presence of an attribute list attribute (unless it is the one being sea=
rched
+ * for, obviously).  If you need to take attribute lists into consideratio=
n,
+ * use ntfs_attr_lookup() instead (see below).  This also means that you c=
annot
+ * use ntfs_attr_find() to search for extent records of non-resident
+ * attributes, as extents with lowest_vcn !=3D 0 are usually described by =
the
+ * attribute list attribute only. - Note that it is possible that the first
+ * extent is only in the attribute list while the last extent is in the ba=
se
+ * mft record, so do not rely on being able to find the first extent in the
+ * base mft record.
+ *
+ * Warning: Never use @val when looking for attribute types which can be
+ *	    non-resident as this most likely will result in a crash!
+ */
+static int ntfs_attr_find(const __le32 type, const __le16 *name,
+		const u32 name_len, const u32 ic,
+		const u8 *val, const u32 val_len, struct ntfs_attr_search_ctx *ctx)
+{
+	struct attr_record *a;
+	struct ntfs_volume *vol =3D ctx->ntfs_ino->vol;
+	__le16 *upcase =3D vol->upcase;
+	u32 upcase_len =3D vol->upcase_len;
+	unsigned int space;
+
+	/*
+	 * Iterate over attributes in mft record starting at @ctx->attr, or the
+	 * attribute following that, if @ctx->is_first is 'true'.
+	 */
+	if (ctx->is_first) {
+		a =3D ctx->attr;
+		ctx->is_first =3D false;
+	} else
+		a =3D (struct attr_record *)((u8 *)ctx->attr +
+				le32_to_cpu(ctx->attr->length));
+	for (;;	a =3D (struct attr_record *)((u8 *)a + le32_to_cpu(a->length))) {
+		if ((u8 *)a < (u8 *)ctx->mrec || (u8 *)a > (u8 *)ctx->mrec +
+				le32_to_cpu(ctx->mrec->bytes_allocated))
+			break;
+
+		space =3D le32_to_cpu(ctx->mrec->bytes_in_use) - ((u8 *)a - (u8 *)ctx->m=
rec);
+		if ((space < offsetof(struct attr_record, data.resident.reserved) + 1 ||
+		      space < le32_to_cpu(a->length)) && (space < 4 || a->type !=3D AT_E=
ND))
+			break;
+
+		ctx->attr =3D a;
+		if (((type !=3D AT_UNUSED) && (le32_to_cpu(a->type) > le32_to_cpu(type))=
) ||
+				a->type =3D=3D AT_END)
+			return -ENOENT;
+		if (unlikely(!a->length))
+			break;
+		if (type =3D=3D AT_UNUSED)
+			return 0;
+		if (a->type !=3D type)
+			continue;
+		/*
+		 * If @name is present, compare the two names.  If @name is
+		 * missing, assume we want an unnamed attribute.
+		 */
+		if (!name || name =3D=3D AT_UNNAMED) {
+			/* The search failed if the found attribute is named. */
+			if (a->name_length)
+				return -ENOENT;
+		} else {
+			if (a->name_length && ((le16_to_cpu(a->name_offset) +
+					       a->name_length * sizeof(__le16)) >
+						le32_to_cpu(a->length))) {
+				ntfs_error(vol->sb, "Corrupt attribute name in MFT record %lld\n",
+					   (long long)ctx->ntfs_ino->mft_no);
+				break;
+			}
+
+			if (!ntfs_are_names_equal(name, name_len,
+					(__le16 *)((u8 *)a + le16_to_cpu(a->name_offset)),
+					a->name_length, ic, upcase, upcase_len)) {
+				register int rc;
+
+				rc =3D ntfs_collate_names(name, name_len,
+						(__le16 *)((u8 *)a + le16_to_cpu(a->name_offset)),
+						a->name_length, 1, IGNORE_CASE,
+						upcase, upcase_len);
+				/*
+				 * If @name collates before a->name, there is no
+				 * matching attribute.
+				 */
+				if (rc =3D=3D -1)
+					return -ENOENT;
+				/* If the strings are not equal, continue search. */
+				if (rc)
+					continue;
+				rc =3D ntfs_collate_names(name, name_len,
+						(__le16 *)((u8 *)a + le16_to_cpu(a->name_offset)),
+						a->name_length, 1, CASE_SENSITIVE,
+						upcase, upcase_len);
+				if (rc =3D=3D -1)
+					return -ENOENT;
+				if (rc)
+					continue;
+			}
+		}
+		/*
+		 * The names match or @name not present and attribute is
+		 * unnamed.  If no @val specified, we have found the attribute
+		 * and are done.
+		 */
+		if (!val)
+			return 0;
+		/* @val is present; compare values. */
+		else {
+			register int rc;
+
+			rc =3D memcmp(val, (u8 *)a + le16_to_cpu(
+					a->data.resident.value_offset),
+					min_t(u32, val_len, le32_to_cpu(
+					a->data.resident.value_length)));
+			/*
+			 * If @val collates before the current attribute's
+			 * value, there is no matching attribute.
+			 */
+			if (!rc) {
+				register u32 avl;
+
+				avl =3D le32_to_cpu(a->data.resident.value_length);
+				if (val_len =3D=3D avl)
+					return 0;
+				if (val_len < avl)
+					return -ENOENT;
+			} else if (rc < 0)
+				return -ENOENT;
+		}
+	}
+	ntfs_error(vol->sb, "Inode is corrupt.  Run chkdsk.");
+	NVolSetErrors(vol);
+	return -EIO;
+}
+
+void ntfs_attr_name_free(unsigned char **name)
+{
+	if (*name) {
+		ntfs_free(*name);
+		*name =3D NULL;
+	}
+}
+
+char *ntfs_attr_name_get(const struct ntfs_volume *vol, const __le16 *unam=
e,
+		const int uname_len)
+{
+	unsigned char *name =3D NULL;
+	int name_len;
+
+	name_len =3D ntfs_ucstonls(vol, uname, uname_len, &name, 0);
+	if (name_len < 0) {
+		ntfs_error(vol->sb, "ntfs_ucstonls error");
+		/* This function when returns -1, memory for name might
+		 * be allocated. So lets free this memory.
+		 */
+		ntfs_attr_name_free(&name);
+		return NULL;
+
+	} else if (name_len > 0)
+		return name;
+
+	ntfs_attr_name_free(&name);
+	return NULL;
+}
+
+int load_attribute_list(struct ntfs_inode *base_ni, u8 *al_start, const s6=
4 size)
+{
+	struct inode *attr_vi =3D NULL;
+	u8 *al;
+	struct attr_list_entry *ale;
+
+	if (!al_start || size <=3D 0)
+		return -EINVAL;
+
+	attr_vi =3D ntfs_attr_iget(VFS_I(base_ni), AT_ATTRIBUTE_LIST, AT_UNNAMED,=
 0);
+	if (IS_ERR(attr_vi)) {
+		ntfs_error(base_ni->vol->sb,
+			   "Failed to open an inode for Attribute list, mft =3D %ld",
+			   base_ni->mft_no);
+		return PTR_ERR(attr_vi);
+	}
+
+	if (ntfs_inode_attr_pread(attr_vi, 0, size, al_start) !=3D size) {
+		iput(attr_vi);
+		ntfs_error(base_ni->vol->sb,
+			   "Failed to read attribute list, mft =3D %ld",
+			   base_ni->mft_no);
+		return -EIO;
+	}
+	iput(attr_vi);
+
+	for (al =3D al_start; al < al_start + size; al +=3D le16_to_cpu(ale->leng=
th)) {
+		ale =3D (struct attr_list_entry *)al;
+		if (ale->name_offset !=3D sizeof(struct attr_list_entry))
+			break;
+		if (le16_to_cpu(ale->length) <=3D ale->name_offset + ale->name_length ||
+		    al + le16_to_cpu(ale->length) > al_start + size)
+			break;
+		if (ale->type =3D=3D AT_UNUSED)
+			break;
+		if (MSEQNO_LE(ale->mft_reference) =3D=3D 0)
+			break;
+	}
+	if (al !=3D al_start + size) {
+		ntfs_error(base_ni->vol->sb, "Corrupt attribute list, mft =3D %ld",
+			   base_ni->mft_no);
+		return -EIO;
+	}
+	return 0;
+}
+
+/**
+ * ntfs_external_attr_find - find an attribute in the attribute list of an=
 inode
+ * @type:	attribute type to find
+ * @name:	attribute name to find (optional, i.e. NULL means don't care)
+ * @name_len:	attribute name length (only needed if @name present)
+ * @ic:		IGNORE_CASE or CASE_SENSITIVE (ignored if @name not present)
+ * @lowest_vcn:	lowest vcn to find (optional, non-resident attributes only)
+ * @val:	attribute value to find (optional, resident attributes only)
+ * @val_len:	attribute value length
+ * @ctx:	search context with mft record and attribute to search from
+ *
+ * You should not need to call this function directly.  Use ntfs_attr_look=
up()
+ * instead.
+ *
+ * Find an attribute by searching the attribute list for the corresponding
+ * attribute list entry.  Having found the entry, map the mft record if the
+ * attribute is in a different mft record/inode, ntfs_attr_find() the attr=
ibute
+ * in there and return it.
+ *
+ * On first search @ctx->ntfs_ino must be the base mft record and @ctx must
+ * have been obtained from a call to ntfs_attr_get_search_ctx().  On subse=
quent
+ * calls @ctx->ntfs_ino can be any extent inode, too (@ctx->base_ntfs_ino =
is
+ * then the base inode).
+ *
+ * After finishing with the attribute/mft record you need to call
+ * ntfs_attr_put_search_ctx() to cleanup the search context (unmapping any
+ * mapped inodes, etc).
+ *
+ * If the attribute is found, ntfs_external_attr_find() returns 0 and
+ * @ctx->attr will point to the found attribute.  @ctx->mrec will point to=
 the
+ * mft record in which @ctx->attr is located and @ctx->al_entry will point=
 to
+ * the attribute list entry for the attribute.
+ *
+ * If the attribute is not found, ntfs_external_attr_find() returns -ENOEN=
T and
+ * @ctx->attr will point to the attribute in the base mft record before wh=
ich
+ * the attribute being searched for would need to be inserted if such an a=
ction
+ * were to be desired.  @ctx->mrec will point to the mft record in which
+ * @ctx->attr is located and @ctx->al_entry will point to the attribute li=
st
+ * entry of the attribute before which the attribute being searched for wo=
uld
+ * need to be inserted if such an action were to be desired.
+ *
+ * Thus to insert the not found attribute, one wants to add the attribute =
to
+ * @ctx->mrec (the base mft record) and if there is not enough space, the
+ * attribute should be placed in a newly allocated extent mft record.  The
+ * attribute list entry for the inserted attribute should be inserted in t=
he
+ * attribute list attribute at @ctx->al_entry.
+ *
+ * On actual error, ntfs_external_attr_find() returns -EIO.  In this case
+ * @ctx->attr is undefined and in particular do not rely on it not changin=
g.
+ */
+static int ntfs_external_attr_find(const __le32 type,
+		const __le16 *name, const u32 name_len,
+		const u32 ic, const s64 lowest_vcn,
+		const u8 *val, const u32 val_len, struct ntfs_attr_search_ctx *ctx)
+{
+	struct ntfs_inode *base_ni, *ni;
+	struct ntfs_volume *vol;
+	struct attr_list_entry *al_entry, *next_al_entry;
+	u8 *al_start, *al_end;
+	struct attr_record *a;
+	__le16 *al_name;
+	u32 al_name_len;
+	bool is_first_search =3D false;
+	int err =3D 0;
+	static const char *es =3D " Unmount and run chkdsk.";
+
+	ni =3D ctx->ntfs_ino;
+	base_ni =3D ctx->base_ntfs_ino;
+	ntfs_debug("Entering for inode 0x%lx, type 0x%x.", ni->mft_no, type);
+	if (!base_ni) {
+		/* First call happens with the base mft record. */
+		base_ni =3D ctx->base_ntfs_ino =3D ctx->ntfs_ino;
+		ctx->base_mrec =3D ctx->mrec;
+		ctx->mapped_base_mrec =3D ctx->mapped_mrec;
+	}
+	if (ni =3D=3D base_ni)
+		ctx->base_attr =3D ctx->attr;
+	if (type =3D=3D AT_END)
+		goto not_found;
+	vol =3D base_ni->vol;
+	al_start =3D base_ni->attr_list;
+	al_end =3D al_start + base_ni->attr_list_size;
+	if (!ctx->al_entry) {
+		ctx->al_entry =3D (struct attr_list_entry *)al_start;
+		is_first_search =3D true;
+	}
+	/*
+	 * Iterate over entries in attribute list starting at @ctx->al_entry,
+	 * or the entry following that, if @ctx->is_first is 'true'.
+	 */
+	if (ctx->is_first) {
+		al_entry =3D ctx->al_entry;
+		ctx->is_first =3D false;
+		/*
+		 * If an enumeration and the first attribute is higher than
+		 * the attribute list itself, need to return the attribute list
+		 * attribute.
+		 */
+		if ((type =3D=3D AT_UNUSED) && is_first_search &&
+				le32_to_cpu(al_entry->type) >
+				le32_to_cpu(AT_ATTRIBUTE_LIST))
+			goto find_attr_list_attr;
+	} else {
+		/* Check for small entry */
+		if (((al_end - (u8 *)ctx->al_entry) <
+		      (long)offsetof(struct attr_list_entry, name)) ||
+		    (le16_to_cpu(ctx->al_entry->length) & 7) ||
+		    (le16_to_cpu(ctx->al_entry->length) < offsetof(struct attr_list_entr=
y, name)))
+			goto corrupt;
+
+		al_entry =3D (struct attr_list_entry *)((u8 *)ctx->al_entry +
+				le16_to_cpu(ctx->al_entry->length));
+
+		if ((u8 *)al_entry =3D=3D al_end)
+			goto not_found;
+
+		/* Preliminary check for small entry */
+		if ((al_end - (u8 *)al_entry) <
+		    (long)offsetof(struct attr_list_entry, name))
+			goto corrupt;
+
+		/*
+		 * If this is an enumeration and the attribute list attribute
+		 * is the next one in the enumeration sequence, just return the
+		 * attribute list attribute from the base mft record as it is
+		 * not listed in the attribute list itself.
+		 */
+		if ((type =3D=3D AT_UNUSED) && le32_to_cpu(ctx->al_entry->type) <
+				le32_to_cpu(AT_ATTRIBUTE_LIST) &&
+				le32_to_cpu(al_entry->type) >
+				le32_to_cpu(AT_ATTRIBUTE_LIST)) {
+find_attr_list_attr:
+
+			/* Check for bogus calls. */
+			if (name || name_len || val || val_len || lowest_vcn)
+				return -EINVAL;
+
+			/* We want the base record. */
+			if (ctx->ntfs_ino !=3D base_ni)
+				unmap_mft_record(ctx->ntfs_ino);
+			ctx->ntfs_ino =3D base_ni;
+			ctx->mapped_mrec =3D ctx->mapped_base_mrec;
+			ctx->mrec =3D ctx->base_mrec;
+			ctx->is_first =3D true;
+
+			/* Sanity checks are performed elsewhere. */
+			ctx->attr =3D (struct attr_record *)((u8 *)ctx->mrec +
+					le16_to_cpu(ctx->mrec->attrs_offset));
+
+			/* Find the attribute list attribute. */
+			err =3D ntfs_attr_find(AT_ATTRIBUTE_LIST, NULL, 0,
+					IGNORE_CASE, NULL, 0, ctx);
+
+			/*
+			 * Setup the search context so the correct
+			 * attribute is returned next time round.
+			 */
+			ctx->al_entry =3D al_entry;
+			ctx->is_first =3D true;
+
+			/* Got it. Done. */
+			if (!err)
+				return 0;
+
+			/* Error! If other than not found return it. */
+			if (err !=3D -ENOENT)
+				return err;
+
+			/* Not found?!? Absurd! */
+			ntfs_error(ctx->ntfs_ino->vol->sb, "Attribute list wasn't found");
+			return -EIO;
+		}
+	}
+	for (;; al_entry =3D next_al_entry) {
+		/* Out of bounds check. */
+		if ((u8 *)al_entry < base_ni->attr_list ||
+				(u8 *)al_entry > al_end)
+			break;	/* Inode is corrupt. */
+		ctx->al_entry =3D al_entry;
+		/* Catch the end of the attribute list. */
+		if ((u8 *)al_entry =3D=3D al_end)
+			goto not_found;
+
+		if ((((u8 *)al_entry + offsetof(struct attr_list_entry, name)) > al_end)=
 ||
+		    ((u8 *)al_entry + le16_to_cpu(al_entry->length) > al_end) ||
+		    (le16_to_cpu(al_entry->length) & 7) ||
+		    (le16_to_cpu(al_entry->length) <
+		     offsetof(struct attr_list_entry, name_length)) ||
+		    (al_entry->name_length && ((u8 *)al_entry + al_entry->name_offset +
+					       al_entry->name_length * sizeof(__le16)) > al_end))
+			break; /* corrupt */
+
+		next_al_entry =3D (struct attr_list_entry *)((u8 *)al_entry +
+				le16_to_cpu(al_entry->length));
+		if (type !=3D AT_UNUSED) {
+			if (le32_to_cpu(al_entry->type) > le32_to_cpu(type))
+				goto not_found;
+			if (type !=3D al_entry->type)
+				continue;
+		}
+		/*
+		 * If @name is present, compare the two names.  If @name is
+		 * missing, assume we want an unnamed attribute.
+		 */
+		al_name_len =3D al_entry->name_length;
+		al_name =3D (__le16 *)((u8 *)al_entry + al_entry->name_offset);
+
+		/*
+		 * If !@type we want the attribute represented by this
+		 * attribute list entry.
+		 */
+		if (type =3D=3D AT_UNUSED)
+			goto is_enumeration;
+
+		if (!name || name =3D=3D AT_UNNAMED) {
+			if (al_name_len)
+				goto not_found;
+		} else if (!ntfs_are_names_equal(al_name, al_name_len, name,
+				name_len, ic, vol->upcase, vol->upcase_len)) {
+			register int rc;
+
+			rc =3D ntfs_collate_names(name, name_len, al_name,
+					al_name_len, 1, IGNORE_CASE,
+					vol->upcase, vol->upcase_len);
+			/*
+			 * If @name collates before al_name, there is no
+			 * matching attribute.
+			 */
+			if (rc =3D=3D -1)
+				goto not_found;
+			/* If the strings are not equal, continue search. */
+			if (rc)
+				continue;
+
+			rc =3D ntfs_collate_names(name, name_len, al_name,
+					al_name_len, 1, CASE_SENSITIVE,
+					vol->upcase, vol->upcase_len);
+			if (rc =3D=3D -1)
+				goto not_found;
+			if (rc)
+				continue;
+		}
+		/*
+		 * The names match or @name not present and attribute is
+		 * unnamed.  Now check @lowest_vcn.  Continue search if the
+		 * next attribute list entry still fits @lowest_vcn.  Otherwise
+		 * we have reached the right one or the search has failed.
+		 */
+		if (lowest_vcn && (u8 *)next_al_entry >=3D al_start &&
+				(u8 *)next_al_entry + 6 < al_end &&
+				(u8 *)next_al_entry + le16_to_cpu(
+					next_al_entry->length) <=3D al_end &&
+				le64_to_cpu(next_al_entry->lowest_vcn) <=3D
+					lowest_vcn &&
+				next_al_entry->type =3D=3D al_entry->type &&
+				next_al_entry->name_length =3D=3D al_name_len &&
+				ntfs_are_names_equal((__le16 *)((u8 *)
+					next_al_entry +
+					next_al_entry->name_offset),
+					next_al_entry->name_length,
+					al_name, al_name_len, CASE_SENSITIVE,
+					vol->upcase, vol->upcase_len))
+			continue;
+
+is_enumeration:
+		if (MREF_LE(al_entry->mft_reference) =3D=3D ni->mft_no) {
+			if (MSEQNO_LE(al_entry->mft_reference) !=3D ni->seq_no) {
+				ntfs_error(vol->sb,
+					"Found stale mft reference in attribute list of base inode 0x%lx.%s",
+					base_ni->mft_no, es);
+				err =3D -EIO;
+				break;
+			}
+		} else { /* Mft references do not match. */
+			/* If there is a mapped record unmap it first. */
+			if (ni !=3D base_ni)
+				unmap_extent_mft_record(ni);
+			/* Do we want the base record back? */
+			if (MREF_LE(al_entry->mft_reference) =3D=3D
+					base_ni->mft_no) {
+				ni =3D ctx->ntfs_ino =3D base_ni;
+				ctx->mrec =3D ctx->base_mrec;
+				ctx->mapped_mrec =3D ctx->mapped_base_mrec;
+			} else {
+				/* We want an extent record. */
+				ctx->mrec =3D map_extent_mft_record(base_ni,
+						le64_to_cpu(
+						al_entry->mft_reference), &ni);
+				if (IS_ERR(ctx->mrec)) {
+					ntfs_error(vol->sb,
+							"Failed to map extent mft record 0x%lx of base inode 0x%lx.%s",
+							MREF_LE(al_entry->mft_reference),
+							base_ni->mft_no, es);
+					err =3D PTR_ERR(ctx->mrec);
+					if (err =3D=3D -ENOENT)
+						err =3D -EIO;
+					/* Cause @ctx to be sanitized below. */
+					ni =3D NULL;
+					break;
+				}
+				ctx->ntfs_ino =3D ni;
+				ctx->mapped_mrec =3D true;
+
+			}
+		}
+		a =3D ctx->attr =3D (struct attr_record *)((u8 *)ctx->mrec +
+					le16_to_cpu(ctx->mrec->attrs_offset));
+		/*
+		 * ctx->vfs_ino, ctx->mrec, and ctx->attr now point to the
+		 * mft record containing the attribute represented by the
+		 * current al_entry.
+		 */
+		/*
+		 * We could call into ntfs_attr_find() to find the right
+		 * attribute in this mft record but this would be less
+		 * efficient and not quite accurate as ntfs_attr_find() ignores
+		 * the attribute instance numbers for example which become
+		 * important when one plays with attribute lists.  Also,
+		 * because a proper match has been found in the attribute list
+		 * entry above, the comparison can now be optimized.  So it is
+		 * worth re-implementing a simplified ntfs_attr_find() here.
+		 */
+		/*
+		 * Use a manual loop so we can still use break and continue
+		 * with the same meanings as above.
+		 */
+do_next_attr_loop:
+		if ((u8 *)a < (u8 *)ctx->mrec || (u8 *)a > (u8 *)ctx->mrec +
+				le32_to_cpu(ctx->mrec->bytes_allocated))
+			break;
+		if (a->type =3D=3D AT_END)
+			continue;
+		if (!a->length)
+			break;
+		if (al_entry->instance !=3D a->instance)
+			goto do_next_attr;
+		/*
+		 * If the type and/or the name are mismatched between the
+		 * attribute list entry and the attribute record, there is
+		 * corruption so we break and return error EIO.
+		 */
+		if (al_entry->type !=3D a->type)
+			break;
+		if (!ntfs_are_names_equal((__le16 *)((u8 *)a +
+				le16_to_cpu(a->name_offset)), a->name_length,
+				al_name, al_name_len, CASE_SENSITIVE,
+				vol->upcase, vol->upcase_len))
+			break;
+		ctx->attr =3D a;
+		/*
+		 * If no @val specified or @val specified and it matches, we
+		 * have found it!
+		 */
+		if ((type =3D=3D AT_UNUSED) || !val || (!a->non_resident && le32_to_cpu(
+				a->data.resident.value_length) =3D=3D val_len &&
+				!memcmp((u8 *)a +
+				le16_to_cpu(a->data.resident.value_offset),
+				val, val_len))) {
+			ntfs_debug("Done, found.");
+			return 0;
+		}
+do_next_attr:
+		/* Proceed to the next attribute in the current mft record. */
+		a =3D (struct attr_record *)((u8 *)a + le32_to_cpu(a->length));
+		goto do_next_attr_loop;
+	}
+
+corrupt:
+	if (ni !=3D base_ni) {
+		if (ni)
+			unmap_extent_mft_record(ni);
+		ctx->ntfs_ino =3D base_ni;
+		ctx->mrec =3D ctx->base_mrec;
+		ctx->attr =3D ctx->base_attr;
+		ctx->mapped_mrec =3D ctx->mapped_base_mrec;
+	}
+
+	if (!err) {
+		ntfs_error(vol->sb,
+			"Base inode 0x%lx contains corrupt attribute list attribute.%s",
+			base_ni->mft_no, es);
+		err =3D -EIO;
+	}
+
+	if (err !=3D -ENOMEM)
+		NVolSetErrors(vol);
+	return err;
+not_found:
+	/*
+	 * If we were looking for AT_END, we reset the search context @ctx and
+	 * use ntfs_attr_find() to seek to the end of the base mft record.
+	 */
+	if (type =3D=3D AT_UNUSED || type =3D=3D AT_END) {
+		ntfs_attr_reinit_search_ctx(ctx);
+		return ntfs_attr_find(AT_END, name, name_len, ic, val, val_len,
+				ctx);
+	}
+	/*
+	 * The attribute was not found.  Before we return, we want to ensure
+	 * @ctx->mrec and @ctx->attr indicate the position at which the
+	 * attribute should be inserted in the base mft record.  Since we also
+	 * want to preserve @ctx->al_entry we cannot reinitialize the search
+	 * context using ntfs_attr_reinit_search_ctx() as this would set
+	 * @ctx->al_entry to NULL.  Thus we do the necessary bits manually (see
+	 * ntfs_attr_init_search_ctx() below).  Note, we _only_ preserve
+	 * @ctx->al_entry as the remaining fields (base_*) are identical to
+	 * their non base_ counterparts and we cannot set @ctx->base_attr
+	 * correctly yet as we do not know what @ctx->attr will be set to by
+	 * the call to ntfs_attr_find() below.
+	 */
+	if (ni !=3D base_ni)
+		unmap_extent_mft_record(ni);
+	ctx->mrec =3D ctx->base_mrec;
+	ctx->attr =3D (struct attr_record *)((u8 *)ctx->mrec +
+			le16_to_cpu(ctx->mrec->attrs_offset));
+	ctx->is_first =3D true;
+	ctx->ntfs_ino =3D base_ni;
+	ctx->base_ntfs_ino =3D NULL;
+	ctx->base_mrec =3D NULL;
+	ctx->base_attr =3D NULL;
+	ctx->mapped_mrec =3D ctx->mapped_base_mrec;
+	/*
+	 * In case there are multiple matches in the base mft record, need to
+	 * keep enumerating until we get an attribute not found response (or
+	 * another error), otherwise we would keep returning the same attribute
+	 * over and over again and all programs using us for enumeration would
+	 * lock up in a tight loop.
+	 */
+	do {
+		err =3D ntfs_attr_find(type, name, name_len, ic, val, val_len,
+				ctx);
+	} while (!err);
+	ntfs_debug("Done, not found.");
+	return err;
+}
+
+/**
+ * ntfs_attr_lookup - find an attribute in an ntfs inode
+ * @type:	attribute type to find
+ * @name:	attribute name to find (optional, i.e. NULL means don't care)
+ * @name_len:	attribute name length (only needed if @name present)
+ * @ic:		IGNORE_CASE or CASE_SENSITIVE (ignored if @name not present)
+ * @lowest_vcn:	lowest vcn to find (optional, non-resident attributes only)
+ * @val:	attribute value to find (optional, resident attributes only)
+ * @val_len:	attribute value length
+ * @ctx:	search context with mft record and attribute to search from
+ *
+ * Find an attribute in an ntfs inode.  On first search @ctx->ntfs_ino must
+ * be the base mft record and @ctx must have been obtained from a call to
+ * ntfs_attr_get_search_ctx().
+ *
+ * This function transparently handles attribute lists and @ctx is used to
+ * continue searches where they were left off at.
+ *
+ * After finishing with the attribute/mft record you need to call
+ * ntfs_attr_put_search_ctx() to cleanup the search context (unmapping any
+ * mapped inodes, etc).
+ *
+ * Return 0 if the search was successful and -errno if not.
+ *
+ * When 0, @ctx->attr is the found attribute and it is in mft record
+ * @ctx->mrec.  If an attribute list attribute is present, @ctx->al_entry =
is
+ * the attribute list entry of the found attribute.
+ *
+ * When -ENOENT, @ctx->attr is the attribute which collates just after the
+ * attribute being searched for, i.e. if one wants to add the attribute to=
 the
+ * mft record this is the correct place to insert it into.  If an attribute
+ * list attribute is present, @ctx->al_entry is the attribute list entry w=
hich
+ * collates just after the attribute list entry of the attribute being sea=
rched
+ * for, i.e. if one wants to add the attribute to the mft record this is t=
he
+ * correct place to insert its attribute list entry into.
+ */
+int ntfs_attr_lookup(const __le32 type, const __le16 *name,
+		const u32 name_len, const u32 ic,
+		const s64 lowest_vcn, const u8 *val, const u32 val_len,
+		struct ntfs_attr_search_ctx *ctx)
+{
+	struct ntfs_inode *base_ni;
+
+	ntfs_debug("Entering.");
+	if (ctx->base_ntfs_ino)
+		base_ni =3D ctx->base_ntfs_ino;
+	else
+		base_ni =3D ctx->ntfs_ino;
+	/* Sanity check, just for debugging really. */
+	if (!base_ni || !NInoAttrList(base_ni) || type =3D=3D AT_ATTRIBUTE_LIST)
+		return ntfs_attr_find(type, name, name_len, ic, val, val_len,
+				ctx);
+	return ntfs_external_attr_find(type, name, name_len, ic, lowest_vcn,
+			val, val_len, ctx);
+}
+
+/**
+ * ntfs_attr_init_search_ctx - initialize an attribute search context
+ * @ctx:        attribute search context to initialize
+ * @ni:         ntfs inode with which to initialize the search context
+ * @mrec:       mft record with which to initialize the search context
+ *
+ * Initialize the attribute search context @ctx with @ni and @mrec.
+ */
+static bool ntfs_attr_init_search_ctx(struct ntfs_attr_search_ctx *ctx,
+		struct ntfs_inode *ni, struct mft_record *mrec)
+{
+	if (!mrec) {
+		mrec =3D map_mft_record(ni);
+		if (IS_ERR(mrec))
+			return false;
+		ctx->mapped_mrec =3D true;
+	} else {
+		ctx->mapped_mrec =3D false;
+	}
+
+	ctx->mrec =3D mrec;
+	/* Sanity checks are performed elsewhere. */
+	ctx->attr =3D (struct attr_record *)((u8 *)mrec + le16_to_cpu(mrec->attrs=
_offset));
+	ctx->is_first =3D true;
+	ctx->ntfs_ino =3D ni;
+	ctx->al_entry =3D NULL;
+	ctx->base_ntfs_ino =3D NULL;
+	ctx->base_mrec =3D NULL;
+	ctx->base_attr =3D NULL;
+	ctx->mapped_base_mrec =3D false;
+	return true;
+}
+
+/**
+ * ntfs_attr_reinit_search_ctx - reinitialize an attribute search context
+ * @ctx:	attribute search context to reinitialize
+ *
+ * Reinitialize the attribute search context @ctx, unmapping an associated
+ * extent mft record if present, and initialize the search context again.
+ *
+ * This is used when a search for a new attribute is being started to reset
+ * the search context to the beginning.
+ */
+void ntfs_attr_reinit_search_ctx(struct ntfs_attr_search_ctx *ctx)
+{
+	bool mapped_mrec;
+
+	if (likely(!ctx->base_ntfs_ino)) {
+		/* No attribute list. */
+		ctx->is_first =3D true;
+		/* Sanity checks are performed elsewhere. */
+		ctx->attr =3D (struct attr_record *)((u8 *)ctx->mrec +
+				le16_to_cpu(ctx->mrec->attrs_offset));
+		/*
+		 * This needs resetting due to ntfs_external_attr_find() which
+		 * can leave it set despite having zeroed ctx->base_ntfs_ino.
+		 */
+		ctx->al_entry =3D NULL;
+		return;
+	} /* Attribute list. */
+	if (ctx->ntfs_ino !=3D ctx->base_ntfs_ino && ctx->ntfs_ino)
+		unmap_extent_mft_record(ctx->ntfs_ino);
+
+	mapped_mrec =3D ctx->mapped_base_mrec;
+	ntfs_attr_init_search_ctx(ctx, ctx->base_ntfs_ino, ctx->base_mrec);
+	ctx->mapped_mrec =3D mapped_mrec;
+}
+
+/**
+ * ntfs_attr_get_search_ctx - allocate/initialize a new attribute search c=
ontext
+ * @ni:		ntfs inode with which to initialize the search context
+ * @mrec:	mft record with which to initialize the search context
+ *
+ * Allocate a new attribute search context, initialize it with @ni and @mr=
ec,
+ * and return it. Return NULL if allocation failed.
+ */
+struct ntfs_attr_search_ctx *ntfs_attr_get_search_ctx(struct ntfs_inode *n=
i,
+		struct mft_record *mrec)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	bool init;
+
+	ctx =3D kmem_cache_alloc(ntfs_attr_ctx_cache, GFP_NOFS);
+	if (ctx) {
+		init =3D ntfs_attr_init_search_ctx(ctx, ni, mrec);
+		if (init =3D=3D false) {
+			kmem_cache_free(ntfs_attr_ctx_cache, ctx);
+			ctx =3D NULL;
+		}
+	}
+
+	return ctx;
+}
+
+/**
+ * ntfs_attr_put_search_ctx - release an attribute search context
+ * @ctx:	attribute search context to free
+ *
+ * Release the attribute search context @ctx, unmapping an associated exte=
nt
+ * mft record if present.
+ */
+void ntfs_attr_put_search_ctx(struct ntfs_attr_search_ctx *ctx)
+{
+	if (ctx->mapped_mrec)
+		unmap_mft_record(ctx->ntfs_ino);
+
+	if (ctx->mapped_base_mrec && ctx->base_ntfs_ino &&
+	    ctx->ntfs_ino !=3D ctx->base_ntfs_ino)
+		unmap_extent_mft_record(ctx->base_ntfs_ino);
+	kmem_cache_free(ntfs_attr_ctx_cache, ctx);
+}
+
+/**
+ * ntfs_attr_find_in_attrdef - find an attribute in the $AttrDef system fi=
le
+ * @vol:	ntfs volume to which the attribute belongs
+ * @type:	attribute type which to find
+ *
+ * Search for the attribute definition record corresponding to the attribu=
te
+ * @type in the $AttrDef system file.
+ *
+ * Return the attribute type definition record if found and NULL if not fo=
und.
+ */
+static struct attr_def *ntfs_attr_find_in_attrdef(const struct ntfs_volume=
 *vol,
+		const __le32 type)
+{
+	struct attr_def *ad;
+
+	WARN_ON(!type);
+	for (ad =3D vol->attrdef; (u8 *)ad - (u8 *)vol->attrdef <
+			vol->attrdef_size && ad->type; ++ad) {
+		/* We have not found it yet, carry on searching. */
+		if (likely(le32_to_cpu(ad->type) < le32_to_cpu(type)))
+			continue;
+		/* We found the attribute; return it. */
+		if (likely(ad->type =3D=3D type))
+			return ad;
+		/* We have gone too far already.  No point in continuing. */
+		break;
+	}
+	/* Attribute not found. */
+	ntfs_debug("Attribute type 0x%x not found in $AttrDef.",
+			le32_to_cpu(type));
+	return NULL;
+}
+
+/**
+ * ntfs_attr_size_bounds_check - check a size of an attribute type for val=
idity
+ * @vol:	ntfs volume to which the attribute belongs
+ * @type:	attribute type which to check
+ * @size:	size which to check
+ *
+ * Check whether the @size in bytes is valid for an attribute of @type on =
the
+ * ntfs volume @vol.  This information is obtained from $AttrDef system fi=
le.
+ */
+int ntfs_attr_size_bounds_check(const struct ntfs_volume *vol, const __le3=
2 type,
+		const s64 size)
+{
+	struct attr_def *ad;
+
+	if (size < 0)
+		return -EINVAL;
+
+	/*
+	 * $ATTRIBUTE_LIST has a maximum size of 256kiB, but this is not
+	 * listed in $AttrDef.
+	 */
+	if (unlikely(type =3D=3D AT_ATTRIBUTE_LIST && size > 256 * 1024))
+		return -ERANGE;
+	/* Get the $AttrDef entry for the attribute @type. */
+	ad =3D ntfs_attr_find_in_attrdef(vol, type);
+	if (unlikely(!ad))
+		return -ENOENT;
+	/* Do the bounds check. */
+	if (((le64_to_cpu(ad->min_size) > 0) &&
+			size < le64_to_cpu(ad->min_size)) ||
+			((le64_to_cpu(ad->max_size) > 0) && size >
+			le64_to_cpu(ad->max_size)))
+		return -ERANGE;
+	return 0;
+}
+
+/**
+ * ntfs_attr_can_be_non_resident - check if an attribute can be non-reside=
nt
+ * @vol:	ntfs volume to which the attribute belongs
+ * @type:	attribute type which to check
+ *
+ * Check whether the attribute of @type on the ntfs volume @vol is allowed=
 to
+ * be non-resident.  This information is obtained from $AttrDef system fil=
e.
+ */
+static int ntfs_attr_can_be_non_resident(const struct ntfs_volume *vol,
+		const __le32 type)
+{
+	struct attr_def *ad;
+
+	/* Find the attribute definition record in $AttrDef. */
+	ad =3D ntfs_attr_find_in_attrdef(vol, type);
+	if (unlikely(!ad))
+		return -ENOENT;
+	/* Check the flags and return the result. */
+	if (ad->flags & ATTR_DEF_RESIDENT)
+		return -EPERM;
+	return 0;
+}
+
+/**
+ * ntfs_attr_can_be_resident - check if an attribute can be resident
+ * @vol:	ntfs volume to which the attribute belongs
+ * @type:	attribute type which to check
+ *
+ * Check whether the attribute of @type on the ntfs volume @vol is allowed=
 to
+ * be resident.  This information is derived from our ntfs knowledge and m=
ay
+ * not be completely accurate, especially when user defined attributes are
+ * present.  Basically we allow everything to be resident except for index
+ * allocation and $EA attributes.
+ *
+ * Return 0 if the attribute is allowed to be non-resident and -EPERM if n=
ot.
+ *
+ * Warning: In the system file $MFT the attribute $Bitmap must be non-resi=
dent
+ *	    otherwise windows will not boot (blue screen of death)!  We cannot
+ *	    check for this here as we do not know which inode's $Bitmap is
+ *	    being asked about so the caller needs to special case this.
+ */
+int ntfs_attr_can_be_resident(const struct ntfs_volume *vol, const __le32 =
type)
+{
+	if (type =3D=3D AT_INDEX_ALLOCATION)
+		return -EPERM;
+	return 0;
+}
+
+/**
+ * ntfs_attr_record_resize - resize an attribute record
+ * @m:		mft record containing attribute record
+ * @a:		attribute record to resize
+ * @new_size:	new size in bytes to which to resize the attribute record @a
+ *
+ * Resize the attribute record @a, i.e. the resident part of the attribute=
, in
+ * the mft record @m to @new_size bytes.
+ */
+int ntfs_attr_record_resize(struct mft_record *m, struct attr_record *a, u=
32 new_size)
+{
+	u32 old_size, alloc_size, attr_size;
+
+	old_size   =3D le32_to_cpu(m->bytes_in_use);
+	alloc_size =3D le32_to_cpu(m->bytes_allocated);
+	attr_size  =3D le32_to_cpu(a->length);
+
+	ntfs_debug("Sizes: old=3D%u alloc=3D%u attr=3D%u new=3D%u\n",
+			(unsigned int)old_size, (unsigned int)alloc_size,
+			(unsigned int)attr_size, (unsigned int)new_size);
+
+	/* Align to 8 bytes if it is not already done. */
+	if (new_size & 7)
+		new_size =3D (new_size + 7) & ~7;
+	/* If the actual attribute length has changed, move things around. */
+	if (new_size !=3D attr_size) {
+		u32 new_muse =3D le32_to_cpu(m->bytes_in_use) -
+				attr_size + new_size;
+		/* Not enough space in this mft record. */
+		if (new_muse > le32_to_cpu(m->bytes_allocated))
+			return -ENOSPC;
+
+		if (a->type =3D=3D AT_INDEX_ROOT && new_size > attr_size &&
+			new_muse + 120 > alloc_size && old_size + 120 <=3D alloc_size) {
+			ntfs_debug("Too big struct index_root (%u > %u)\n",
+					new_muse, alloc_size);
+			return -ENOSPC;
+		}
+
+		/* Move attributes following @a to their new location. */
+		memmove((u8 *)a + new_size, (u8 *)a + le32_to_cpu(a->length),
+				le32_to_cpu(m->bytes_in_use) - ((u8 *)a -
+				(u8 *)m) - attr_size);
+		/* Adjust @m to reflect the change in used space. */
+		m->bytes_in_use =3D cpu_to_le32(new_muse);
+		/* Adjust @a to reflect the new size. */
+		if (new_size >=3D offsetof(struct attr_record, length) + sizeof(a->lengt=
h))
+			a->length =3D cpu_to_le32(new_size);
+	}
+	return 0;
+}
+
+/**
+ * ntfs_resident_attr_value_resize - resize the value of a resident attrib=
ute
+ * @m:		mft record containing attribute record
+ * @a:		attribute record whose value to resize
+ * @new_size:	new size in bytes to which to resize the attribute value of =
@a
+ *
+ * Resize the value of the attribute @a in the mft record @m to @new_size =
bytes.
+ * If the value is made bigger, the newly allocated space is cleared.
+ */
+int ntfs_resident_attr_value_resize(struct mft_record *m, struct attr_reco=
rd *a,
+		const u32 new_size)
+{
+	u32 old_size;
+
+	/* Resize the resident part of the attribute record. */
+	if (ntfs_attr_record_resize(m, a,
+			le16_to_cpu(a->data.resident.value_offset) + new_size))
+		return -ENOSPC;
+	/*
+	 * The resize succeeded!  If we made the attribute value bigger, clear
+	 * the area between the old size and @new_size.
+	 */
+	old_size =3D le32_to_cpu(a->data.resident.value_length);
+	if (new_size > old_size)
+		memset((u8 *)a + le16_to_cpu(a->data.resident.value_offset) +
+				old_size, 0, new_size - old_size);
+	/* Finally update the length of the attribute value. */
+	a->data.resident.value_length =3D cpu_to_le32(new_size);
+	return 0;
+}
+
+/**
+ * ntfs_attr_make_non_resident - convert a resident to a non-resident attr=
ibute
+ * @ni:		ntfs inode describing the attribute to convert
+ * @data_size:	size of the resident data to copy to the non-resident attri=
bute
+ *
+ * Convert the resident ntfs attribute described by the ntfs inode @ni to a
+ * non-resident one.
+ *
+ * @data_size must be equal to the attribute value size.  This is needed s=
ince
+ * we need to know the size before we can map the mft record and our calle=
rs
+ * always know it.  The reason we cannot simply read the size from the vfs
+ * inode i_size is that this is not necessarily uptodate.  This happens wh=
en
+ * ntfs_attr_make_non_resident() is called in the ->truncate call path(s).
+ */
+int ntfs_attr_make_non_resident(struct ntfs_inode *ni, const u32 data_size)
+{
+	s64 new_size;
+	struct inode *vi =3D VFS_I(ni);
+	struct ntfs_volume *vol =3D ni->vol;
+	struct ntfs_inode *base_ni;
+	struct mft_record *m;
+	struct attr_record *a;
+	struct ntfs_attr_search_ctx *ctx;
+	struct folio *folio;
+	struct runlist_element *rl;
+	u8 *kaddr;
+	unsigned long flags;
+	int mp_size, mp_ofs, name_ofs, arec_size, err, err2;
+	u32 attr_size;
+	u8 old_res_attr_flags;
+
+	if (NInoNonResident(ni)) {
+		ntfs_warning(vol->sb,
+			"Trying to make non-resident attribute non-resident.  Aborting...\n");
+		return -EINVAL;
+	}
+
+	/* Check that the attribute is allowed to be non-resident. */
+	err =3D ntfs_attr_can_be_non_resident(vol, ni->type);
+	if (unlikely(err)) {
+		if (err =3D=3D -EPERM)
+			ntfs_debug("Attribute is not allowed to be non-resident.");
+		else
+			ntfs_debug("Attribute not defined on the NTFS volume!");
+		return err;
+	}
+
+	if (NInoEncrypted(ni))
+		return -EIO;
+
+	if (!NInoAttr(ni))
+		base_ni =3D ni;
+	else
+		base_ni =3D ni->ext.base_ntfs_ino;
+	m =3D map_mft_record(base_ni);
+	if (IS_ERR(m)) {
+		err =3D PTR_ERR(m);
+		m =3D NULL;
+		ctx =3D NULL;
+		goto err_out;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, m);
+	if (unlikely(!ctx)) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+	err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err)) {
+		if (err =3D=3D -ENOENT)
+			err =3D -EIO;
+		goto err_out;
+	}
+	m =3D ctx->mrec;
+	a =3D ctx->attr;
+
+	/*
+	 * The size needs to be aligned to a cluster boundary for allocation
+	 * purposes.
+	 */
+	new_size =3D (data_size + vol->cluster_size - 1) &
+			~(vol->cluster_size - 1);
+	if (new_size > 0) {
+		if ((a->flags & ATTR_COMPRESSION_MASK) =3D=3D ATTR_IS_COMPRESSED) {
+			/* must allocate full compression blocks */
+			new_size =3D
+				((new_size - 1) |
+				 ((1L << (STANDARD_COMPRESSION_UNIT +
+					  vol->cluster_size_bits)) - 1)) + 1;
+		}
+
+		/*
+		 * Will need folio later and since folio lock nests
+		 * outside all ntfs locks, we need to get the folio now.
+		 */
+		folio =3D __filemap_get_folio(vi->i_mapping, 0,
+					    FGP_CREAT | FGP_LOCK,
+					    mapping_gfp_mask(vi->i_mapping));
+		if (IS_ERR(folio)) {
+			err =3D -ENOMEM;
+			goto err_out;
+		}
+
+		/* Start by allocating clusters to hold the attribute value. */
+		rl =3D ntfs_cluster_alloc(vol, 0, new_size >>
+				vol->cluster_size_bits, -1, DATA_ZONE, true,
+				false, false);
+		if (IS_ERR(rl)) {
+			err =3D PTR_ERR(rl);
+			ntfs_debug("Failed to allocate cluster%s, error code %i.",
+					(new_size >> vol->cluster_size_bits) > 1 ? "s" : "",
+					err);
+			goto folio_err_out;
+		}
+	} else {
+		rl =3D NULL;
+		folio =3D NULL;
+	}
+
+	down_write(&ni->runlist.lock);
+	/* Determine the size of the mapping pairs array. */
+	mp_size =3D ntfs_get_size_for_mapping_pairs(vol, rl, 0, -1, -1);
+	if (unlikely(mp_size < 0)) {
+		err =3D mp_size;
+		ntfs_debug("Failed to get size for mapping pairs array, error code %i.\n=
", err);
+		goto rl_err_out;
+	}
+
+	if (NInoNonResident(ni) || a->non_resident) {
+		err =3D -EIO;
+		goto rl_err_out;
+	}
+
+	/*
+	 * Calculate new offsets for the name and the mapping pairs array.
+	 */
+	if (NInoSparse(ni) || NInoCompressed(ni))
+		name_ofs =3D (offsetof(struct attr_record,
+				data.non_resident.compressed_size) +
+				sizeof(a->data.non_resident.compressed_size) +
+				7) & ~7;
+	else
+		name_ofs =3D (offsetof(struct attr_record,
+				data.non_resident.compressed_size) + 7) & ~7;
+	mp_ofs =3D (name_ofs + a->name_length * sizeof(__le16) + 7) & ~7;
+	/*
+	 * Determine the size of the resident part of the now non-resident
+	 * attribute record.
+	 */
+	arec_size =3D (mp_ofs + mp_size + 7) & ~7;
+	/*
+	 * If the folio is not uptodate bring it uptodate by copying from the
+	 * attribute value.
+	 */
+	attr_size =3D le32_to_cpu(a->data.resident.value_length);
+	WARN_ON(attr_size !=3D data_size);
+	if (folio && !folio_test_uptodate(folio)) {
+		kaddr =3D kmap_local_folio(folio, 0);
+		memcpy(kaddr, (u8 *)a +
+				le16_to_cpu(a->data.resident.value_offset),
+				attr_size);
+		memset(kaddr + attr_size, 0, PAGE_SIZE - attr_size);
+		kunmap_local(kaddr);
+		flush_dcache_folio(folio);
+		folio_mark_uptodate(folio);
+	}
+
+	/* Backup the attribute flag. */
+	old_res_attr_flags =3D a->data.resident.flags;
+	/* Resize the resident part of the attribute record. */
+	err =3D ntfs_attr_record_resize(m, a, arec_size);
+	if (unlikely(err))
+		goto rl_err_out;
+
+	/*
+	 * Convert the resident part of the attribute record to describe a
+	 * non-resident attribute.
+	 */
+	a->non_resident =3D 1;
+	/* Move the attribute name if it exists and update the offset. */
+	if (a->name_length)
+		memmove((u8 *)a + name_ofs, (u8 *)a + le16_to_cpu(a->name_offset),
+				a->name_length * sizeof(__le16));
+	a->name_offset =3D cpu_to_le16(name_ofs);
+	/* Setup the fields specific to non-resident attributes. */
+	a->data.non_resident.lowest_vcn =3D 0;
+	a->data.non_resident.highest_vcn =3D cpu_to_le64((new_size - 1) >>
+			vol->cluster_size_bits);
+	a->data.non_resident.mapping_pairs_offset =3D cpu_to_le16(mp_ofs);
+	memset(&a->data.non_resident.reserved, 0,
+			sizeof(a->data.non_resident.reserved));
+	a->data.non_resident.allocated_size =3D cpu_to_le64(new_size);
+	a->data.non_resident.data_size =3D
+			a->data.non_resident.initialized_size =3D
+			cpu_to_le64(attr_size);
+	if (NInoSparse(ni) || NInoCompressed(ni)) {
+		a->data.non_resident.compression_unit =3D 0;
+		if (NInoCompressed(ni) || vol->major_ver < 3)
+			a->data.non_resident.compression_unit =3D 4;
+		a->data.non_resident.compressed_size =3D
+				a->data.non_resident.allocated_size;
+	} else
+		a->data.non_resident.compression_unit =3D 0;
+	/* Generate the mapping pairs array into the attribute record. */
+	err =3D ntfs_mapping_pairs_build(vol, (u8 *)a + mp_ofs,
+			arec_size - mp_ofs, rl, 0, -1, NULL, NULL, NULL);
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Failed to build mapping pairs, error code %i.",
+				err);
+		goto undo_err_out;
+	}
+
+	/* Setup the in-memory attribute structure to be non-resident. */
+	ni->runlist.rl =3D rl;
+	if (rl) {
+		for (ni->runlist.count =3D 1; rl->length !=3D 0; rl++)
+			ni->runlist.count++;
+	} else
+		ni->runlist.count =3D 0;
+	write_lock_irqsave(&ni->size_lock, flags);
+	ni->allocated_size =3D new_size;
+	if (NInoSparse(ni) || NInoCompressed(ni)) {
+		ni->itype.compressed.size =3D ni->allocated_size;
+		if (a->data.non_resident.compression_unit) {
+			ni->itype.compressed.block_size =3D 1U <<
+				(a->data.non_resident.compression_unit +
+				 vol->cluster_size_bits);
+			ni->itype.compressed.block_size_bits =3D
+					ffs(ni->itype.compressed.block_size) -
+					1;
+			ni->itype.compressed.block_clusters =3D 1U <<
+					a->data.non_resident.compression_unit;
+		} else {
+			ni->itype.compressed.block_size =3D 0;
+			ni->itype.compressed.block_size_bits =3D 0;
+			ni->itype.compressed.block_clusters =3D 0;
+		}
+		vi->i_blocks =3D ni->itype.compressed.size >> 9;
+	} else
+		vi->i_blocks =3D ni->allocated_size >> 9;
+	write_unlock_irqrestore(&ni->size_lock, flags);
+	/*
+	 * This needs to be last since the address space operations ->read_folio
+	 * and ->writepage can run concurrently with us as they are not
+	 * serialized on i_mutex.  Note, we are not allowed to fail once we flip
+	 * this switch, which is another reason to do this last.
+	 */
+	NInoSetNonResident(ni);
+	NInoSetFullyMapped(ni);
+	/* Mark the mft record dirty, so it gets written back. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(base_ni);
+	up_write(&ni->runlist.lock);
+	if (folio) {
+		iomap_dirty_folio(vi->i_mapping, folio);
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+	ntfs_debug("Done.");
+	return 0;
+undo_err_out:
+	/* Convert the attribute back into a resident attribute. */
+	a->non_resident =3D 0;
+	/* Move the attribute name if it exists and update the offset. */
+	name_ofs =3D (offsetof(struct attr_record, data.resident.reserved) +
+			sizeof(a->data.resident.reserved) + 7) & ~7;
+	if (a->name_length)
+		memmove((u8 *)a + name_ofs, (u8 *)a + le16_to_cpu(a->name_offset),
+				a->name_length * sizeof(__le16));
+	mp_ofs =3D (name_ofs + a->name_length * sizeof(__le16) + 7) & ~7;
+	a->name_offset =3D cpu_to_le16(name_ofs);
+	arec_size =3D (mp_ofs + attr_size + 7) & ~7;
+	/* Resize the resident part of the attribute record. */
+	err2 =3D ntfs_attr_record_resize(m, a, arec_size);
+	if (unlikely(err2)) {
+		/*
+		 * This cannot happen (well if memory corruption is at work it
+		 * could happen in theory), but deal with it as well as we can.
+		 * If the old size is too small, truncate the attribute,
+		 * otherwise simply give it a larger allocated size.
+		 */
+		arec_size =3D le32_to_cpu(a->length);
+		if ((mp_ofs + attr_size) > arec_size) {
+			err2 =3D attr_size;
+			attr_size =3D arec_size - mp_ofs;
+			ntfs_error(vol->sb,
+				"Failed to undo partial resident to non-resident attribute conversion.=
  Truncating inode 0x%lx, attribute type 0x%x from %i bytes to %i bytes to =
maintain metadata consistency.  THIS MEANS YOU ARE LOSING %i BYTES DATA FRO=
M THIS %s.",
+					vi->i_ino,
+					(unsigned int)le32_to_cpu(ni->type),
+					err2, attr_size, err2 - attr_size,
+					((ni->type =3D=3D AT_DATA) &&
+					!ni->name_len) ? "FILE" : "ATTRIBUTE");
+			write_lock_irqsave(&ni->size_lock, flags);
+			ni->initialized_size =3D attr_size;
+			i_size_write(vi, attr_size);
+			write_unlock_irqrestore(&ni->size_lock, flags);
+		}
+	}
+	/* Setup the fields specific to resident attributes. */
+	a->data.resident.value_length =3D cpu_to_le32(attr_size);
+	a->data.resident.value_offset =3D cpu_to_le16(mp_ofs);
+	a->data.resident.flags =3D old_res_attr_flags;
+	memset(&a->data.resident.reserved, 0,
+			sizeof(a->data.resident.reserved));
+	/* Copy the data from folio back to the attribute value. */
+	if (folio)
+		memcpy_from_folio((u8 *)a + mp_ofs, folio, 0, attr_size);
+	/* Setup the allocated size in the ntfs inode in case it changed. */
+	write_lock_irqsave(&ni->size_lock, flags);
+	ni->allocated_size =3D arec_size - mp_ofs;
+	write_unlock_irqrestore(&ni->size_lock, flags);
+	/* Mark the mft record dirty, so it gets written back. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+rl_err_out:
+	up_write(&ni->runlist.lock);
+	if (rl) {
+		if (ntfs_cluster_free_from_rl(vol, rl) < 0) {
+			ntfs_error(vol->sb,
+				"Failed to release allocated cluster(s) in error code path.  Run chkds=
k to recover the lost cluster(s).");
+			NVolSetErrors(vol);
+		}
+		ntfs_free(rl);
+folio_err_out:
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+err_out:
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	if (m)
+		unmap_mft_record(base_ni);
+	ni->runlist.rl =3D NULL;
+
+	if (err =3D=3D -EINVAL)
+		err =3D -EIO;
+	return err;
+}
+
+/**
+ * ntfs_attr_set - fill (a part of) an attribute with a byte
+ * @ni:		ntfs inode describing the attribute to fill
+ * @ofs:	offset inside the attribute at which to start to fill
+ * @cnt:	number of bytes to fill
+ * @val:	the unsigned 8-bit value with which to fill the attribute
+ *
+ * Fill @cnt bytes of the attribute described by the ntfs inode @ni starti=
ng at
+ * byte offset @ofs inside the attribute with the constant byte @val.
+ *
+ * This function is effectively like memset() applied to an ntfs attribute.
+ * Note thie function actually only operates on the page cache pages belon=
ging
+ * to the ntfs attribute and it marks them dirty after doing the memset().
+ * Thus it relies on the vm dirty page write code paths to cause the modif=
ied
+ * pages to be written to the mft record/disk.
+ */
+int ntfs_attr_set(struct ntfs_inode *ni, s64 ofs, s64 cnt, const u8 val)
+{
+	struct address_space *mapping =3D VFS_I(ni)->i_mapping;
+	struct folio *folio;
+	pgoff_t index;
+	u8 *addr;
+	unsigned long offset;
+	size_t attr_len;
+	int ret =3D 0;
+
+	index =3D ofs >> PAGE_SHIFT;
+	while (cnt) {
+		folio =3D ntfs_read_mapping_folio(mapping, index);
+		if (IS_ERR(folio)) {
+			ret =3D PTR_ERR(folio);
+			ntfs_error(VFS_I(ni)->i_sb, "Failed to read a page %lu for attr %#x: %l=
d",
+				   index, ni->type, PTR_ERR(folio));
+			break;
+		}
+
+		offset =3D offset_in_folio(folio, ofs);
+		attr_len =3D min_t(size_t, (size_t)cnt, folio_size(folio) - offset);
+
+		folio_lock(folio);
+		addr =3D kmap_local_folio(folio, offset);
+		memset(addr, val, attr_len);
+		kunmap_local(addr);
+
+		flush_dcache_folio(folio);
+		folio_mark_dirty(folio);
+		folio_unlock(folio);
+		folio_put(folio);
+
+		ofs +=3D attr_len;
+		cnt -=3D attr_len;
+		index++;
+		cond_resched();
+	}
+
+	return ret;
+}
+
+int ntfs_attr_set_initialized_size(struct ntfs_inode *ni, loff_t new_size)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	int err =3D 0;
+
+	if (!NInoNonResident(ni))
+		return -EINVAL;
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx)
+		return -ENOMEM;
+
+	err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+			       CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (err)
+		goto out_ctx;
+
+	ctx->attr->data.non_resident.initialized_size =3D cpu_to_le64(new_size);
+	ni->initialized_size =3D new_size;
+	mark_mft_record_dirty(ctx->ntfs_ino);
+out_ctx:
+	ntfs_attr_put_search_ctx(ctx);
+	return err;
+}
+
+/**
+ * ntfs_make_room_for_attr - make room for an attribute inside an mft reco=
rd
+ * @m:		mft record
+ * @pos:	position at which to make space
+ * @size:	byte size to make available at this position
+ *
+ * @pos points to the attribute in front of which we want to make space.
+ */
+static int ntfs_make_room_for_attr(struct mft_record *m, u8 *pos, u32 size)
+{
+	u32 biu;
+
+	ntfs_debug("Entering for pos 0x%x, size %u.\n",
+			(int)(pos - (u8 *)m), (unsigned int) size);
+
+	/* Make size 8-byte alignment. */
+	size =3D (size + 7) & ~7;
+
+	/* Rigorous consistency checks. */
+	if (!m || !pos || pos < (u8 *)m) {
+		pr_err("%s: pos=3D%p  m=3D%p", __func__, pos, m);
+		return -EINVAL;
+	}
+
+	/* The -8 is for the attribute terminator. */
+	if (pos - (u8 *)m > (int)le32_to_cpu(m->bytes_in_use) - 8)
+		return -EINVAL;
+	/* Nothing to do. */
+	if (!size)
+		return 0;
+
+	biu =3D le32_to_cpu(m->bytes_in_use);
+	/* Do we have enough space? */
+	if (biu + size > le32_to_cpu(m->bytes_allocated) ||
+	    pos + size > (u8 *)m + le32_to_cpu(m->bytes_allocated)) {
+		ntfs_debug("No enough space in the MFT record\n");
+		return -ENOSPC;
+	}
+	/* Move everything after pos to pos + size. */
+	memmove(pos + size, pos, biu - (pos - (u8 *)m));
+	/* Update mft record. */
+	m->bytes_in_use =3D cpu_to_le32(biu + size);
+	return 0;
+}
+
+/**
+ * ntfs_resident_attr_record_add - add resident attribute to inode
+ * @ni:		opened ntfs inode to which MFT record add attribute
+ * @type:	type of the new attribute
+ * @name:	name of the new attribute
+ * @name_len:	name length of the new attribute
+ * @val:	value of the new attribute
+ * @size:	size of new attribute (length of @val, if @val !=3D NULL)
+ * @flags:	flags of the new attribute
+ */
+int ntfs_resident_attr_record_add(struct ntfs_inode *ni, __le32 type,
+		__le16 *name, u8 name_len, u8 *val, u32 size,
+		__le16 flags)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	u32 length;
+	struct attr_record *a;
+	struct mft_record *m;
+	int err, offset;
+	struct ntfs_inode *base_ni;
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x, flags 0x%x.\n",
+			(long long) ni->mft_no, (unsigned int) le32_to_cpu(type),
+			(unsigned int) le16_to_cpu(flags));
+
+	if (!ni || (!name && name_len))
+		return -EINVAL;
+
+	err =3D ntfs_attr_can_be_resident(ni->vol, type);
+	if (err) {
+		if (err =3D=3D -EPERM)
+			ntfs_debug("Attribute can't be resident.\n");
+		else
+			ntfs_debug("ntfs_attr_can_be_resident failed.\n");
+		return err;
+	}
+
+	/* Locate place where record should be. */
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		ntfs_error(ni->vol->sb, "%s: Failed to get search context",
+				__func__);
+		return -ENOMEM;
+	}
+	/*
+	 * Use ntfs_attr_find instead of ntfs_attr_lookup to find place for
+	 * attribute in @ni->mrec, not any extent inode in case if @ni is base
+	 * file record.
+	 */
+	err =3D ntfs_attr_find(type, name, name_len, CASE_SENSITIVE, val, size, c=
tx);
+	if (!err) {
+		err =3D -EEXIST;
+		ntfs_debug("Attribute already present.\n");
+		goto put_err_out;
+	}
+	if (err !=3D -ENOENT) {
+		err =3D -EIO;
+		goto put_err_out;
+	}
+	a =3D ctx->attr;
+	m =3D ctx->mrec;
+
+	/* Make room for attribute. */
+	length =3D offsetof(struct attr_record, data.resident.reserved) +
+			  sizeof(a->data.resident.reserved) +
+		((name_len * sizeof(__le16) + 7) & ~7) +
+		((size + 7) & ~7);
+	err =3D ntfs_make_room_for_attr(ctx->mrec, (u8 *) ctx->attr, length);
+	if (err) {
+		ntfs_debug("Failed to make room for attribute.\n");
+		goto put_err_out;
+	}
+
+	/* Setup record fields. */
+	offset =3D ((u8 *)a - (u8 *)m);
+	a->type =3D type;
+	a->length =3D cpu_to_le32(length);
+	a->non_resident =3D 0;
+	a->name_length =3D name_len;
+	a->name_offset =3D
+		name_len ? cpu_to_le16((offsetof(struct attr_record, data.resident.reser=
ved) +
+				sizeof(a->data.resident.reserved))) : cpu_to_le16(0);
+
+	a->flags =3D flags;
+	a->instance =3D m->next_attr_instance;
+	a->data.resident.value_length =3D cpu_to_le32(size);
+	a->data.resident.value_offset =3D cpu_to_le16(length - ((size + 7) & ~7));
+	if (val)
+		memcpy((u8 *)a + le16_to_cpu(a->data.resident.value_offset), val, size);
+	else
+		memset((u8 *)a + le16_to_cpu(a->data.resident.value_offset), 0, size);
+	if (type =3D=3D AT_FILE_NAME)
+		a->data.resident.flags =3D RESIDENT_ATTR_IS_INDEXED;
+	else
+		a->data.resident.flags =3D 0;
+	if (name_len)
+		memcpy((u8 *)a + le16_to_cpu(a->name_offset),
+				name, sizeof(__le16) * name_len);
+	m->next_attr_instance =3D
+		cpu_to_le16((le16_to_cpu(m->next_attr_instance) + 1) & 0xffff);
+	if (ni->nr_extents =3D=3D -1)
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+	if (type !=3D AT_ATTRIBUTE_LIST && NInoAttrList(base_ni)) {
+		err =3D ntfs_attrlist_entry_add(ni, a);
+		if (err) {
+			ntfs_attr_record_resize(m, a, 0);
+			mark_mft_record_dirty(ctx->ntfs_ino);
+			ntfs_debug("Failed add attribute entry to ATTRIBUTE_LIST.\n");
+			goto put_err_out;
+		}
+	}
+	mark_mft_record_dirty(ni);
+	ntfs_attr_put_search_ctx(ctx);
+	return offset;
+put_err_out:
+	ntfs_attr_put_search_ctx(ctx);
+	return -EIO;
+}
+
+/**
+ * ntfs_non_resident_attr_record_add - add extent of non-resident attribute
+ * @ni:			opened ntfs inode to which MFT record add attribute
+ * @type:		type of the new attribute extent
+ * @name:		name of the new attribute extent
+ * @name_len:		name length of the new attribute extent
+ * @lowest_vcn:		lowest vcn of the new attribute extent
+ * @dataruns_size:	dataruns size of the new attribute extent
+ * @flags:		flags of the new attribute extent
+ */
+static int ntfs_non_resident_attr_record_add(struct ntfs_inode *ni, __le32=
 type,
+		__le16 *name, u8 name_len, s64 lowest_vcn, int dataruns_size,
+		__le16 flags)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	u32 length;
+	struct attr_record *a;
+	struct mft_record *m;
+	struct ntfs_inode *base_ni;
+	int err, offset;
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x, lowest_vcn %lld, dataru=
ns_size %d, flags 0x%x.\n",
+			(long long) ni->mft_no, (unsigned int) le32_to_cpu(type),
+			(long long) lowest_vcn, dataruns_size,
+			(unsigned int) le16_to_cpu(flags));
+
+	if (!ni || dataruns_size <=3D 0 || (!name && name_len))
+		return -EINVAL;
+
+	err =3D ntfs_attr_can_be_non_resident(ni->vol, type);
+	if (err) {
+		if (err =3D=3D -EPERM)
+			pr_err("Attribute can't be non resident");
+		else
+			pr_err("ntfs_attr_can_be_non_resident failed");
+		return err;
+	}
+
+	/* Locate place where record should be. */
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		pr_err("%s: Failed to get search context", __func__);
+		return -ENOMEM;
+	}
+	/*
+	 * Use ntfs_attr_find instead of ntfs_attr_lookup to find place for
+	 * attribute in @ni->mrec, not any extent inode in case if @ni is base
+	 * file record.
+	 */
+	err =3D ntfs_attr_find(type, name, name_len, CASE_SENSITIVE, NULL, 0, ctx=
);
+	if (!err) {
+		err =3D -EEXIST;
+		pr_err("Attribute 0x%x already present", type);
+		goto put_err_out;
+	}
+	if (err !=3D -ENOENT) {
+		pr_err("ntfs_attr_find failed");
+		err =3D -EIO;
+		goto put_err_out;
+	}
+	a =3D ctx->attr;
+	m =3D ctx->mrec;
+
+	/* Make room for attribute. */
+	dataruns_size =3D (dataruns_size + 7) & ~7;
+	length =3D offsetof(struct attr_record, data.non_resident.compressed_size=
) +
+		((sizeof(__le16) * name_len + 7) & ~7) + dataruns_size +
+		((flags & (ATTR_IS_COMPRESSED | ATTR_IS_SPARSE)) ?
+		 sizeof(a->data.non_resident.compressed_size) : 0);
+	err =3D ntfs_make_room_for_attr(ctx->mrec, (u8 *) ctx->attr, length);
+	if (err) {
+		pr_err("Failed to make room for attribute");
+		goto put_err_out;
+	}
+
+	/* Setup record fields. */
+	a->type =3D type;
+	a->length =3D cpu_to_le32(length);
+	a->non_resident =3D 1;
+	a->name_length =3D name_len;
+	a->name_offset =3D cpu_to_le16(offsetof(struct attr_record,
+					      data.non_resident.compressed_size) +
+			((flags & (ATTR_IS_COMPRESSED | ATTR_IS_SPARSE)) ?
+			 sizeof(a->data.non_resident.compressed_size) : 0));
+	a->flags =3D flags;
+	a->instance =3D m->next_attr_instance;
+	a->data.non_resident.lowest_vcn =3D cpu_to_le64(lowest_vcn);
+	a->data.non_resident.mapping_pairs_offset =3D cpu_to_le16(length - dataru=
ns_size);
+	a->data.non_resident.compression_unit =3D
+		(flags & ATTR_IS_COMPRESSED) ? STANDARD_COMPRESSION_UNIT : 0;
+	/* If @lowest_vcn =3D=3D 0, than setup empty attribute. */
+	if (!lowest_vcn) {
+		a->data.non_resident.highest_vcn =3D cpu_to_le64(-1);
+		a->data.non_resident.allocated_size =3D 0;
+		a->data.non_resident.data_size =3D 0;
+		a->data.non_resident.initialized_size =3D 0;
+		/* Set empty mapping pairs. */
+		*((u8 *)a + le16_to_cpu(a->data.non_resident.mapping_pairs_offset)) =3D =
0;
+	}
+	if (name_len)
+		memcpy((u8 *)a + le16_to_cpu(a->name_offset),
+				name, sizeof(__le16) * name_len);
+	m->next_attr_instance =3D
+		cpu_to_le16((le16_to_cpu(m->next_attr_instance) + 1) & 0xffff);
+	if (ni->nr_extents =3D=3D -1)
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+	if (type !=3D AT_ATTRIBUTE_LIST && NInoAttrList(base_ni)) {
+		err =3D ntfs_attrlist_entry_add(ni, a);
+		if (err) {
+			pr_err("Failed add attr entry to attrlist");
+			ntfs_attr_record_resize(m, a, 0);
+			goto put_err_out;
+		}
+	}
+	mark_mft_record_dirty(ni);
+	/*
+	 * Locate offset from start of the MFT record where new attribute is
+	 * placed. We need relookup it, because record maybe moved during
+	 * update of attribute list.
+	 */
+	ntfs_attr_reinit_search_ctx(ctx);
+	err =3D ntfs_attr_lookup(type, name, name_len, CASE_SENSITIVE,
+				lowest_vcn, NULL, 0, ctx);
+	if (err) {
+		pr_err("%s: attribute lookup failed", __func__);
+		ntfs_attr_put_search_ctx(ctx);
+		return err;
+
+	}
+	offset =3D (u8 *)ctx->attr - (u8 *)ctx->mrec;
+	ntfs_attr_put_search_ctx(ctx);
+	return offset;
+put_err_out:
+	ntfs_attr_put_search_ctx(ctx);
+	return -1;
+}
+
+/**
+ * ntfs_attr_record_rm - remove attribute extent
+ * @ctx:	search context describing the attribute which should be removed
+ *
+ * If this function succeed, user should reinit search context if he/she w=
ants
+ * use it anymore.
+ */
+int ntfs_attr_record_rm(struct ntfs_attr_search_ctx *ctx)
+{
+	struct ntfs_inode *base_ni, *ni;
+	__le32 type;
+	int err;
+
+	if (!ctx || !ctx->ntfs_ino || !ctx->mrec || !ctx->attr)
+		return -EINVAL;
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x.\n",
+			(long long) ctx->ntfs_ino->mft_no,
+			(unsigned int) le32_to_cpu(ctx->attr->type));
+	type =3D ctx->attr->type;
+	ni =3D ctx->ntfs_ino;
+	if (ctx->base_ntfs_ino)
+		base_ni =3D ctx->base_ntfs_ino;
+	else
+		base_ni =3D ctx->ntfs_ino;
+
+	/* Remove attribute itself. */
+	if (ntfs_attr_record_resize(ctx->mrec, ctx->attr, 0)) {
+		ntfs_debug("Couldn't remove attribute record. Bug or damaged MFT record.=
\n");
+		return -EIO;
+	}
+	mark_mft_record_dirty(ni);
+
+	/*
+	 * Remove record from $ATTRIBUTE_LIST if present and we don't want
+	 * delete $ATTRIBUTE_LIST itself.
+	 */
+	if (NInoAttrList(base_ni) && type !=3D AT_ATTRIBUTE_LIST) {
+		err =3D ntfs_attrlist_entry_rm(ctx);
+		if (err) {
+			ntfs_debug("Couldn't delete record from $ATTRIBUTE_LIST.\n");
+			return err;
+		}
+	}
+
+	/* Post $ATTRIBUTE_LIST delete setup. */
+	if (type =3D=3D AT_ATTRIBUTE_LIST) {
+		if (NInoAttrList(base_ni) && base_ni->attr_list)
+			ntfs_free(base_ni->attr_list);
+		base_ni->attr_list =3D NULL;
+		NInoClearAttrList(base_ni);
+	}
+
+	/* Free MFT record, if it doesn't contain attributes. */
+	if (le32_to_cpu(ctx->mrec->bytes_in_use) -
+			le16_to_cpu(ctx->mrec->attrs_offset) =3D=3D 8) {
+		if (ntfs_mft_record_free(ni->vol, ni)) {
+			ntfs_debug("Couldn't free MFT record.\n");
+			return -EIO;
+		}
+		/* Remove done if we freed base inode. */
+		if (ni =3D=3D base_ni)
+			return 0;
+		ntfs_inode_close(ni);
+		ctx->ntfs_ino =3D ni =3D NULL;
+	}
+
+	if (type =3D=3D AT_ATTRIBUTE_LIST || !NInoAttrList(base_ni))
+		return 0;
+
+	/* Remove attribute list if we don't need it any more. */
+	if (!ntfs_attrlist_need(base_ni)) {
+		struct ntfs_attr na;
+		struct inode *attr_vi;
+
+		ntfs_attr_reinit_search_ctx(ctx);
+		if (ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0, CASE_SENSITIVE,
+					0, NULL, 0, ctx)) {
+			ntfs_debug("Couldn't find attribute list. Succeed anyway.\n");
+			return 0;
+		}
+		/* Deallocate clusters. */
+		if (ctx->attr->non_resident) {
+			struct runlist_element *al_rl;
+			size_t new_rl_count;
+
+			al_rl =3D ntfs_mapping_pairs_decompress(base_ni->vol,
+					ctx->attr, NULL, &new_rl_count);
+			if (IS_ERR(al_rl)) {
+				ntfs_debug("Couldn't decompress attribute list runlist. Succeed anyway=
.\n");
+				return 0;
+			}
+			if (ntfs_cluster_free_from_rl(base_ni->vol, al_rl))
+				ntfs_debug("Leaking clusters! Run chkdsk. Couldn't free clusters from =
attribute list runlist.\n");
+			ntfs_free(al_rl);
+		}
+		/* Remove attribute record itself. */
+		if (ntfs_attr_record_rm(ctx)) {
+			ntfs_debug("Couldn't remove attribute list. Succeed anyway.\n");
+			return 0;
+		}
+
+		na.mft_no =3D VFS_I(base_ni)->i_ino;
+		na.type =3D AT_ATTRIBUTE_LIST;
+		na.name =3D NULL;
+		na.name_len =3D 0;
+
+		attr_vi =3D ilookup5(VFS_I(base_ni)->i_sb, VFS_I(base_ni)->i_ino,
+				   ntfs_test_inode, &na);
+		if (attr_vi) {
+			clear_nlink(attr_vi);
+			iput(attr_vi);
+		}
+
+	}
+	return 0;
+}
+
+/**
+ * ntfs_attr_add - add attribute to inode
+ * @ni:		opened ntfs inode to which add attribute
+ * @type:	type of the new attribute
+ * @name:	name in unicode of the new attribute
+ * @name_len:	name length in unicode characters of the new attribute
+ * @val:	value of new attribute
+ * @size:	size of the new attribute / length of @val (if specified)
+ *
+ * @val should always be specified for always resident attributes (eg. FIL=
E_NAME
+ * attribute), for attributes that can become non-resident @val can be NULL
+ * (eg. DATA attribute). @size can be specified even if @val is NULL, in t=
his
+ * case data size will be equal to @size and initialized size will be equal
+ * to 0.
+ *
+ * If inode haven't got enough space to add attribute, add attribute to on=
e of
+ * it extents, if no extents present or no one of them have enough space, =
than
+ * allocate new extent and add attribute to it.
+ *
+ * If on one of this steps attribute list is needed but not present, than =
it is
+ * added transparently to caller. So, this function should not be called w=
ith
+ * @type =3D=3D AT_ATTRIBUTE_LIST, if you really need to add attribute lis=
t call
+ * ntfs_inode_add_attrlist instead.
+ *
+ * On success return 0. On error return -1 with errno set to the error cod=
e.
+ */
+int ntfs_attr_add(struct ntfs_inode *ni, __le32 type,
+		__le16 *name, u8 name_len, u8 *val, s64 size)
+{
+	struct super_block *sb;
+	u32 attr_rec_size;
+	int err, i, offset;
+	bool is_resident;
+	bool can_be_non_resident =3D false;
+	struct ntfs_inode *attr_ni;
+	struct inode *attr_vi;
+	struct mft_record *ni_mrec;
+
+	if (!ni || size < 0 || type =3D=3D AT_ATTRIBUTE_LIST)
+		return -EINVAL;
+
+	ntfs_debug("Entering for inode 0x%llx, attr %x, size %lld.\n",
+			(long long) ni->mft_no, type, size);
+
+	if (ni->nr_extents =3D=3D -1)
+		ni =3D ni->ext.base_ntfs_ino;
+
+	/* Check the attribute type and the size. */
+	err =3D ntfs_attr_size_bounds_check(ni->vol, type, size);
+	if (err) {
+		if (err =3D=3D -ENOENT)
+			err =3D -EIO;
+		return err;
+	}
+
+	sb =3D ni->vol->sb;
+	/* Sanity checks for always resident attributes. */
+	err =3D ntfs_attr_can_be_non_resident(ni->vol, type);
+	if (err) {
+		if (err !=3D -EPERM) {
+			ntfs_error(sb, "ntfs_attr_can_be_non_resident failed");
+			goto err_out;
+		}
+		/* @val is mandatory. */
+		if (!val) {
+			ntfs_error(sb,
+				"val is mandatory for always resident attributes");
+			return -EINVAL;
+		}
+		if (size > ni->vol->mft_record_size) {
+			ntfs_error(sb, "Attribute is too big");
+			return -ERANGE;
+		}
+	} else
+		can_be_non_resident =3D true;
+
+	/*
+	 * Determine resident or not will be new attribute. We add 8 to size in
+	 * non resident case for mapping pairs.
+	 */
+	err =3D ntfs_attr_can_be_resident(ni->vol, type);
+	if (!err) {
+		is_resident =3D true;
+	} else {
+		if (err !=3D -EPERM) {
+			ntfs_error(sb, "ntfs_attr_can_be_resident failed");
+			goto err_out;
+		}
+		is_resident =3D false;
+	}
+
+	/* Calculate attribute record size. */
+	if (is_resident)
+		attr_rec_size =3D offsetof(struct attr_record, data.resident.reserved) +
+			1 +
+			((name_len * sizeof(__le16) + 7) & ~7) +
+			((size + 7) & ~7);
+	else
+		attr_rec_size =3D offsetof(struct attr_record, data.non_resident.compres=
sed_size) +
+			((name_len * sizeof(__le16) + 7) & ~7) + 8;
+
+	/*
+	 * If we have enough free space for the new attribute in the base MFT
+	 * record, then add attribute to it.
+	 */
+retry:
+	ni_mrec =3D map_mft_record(ni);
+	if (IS_ERR(ni_mrec)) {
+		err =3D -EIO;
+		goto err_out;
+	}
+
+	if (le32_to_cpu(ni_mrec->bytes_allocated) -
+			le32_to_cpu(ni_mrec->bytes_in_use) >=3D attr_rec_size) {
+		attr_ni =3D ni;
+		unmap_mft_record(ni);
+		goto add_attr_record;
+	}
+	unmap_mft_record(ni);
+
+	/* Try to add to extent inodes. */
+	err =3D ntfs_inode_attach_all_extents(ni);
+	if (err) {
+		ntfs_error(sb, "Failed to attach all extents to inode");
+		goto err_out;
+	}
+
+	for (i =3D 0; i < ni->nr_extents; i++) {
+		attr_ni =3D ni->ext.extent_ntfs_inos[i];
+		ni_mrec =3D map_mft_record(attr_ni);
+		if (IS_ERR(ni_mrec)) {
+			err =3D -EIO;
+			goto err_out;
+		}
+
+		if (le32_to_cpu(ni_mrec->bytes_allocated) -
+				le32_to_cpu(ni_mrec->bytes_in_use) >=3D
+				attr_rec_size) {
+			unmap_mft_record(attr_ni);
+			goto add_attr_record;
+		}
+		unmap_mft_record(attr_ni);
+	}
+
+	/* There is no extent that contain enough space for new attribute. */
+	if (!NInoAttrList(ni)) {
+		/* Add attribute list not present, add it and retry. */
+		err =3D ntfs_inode_add_attrlist(ni);
+		if (err) {
+			ntfs_error(sb, "Failed to add attribute list");
+			goto err_out;
+		}
+		goto retry;
+	}
+
+	attr_ni =3D NULL;
+	/* Allocate new extent. */
+	err =3D ntfs_mft_record_alloc(ni->vol, 0, &attr_ni, ni, NULL);
+	if (err) {
+		ntfs_error(sb, "Failed to allocate extent record");
+		goto err_out;
+	}
+	unmap_mft_record(attr_ni);
+
+add_attr_record:
+	if (is_resident) {
+		/* Add resident attribute. */
+		offset =3D ntfs_resident_attr_record_add(attr_ni, type, name,
+				name_len, val, size, 0);
+		if (offset < 0) {
+			if (offset =3D=3D -ENOSPC && can_be_non_resident)
+				goto add_non_resident;
+			err =3D offset;
+			ntfs_error(sb, "Failed to add resident attribute");
+			goto free_err_out;
+		}
+		return 0;
+	}
+
+add_non_resident:
+	/* Add non resident attribute. */
+	offset =3D ntfs_non_resident_attr_record_add(attr_ni, type, name,
+			name_len, 0, 8, 0);
+	if (offset < 0) {
+		err =3D offset;
+		ntfs_error(sb, "Failed to add non resident attribute");
+		goto free_err_out;
+	}
+
+	/* If @size =3D=3D 0, we are done. */
+	if (!size)
+		return 0;
+
+	/* Open new attribute and resize it. */
+	attr_vi =3D ntfs_attr_iget(VFS_I(ni), type, name, name_len);
+	if (IS_ERR(attr_vi)) {
+		ntfs_error(sb, "Failed to open just added attribute");
+		goto rm_attr_err_out;
+	}
+	attr_ni =3D NTFS_I(attr_vi);
+
+	/* Resize and set attribute value. */
+	if (ntfs_attr_truncate(attr_ni, size) ||
+		(val && (ntfs_inode_attr_pwrite(attr_vi, 0, size, val, false) !=3D size)=
)) {
+		err =3D -EIO;
+		ntfs_error(sb, "Failed to initialize just added attribute");
+		if (ntfs_attr_rm(attr_ni))
+			ntfs_error(sb, "Failed to remove just added attribute");
+		iput(attr_vi);
+		goto err_out;
+	}
+	iput(attr_vi);
+	return 0;
+
+rm_attr_err_out:
+	/* Remove just added attribute. */
+	ni_mrec =3D map_mft_record(attr_ni);
+	if (!IS_ERR(ni_mrec)) {
+		if (ntfs_attr_record_resize(ni_mrec,
+					(struct attr_record *)((u8 *)ni_mrec + offset), 0))
+			ntfs_error(sb, "Failed to remove just added attribute #2");
+		unmap_mft_record(attr_ni);
+	} else
+		pr_err("EIO when try to remove new added attr\n");
+
+free_err_out:
+	/* Free MFT record, if it doesn't contain attributes. */
+	ni_mrec =3D map_mft_record(attr_ni);
+	if (!IS_ERR(ni_mrec)) {
+		int attr_size;
+
+		attr_size =3D le32_to_cpu(ni_mrec->bytes_in_use) -
+			le16_to_cpu(ni_mrec->attrs_offset);
+		unmap_mft_record(attr_ni);
+		if (attr_size =3D=3D 8) {
+			if (ntfs_mft_record_free(attr_ni->vol, attr_ni))
+				ntfs_error(sb, "Failed to free MFT record");
+			if (attr_ni->nr_extents < 0)
+				ntfs_inode_close(attr_ni);
+		}
+	} else
+		pr_err("EIO when testing mft record is free-able\n");
+
+err_out:
+	return err;
+}
+
+/**
+ * __ntfs_attr_init - primary initialization of an ntfs attribute structure
+ * @ni:		ntfs attribute inode to initialize
+ * @ni:		ntfs inode with which to initialize the ntfs attribute
+ * @type:	attribute type
+ * @name:	attribute name in little endian Unicode or NULL
+ * @name_len:	length of attribute @name in Unicode characters (if @name gi=
ven)
+ *
+ * Initialize the ntfs attribute @na with @ni, @type, @name, and @name_len.
+ */
+static void __ntfs_attr_init(struct ntfs_inode *ni,
+		const __le32 type, __le16 *name, const u32 name_len)
+{
+	ni->runlist.rl =3D NULL;
+	ni->type =3D type;
+	ni->name =3D name;
+	if (name)
+		ni->name_len =3D name_len;
+	else
+		ni->name_len =3D 0;
+}
+
+/**
+ * ntfs_attr_init - initialize an ntfs_attr with data sizes and status
+ * Final initialization for an ntfs attribute.
+ */
+static void ntfs_attr_init(struct ntfs_inode *ni, const bool non_resident,
+		const bool compressed, const bool encrypted, const bool sparse,
+		const s64 allocated_size, const s64 data_size,
+		const s64 initialized_size, const s64 compressed_size,
+		const u8 compression_unit)
+{
+	if (non_resident)
+		NInoSetNonResident(ni);
+	if (compressed) {
+		NInoSetCompressed(ni);
+		ni->flags |=3D FILE_ATTR_COMPRESSED;
+	}
+	if (encrypted) {
+		NInoSetEncrypted(ni);
+		ni->flags |=3D FILE_ATTR_ENCRYPTED;
+	}
+	if (sparse) {
+		NInoSetSparse(ni);
+		ni->flags |=3D FILE_ATTR_SPARSE_FILE;
+	}
+	ni->allocated_size =3D allocated_size;
+	ni->data_size =3D data_size;
+	ni->initialized_size =3D initialized_size;
+	if (compressed || sparse) {
+		struct ntfs_volume *vol =3D ni->vol;
+
+		ni->itype.compressed.size =3D compressed_size;
+		ni->itype.compressed.block_clusters =3D 1 << compression_unit;
+		ni->itype.compressed.block_size =3D 1 << (compression_unit +
+				vol->cluster_size_bits);
+		ni->itype.compressed.block_size_bits =3D ffs(
+				ni->itype.compressed.block_size) - 1;
+	}
+}
+
+/**
+ * ntfs_attr_open - open an ntfs attribute for access
+ * @ni:		open ntfs inode in which the ntfs attribute resides
+ * @type:	attribute type
+ * @name:	attribute name in little endian Unicode or AT_UNNAMED or NULL
+ * @name_len:	length of attribute @name in Unicode characters (if @name gi=
ven)
+ */
+int ntfs_attr_open(struct ntfs_inode *ni, const __le32 type,
+		__le16 *name, u32 name_len)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	__le16 *newname =3D NULL;
+	struct attr_record *a;
+	bool cs;
+	struct ntfs_inode *base_ni;
+	int err;
+
+	ntfs_debug("Entering for inode %lld, attr 0x%x.\n",
+			(unsigned long long)ni->mft_no, type);
+
+	if (!ni || !ni->vol)
+		return -EINVAL;
+
+	if (NInoAttr(ni))
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+
+	if (name && name !=3D AT_UNNAMED && name !=3D I30) {
+		name =3D ntfs_ucsndup(name, name_len);
+		if (!name) {
+			err =3D -ENOMEM;
+			goto err_out;
+		}
+		newname =3D name;
+	}
+
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, NULL);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		pr_err("%s: Failed to get search context", __func__);
+		goto err_out;
+	}
+
+	err =3D ntfs_attr_lookup(type, name, name_len, 0, 0, NULL, 0, ctx);
+	if (err)
+		goto put_err_out;
+
+	a =3D ctx->attr;
+
+	if (!name) {
+		if (a->name_length) {
+			name =3D ntfs_ucsndup((__le16 *)((u8 *)a + le16_to_cpu(a->name_offset)),
+					    a->name_length);
+			if (!name)
+				goto put_err_out;
+			newname =3D name;
+			name_len =3D a->name_length;
+		} else {
+			name =3D AT_UNNAMED;
+			name_len =3D 0;
+		}
+	}
+
+	__ntfs_attr_init(ni, type, name, name_len);
+
+	/*
+	 * Wipe the flags in case they are not zero for an attribute list
+	 * attribute.  Windows does not complain about invalid flags and chkdsk
+	 * does not detect or fix them so we need to cope with it, too.
+	 */
+	if (type =3D=3D AT_ATTRIBUTE_LIST)
+		a->flags =3D 0;
+
+	if ((type =3D=3D AT_DATA) &&
+	    (a->non_resident ? !a->data.non_resident.initialized_size :
+	     !a->data.resident.value_length)) {
+		/*
+		 * Define/redefine the compression state if stream is
+		 * empty, based on the compression mark on parent
+		 * directory (for unnamed data streams) or on current
+		 * inode (for named data streams). The compression mark
+		 * may change any time, the compression state can only
+		 * change when stream is wiped out.
+		 *
+		 * Also prevent compression on NTFS version < 3.0
+		 * or cluster size > 4K or compression is disabled
+		 */
+		a->flags &=3D ~ATTR_COMPRESSION_MASK;
+		if (NInoCompressed(ni)
+				&& (ni->vol->major_ver >=3D 3)
+				&& NVolCompression(ni->vol)
+				&& (ni->vol->cluster_size <=3D MAX_COMPRESSION_CLUSTER_SIZE))
+			a->flags |=3D ATTR_IS_COMPRESSED;
+	}
+
+	cs =3D a->flags & (ATTR_IS_COMPRESSED | ATTR_IS_SPARSE);
+
+	if (ni->type =3D=3D AT_DATA && ni->name =3D=3D AT_UNNAMED &&
+	    ((!(a->flags & ATTR_IS_COMPRESSED) !=3D !NInoCompressed(ni)) ||
+	     (!(a->flags & ATTR_IS_SPARSE)     !=3D !NInoSparse(ni)) ||
+	     (!(a->flags & ATTR_IS_ENCRYPTED)  !=3D !NInoEncrypted(ni)))) {
+		err =3D -EIO;
+		pr_err("Inode %lld has corrupt attribute flags (0x%x <> 0x%x)\n",
+				(unsigned long long)ni->mft_no,
+				a->flags, ni->flags);
+		goto put_err_out;
+	}
+
+	if (a->non_resident) {
+		if (((a->flags & ATTR_COMPRESSION_MASK) || a->data.non_resident.compress=
ion_unit) &&
+				(ni->vol->major_ver < 3)) {
+			err =3D -EIO;
+			pr_err("Compressed inode %lld not allowed  on NTFS %d.%d\n",
+					(unsigned long long)ni->mft_no,
+					ni->vol->major_ver,
+					ni->vol->major_ver);
+			goto put_err_out;
+		}
+
+		if ((a->flags & ATTR_IS_COMPRESSED) && !a->data.non_resident.compression=
_unit) {
+			err =3D -EIO;
+			pr_err("Compressed inode %lld attr 0x%x has no compression unit\n",
+					(unsigned long long)ni->mft_no, type);
+			goto put_err_out;
+		}
+		if ((a->flags & ATTR_COMPRESSION_MASK) &&
+		    (a->data.non_resident.compression_unit !=3D STANDARD_COMPRESSION_UNI=
T)) {
+			err =3D -EIO;
+			pr_err("Compressed inode %lld attr 0x%lx has an unsupported compression=
 unit %d\n",
+					(unsigned long long)ni->mft_no,
+					(long)le32_to_cpu(type),
+					(int)a->data.non_resident.compression_unit);
+			goto put_err_out;
+		}
+		ntfs_attr_init(ni, true, a->flags & ATTR_IS_COMPRESSED,
+				a->flags & ATTR_IS_ENCRYPTED,
+				a->flags & ATTR_IS_SPARSE,
+				le64_to_cpu(a->data.non_resident.allocated_size),
+				le64_to_cpu(a->data.non_resident.data_size),
+				le64_to_cpu(a->data.non_resident.initialized_size),
+				cs ? le64_to_cpu(a->data.non_resident.compressed_size) : 0,
+				cs ? a->data.non_resident.compression_unit : 0);
+	} else {
+		s64 l =3D le32_to_cpu(a->data.resident.value_length);
+
+		ntfs_attr_init(ni, false, a->flags & ATTR_IS_COMPRESSED,
+				a->flags & ATTR_IS_ENCRYPTED,
+				a->flags & ATTR_IS_SPARSE, (l + 7) & ~7, l, l,
+				cs ? (l + 7) & ~7 : 0, 0);
+	}
+	ntfs_attr_put_search_ctx(ctx);
+out:
+	ntfs_debug("\n");
+	return err;
+
+put_err_out:
+	ntfs_attr_put_search_ctx(ctx);
+err_out:
+	ntfs_free(newname);
+	goto out;
+}
+
+/**
+ * ntfs_attr_close - free an ntfs attribute structure
+ * @ni:		ntfs inode to free
+ *
+ * Release all memory associated with the ntfs attribute @na and then rele=
ase
+ * @na itself.
+ */
+void ntfs_attr_close(struct ntfs_inode *ni)
+{
+	if (NInoNonResident(ni) && ni->runlist.rl)
+		ntfs_free(ni->runlist.rl);
+	/* Don't release if using an internal constant. */
+	if (ni->name !=3D AT_UNNAMED && ni->name !=3D I30)
+		ntfs_free(ni->name);
+}
+
+/**
+ * ntfs_attr_map_whole_runlist - map the whole runlist of an ntfs attribute
+ * @ni:		ntfs inode for which to map the runlist
+ *
+ * Map the whole runlist of the ntfs attribute @na.  For an attribute made=
 up
+ * of only one attribute extent this is the same as calling
+ * ntfs_map_runlist(ni, 0) but for an attribute with multiple extents this
+ * will map the runlist fragments from each of the extents thus giving acc=
ess
+ * to the entirety of the disk allocation of an attribute.
+ */
+int ntfs_attr_map_whole_runlist(struct ntfs_inode *ni)
+{
+	s64 next_vcn, last_vcn, highest_vcn;
+	struct ntfs_attr_search_ctx *ctx;
+	struct ntfs_volume *vol =3D ni->vol;
+	struct super_block *sb =3D vol->sb;
+	struct attr_record *a;
+	int err;
+	struct ntfs_inode *base_ni;
+	int not_mapped;
+	size_t new_rl_count;
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x.\n",
+			(unsigned long long)ni->mft_no, ni->type);
+
+	if (NInoFullyMapped(ni) && ni->runlist.rl)
+		return 0;
+
+	if (NInoAttr(ni))
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, NULL);
+	if (!ctx) {
+		ntfs_error(sb, "%s: Failed to get search context", __func__);
+		return -ENOMEM;
+	}
+
+	/* Map all attribute extents one by one. */
+	next_vcn =3D last_vcn =3D highest_vcn =3D 0;
+	a =3D NULL;
+	while (1) {
+		struct runlist_element *rl;
+
+		not_mapped =3D 0;
+		if (ntfs_rl_vcn_to_lcn(ni->runlist.rl, next_vcn) =3D=3D LCN_RL_NOT_MAPPE=
D)
+			not_mapped =3D 1;
+
+		err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+					CASE_SENSITIVE, next_vcn, NULL, 0, ctx);
+		if (err)
+			break;
+
+		a =3D ctx->attr;
+
+		if (not_mapped) {
+			/* Decode the runlist. */
+			rl =3D ntfs_mapping_pairs_decompress(ni->vol, a, &ni->runlist,
+							   &new_rl_count);
+			if (IS_ERR(rl)) {
+				err =3D PTR_ERR(rl);
+				goto err_out;
+			}
+			ni->runlist.rl =3D rl;
+			ni->runlist.count =3D new_rl_count;
+		}
+
+		/* Are we in the first extent? */
+		if (!next_vcn) {
+			if (a->data.non_resident.lowest_vcn) {
+				err =3D -EIO;
+				ntfs_error(sb,
+					"First extent of inode %llu attribute has non-zero lowest_vcn",
+					(unsigned long long)ni->mft_no);
+				goto err_out;
+			}
+			/* Get the last vcn in the attribute. */
+			last_vcn =3D le64_to_cpu(a->data.non_resident.allocated_size) >>
+				vol->cluster_size_bits;
+		}
+
+		/* Get the lowest vcn for the next extent. */
+		highest_vcn =3D le64_to_cpu(a->data.non_resident.highest_vcn);
+		next_vcn =3D highest_vcn + 1;
+
+		/* Only one extent or error, which we catch below. */
+		if (next_vcn <=3D 0) {
+			err =3D -ENOENT;
+			break;
+		}
+
+		/* Avoid endless loops due to corruption. */
+		if (next_vcn < le64_to_cpu(a->data.non_resident.lowest_vcn)) {
+			err =3D -EIO;
+			ntfs_error(sb, "Inode %llu has corrupt attribute list",
+					(unsigned long long)ni->mft_no);
+			goto err_out;
+		}
+	}
+	if (!a) {
+		ntfs_error(sb, "Couldn't find attribute for runlist mapping");
+		goto err_out;
+	}
+	if (not_mapped && highest_vcn && highest_vcn !=3D last_vcn - 1) {
+		err =3D -EIO;
+		ntfs_error(sb,
+			"Failed to load full runlist: inode: %llu highest_vcn: 0x%llx last_vcn:=
 0x%llx",
+			(unsigned long long)ni->mft_no,
+			(long long)highest_vcn, (long long)last_vcn);
+		goto err_out;
+	}
+	ntfs_attr_put_search_ctx(ctx);
+	if (err =3D=3D -ENOENT) {
+		NInoSetFullyMapped(ni);
+		return 0;
+	}
+
+	return err;
+
+err_out:
+	ntfs_attr_put_search_ctx(ctx);
+	return err;
+}
+
+/**
+ * ntfs_attr_record_move_to - move attribute record to target inode
+ * @ctx:	attribute search context describing the attribute record
+ * @ni:		opened ntfs inode to which move attribute record
+ */
+int ntfs_attr_record_move_to(struct ntfs_attr_search_ctx *ctx, struct ntfs=
_inode *ni)
+{
+	struct ntfs_attr_search_ctx *nctx;
+	struct attr_record *a;
+	int err;
+	struct mft_record *ni_mrec;
+	struct super_block *sb;
+
+	if (!ctx || !ctx->attr || !ctx->ntfs_ino || !ni) {
+		ntfs_debug("Invalid arguments passed.\n");
+		return -EINVAL;
+	}
+
+	sb =3D ni->vol->sb;
+	ntfs_debug("Entering for ctx->attr->type 0x%x, ctx->ntfs_ino->mft_no 0x%l=
lx, ni->mft_no 0x%llx.\n",
+			(unsigned int) le32_to_cpu(ctx->attr->type),
+			(long long) ctx->ntfs_ino->mft_no,
+			(long long) ni->mft_no);
+
+	if (ctx->ntfs_ino =3D=3D ni)
+		return 0;
+
+	if (!ctx->al_entry) {
+		ntfs_debug("Inode should contain attribute list to use this function.\n"=
);
+		return -EINVAL;
+	}
+
+	/* Find place in MFT record where attribute will be moved. */
+	a =3D ctx->attr;
+	nctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!nctx) {
+		ntfs_error(sb, "%s: Failed to get search context", __func__);
+		return -ENOMEM;
+	}
+
+	/*
+	 * Use ntfs_attr_find instead of ntfs_attr_lookup to find place for
+	 * attribute in @ni->mrec, not any extent inode in case if @ni is base
+	 * file record.
+	 */
+	err =3D ntfs_attr_find(a->type, (__le16 *)((u8 *)a + le16_to_cpu(a->name_=
offset)),
+				a->name_length, CASE_SENSITIVE, NULL,
+				0, nctx);
+	if (!err) {
+		ntfs_debug("Attribute of such type, with same name already present in th=
is MFT record.\n");
+		err =3D -EEXIST;
+		goto put_err_out;
+	}
+	if (err !=3D -ENOENT) {
+		ntfs_debug("Attribute lookup failed.\n");
+		goto put_err_out;
+	}
+
+	/* Make space and move attribute. */
+	ni_mrec =3D map_mft_record(ni);
+	if (IS_ERR(ni_mrec)) {
+		err =3D -EIO;
+		goto put_err_out;
+	}
+
+	err =3D ntfs_make_room_for_attr(ni_mrec, (u8 *) nctx->attr,
+				le32_to_cpu(a->length));
+	if (err) {
+		ntfs_debug("Couldn't make space for attribute.\n");
+		unmap_mft_record(ni);
+		goto put_err_out;
+	}
+	memcpy(nctx->attr, a, le32_to_cpu(a->length));
+	nctx->attr->instance =3D nctx->mrec->next_attr_instance;
+	nctx->mrec->next_attr_instance =3D
+		cpu_to_le16((le16_to_cpu(nctx->mrec->next_attr_instance) + 1) & 0xffff);
+	ntfs_attr_record_resize(ctx->mrec, a, 0);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	mark_mft_record_dirty(ni);
+
+	/* Update attribute list. */
+	ctx->al_entry->mft_reference =3D
+		MK_LE_MREF(ni->mft_no, le16_to_cpu(ni_mrec->sequence_number));
+	ctx->al_entry->instance =3D nctx->attr->instance;
+	unmap_mft_record(ni);
+put_err_out:
+	ntfs_attr_put_search_ctx(nctx);
+	return err;
+}
+
+/**
+ * ntfs_attr_record_move_away - move away attribute record from it's mft r=
ecord
+ * @ctx:	attribute search context describing the attribute record
+ * @extra:	minimum amount of free space in the new holder of record
+ */
+int ntfs_attr_record_move_away(struct ntfs_attr_search_ctx *ctx, int extra)
+{
+	struct ntfs_inode *base_ni, *ni =3D NULL;
+	struct mft_record *m;
+	int i, err;
+	struct super_block *sb;
+
+	if (!ctx || !ctx->attr || !ctx->ntfs_ino || extra < 0)
+		return -EINVAL;
+
+	ntfs_debug("Entering for attr 0x%x, inode %llu\n",
+			(unsigned int) le32_to_cpu(ctx->attr->type),
+			(unsigned long long)ctx->ntfs_ino->mft_no);
+
+	if (ctx->ntfs_ino->nr_extents =3D=3D -1)
+		base_ni =3D ctx->base_ntfs_ino;
+	else
+		base_ni =3D ctx->ntfs_ino;
+
+	sb =3D ctx->ntfs_ino->vol->sb;
+	if (!NInoAttrList(base_ni)) {
+		ntfs_error(sb, "Inode %llu has no attrlist",
+				(unsigned long long)base_ni->mft_no);
+		return -EINVAL;
+	}
+
+	err =3D ntfs_inode_attach_all_extents(ctx->ntfs_ino);
+	if (err) {
+		ntfs_error(sb, "Couldn't attach extents, inode=3D%llu",
+			(unsigned long long)base_ni->mft_no);
+		return err;
+	}
+
+	mutex_lock(&base_ni->extent_lock);
+	/* Walk through all extents and try to move attribute to them. */
+	for (i =3D 0; i < base_ni->nr_extents; i++) {
+		ni =3D base_ni->ext.extent_ntfs_inos[i];
+
+		if (ctx->ntfs_ino->mft_no =3D=3D ni->mft_no)
+			continue;
+		m =3D map_mft_record(ni);
+		if (IS_ERR(m)) {
+			ntfs_error(sb, "Can not map mft record for mft_no %lld",
+					(unsigned long long)ni->mft_no);
+			mutex_unlock(&base_ni->extent_lock);
+			return -EIO;
+		}
+		if (le32_to_cpu(m->bytes_allocated) -
+		    le32_to_cpu(m->bytes_in_use) < le32_to_cpu(ctx->attr->length) + extr=
a) {
+			unmap_mft_record(ni);
+			continue;
+		}
+		unmap_mft_record(ni);
+
+		/*
+		 * ntfs_attr_record_move_to can fail if extent with other lowest
+		 * s64 already present in inode we trying move record to. So,
+		 * do not return error.
+		 */
+		if (!ntfs_attr_record_move_to(ctx, ni)) {
+			mutex_unlock(&base_ni->extent_lock);
+			return 0;
+		}
+	}
+	mutex_unlock(&base_ni->extent_lock);
+
+	/*
+	 * Failed to move attribute to one of the current extents, so allocate
+	 * new extent and move attribute to it.
+	 */
+	ni =3D NULL;
+	err =3D ntfs_mft_record_alloc(base_ni->vol, 0, &ni, base_ni, NULL);
+	if (err) {
+		ntfs_error(sb, "Couldn't allocate MFT record, err : %d", err);
+		return err;
+	}
+	unmap_mft_record(ni);
+
+	err =3D ntfs_attr_record_move_to(ctx, ni);
+	if (err)
+		ntfs_error(sb, "Couldn't move attribute to MFT record");
+
+	return err;
+}
+
+/*
+ * If we are in the first extent, then set/clean sparse bit,
+ * update allocated and compressed size.
+ */
+static int ntfs_attr_update_meta(struct attr_record *a, struct ntfs_inode =
*ni,
+		struct mft_record *m, struct ntfs_attr_search_ctx *ctx)
+{
+	int sparse, err =3D 0;
+	struct ntfs_inode *base_ni;
+	struct super_block *sb =3D ni->vol->sb;
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x\n",
+			(unsigned long long)ni->mft_no, ni->type);
+
+	if (NInoAttr(ni))
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+
+	if (a->data.non_resident.lowest_vcn)
+		goto out;
+
+	a->data.non_resident.allocated_size =3D cpu_to_le64(ni->allocated_size);
+
+	sparse =3D ntfs_rl_sparse(ni->runlist.rl);
+	if (sparse < 0) {
+		err =3D -EIO;
+		goto out;
+	}
+
+	/* Attribute become sparse. */
+	if (sparse && !(a->flags & (ATTR_IS_SPARSE | ATTR_IS_COMPRESSED))) {
+		/*
+		 * Move attribute to another mft record, if attribute is too
+		 * small to add compressed_size field to it and we have no
+		 * free space in the current mft record.
+		 */
+		if ((le32_to_cpu(a->length) -
+		     le16_to_cpu(a->data.non_resident.mapping_pairs_offset) =3D=3D 8) &&
+		    !(le32_to_cpu(m->bytes_allocated) - le32_to_cpu(m->bytes_in_use))) {
+
+			if (!NInoAttrList(base_ni)) {
+				err =3D ntfs_inode_add_attrlist(base_ni);
+				if (err)
+					goto out;
+				err =3D -EAGAIN;
+				goto out;
+			}
+			err =3D ntfs_attr_record_move_away(ctx, 8);
+			if (err) {
+				ntfs_error(sb, "Failed to move attribute");
+				goto out;
+			}
+
+			err =3D ntfs_attrlist_update(base_ni);
+			if (err)
+				goto out;
+			err =3D -EAGAIN;
+			goto out;
+		}
+		if (!(le32_to_cpu(a->length) -
+		    le16_to_cpu(a->data.non_resident.mapping_pairs_offset))) {
+			err =3D -EIO;
+			ntfs_error(sb, "Mapping pairs space is 0");
+			goto out;
+		}
+
+		NInoSetSparse(ni);
+		ni->flags |=3D FILE_ATTR_SPARSE_FILE;
+		a->flags |=3D ATTR_IS_SPARSE;
+		a->data.non_resident.compression_unit =3D 0;
+
+		memmove((u8 *)a + le16_to_cpu(a->name_offset) + 8,
+				(u8 *)a + le16_to_cpu(a->name_offset),
+				a->name_length * sizeof(__le16));
+
+		a->name_offset =3D cpu_to_le16(le16_to_cpu(a->name_offset) + 8);
+
+		a->data.non_resident.mapping_pairs_offset =3D
+			cpu_to_le16(le16_to_cpu(a->data.non_resident.mapping_pairs_offset) + 8);
+	}
+
+	/* Attribute no longer sparse. */
+	if (!sparse && (a->flags & ATTR_IS_SPARSE) &&
+	    !(a->flags & ATTR_IS_COMPRESSED)) {
+		NInoClearSparse(ni);
+		ni->flags &=3D ~FILE_ATTR_SPARSE_FILE;
+		a->flags &=3D ~ATTR_IS_SPARSE;
+		a->data.non_resident.compression_unit =3D 0;
+
+		memmove((u8 *)a + le16_to_cpu(a->name_offset) - 8,
+				(u8 *)a + le16_to_cpu(a->name_offset),
+				a->name_length * sizeof(__le16));
+
+		if (le16_to_cpu(a->name_offset) >=3D 8)
+			a->name_offset =3D cpu_to_le16(le16_to_cpu(a->name_offset) - 8);
+
+		a->data.non_resident.mapping_pairs_offset =3D
+			cpu_to_le16(le16_to_cpu(a->data.non_resident.mapping_pairs_offset) - 8);
+	}
+
+	/* Update compressed size if required. */
+	if (NInoFullyMapped(ni) && (sparse || NInoCompressed(ni))) {
+		s64 new_compr_size;
+
+		new_compr_size =3D ntfs_rl_get_compressed_size(ni->vol, ni->runlist.rl);
+		if (new_compr_size < 0) {
+			err =3D new_compr_size;
+			goto out;
+		}
+
+		ni->itype.compressed.size =3D new_compr_size;
+		a->data.non_resident.compressed_size =3D cpu_to_le64(new_compr_size);
+	}
+
+	if (NInoSparse(ni) || NInoCompressed(ni))
+		VFS_I(base_ni)->i_blocks =3D ni->itype.compressed.size >> 9;
+	else
+		VFS_I(base_ni)->i_blocks =3D ni->allocated_size >> 9;
+	/*
+	 * Set FILE_NAME dirty flag, to update sparse bit and
+	 * allocated size in the index.
+	 */
+	if (ni->type =3D=3D AT_DATA && ni->name =3D=3D AT_UNNAMED)
+		NInoSetFileNameDirty(ni);
+out:
+	return err;
+}
+
+#define NTFS_VCN_DELETE_MARK -2
+/**
+ * ntfs_attr_update_mapping_pairs - update mapping pairs for ntfs attribute
+ * @ni:		non-resident ntfs inode for which we need update
+ * @from_vcn:	update runlist starting this VCN
+ *
+ * Build mapping pairs from @na->rl and write them to the disk. Also, this
+ * function updates sparse bit, allocated and compressed size (allocates/f=
rees
+ * space for this field if required).
+ *
+ * @na->allocated_size should be set to correct value for the new runlist =
before
+ * call to this function. Vice-versa @na->compressed_size will be calculat=
ed and
+ * set to correct value during this function.
+ */
+int ntfs_attr_update_mapping_pairs(struct ntfs_inode *ni, s64 from_vcn)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	struct ntfs_inode *base_ni;
+	struct mft_record *m;
+	struct attr_record *a;
+	s64 stop_vcn;
+	int err =3D 0, mp_size, cur_max_mp_size, exp_max_mp_size;
+	bool finished_build;
+	bool first_updated =3D false;
+	struct super_block *sb;
+	struct runlist_element *start_rl;
+	unsigned int de_cluster_count =3D 0;
+
+retry:
+	if (!ni || !ni->runlist.rl)
+		return -EINVAL;
+
+	ntfs_debug("Entering for inode %llu, attr 0x%x\n",
+			(unsigned long long)ni->mft_no, ni->type);
+
+	sb =3D ni->vol->sb;
+	if (!NInoNonResident(ni)) {
+		ntfs_error(sb, "%s: resident attribute", __func__);
+		return -EINVAL;
+	}
+
+	if (ni->nr_extents =3D=3D -1)
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, NULL);
+	if (!ctx) {
+		ntfs_error(sb, "%s: Failed to get search context", __func__);
+		return -ENOMEM;
+	}
+
+	/* Fill attribute records with new mapping pairs. */
+	stop_vcn =3D 0;
+	finished_build =3D false;
+	start_rl =3D ni->runlist.rl;
+	while (!(err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+				CASE_SENSITIVE, from_vcn, NULL, 0, ctx))) {
+		unsigned int de_cnt =3D 0;
+
+		a =3D ctx->attr;
+		m =3D ctx->mrec;
+		if (!a->data.non_resident.lowest_vcn)
+			first_updated =3D true;
+
+		/*
+		 * If runlist is updating not from the beginning, then set
+		 * @stop_vcn properly, i.e. to the lowest vcn of record that
+		 * contain @from_vcn. Also we do not need @from_vcn anymore,
+		 * set it to 0 to make ntfs_attr_lookup enumerate attributes.
+		 */
+		if (from_vcn) {
+			s64 first_lcn;
+
+			stop_vcn =3D le64_to_cpu(a->data.non_resident.lowest_vcn);
+			from_vcn =3D 0;
+			/*
+			 * Check whether the first run we need to update is
+			 * the last run in runlist, if so, then deallocate
+			 * all attrubute extents starting this one.
+			 */
+			first_lcn =3D ntfs_rl_vcn_to_lcn(ni->runlist.rl, stop_vcn);
+			if (first_lcn =3D=3D LCN_EINVAL) {
+				err =3D -EIO;
+				ntfs_error(sb, "Bad runlist");
+				goto put_err_out;
+			}
+			if (first_lcn =3D=3D LCN_ENOENT ||
+			    first_lcn =3D=3D LCN_RL_NOT_MAPPED)
+				finished_build =3D true;
+		}
+
+		/*
+		 * Check whether we finished mapping pairs build, if so mark
+		 * extent as need to delete (by setting highest vcn to
+		 * NTFS_VCN_DELETE_MARK (-2), we shall check it later and
+		 * delete extent) and continue search.
+		 */
+		if (finished_build) {
+			ntfs_debug("Mark attr 0x%x for delete in inode 0x%lx.\n",
+				(unsigned int)le32_to_cpu(a->type), ctx->ntfs_ino->mft_no);
+			a->data.non_resident.highest_vcn =3D cpu_to_le64(NTFS_VCN_DELETE_MARK);
+			mark_mft_record_dirty(ctx->ntfs_ino);
+			continue;
+		}
+
+		err =3D ntfs_attr_update_meta(a, ni, m, ctx);
+		if (err < 0) {
+			if (err =3D=3D -EAGAIN) {
+				ntfs_attr_put_search_ctx(ctx);
+				goto retry;
+			}
+			goto put_err_out;
+		}
+
+		/*
+		 * Determine maximum possible length of mapping pairs,
+		 * if we shall *not* expand space for mapping pairs.
+		 */
+		cur_max_mp_size =3D le32_to_cpu(a->length) -
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset);
+		/*
+		 * Determine maximum possible length of mapping pairs in the
+		 * current mft record, if we shall expand space for mapping
+		 * pairs.
+		 */
+		exp_max_mp_size =3D le32_to_cpu(m->bytes_allocated) -
+			le32_to_cpu(m->bytes_in_use) + cur_max_mp_size;
+
+		/* Get the size for the rest of mapping pairs array. */
+		mp_size =3D ntfs_get_size_for_mapping_pairs(ni->vol, start_rl,
+				stop_vcn, -1, exp_max_mp_size);
+		if (mp_size <=3D 0) {
+			err =3D mp_size;
+			ntfs_error(sb, "%s: get MP size failed", __func__);
+			goto put_err_out;
+		}
+		/* Test mapping pairs for fitting in the current mft record. */
+		if (mp_size > exp_max_mp_size) {
+			/*
+			 * Mapping pairs of $ATTRIBUTE_LIST attribute must fit
+			 * in the base mft record. Try to move out other
+			 * attributes and try again.
+			 */
+			if (ni->type =3D=3D AT_ATTRIBUTE_LIST) {
+				ntfs_attr_put_search_ctx(ctx);
+				if (ntfs_inode_free_space(base_ni, mp_size -
+							cur_max_mp_size)) {
+					ntfs_debug("Attribute list is too big. Defragment the volume\n");
+					return -ENOSPC;
+				}
+				if (ntfs_attrlist_update(base_ni))
+					return -EIO;
+				goto retry;
+			}
+
+			/* Add attribute list if it isn't present, and retry. */
+			if (!NInoAttrList(base_ni)) {
+				ntfs_attr_put_search_ctx(ctx);
+				if (ntfs_inode_add_attrlist(base_ni)) {
+					ntfs_error(sb, "Can not add attrlist");
+					return -EIO;
+				}
+				goto retry;
+			}
+
+			/*
+			 * Set mapping pairs size to maximum possible for this
+			 * mft record. We shall write the rest of mapping pairs
+			 * to another MFT records.
+			 */
+			mp_size =3D exp_max_mp_size;
+		}
+
+		/* Change space for mapping pairs if we need it. */
+		if (((mp_size + 7) & ~7) !=3D cur_max_mp_size) {
+			if (ntfs_attr_record_resize(m, a,
+					le16_to_cpu(a->data.non_resident.mapping_pairs_offset) +
+						mp_size)) {
+				err =3D -EIO;
+				ntfs_error(sb, "Failed to resize attribute");
+				goto put_err_out;
+			}
+		}
+
+		/* Update lowest vcn. */
+		a->data.non_resident.lowest_vcn =3D cpu_to_le64(stop_vcn);
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		if ((ctx->ntfs_ino->nr_extents =3D=3D -1 || NInoAttrList(ctx->ntfs_ino))=
 &&
+		    ctx->attr->type !=3D AT_ATTRIBUTE_LIST) {
+			ctx->al_entry->lowest_vcn =3D cpu_to_le64(stop_vcn);
+			err =3D ntfs_attrlist_update(base_ni);
+			if (err)
+				goto put_err_out;
+		}
+
+		/*
+		 * Generate the new mapping pairs array directly into the
+		 * correct destination, i.e. the attribute record itself.
+		 */
+		err =3D ntfs_mapping_pairs_build(ni->vol,
+				(u8 *)a + le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
+				mp_size, start_rl, stop_vcn, -1, &stop_vcn, &start_rl, &de_cnt);
+		if (!err)
+			finished_build =3D true;
+		if (!finished_build && err !=3D -ENOSPC) {
+			ntfs_error(sb, "Failed to build mapping pairs");
+			goto put_err_out;
+		}
+		a->data.non_resident.highest_vcn =3D cpu_to_le64(stop_vcn - 1);
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		de_cluster_count +=3D de_cnt;
+	}
+
+	/* Check whether error occurred. */
+	if (err && err !=3D -ENOENT) {
+		ntfs_error(sb, "%s: Attribute lookup failed", __func__);
+		goto put_err_out;
+	}
+
+	/*
+	 * If the base extent was skipped in the above process,
+	 * we still may have to update the sizes.
+	 */
+	if (!first_updated) {
+		ntfs_attr_reinit_search_ctx(ctx);
+		err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+				CASE_SENSITIVE, 0, NULL, 0, ctx);
+		if (!err) {
+			a =3D ctx->attr;
+			a->data.non_resident.allocated_size =3D cpu_to_le64(ni->allocated_size);
+			if (NInoCompressed(ni) || NInoSparse(ni))
+				a->data.non_resident.compressed_size =3D
+					cpu_to_le64(ni->itype.compressed.size);
+			/* Updating sizes taints the extent holding the attr */
+			if (ni->type =3D=3D AT_DATA && ni->name =3D=3D AT_UNNAMED)
+				NInoSetFileNameDirty(ni);
+			mark_mft_record_dirty(ctx->ntfs_ino);
+		} else {
+			ntfs_error(sb, "Failed to update sizes in base extent\n");
+			goto put_err_out;
+		}
+	}
+
+	/* Deallocate not used attribute extents and return with success. */
+	if (finished_build) {
+		ntfs_attr_reinit_search_ctx(ctx);
+		ntfs_debug("Deallocate marked extents.\n");
+		while (!(err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+				CASE_SENSITIVE, 0, NULL, 0, ctx))) {
+			if (le64_to_cpu(ctx->attr->data.non_resident.highest_vcn) !=3D
+					NTFS_VCN_DELETE_MARK)
+				continue;
+			/* Remove unused attribute record. */
+			err =3D ntfs_attr_record_rm(ctx);
+			if (err) {
+				ntfs_error(sb, "Could not remove unused attr");
+				goto put_err_out;
+			}
+			ntfs_attr_reinit_search_ctx(ctx);
+		}
+		if (err && err !=3D -ENOENT) {
+			ntfs_error(sb, "%s: Attr lookup failed", __func__);
+			goto put_err_out;
+		}
+		ntfs_debug("Deallocate done.\n");
+		ntfs_attr_put_search_ctx(ctx);
+		goto out;
+	}
+	ntfs_attr_put_search_ctx(ctx);
+	ctx =3D NULL;
+
+	/* Allocate new MFT records for the rest of mapping pairs. */
+	while (1) {
+		struct ntfs_inode *ext_ni =3D NULL;
+		unsigned int de_cnt =3D 0;
+
+		/* Allocate new mft record. */
+		err =3D ntfs_mft_record_alloc(ni->vol, 0, &ext_ni, base_ni, NULL);
+		if (err) {
+			ntfs_error(sb, "Failed to allocate extent record");
+			goto put_err_out;
+		}
+		unmap_mft_record(ext_ni);
+
+		m =3D map_mft_record(ext_ni);
+		if (IS_ERR(m)) {
+			ntfs_error(sb, "Could not map new MFT record");
+			if (ntfs_mft_record_free(ni->vol, ext_ni))
+				ntfs_error(sb, "Could not free MFT record");
+			ntfs_inode_close(ext_ni);
+			err =3D -ENOMEM;
+			ext_ni =3D NULL;
+			goto put_err_out;
+		}
+		/*
+		 * If mapping size exceed available space, set them to
+		 * possible maximum.
+		 */
+		cur_max_mp_size =3D le32_to_cpu(m->bytes_allocated) -
+			le32_to_cpu(m->bytes_in_use) -
+			(sizeof(struct attr_record) +
+			 ((NInoCompressed(ni) || NInoSparse(ni)) ?
+			  sizeof(a->data.non_resident.compressed_size) : 0)) -
+			((sizeof(__le16) * ni->name_len + 7) & ~7);
+
+		/* Calculate size of rest mapping pairs. */
+		mp_size =3D ntfs_get_size_for_mapping_pairs(ni->vol,
+				start_rl, stop_vcn, -1, cur_max_mp_size);
+		if (mp_size <=3D 0) {
+			unmap_mft_record(ext_ni);
+			ntfs_inode_close(ext_ni);
+			err =3D mp_size;
+			ntfs_error(sb, "%s: get mp size failed", __func__);
+			goto put_err_out;
+		}
+
+		if (mp_size > cur_max_mp_size)
+			mp_size =3D cur_max_mp_size;
+		/* Add attribute extent to new record. */
+		err =3D ntfs_non_resident_attr_record_add(ext_ni, ni->type,
+				ni->name, ni->name_len, stop_vcn, mp_size, 0);
+		if (err < 0) {
+			ntfs_error(sb, "Could not add attribute extent");
+			unmap_mft_record(ext_ni);
+			if (ntfs_mft_record_free(ni->vol, ext_ni))
+				ntfs_error(sb, "Could not free MFT record");
+			ntfs_inode_close(ext_ni);
+			goto put_err_out;
+		}
+		a =3D (struct attr_record *)((u8 *)m + err);
+
+		err =3D ntfs_mapping_pairs_build(ni->vol, (u8 *)a +
+				le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
+				mp_size, start_rl, stop_vcn, -1, &stop_vcn, &start_rl,
+				&de_cnt);
+		if (err < 0 && err !=3D -ENOSPC) {
+			ntfs_error(sb, "Failed to build MP");
+			unmap_mft_record(ext_ni);
+			if (ntfs_mft_record_free(ni->vol, ext_ni))
+				ntfs_error(sb, "Couldn't free MFT record");
+			goto put_err_out;
+		}
+		a->data.non_resident.highest_vcn =3D cpu_to_le64(stop_vcn - 1);
+		mark_mft_record_dirty(ext_ni);
+		unmap_mft_record(ext_ni);
+
+		de_cluster_count +=3D de_cnt;
+		/* All mapping pairs has been written. */
+		if (!err)
+			break;
+	}
+out:
+	if (from_vcn =3D=3D 0)
+		ni->i_dealloc_clusters =3D de_cluster_count;
+	return 0;
+
+put_err_out:
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	return err;
+}
+
+/**
+ * ntfs_attr_make_resident - convert a non-resident to a resident attribute
+ * @ni:		open ntfs attribute to make resident
+ * @ctx:	ntfs search context describing the attribute
+ *
+ * Convert a non-resident ntfs attribute to a resident one.
+ */
+static int ntfs_attr_make_resident(struct ntfs_inode *ni, struct ntfs_attr=
_search_ctx *ctx)
+{
+	struct ntfs_volume *vol =3D ni->vol;
+	struct super_block *sb =3D vol->sb;
+	struct attr_record *a =3D ctx->attr;
+	int name_ofs, val_ofs, err;
+	s64 arec_size;
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x.\n",
+			(unsigned long long)ni->mft_no, ni->type);
+
+	/* Should be called for the first extent of the attribute. */
+	if (le64_to_cpu(a->data.non_resident.lowest_vcn)) {
+		ntfs_debug("Eeek!  Should be called for the first extent of the attribut=
e.  Aborting...\n");
+		return -EINVAL;
+	}
+
+	/* Some preliminary sanity checking. */
+	if (!NInoNonResident(ni)) {
+		ntfs_debug("Eeek!  Trying to make resident attribute resident. Aborting.=
..\n");
+		return -EINVAL;
+	}
+
+	/* Make sure this is not $MFT/$BITMAP or Windows will not boot! */
+	if (ni->type =3D=3D AT_BITMAP && ni->mft_no =3D=3D FILE_MFT)
+		return -EPERM;
+
+	/* Check that the attribute is allowed to be resident. */
+	err =3D ntfs_attr_can_be_resident(vol, ni->type);
+	if (err)
+		return err;
+
+	if (NInoCompressed(ni) || NInoEncrypted(ni)) {
+		ntfs_debug("Making compressed or encrypted files resident is not impleme=
nted yet.\n");
+		return -EOPNOTSUPP;
+	}
+
+	/* Work out offsets into and size of the resident attribute. */
+	name_ofs =3D 24; /* =3D sizeof(resident_struct attr_record); */
+	val_ofs =3D (name_ofs + a->name_length * sizeof(__le16) + 7) & ~7;
+	arec_size =3D (val_ofs + ni->data_size + 7) & ~7;
+
+	/* Sanity check the size before we start modifying the attribute. */
+	if (le32_to_cpu(ctx->mrec->bytes_in_use) - le32_to_cpu(a->length) +
+	    arec_size > le32_to_cpu(ctx->mrec->bytes_allocated)) {
+		ntfs_debug("Not enough space to make attribute resident\n");
+		return -ENOSPC;
+	}
+
+	/* Read and cache the whole runlist if not already done. */
+	err =3D ntfs_attr_map_whole_runlist(ni);
+	if (err)
+		return err;
+
+	/* Move the attribute name if it exists and update the offset. */
+	if (a->name_length) {
+		memmove((u8 *)a + name_ofs, (u8 *)a + le16_to_cpu(a->name_offset),
+				a->name_length * sizeof(__le16));
+	}
+	a->name_offset =3D cpu_to_le16(name_ofs);
+
+	/* Resize the resident part of the attribute record. */
+	if (ntfs_attr_record_resize(ctx->mrec, a, arec_size) < 0) {
+		/*
+		 * Bug, because ntfs_attr_record_resize should not fail (we
+		 * already checked that attribute fits MFT record).
+		 */
+		ntfs_error(ctx->ntfs_ino->vol->sb, "BUG! Failed to resize attribute reco=
rd. ");
+		return -EIO;
+	}
+
+	/* Convert the attribute record to describe a resident attribute. */
+	a->non_resident =3D 0;
+	a->flags =3D 0;
+	a->data.resident.value_length =3D cpu_to_le32(ni->data_size);
+	a->data.resident.value_offset =3D cpu_to_le16(val_ofs);
+	/*
+	 * File names cannot be non-resident so we would never see this here
+	 * but at least it serves as a reminder that there may be attributes
+	 * for which we do need to set this flag. (AIA)
+	 */
+	if (a->type =3D=3D AT_FILE_NAME)
+		a->data.resident.flags =3D RESIDENT_ATTR_IS_INDEXED;
+	else
+		a->data.resident.flags =3D 0;
+	a->data.resident.reserved =3D 0;
+
+	/*
+	 * Deallocate clusters from the runlist.
+	 *
+	 * NOTE: We can use ntfs_cluster_free() because we have already mapped
+	 * the whole run list and thus it doesn't matter that the attribute
+	 * record is in a transiently corrupted state at this moment in time.
+	 */
+	err =3D ntfs_cluster_free(ni, 0, -1, ctx);
+	if (err) {
+		ntfs_error(sb, "Eeek! Failed to release allocated clusters");
+		ntfs_debug("Ignoring error and leaving behind wasted clusters.\n");
+	}
+
+	/* Throw away the now unused runlist. */
+	ntfs_free(ni->runlist.rl);
+	ni->runlist.rl =3D NULL;
+	ni->runlist.count =3D 0;
+	/* Update in-memory struct ntfs_attr. */
+	NInoClearNonResident(ni);
+	NInoClearCompressed(ni);
+	ni->flags &=3D ~FILE_ATTR_COMPRESSED;
+	NInoClearSparse(ni);
+	ni->flags &=3D ~FILE_ATTR_SPARSE_FILE;
+	NInoClearEncrypted(ni);
+	ni->flags &=3D ~FILE_ATTR_ENCRYPTED;
+	ni->initialized_size =3D ni->data_size;
+	ni->allocated_size =3D ni->itype.compressed.size =3D (ni->data_size + 7) =
& ~7;
+	ni->itype.compressed.block_size =3D 0;
+	ni->itype.compressed.block_size_bits =3D ni->itype.compressed.block_clust=
ers =3D 0;
+	return 0;
+}
+
+/**
+ * ntfs_non_resident_attr_shrink - shrink a non-resident, open ntfs attrib=
ute
+ * @ni:		non-resident ntfs attribute to shrink
+ * @newsize:	new size (in bytes) to which to shrink the attribute
+ *
+ * Reduce the size of a non-resident, open ntfs attribute @na to @newsize =
bytes.
+ */
+static int ntfs_non_resident_attr_shrink(struct ntfs_inode *ni, const s64 =
newsize)
+{
+	struct ntfs_volume *vol;
+	struct ntfs_attr_search_ctx *ctx;
+	s64 first_free_vcn;
+	s64 nr_freed_clusters;
+	int err;
+	struct ntfs_inode *base_ni;
+
+	ntfs_debug("Inode 0x%llx attr 0x%x new size %lld\n",
+		(unsigned long long)ni->mft_no, ni->type, (long long)newsize);
+
+	vol =3D ni->vol;
+
+	if (NInoAttr(ni))
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+
+	/*
+	 * Check the attribute type and the corresponding minimum size
+	 * against @newsize and fail if @newsize is too small.
+	 */
+	err =3D ntfs_attr_size_bounds_check(vol, ni->type, newsize);
+	if (err) {
+		if (err =3D=3D -ERANGE)
+			ntfs_debug("Eeek! Size bounds check failed. Aborting...\n");
+		else if (err =3D=3D -ENOENT)
+			err =3D -EIO;
+		return err;
+	}
+
+	/* The first cluster outside the new allocation. */
+	if (NInoCompressed(ni))
+		/*
+		 * For compressed files we must keep full compressions blocks,
+		 * but currently we do not decompress/recompress the last
+		 * block to truncate the data, so we may leave more allocated
+		 * clusters than really needed.
+		 */
+		first_free_vcn =3D (((newsize - 1) | (ni->itype.compressed.block_size - =
1)) + 1) >>
+			vol->cluster_size_bits;
+	else
+		first_free_vcn =3D (newsize + vol->cluster_size - 1) >>
+			vol->cluster_size_bits;
+
+	if (first_free_vcn < 0)
+		return -EINVAL;
+	/*
+	 * Compare the new allocation with the old one and only deallocate
+	 * clusters if there is a change.
+	 */
+	if ((ni->allocated_size >> vol->cluster_size_bits) !=3D first_free_vcn) {
+		struct ntfs_attr_search_ctx *ctx;
+
+		err =3D ntfs_attr_map_whole_runlist(ni);
+		if (err) {
+			ntfs_debug("Eeek! ntfs_attr_map_whole_runlist failed.\n");
+			return err;
+		}
+
+		ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+		if (!ctx) {
+			ntfs_error(vol->sb, "%s: Failed to get search context", __func__);
+			return -ENOMEM;
+		}
+
+		/* Deallocate all clusters starting with the first free one. */
+		nr_freed_clusters =3D ntfs_cluster_free(ni, first_free_vcn, -1, ctx);
+		if (nr_freed_clusters < 0) {
+			ntfs_debug("Eeek! Freeing of clusters failed. Aborting...\n");
+			ntfs_attr_put_search_ctx(ctx);
+			return (int)nr_freed_clusters;
+		}
+		ntfs_attr_put_search_ctx(ctx);
+
+		/* Truncate the runlist itself. */
+		if (ntfs_rl_truncate_nolock(vol, &ni->runlist, first_free_vcn)) {
+			/*
+			 * Failed to truncate the runlist, so just throw it
+			 * away, it will be mapped afresh on next use.
+			 */
+			ntfs_free(ni->runlist.rl);
+			ni->runlist.rl =3D NULL;
+			ntfs_error(vol->sb, "Eeek! Run list truncation failed.\n");
+			return -EIO;
+		}
+
+		/* Prepare to mapping pairs update. */
+		ni->allocated_size =3D first_free_vcn << vol->cluster_size_bits;
+
+		if (NInoSparse(ni) || NInoCompressed(ni)) {
+			if (nr_freed_clusters) {
+				ni->itype.compressed.size -=3D nr_freed_clusters <<
+					vol->cluster_size_bits;
+				VFS_I(base_ni)->i_blocks =3D ni->itype.compressed.size >> 9;
+			}
+		} else
+			VFS_I(base_ni)->i_blocks =3D ni->allocated_size >> 9;
+
+		/* Write mapping pairs for new runlist. */
+		err =3D ntfs_attr_update_mapping_pairs(ni, 0 /*first_free_vcn*/);
+		if (err) {
+			ntfs_debug("Eeek! Mapping pairs update failed. Leaving inconstant metad=
ata. Run chkdsk.\n");
+			return err;
+		}
+	}
+
+	/* Get the first attribute record. */
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, NULL);
+	if (!ctx) {
+		ntfs_error(vol->sb, "%s: Failed to get search context", __func__);
+		return -ENOMEM;
+	}
+
+	err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIVE,
+				0, NULL, 0, ctx);
+	if (err) {
+		if (err =3D=3D -ENOENT)
+			err =3D -EIO;
+		ntfs_debug("Eeek! Lookup of first attribute extent failed. Leaving incon=
stant metadata.\n");
+		goto put_err_out;
+	}
+
+	/* Update data and initialized size. */
+	ni->data_size =3D newsize;
+	ctx->attr->data.non_resident.data_size =3D cpu_to_le64(newsize);
+	if (newsize < ni->initialized_size) {
+		ni->initialized_size =3D newsize;
+		ctx->attr->data.non_resident.initialized_size =3D cpu_to_le64(newsize);
+	}
+	/* Update data size in the index. */
+	if (ni->type =3D=3D AT_DATA && ni->name =3D=3D AT_UNNAMED)
+		NInoSetFileNameDirty(ni);
+
+	/* If the attribute now has zero size, make it resident. */
+	if (!newsize && !NInoEncrypted(ni) && !NInoCompressed(ni)) {
+		err =3D ntfs_attr_make_resident(ni, ctx);
+		if (err) {
+			/* If couldn't make resident, just continue. */
+			if (err !=3D -EPERM)
+				ntfs_error(ni->vol->sb,
+					"Failed to make attribute resident. Leaving as is...\n");
+		}
+	}
+
+	/* Set the inode dirty so it is written out later. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	/* Done! */
+	ntfs_attr_put_search_ctx(ctx);
+	return 0;
+put_err_out:
+	ntfs_attr_put_search_ctx(ctx);
+	return err;
+}
+
+/**
+ * ntfs_non_resident_attr_expand - expand a non-resident, open ntfs attrib=
ute
+ * @ni:			non-resident ntfs attribute to expand
+ * @prealloc_size:	preallocation size (in bytes) to which to expand the at=
tribute
+ * @newsize:		new size (in bytes) to which to expand the attribute
+ *
+ * Expand the size of a non-resident, open ntfs attribute @na to @newsize =
bytes,
+ * by allocating new clusters.
+ */
+static int ntfs_non_resident_attr_expand(struct ntfs_inode *ni, const s64 =
newsize,
+		const s64 prealloc_size, unsigned int holes)
+{
+	s64 lcn_seek_from;
+	s64 first_free_vcn;
+	struct ntfs_volume *vol;
+	struct ntfs_attr_search_ctx *ctx =3D NULL;
+	struct runlist_element *rl, *rln;
+	s64 org_alloc_size, org_compressed_size;
+	int err, err2;
+	struct ntfs_inode *base_ni;
+	struct super_block *sb =3D ni->vol->sb;
+	size_t new_rl_count;
+
+	ntfs_debug("Inode 0x%llx, attr 0x%x, new size %lld old size %lld\n",
+			(unsigned long long)ni->mft_no, ni->type,
+			(long long)newsize, (long long)ni->data_size);
+
+	vol =3D ni->vol;
+
+	if (NInoAttr(ni))
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+
+	/*
+	 * Check the attribute type and the corresponding maximum size
+	 * against @newsize and fail if @newsize is too big.
+	 */
+	err =3D ntfs_attr_size_bounds_check(vol, ni->type, newsize);
+	if (err	< 0) {
+		ntfs_error(sb, "%s: bounds check failed", __func__);
+		return err;
+	}
+
+	/* Save for future use. */
+	org_alloc_size =3D ni->allocated_size;
+	org_compressed_size =3D ni->itype.compressed.size;
+
+	/* The first cluster outside the new allocation. */
+	if (prealloc_size)
+		first_free_vcn =3D (prealloc_size + vol->cluster_size - 1) >>
+			vol->cluster_size_bits;
+	else
+		first_free_vcn =3D (newsize + vol->cluster_size - 1) >>
+			vol->cluster_size_bits;
+	if (first_free_vcn < 0)
+		return -EFBIG;
+
+	/*
+	 * Compare the new allocation with the old one and only allocate
+	 * clusters if there is a change.
+	 */
+	if ((ni->allocated_size >> vol->cluster_size_bits) < first_free_vcn) {
+		err =3D ntfs_attr_map_whole_runlist(ni);
+		if (err) {
+			ntfs_error(sb, "ntfs_attr_map_whole_runlist failed");
+			return err;
+		}
+
+		/*
+		 * If we extend $DATA attribute on NTFS 3+ volume, we can add
+		 * sparse runs instead of real allocation of clusters.
+		 */
+		if ((ni->type =3D=3D AT_DATA && (vol->major_ver >=3D 3 || !NInoSparseDis=
abled(ni))) &&
+		    (holes !=3D HOLES_NO)) {
+			if (NInoCompressed(ni)) {
+				int last =3D 0, i =3D 0;
+				s64 alloc_size;
+				int more_entries =3D
+					round_up(first_free_vcn -
+						 (ni->allocated_size >>
+						  vol->cluster_size_bits),
+						 ni->itype.compressed.block_clusters) /
+					ni->itype.compressed.block_clusters;
+
+				while (ni->runlist.rl[last].length)
+					last++;
+
+				rl =3D ntfs_rl_realloc(ni->runlist.rl, last + 1,
+						last + more_entries + 1);
+				if (IS_ERR(rl)) {
+					err =3D -ENOMEM;
+					goto put_err_out;
+				}
+
+				alloc_size =3D ni->allocated_size;
+				while (i++ < more_entries) {
+					rl[last].vcn =3D round_up(alloc_size, vol->cluster_size) >>
+						vol->cluster_size_bits;
+					rl[last].length =3D ni->itype.compressed.block_clusters -
+						(rl[last].vcn &
+						 (ni->itype.compressed.block_clusters - 1));
+					rl[last].lcn =3D LCN_HOLE;
+					last++;
+					alloc_size +=3D ni->itype.compressed.block_size;
+				}
+
+				rl[last].vcn =3D first_free_vcn;
+				rl[last].lcn =3D LCN_ENOENT;
+				rl[last].length =3D 0;
+
+				ni->runlist.rl =3D rl;
+				ni->runlist.count +=3D more_entries;
+			} else {
+				rl =3D ntfs_malloc_nofs(sizeof(struct runlist_element) * 2);
+				if (!rl) {
+					err =3D -ENOMEM;
+					goto put_err_out;
+				}
+
+				rl[0].vcn =3D (ni->allocated_size >>
+						vol->cluster_size_bits);
+				rl[0].lcn =3D LCN_HOLE;
+				rl[0].length =3D first_free_vcn -
+					(ni->allocated_size >> vol->cluster_size_bits);
+				rl[1].vcn =3D first_free_vcn;
+				rl[1].lcn =3D LCN_ENOENT;
+				rl[1].length =3D 0;
+			}
+		} else {
+			/*
+			 * Determine first after last LCN of attribute.
+			 * We will start seek clusters from this LCN to avoid
+			 * fragmentation.  If there are no valid LCNs in the
+			 * attribute let the cluster allocator choose the
+			 * starting LCN.
+			 */
+			lcn_seek_from =3D -1;
+			if (ni->runlist.rl->length) {
+				/* Seek to the last run list element. */
+				for (rl =3D ni->runlist.rl; (rl + 1)->length; rl++)
+					;
+				/*
+				 * If the last LCN is a hole or similar seek
+				 * back to last valid LCN.
+				 */
+				while (rl->lcn < 0 && rl !=3D ni->runlist.rl)
+					rl--;
+				/*
+				 * Only set lcn_seek_from it the LCN is valid.
+				 */
+				if (rl->lcn >=3D 0)
+					lcn_seek_from =3D rl->lcn + rl->length;
+			}
+
+			rl =3D ntfs_cluster_alloc(vol, ni->allocated_size >>
+					vol->cluster_size_bits, first_free_vcn -
+					(ni->allocated_size >>
+					 vol->cluster_size_bits), lcn_seek_from,
+					DATA_ZONE, false, false, false);
+			if (IS_ERR(rl)) {
+				ntfs_debug("Cluster allocation failed (%lld)",
+						(long long)first_free_vcn -
+						((long long)ni->allocated_size >>
+						 vol->cluster_size_bits));
+				return PTR_ERR(rl);
+			}
+		}
+
+		if (!NInoCompressed(ni)) {
+			/* Append new clusters to attribute runlist. */
+			rln =3D ntfs_runlists_merge(&ni->runlist, rl, 0, &new_rl_count);
+			if (IS_ERR(rln)) {
+				/* Failed, free just allocated clusters. */
+				ntfs_error(sb, "Run list merge failed");
+				ntfs_cluster_free_from_rl(vol, rl);
+				ntfs_free(rl);
+				return -EIO;
+			}
+			ni->runlist.rl =3D rln;
+			ni->runlist.count =3D new_rl_count;
+		}
+
+		/* Prepare to mapping pairs update. */
+		ni->allocated_size =3D first_free_vcn << vol->cluster_size_bits;
+		err =3D ntfs_attr_update_mapping_pairs(ni, 0);
+		if (err) {
+			ntfs_debug("Mapping pairs update failed");
+			goto rollback;
+		}
+	}
+
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, NULL);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		if (ni->allocated_size =3D=3D org_alloc_size)
+			return err;
+		goto rollback;
+	}
+
+	err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIVE,
+			       0, NULL, 0, ctx);
+	if (err) {
+		if (err =3D=3D -ENOENT)
+			err =3D -EIO;
+		if (ni->allocated_size !=3D org_alloc_size)
+			goto rollback;
+		goto put_err_out;
+	}
+
+	/* Update data size. */
+	ni->data_size =3D newsize;
+	ctx->attr->data.non_resident.data_size =3D cpu_to_le64(newsize);
+	/* Update data size in the index. */
+	if (ni->type =3D=3D AT_DATA && ni->name =3D=3D AT_UNNAMED)
+		NInoSetFileNameDirty(ni);
+	/* Set the inode dirty so it is written out later. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	/* Done! */
+	ntfs_attr_put_search_ctx(ctx);
+	return 0;
+rollback:
+	/* Free allocated clusters. */
+	err2 =3D ntfs_cluster_free(ni, org_alloc_size >>
+				vol->cluster_size_bits, -1, ctx);
+	if (err2)
+		ntfs_debug("Leaking clusters");
+
+	/* Now, truncate the runlist itself. */
+	down_write(&ni->runlist.lock);
+	err2 =3D ntfs_rl_truncate_nolock(vol, &ni->runlist, org_alloc_size >>
+				vol->cluster_size_bits);
+	up_write(&ni->runlist.lock);
+	if (err2) {
+		/*
+		 * Failed to truncate the runlist, so just throw it away, it
+		 * will be mapped afresh on next use.
+		 */
+		ntfs_free(ni->runlist.rl);
+		ni->runlist.rl =3D NULL;
+		ntfs_error(sb, "Couldn't truncate runlist. Rollback failed");
+	} else {
+		/* Prepare to mapping pairs update. */
+		ni->allocated_size =3D org_alloc_size;
+		/* Restore mapping pairs. */
+		down_read(&ni->runlist.lock);
+		if (ntfs_attr_update_mapping_pairs(ni, 0))
+			ntfs_error(sb, "Failed to restore old mapping pairs");
+		up_read(&ni->runlist.lock);
+
+		if (NInoSparse(ni) || NInoCompressed(ni)) {
+			ni->itype.compressed.size =3D  org_compressed_size;
+			VFS_I(base_ni)->i_blocks =3D ni->itype.compressed.size >> 9;
+		} else
+			VFS_I(base_ni)->i_blocks =3D ni->allocated_size >> 9;
+	}
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	return err;
+put_err_out:
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	return err;
+}
+
+/**
+ * ntfs_resident_attr_resize - resize a resident, open ntfs attribute
+ * @attr_ni:		resident ntfs inode to resize
+ * @prealloc_size:	preallocation size (in bytes) to which to resize the at=
tribute
+ * @newsize:		new size (in bytes) to which to resize the attribute
+ *
+ * Change the size of a resident, open ntfs attribute @na to @newsize byte=
s.
+ */
+static int ntfs_resident_attr_resize(struct ntfs_inode *attr_ni, const s64=
 newsize,
+		const s64 prealloc_size, unsigned int holes)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	struct ntfs_volume *vol =3D attr_ni->vol;
+	struct super_block *sb =3D vol->sb;
+	int err =3D -EIO;
+	struct ntfs_inode *base_ni, *ext_ni =3D NULL;
+
+attr_resize_again:
+	ntfs_debug("Inode 0x%llx attr 0x%x new size %lld\n",
+			(unsigned long long)attr_ni->mft_no, attr_ni->type,
+			(long long)newsize);
+
+	if (NInoAttr(attr_ni))
+		base_ni =3D attr_ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D attr_ni;
+
+	/* Get the attribute record that needs modification. */
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, NULL);
+	if (!ctx) {
+		ntfs_error(sb, "%s: Failed to get search context", __func__);
+		return -ENOMEM;
+	}
+	err =3D ntfs_attr_lookup(attr_ni->type, attr_ni->name, attr_ni->name_len,
+			0, 0, NULL, 0, ctx);
+	if (err) {
+		ntfs_error(sb, "ntfs_attr_lookup failed");
+		goto put_err_out;
+	}
+
+	/*
+	 * Check the attribute type and the corresponding minimum and maximum
+	 * sizes against @newsize and fail if @newsize is out of bounds.
+	 */
+	err =3D ntfs_attr_size_bounds_check(vol, attr_ni->type, newsize);
+	if (err) {
+		if (err =3D=3D -ENOENT)
+			err =3D -EIO;
+		ntfs_debug("%s: bounds check failed", __func__);
+		goto put_err_out;
+	}
+	/*
+	 * If @newsize is bigger than the mft record we need to make the
+	 * attribute non-resident if the attribute type supports it. If it is
+	 * smaller we can go ahead and attempt the resize.
+	 */
+	if (newsize < vol->mft_record_size) {
+		/* Perform the resize of the attribute record. */
+		err =3D ntfs_resident_attr_value_resize(ctx->mrec, ctx->attr,
+					newsize);
+		if (!err) {
+			/* Update attribute size everywhere. */
+			attr_ni->data_size =3D attr_ni->initialized_size =3D newsize;
+			attr_ni->allocated_size =3D (newsize + 7) & ~7;
+			if (NInoCompressed(attr_ni) || NInoSparse(attr_ni))
+				attr_ni->itype.compressed.size =3D attr_ni->allocated_size;
+			if (attr_ni->type =3D=3D AT_DATA && attr_ni->name =3D=3D AT_UNNAMED)
+				NInoSetFileNameDirty(attr_ni);
+			goto resize_done;
+		}
+
+		/* Prefer AT_INDEX_ALLOCATION instead of AT_ATTRIBUTE_LIST */
+		if (err =3D=3D -ENOSPC && ctx->attr->type =3D=3D AT_INDEX_ROOT)
+			goto put_err_out;
+
+	}
+	/* There is not enough space in the mft record to perform the resize. */
+
+	/* Make the attribute non-resident if possible. */
+	err =3D ntfs_attr_make_non_resident(attr_ni,
+			le32_to_cpu(ctx->attr->data.resident.value_length));
+	if (!err) {
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		ntfs_attr_put_search_ctx(ctx);
+		/* Resize non-resident attribute */
+		return ntfs_non_resident_attr_expand(attr_ni, newsize, prealloc_size, ho=
les);
+	} else if (err !=3D -ENOSPC && err !=3D -EPERM) {
+		ntfs_error(sb, "Failed to make attribute non-resident");
+		goto put_err_out;
+	}
+
+	/* Try to make other attributes non-resident and retry each time. */
+	ntfs_attr_reinit_search_ctx(ctx);
+	while (!(err =3D ntfs_attr_lookup(AT_UNUSED, NULL, 0, 0, 0, NULL, 0, ctx)=
)) {
+		struct inode *tvi;
+		struct attr_record *a;
+
+		a =3D ctx->attr;
+		if (a->non_resident || a->type =3D=3D AT_ATTRIBUTE_LIST)
+			continue;
+
+		if (ntfs_attr_can_be_non_resident(vol, a->type))
+			continue;
+
+		/*
+		 * Check out whether convert is reasonable. Assume that mapping
+		 * pairs will take 8 bytes.
+		 */
+		if (le32_to_cpu(a->length) <=3D (sizeof(struct attr_record) - sizeof(s64=
)) +
+				((a->name_length * sizeof(__le16) + 7) & ~7) + 8)
+			continue;
+
+		if (a->type =3D=3D AT_DATA)
+			tvi =3D ntfs_iget(sb, base_ni->mft_no);
+		else
+			tvi =3D ntfs_attr_iget(VFS_I(base_ni), a->type,
+				(__le16 *)((u8 *)a + le16_to_cpu(a->name_offset)),
+				a->name_length);
+		if (IS_ERR(tvi)) {
+			ntfs_error(sb, "Couldn't open attribute");
+			continue;
+		}
+
+		if (ntfs_attr_make_non_resident(NTFS_I(tvi),
+		    le32_to_cpu(ctx->attr->data.resident.value_length))) {
+			iput(tvi);
+			continue;
+		}
+
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		iput(tvi);
+		ntfs_attr_put_search_ctx(ctx);
+		goto attr_resize_again;
+	}
+
+	/* Check whether error occurred. */
+	if (err !=3D -ENOENT) {
+		ntfs_error(sb, "%s: Attribute lookup failed 1", __func__);
+		goto put_err_out;
+	}
+
+	/*
+	 * The standard information and attribute list attributes can't be
+	 * moved out from the base MFT record, so try to move out others.
+	 */
+	if (attr_ni->type =3D=3D AT_STANDARD_INFORMATION ||
+	    attr_ni->type =3D=3D AT_ATTRIBUTE_LIST) {
+		ntfs_attr_put_search_ctx(ctx);
+
+		if (!NInoAttrList(base_ni)) {
+			err =3D ntfs_inode_add_attrlist(base_ni);
+			if (err)
+				return err;
+		}
+
+		err =3D ntfs_inode_free_space(base_ni, sizeof(struct attr_record));
+		if (err) {
+			err =3D -ENOSPC;
+			ntfs_error(sb,
+				"Couldn't free space in the MFT record to make attribute list non resi=
dent");
+			return err;
+		}
+		err =3D ntfs_attrlist_update(base_ni);
+		if (err)
+			return err;
+		goto attr_resize_again;
+	}
+
+	/*
+	 * Move the attribute to a new mft record, creating an attribute list
+	 * attribute or modifying it if it is already present.
+	 */
+
+	/* Point search context back to attribute which we need resize. */
+	ntfs_attr_reinit_search_ctx(ctx);
+	err =3D ntfs_attr_lookup(attr_ni->type, attr_ni->name, attr_ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (err) {
+		ntfs_error(sb, "%s: Attribute lookup failed 2", __func__);
+		goto put_err_out;
+	}
+
+	/*
+	 * Check whether attribute is already single in this MFT record.
+	 * 8 added for the attribute terminator.
+	 */
+	if (le32_to_cpu(ctx->mrec->bytes_in_use) =3D=3D
+	    le16_to_cpu(ctx->mrec->attrs_offset) + le32_to_cpu(ctx->attr->length)=
 + 8) {
+		err =3D -ENOSPC;
+		ntfs_debug("MFT record is filled with one attribute\n");
+		goto put_err_out;
+	}
+
+	/* Add attribute list if not present. */
+	if (!NInoAttrList(base_ni)) {
+		ntfs_attr_put_search_ctx(ctx);
+		err =3D ntfs_inode_add_attrlist(base_ni);
+		if (err)
+			return err;
+		goto attr_resize_again;
+	}
+
+	/* Allocate new mft record. */
+	err =3D ntfs_mft_record_alloc(base_ni->vol, 0, &ext_ni, base_ni, NULL);
+	if (err) {
+		ntfs_error(sb, "Couldn't allocate MFT record");
+		goto put_err_out;
+	}
+	unmap_mft_record(ext_ni);
+
+	/* Move attribute to it. */
+	err =3D ntfs_attr_record_move_to(ctx, ext_ni);
+	if (err) {
+		ntfs_error(sb, "Couldn't move attribute to new MFT record");
+		err =3D -ENOMEM;
+		goto put_err_out;
+	}
+
+	err =3D ntfs_attrlist_update(base_ni);
+	if (err < 0)
+		goto put_err_out;
+
+	ntfs_attr_put_search_ctx(ctx);
+	/* Try to perform resize once again. */
+	goto attr_resize_again;
+
+resize_done:
+	/*
+	 * Set the inode (and its base inode if it exists) dirty so it is
+	 * written out later.
+	 */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	return 0;
+
+put_err_out:
+	ntfs_attr_put_search_ctx(ctx);
+	return err;
+}
+
+int __ntfs_attr_truncate_vfs(struct ntfs_inode *ni, const s64 newsize,
+		const s64 i_size)
+{
+	int err =3D 0;
+
+	if (newsize < 0 ||
+	    (ni->mft_no =3D=3D FILE_MFT && ni->type =3D=3D AT_DATA)) {
+		ntfs_debug("Invalid arguments passed.\n");
+		return -EINVAL;
+	}
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x, size %lld\n",
+			(unsigned long long)ni->mft_no, ni->type, newsize);
+
+	if (NInoNonResident(ni)) {
+		if (newsize > i_size) {
+			down_write(&ni->runlist.lock);
+			err =3D ntfs_non_resident_attr_expand(ni, newsize, 0,
+							    NVolDisableSparse(ni->vol) ?
+							    HOLES_NO : HOLES_OK);
+			up_write(&ni->runlist.lock);
+		} else
+			err =3D ntfs_non_resident_attr_shrink(ni, newsize);
+	} else
+		err =3D ntfs_resident_attr_resize(ni, newsize, 0,
+						NVolDisableSparse(ni->vol) ?
+						HOLES_NO : HOLES_OK);
+	ntfs_debug("Return status %d\n", err);
+	return err;
+}
+
+int ntfs_attr_expand(struct ntfs_inode *ni, const s64 newsize, const s64 p=
realloc_size)
+{
+	int err =3D 0;
+
+	if (newsize < 0 ||
+	    (ni->mft_no =3D=3D FILE_MFT && ni->type =3D=3D AT_DATA)) {
+		ntfs_debug("Invalid arguments passed.\n");
+		return -EINVAL;
+	}
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x, size %lld\n",
+			(unsigned long long)ni->mft_no, ni->type, newsize);
+
+	if (ni->data_size =3D=3D newsize) {
+		ntfs_debug("Size is already ok\n");
+		return 0;
+	}
+
+	/*
+	 * Encrypted attributes are not supported. We return access denied,
+	 * which is what Windows NT4 does, too.
+	 */
+	if (NInoEncrypted(ni)) {
+		pr_err("Failed to truncate encrypted attribute");
+		return -EACCES;
+	}
+
+	if (NInoNonResident(ni)) {
+		if (newsize > ni->data_size)
+			err =3D ntfs_non_resident_attr_expand(ni, newsize, prealloc_size,
+							    NVolDisableSparse(ni->vol) ?
+							    HOLES_NO : HOLES_OK);
+	} else
+		err =3D ntfs_resident_attr_resize(ni, newsize, prealloc_size,
+						NVolDisableSparse(ni->vol) ?
+						HOLES_NO : HOLES_OK);
+	if (!err)
+		i_size_write(VFS_I(ni), newsize);
+	ntfs_debug("Return status %d\n", err);
+	return err;
+}
+
+/**
+ * ntfs_attr_truncate_i - resize an ntfs attribute
+ * @ni:		open ntfs inode to resize
+ * @newsize:	new size (in bytes) to which to resize the attribute
+ *
+ * Change the size of an open ntfs attribute @na to @newsize bytes. If the
+ * attribute is made bigger and the attribute is resident the newly
+ * "allocated" space is cleared and if the attribute is non-resident the
+ * newly allocated space is marked as not initialised and no real allocati=
on
+ * on disk is performed.
+ */
+int ntfs_attr_truncate_i(struct ntfs_inode *ni, const s64 newsize, unsigne=
d int holes)
+{
+	int err;
+
+	if (newsize < 0 ||
+	    (ni->mft_no =3D=3D FILE_MFT && ni->type =3D=3D AT_DATA)) {
+		ntfs_debug("Invalid arguments passed.\n");
+		return -EINVAL;
+	}
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x, size %lld\n",
+			(unsigned long long)ni->mft_no, ni->type, newsize);
+
+	if (ni->data_size =3D=3D newsize) {
+		ntfs_debug("Size is already ok\n");
+		return 0;
+	}
+
+	/*
+	 * Encrypted attributes are not supported. We return access denied,
+	 * which is what Windows NT4 does, too.
+	 */
+	if (NInoEncrypted(ni)) {
+		pr_err("Failed to truncate encrypted attribute");
+		return -EACCES;
+	}
+
+	if (NInoCompressed(ni)) {
+		pr_err("Failed to truncate compressed attribute");
+		return -EOPNOTSUPP;
+	}
+
+	if (NInoNonResident(ni)) {
+		if (newsize > ni->data_size)
+			err =3D ntfs_non_resident_attr_expand(ni, newsize, 0, holes);
+		else
+			err =3D ntfs_non_resident_attr_shrink(ni, newsize);
+	} else
+		err =3D ntfs_resident_attr_resize(ni, newsize, 0, holes);
+	ntfs_debug("Return status %d\n", err);
+	return err;
+}
+
+/*
+ * Resize an attribute, creating a hole if relevant
+ */
+int ntfs_attr_truncate(struct ntfs_inode *ni, const s64 newsize)
+{
+	return ntfs_attr_truncate_i(ni, newsize,
+				    NVolDisableSparse(ni->vol) ?
+				    HOLES_NO : HOLES_OK);
+}
+
+int ntfs_attr_map_cluster(struct ntfs_inode *ni, s64 vcn_start, s64 *lcn_s=
tart,
+		s64 *lcn_count, s64 max_clu_count, bool *balloc, bool update_mp,
+		bool skip_holes)
+{
+	struct ntfs_volume *vol =3D ni->vol;
+	struct ntfs_attr_search_ctx *ctx;
+	struct runlist_element *rl, *rlc;
+	s64 vcn =3D vcn_start, lcn, clu_count;
+	s64 lcn_seek_from =3D -1;
+	int err =3D 0;
+	size_t new_rl_count;
+
+	err =3D ntfs_attr_map_whole_runlist(ni);
+	if (err)
+		return err;
+
+	if (NInoAttr(ni))
+		ctx =3D ntfs_attr_get_search_ctx(ni->ext.base_ntfs_ino, NULL);
+	else
+		ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		ntfs_error(vol->sb, "%s: Failed to get search context", __func__);
+		return -ENOMEM;
+	}
+
+	err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+			CASE_SENSITIVE, vcn, NULL, 0, ctx);
+	if (err) {
+		ntfs_error(vol->sb,
+			   "ntfs_attr_lookup failed, ntfs inode(mft_no : %ld) type : 0x%x, err =
: %d",
+			   ni->mft_no, ni->type, err);
+		goto out;
+	}
+
+	rl =3D ntfs_attr_find_vcn_nolock(ni, vcn, ctx);
+	if (IS_ERR(rl)) {
+		ntfs_error(vol->sb, "Failed to find run after mapping runlist.");
+		err =3D PTR_ERR(rl);
+		goto out;
+	}
+
+	lcn =3D ntfs_rl_vcn_to_lcn(rl, vcn);
+	clu_count =3D min(max_clu_count, rl->length - (vcn - rl->vcn));
+	if (lcn >=3D LCN_HOLE) {
+		if (lcn > LCN_DELALLOC ||
+		    (lcn =3D=3D LCN_HOLE && skip_holes)) {
+			*lcn_start =3D lcn;
+			*lcn_count =3D clu_count;
+			*balloc =3D false;
+			goto out;
+		}
+	} else {
+		WARN_ON(lcn =3D=3D LCN_RL_NOT_MAPPED);
+		if (lcn =3D=3D LCN_ENOENT)
+			err =3D -ENOENT;
+		else
+			err =3D -EIO;
+		goto out;
+	}
+
+	/* Search backwards to find the best lcn to start seek from. */
+	rlc =3D rl;
+	while (rlc->vcn) {
+		rlc--;
+		if (rlc->lcn >=3D 0) {
+			/*
+			 * avoid fragmenting a compressed file
+			 * Windows does not do that, and that may
+			 * not be desirable for files which can
+			 * be updated
+			 */
+			if (NInoCompressed(ni))
+				lcn_seek_from =3D rlc->lcn + rlc->length;
+			else
+				lcn_seek_from =3D rlc->lcn + (vcn - rlc->vcn);
+			break;
+		}
+	}
+
+	if (lcn_seek_from =3D=3D -1) {
+		/* Backwards search failed, search forwards. */
+		rlc =3D rl;
+		while (rlc->length) {
+			rlc++;
+			if (rlc->lcn >=3D 0) {
+				lcn_seek_from =3D rlc->lcn - (rlc->vcn - vcn);
+				if (lcn_seek_from < -1)
+					lcn_seek_from =3D -1;
+				break;
+			}
+		}
+	}
+
+	if (lcn_seek_from =3D=3D -1 && ni->lcn_seek_trunc !=3D LCN_RL_NOT_MAPPED)=
 {
+		lcn_seek_from =3D ni->lcn_seek_trunc;
+		ni->lcn_seek_trunc =3D LCN_RL_NOT_MAPPED;
+	}
+
+	rlc =3D ntfs_cluster_alloc(vol, vcn, clu_count, lcn_seek_from, DATA_ZONE,
+			false, true, true);
+	if (IS_ERR(rlc)) {
+		err =3D PTR_ERR(rlc);
+		goto out;
+	}
+
+	WARN_ON(rlc->vcn !=3D vcn);
+	lcn =3D rlc->lcn;
+	clu_count =3D rlc->length;
+
+	rl =3D ntfs_runlists_merge(&ni->runlist, rlc, 0, &new_rl_count);
+	if (IS_ERR(rl)) {
+		ntfs_error(vol->sb, "Failed to merge runlists");
+		err =3D PTR_ERR(rl);
+		if (ntfs_cluster_free_from_rl(vol, rlc))
+			ntfs_error(vol->sb, "Failed to free hot clusters.");
+		ntfs_free(rlc);
+		goto out;
+	}
+	ni->runlist.rl =3D rl;
+	ni->runlist.count =3D new_rl_count;
+
+	if (!update_mp) {
+		u64 free =3D atomic64_read(&vol->free_clusters) * 100;
+
+		do_div(free, vol->nr_clusters);
+		if (free <=3D 5)
+			update_mp =3D true;
+	}
+
+	if (update_mp) {
+		ntfs_attr_reinit_search_ctx(ctx);
+		err =3D ntfs_attr_update_mapping_pairs(ni, 0);
+		if (err) {
+			int err2;
+
+			err2 =3D ntfs_cluster_free(ni, vcn, clu_count, ctx);
+			if (err2 < 0)
+				ntfs_error(vol->sb,
+					   "Failed to free cluster allocation. Leaving inconstant metadata.\n=
");
+			goto out;
+		}
+	} else {
+		VFS_I(ni)->i_blocks +=3D clu_count << (vol->cluster_size_bits - 9);
+		NInoSetRunlistDirty(ni);
+		mark_mft_record_dirty(ni);
+	}
+
+	*lcn_start =3D lcn;
+	*lcn_count =3D clu_count;
+	*balloc =3D true;
+out:
+	ntfs_attr_put_search_ctx(ctx);
+	return err;
+}
+
+/**
+ * ntfs_attr_rm - remove attribute from ntfs inode
+ * @ni:		opened ntfs attribute to delete
+ *
+ * Remove attribute and all it's extents from ntfs inode. If attribute was=
 non
+ * resident also free all clusters allocated by attribute.
+ */
+int ntfs_attr_rm(struct ntfs_inode *ni)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	int err =3D 0, ret =3D 0;
+	struct ntfs_inode *base_ni;
+	struct super_block *sb =3D ni->vol->sb;
+
+	if (NInoAttr(ni))
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x.\n",
+			(long long) ni->mft_no, ni->type);
+
+	/* Free cluster allocation. */
+	if (NInoNonResident(ni)) {
+		struct ntfs_attr_search_ctx *ctx;
+
+		err =3D ntfs_attr_map_whole_runlist(ni);
+		if (err)
+			return err;
+		ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+		if (!ctx) {
+			ntfs_error(sb, "%s: Failed to get search context", __func__);
+			return -ENOMEM;
+		}
+
+		ret =3D ntfs_cluster_free(ni, 0, -1, ctx);
+		if (ret < 0)
+			ntfs_error(sb,
+				"Failed to free cluster allocation. Leaving inconstant metadata.\n");
+		ntfs_attr_put_search_ctx(ctx);
+	}
+
+	/* Search for attribute extents and remove them all. */
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, NULL);
+	if (!ctx) {
+		ntfs_error(sb, "%s: Failed to get search context", __func__);
+		return -ENOMEM;
+	}
+	while (!(err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+				CASE_SENSITIVE, 0, NULL, 0, ctx))) {
+		err =3D ntfs_attr_record_rm(ctx);
+		if (err) {
+			ntfs_error(sb,
+				"Failed to remove attribute extent. Leaving inconstant metadata.\n");
+			ret =3D err;
+		}
+		ntfs_attr_reinit_search_ctx(ctx);
+	}
+	ntfs_attr_put_search_ctx(ctx);
+	if (err !=3D -ENOENT) {
+		ntfs_error(sb, "Attribute lookup failed. Probably leaving inconstant met=
adata.\n");
+		ret =3D err;
+	}
+
+	return ret;
+}
+
+int ntfs_attr_exist(struct ntfs_inode *ni, const __le32 type, __le16 *name,
+		u32 name_len)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	int ret;
+
+	ntfs_debug("Entering\n");
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		ntfs_error(ni->vol->sb, "%s: Failed to get search context",
+				__func__);
+		return 0;
+	}
+
+	ret =3D ntfs_attr_lookup(type, name, name_len, CASE_SENSITIVE,
+			0, NULL, 0, ctx);
+	ntfs_attr_put_search_ctx(ctx);
+
+	return !ret;
+}
+
+int ntfs_attr_remove(struct ntfs_inode *ni, const __le32 type, __le16 *nam=
e,
+		u32 name_len)
+{
+	struct super_block *sb;
+	int err;
+	struct inode *attr_vi;
+	struct ntfs_inode *attr_ni;
+
+	ntfs_debug("Entering\n");
+
+	sb =3D ni->vol->sb;
+	if (!ni) {
+		ntfs_error(sb, "NULL inode pointer\n");
+		return -EINVAL;
+	}
+
+	attr_vi =3D ntfs_attr_iget(VFS_I(ni), type, name, name_len);
+	if (IS_ERR(attr_vi)) {
+		err =3D PTR_ERR(attr_vi);
+		ntfs_error(sb, "Failed to open attribute 0x%02x of inode 0x%llx",
+				type, (unsigned long long)ni->mft_no);
+		return err;
+	}
+	attr_ni =3D NTFS_I(attr_vi);
+
+	err =3D ntfs_attr_rm(attr_ni);
+	if (err)
+		ntfs_error(sb, "Failed to remove attribute 0x%02x of inode 0x%llx",
+				type, (unsigned long long)ni->mft_no);
+	iput(attr_vi);
+	return err;
+}
+
+/**
+ * ntfs_attr_readall - read the entire data from an ntfs attribute
+ * @ni:		open ntfs inode in which the ntfs attribute resides
+ * @type:	attribute type
+ * @name:	attribute name in little endian Unicode or AT_UNNAMED or NULL
+ * @name_len:	length of attribute @name in Unicode characters (if @name gi=
ven)
+ * @data_size:	if non-NULL then store here the data size
+ *
+ * This function will read the entire content of an ntfs attribute.
+ * If @name is AT_UNNAMED then look specifically for an unnamed attribute.
+ * If @name is NULL then the attribute could be either named or not.
+ * In both those cases @name_len is not used at all.
+ *
+ * On success a buffer is allocated with the content of the attribute
+ * and which needs to be freed when it's not needed anymore. If the
+ * @data_size parameter is non-NULL then the data size is set there.
+ */
+void *ntfs_attr_readall(struct ntfs_inode *ni, const __le32 type,
+		__le16 *name, u32 name_len, s64 *data_size)
+{
+	struct ntfs_inode *bmp_ni;
+	struct inode *bmp_vi;
+	void *data, *ret =3D NULL;
+	s64 size;
+	struct super_block *sb =3D ni->vol->sb;
+
+	ntfs_debug("Entering\n");
+
+	bmp_vi =3D ntfs_attr_iget(VFS_I(ni), type, name, name_len);
+	if (IS_ERR(bmp_vi)) {
+		ntfs_debug("ntfs_attr_iget failed");
+		goto err_exit;
+	}
+	bmp_ni =3D NTFS_I(bmp_vi);
+
+	data =3D ntfs_malloc_nofs(bmp_ni->data_size);
+	if (!data) {
+		ntfs_error(sb, "ntfs_malloc_nofs failed");
+		goto out;
+	}
+
+	size =3D ntfs_inode_attr_pread(VFS_I(bmp_ni), 0, bmp_ni->data_size,
+			(u8 *)data);
+	if (size !=3D bmp_ni->data_size) {
+		ntfs_error(sb, "ntfs_attr_pread failed");
+		ntfs_free(data);
+		goto out;
+	}
+	ret =3D data;
+	if (data_size)
+		*data_size =3D size;
+out:
+	iput(bmp_vi);
+err_exit:
+	ntfs_debug("\n");
+	return ret;
+}
+
+int ntfs_non_resident_attr_insert_range(struct ntfs_inode *ni, s64 start_v=
cn, s64 len)
+{
+	struct ntfs_volume *vol =3D ni->vol;
+	struct runlist_element *hole_rl, *rl;
+	struct ntfs_attr_search_ctx *ctx;
+	int ret;
+	size_t new_rl_count;
+
+	if (NInoAttr(ni) || ni->type !=3D AT_DATA)
+		return -EOPNOTSUPP;
+	if (start_vcn > (ni->allocated_size >> vol->cluster_size_bits))
+		return -EINVAL;
+
+	hole_rl =3D ntfs_malloc_nofs(sizeof(*hole_rl) * 2);
+	if (!hole_rl)
+		return -ENOMEM;
+	hole_rl[0].vcn =3D start_vcn;
+	hole_rl[0].lcn =3D LCN_HOLE;
+	hole_rl[0].length =3D len;
+	hole_rl[1].vcn =3D start_vcn + len;
+	hole_rl[1].lcn =3D LCN_ENOENT;
+	hole_rl[1].length =3D 0;
+
+	down_write(&ni->runlist.lock);
+	ret =3D ntfs_attr_map_whole_runlist(ni);
+	if (ret) {
+		up_write(&ni->runlist.lock);
+		return ret;
+	}
+
+	rl =3D ntfs_rl_find_vcn_nolock(ni->runlist.rl, start_vcn);
+	if (!rl) {
+		up_write(&ni->runlist.lock);
+		ntfs_free(hole_rl);
+		return -EIO;
+	}
+
+	rl =3D ntfs_rl_insert_range(ni->runlist.rl, (int)ni->runlist.count,
+				  hole_rl, 1, &new_rl_count);
+	if (IS_ERR(rl)) {
+		up_write(&ni->runlist.lock);
+		ntfs_free(hole_rl);
+		return PTR_ERR(rl);
+	}
+	ni->runlist.rl =3D  rl;
+	ni->runlist.count =3D new_rl_count;
+
+	ni->allocated_size +=3D len << vol->cluster_size_bits;
+	ni->data_size +=3D len << vol->cluster_size_bits;
+	if ((start_vcn << vol->cluster_size_bits) < ni->initialized_size)
+		ni->initialized_size +=3D len << vol->cluster_size_bits;
+	ret =3D ntfs_attr_update_mapping_pairs(ni, 0);
+	up_write(&ni->runlist.lock);
+	if (ret)
+		return ret;
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		ret =3D -ENOMEM;
+		return ret;
+	}
+
+	ret =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIVE,
+			       0, NULL, 0, ctx);
+	if (ret) {
+		ntfs_attr_put_search_ctx(ctx);
+		return ret;
+	}
+
+	ctx->attr->data.non_resident.data_size =3D cpu_to_le64(ni->data_size);
+	ctx->attr->data.non_resident.initialized_size =3D cpu_to_le64(ni->initial=
ized_size);
+	if (ni->type =3D=3D AT_DATA && ni->name =3D=3D AT_UNNAMED)
+		NInoSetFileNameDirty(ni);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	return ret;
+}
+
+int ntfs_non_resident_attr_collapse_range(struct ntfs_inode *ni, s64 start=
_vcn, s64 len)
+{
+	struct ntfs_volume *vol =3D ni->vol;
+	struct runlist_element *punch_rl, *rl;
+	struct ntfs_attr_search_ctx *ctx =3D NULL;
+	s64 end_vcn;
+	int dst_cnt;
+	int ret;
+	size_t new_rl_cnt;
+
+	if (NInoAttr(ni) || ni->type !=3D AT_DATA)
+		return -EOPNOTSUPP;
+
+	end_vcn =3D ni->allocated_size >> vol->cluster_size_bits;
+	if (start_vcn >=3D end_vcn)
+		return -EINVAL;
+
+	down_write(&ni->runlist.lock);
+	ret =3D ntfs_attr_map_whole_runlist(ni);
+	if (ret)
+		return ret;
+
+	len =3D min(len, end_vcn - start_vcn);
+	for (rl =3D ni->runlist.rl, dst_cnt =3D 0; rl && rl->length; rl++)
+		dst_cnt++;
+	rl =3D ntfs_rl_find_vcn_nolock(ni->runlist.rl, start_vcn);
+	if (!rl) {
+		up_write(&ni->runlist.lock);
+		return -EIO;
+	}
+
+	rl =3D ntfs_rl_collapse_range(ni->runlist.rl, dst_cnt + 1,
+				    start_vcn, len, &punch_rl, &new_rl_cnt);
+	if (IS_ERR(rl)) {
+		up_write(&ni->runlist.lock);
+		return PTR_ERR(rl);
+	}
+	ni->runlist.rl =3D rl;
+	ni->runlist.count =3D new_rl_cnt;
+
+	ni->allocated_size -=3D len << vol->cluster_size_bits;
+	if (ni->data_size > (start_vcn << vol->cluster_size_bits)) {
+		if (ni->data_size > (start_vcn + len) << vol->cluster_size_bits)
+			ni->data_size -=3D len << vol->cluster_size_bits;
+		else
+			ni->data_size =3D start_vcn << vol->cluster_size_bits;
+	}
+	if (ni->initialized_size > (start_vcn << vol->cluster_size_bits)) {
+		if (ni->initialized_size >
+		    (start_vcn + len) << vol->cluster_size_bits)
+			ni->initialized_size -=3D len << vol->cluster_size_bits;
+		else
+			ni->initialized_size =3D start_vcn << vol->cluster_size_bits;
+	}
+
+	if (ni->allocated_size > 0) {
+		ret =3D ntfs_attr_update_mapping_pairs(ni, 0);
+		if (ret) {
+			up_write(&ni->runlist.lock);
+			goto out_rl;
+		}
+	}
+	up_write(&ni->runlist.lock);
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		ret =3D -ENOMEM;
+		goto out_rl;
+	}
+
+	ret =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIVE,
+			       0, NULL, 0, ctx);
+	if (ret)
+		goto out_ctx;
+
+	ctx->attr->data.non_resident.data_size =3D cpu_to_le64(ni->data_size);
+	ctx->attr->data.non_resident.initialized_size =3D cpu_to_le64(ni->initial=
ized_size);
+	if (ni->allocated_size =3D=3D 0)
+		ntfs_attr_make_resident(ni, ctx);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+
+	ret =3D ntfs_cluster_free_from_rl(vol, punch_rl);
+	if (ret)
+		ntfs_error(vol->sb, "Freeing of clusters failed");
+out_ctx:
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+out_rl:
+	ntfs_free(punch_rl);
+	mark_mft_record_dirty(ni);
+	return ret;
+}
+
+int ntfs_non_resident_attr_punch_hole(struct ntfs_inode *ni, s64 start_vcn=
, s64 len)
+{
+	struct ntfs_volume *vol =3D ni->vol;
+	struct runlist_element *punch_rl, *rl;
+	s64 end_vcn;
+	int dst_cnt;
+	int ret;
+	size_t new_rl_count;
+
+	if (NInoAttr(ni) || ni->type !=3D AT_DATA)
+		return -EOPNOTSUPP;
+
+	end_vcn =3D ni->allocated_size >> vol->cluster_size_bits;
+	if (start_vcn >=3D end_vcn)
+		return -EINVAL;
+
+	down_write(&ni->runlist.lock);
+	ret =3D ntfs_attr_map_whole_runlist(ni);
+	if (ret) {
+		up_write(&ni->runlist.lock);
+		return ret;
+	}
+
+	len =3D min(len, end_vcn - start_vcn + 1);
+	for (rl =3D ni->runlist.rl, dst_cnt =3D 0; rl && rl->length; rl++)
+		dst_cnt++;
+	rl =3D ntfs_rl_find_vcn_nolock(ni->runlist.rl, start_vcn);
+	if (!rl) {
+		up_write(&ni->runlist.lock);
+		return -EIO;
+	}
+
+	rl =3D ntfs_rl_punch_hole(ni->runlist.rl, dst_cnt + 1,
+				start_vcn, len, &punch_rl, &new_rl_count);
+	if (IS_ERR(rl)) {
+		up_write(&ni->runlist.lock);
+		return PTR_ERR(rl);
+	}
+	ni->runlist.rl =3D rl;
+	ni->runlist.count =3D new_rl_count;
+
+	ret =3D ntfs_attr_update_mapping_pairs(ni, 0);
+	up_write(&ni->runlist.lock);
+	if (ret) {
+		ntfs_free(punch_rl);
+		return ret;
+	}
+
+	ret =3D ntfs_cluster_free_from_rl(vol, punch_rl);
+	if (ret)
+		ntfs_error(vol->sb, "Freeing of clusters failed");
+
+	ntfs_free(punch_rl);
+	mark_mft_record_dirty(ni);
+	return ret;
+}
+
+int ntfs_attr_fallocate(struct ntfs_inode *ni, loff_t start, loff_t byte_l=
en, bool keep_size)
+{
+	struct ntfs_volume *vol =3D ni->vol;
+	struct mft_record *mrec;
+	struct ntfs_attr_search_ctx *ctx;
+	s64 old_data_size;
+	s64 vcn_start, vcn_end, vcn_uninit, vcn, try_alloc_cnt;
+	s64 lcn, alloc_cnt;
+	int err =3D 0;
+	struct runlist_element *rl;
+	bool balloc;
+
+	if (NInoAttr(ni) || ni->type !=3D AT_DATA)
+		return -EINVAL;
+
+	if (NInoNonResident(ni) && !NInoFullyMapped(ni)) {
+		down_write(&ni->runlist.lock);
+		err =3D ntfs_attr_map_whole_runlist(ni);
+		up_write(&ni->runlist.lock);
+		if (err)
+			return err;
+	}
+
+	mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+	mrec =3D map_mft_record(ni);
+	if (IS_ERR(mrec)) {
+		mutex_unlock(&ni->mrec_lock);
+		return PTR_ERR(mrec);
+	}
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, mrec);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto out_unmap;
+	}
+
+	err =3D ntfs_attr_lookup(AT_DATA, AT_UNNAMED, 0, 0, 0, NULL, 0, ctx);
+	if (err) {
+		err =3D -EIO;
+		goto out_unmap;
+	}
+
+	old_data_size =3D ni->data_size;
+	if (start + byte_len > ni->data_size) {
+		err =3D ntfs_attr_truncate(ni, start + byte_len);
+		if (err)
+			goto out_unmap;
+		if (keep_size) {
+			ntfs_attr_reinit_search_ctx(ctx);
+			err =3D ntfs_attr_lookup(AT_DATA, AT_UNNAMED, 0, 0, 0, NULL, 0, ctx);
+			if (err) {
+				err =3D -EIO;
+				goto out_unmap;
+			}
+			ni->data_size =3D old_data_size;
+			if (NInoNonResident(ni))
+				ctx->attr->data.non_resident.data_size =3D
+					cpu_to_le64(old_data_size);
+			else
+				ctx->attr->data.resident.value_length =3D
+					cpu_to_le64(old_data_size);
+			mark_mft_record_dirty(ni);
+		}
+	}
+
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(ni);
+	mutex_unlock(&ni->mrec_lock);
+
+	if (!NInoNonResident(ni))
+		goto out;
+
+	vcn_start =3D (s64)(start >> vol->cluster_size_bits);
+	vcn_end =3D (s64)(round_up(start + byte_len, vol->cluster_size) >>
+			vol->cluster_size_bits);
+	vcn_uninit =3D (s64)(round_up(ni->initialized_size, vol->cluster_size) >>
+			       vol->cluster_size_bits);
+	vcn_uninit =3D min_t(s64, vcn_uninit, vcn_end);
+
+	/*
+	 * we have to allocate clusters for holes and delayed within initialized_=
size,
+	 * and zero out the clusters only for the holes.
+	 */
+	vcn =3D vcn_start;
+	while (vcn < vcn_uninit) {
+		down_read(&ni->runlist.lock);
+		rl =3D ntfs_attr_find_vcn_nolock(ni, vcn, NULL);
+		up_read(&ni->runlist.lock);
+		if (IS_ERR(rl)) {
+			err =3D PTR_ERR(rl);
+			goto out;
+		}
+
+		if (rl->lcn > 0) {
+			vcn +=3D rl->length - (vcn - rl->vcn);
+		} else if (rl->lcn =3D=3D LCN_DELALLOC || rl->lcn =3D=3D LCN_HOLE) {
+			try_alloc_cnt =3D min(rl->length - (vcn - rl->vcn),
+					    vcn_uninit - vcn);
+
+			if (rl->lcn =3D=3D LCN_DELALLOC) {
+				vcn +=3D try_alloc_cnt;
+				continue;
+			}
+
+			while (try_alloc_cnt > 0) {
+				mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+				down_write(&ni->runlist.lock);
+				err =3D ntfs_attr_map_cluster(ni, vcn, &lcn, &alloc_cnt,
+							    try_alloc_cnt, &balloc, false, false);
+				up_write(&ni->runlist.lock);
+				mutex_unlock(&ni->mrec_lock);
+				if (err)
+					goto out;
+
+				err =3D ntfs_zeroed_clusters(VFS_I(ni), lcn, alloc_cnt);
+				if (err > 0)
+					goto out;
+
+				if (signal_pending(current))
+					goto out;
+
+				vcn +=3D alloc_cnt;
+				try_alloc_cnt -=3D alloc_cnt;
+			}
+		} else {
+			err =3D -EIO;
+			goto out;
+		}
+	}
+
+	/* allocate clusters outside of initialized_size */
+	try_alloc_cnt =3D vcn_end - vcn;
+	while (try_alloc_cnt > 0) {
+		mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+		down_write(&ni->runlist.lock);
+		err =3D ntfs_attr_map_cluster(ni, vcn, &lcn, &alloc_cnt,
+					    try_alloc_cnt, &balloc, false, false);
+		up_write(&ni->runlist.lock);
+		mutex_unlock(&ni->mrec_lock);
+		if (err || signal_pending(current))
+			goto out;
+
+		vcn +=3D alloc_cnt;
+		try_alloc_cnt -=3D alloc_cnt;
+		cond_resched();
+	}
+
+	if (NInoRunlistDirty(ni)) {
+		mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+		down_write(&ni->runlist.lock);
+		err =3D ntfs_attr_update_mapping_pairs(ni, 0);
+		if (err)
+			ntfs_error(ni->vol->sb, "Updating mapping pairs failed");
+		else
+			NInoClearRunlistDirty(ni);
+		up_write(&ni->runlist.lock);
+		mutex_unlock(&ni->mrec_lock);
+	}
+	return err;
+out_unmap:
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(ni);
+	mutex_unlock(&ni->mrec_lock);
+out:
+	return err >=3D 0 ? 0 : err;
+}
diff --git a/fs/ntfsplus/attrlist.c b/fs/ntfsplus/attrlist.c
new file mode 100644
index 000000000000..7c2fb3f77e91
--- /dev/null
+++ b/fs/ntfsplus/attrlist.c
@@ -0,0 +1,285 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Attribute list attribute handling code.  Originated from the Linux-NTFS
+ * project.
+ * Part of this file is based on code from the NTFS-3G project.
+ *
+ * Copyright (c) 2004-2005 Anton Altaparmakov
+ * Copyright (c) 2004-2005 Yura Pakhuchiy
+ * Copyright (c)      2006 Szabolcs Szakacsits
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include "mft.h"
+#include "attrib.h"
+#include "misc.h"
+#include "attrlist.h"
+
+/**
+ * ntfs_attrlist_need - check whether inode need attribute list
+ * @ni:	opened ntfs inode for which perform check
+ *
+ * Check whether all are attributes belong to one MFT record, in that case
+ * attribute list is not needed.
+ */
+int ntfs_attrlist_need(struct ntfs_inode *ni)
+{
+	struct attr_list_entry *ale;
+
+	if (!ni) {
+		ntfs_debug("Invalid arguments.\n");
+		return -EINVAL;
+	}
+	ntfs_debug("Entering for inode 0x%llx.\n", (long long) ni->mft_no);
+
+	if (!NInoAttrList(ni)) {
+		ntfs_debug("Inode haven't got attribute list.\n");
+		return -EINVAL;
+	}
+
+	if (!ni->attr_list) {
+		ntfs_debug("Corrupt in-memory struct.\n");
+		return -EINVAL;
+	}
+
+	ale =3D (struct attr_list_entry *)ni->attr_list;
+	while ((u8 *)ale < ni->attr_list + ni->attr_list_size) {
+		if (MREF_LE(ale->mft_reference) !=3D ni->mft_no)
+			return 1;
+		ale =3D (struct attr_list_entry *)((u8 *)ale + le16_to_cpu(ale->length));
+	}
+	return 0;
+}
+
+int ntfs_attrlist_update(struct ntfs_inode *base_ni)
+{
+	struct inode *attr_vi;
+	struct ntfs_inode *attr_ni;
+	int err;
+
+	attr_vi =3D ntfs_attr_iget(VFS_I(base_ni), AT_ATTRIBUTE_LIST, AT_UNNAMED,=
 0);
+	if (IS_ERR(attr_vi)) {
+		err =3D PTR_ERR(attr_vi);
+		return err;
+	}
+	attr_ni =3D NTFS_I(attr_vi);
+
+	err =3D ntfs_attr_truncate_i(attr_ni, base_ni->attr_list_size, HOLES_NO);
+	if (err =3D=3D -ENOSPC && attr_ni->mft_no =3D=3D FILE_MFT) {
+		err =3D ntfs_attr_truncate(attr_ni, 0);
+		if (err || ntfs_attr_truncate_i(attr_ni, base_ni->attr_list_size, HOLES_=
NO) !=3D 0) {
+			iput(attr_vi);
+			ntfs_error(base_ni->vol->sb,
+					"Failed to truncate attribute list of inode %#llx",
+					(long long)base_ni->mft_no);
+			return -EIO;
+		}
+	} else if (err) {
+		iput(attr_vi);
+		ntfs_error(base_ni->vol->sb,
+			   "Failed to truncate attribute list of inode %#llx",
+			   (long long)base_ni->mft_no);
+		return -EIO;
+	}
+
+	i_size_write(attr_vi, base_ni->attr_list_size);
+
+	if (NInoNonResident(attr_ni) && !NInoAttrListNonResident(base_ni))
+		NInoSetAttrListNonResident(base_ni);
+
+	if (ntfs_inode_attr_pwrite(attr_vi, 0, base_ni->attr_list_size,
+				   base_ni->attr_list, false) !=3D
+	    base_ni->attr_list_size) {
+		iput(attr_vi);
+		ntfs_error(base_ni->vol->sb,
+			   "Failed to write attribute list of inode %#llx",
+			   (long long)base_ni->mft_no);
+		return -EIO;
+	}
+
+	NInoSetAttrListDirty(base_ni);
+	iput(attr_vi);
+	return 0;
+}
+
+/**
+ * ntfs_attrlist_entry_add - add an attribute list attribute entry
+ * @ni:	opened ntfs inode, which contains that attribute
+ * @attr: attribute record to add to attribute list
+ */
+int ntfs_attrlist_entry_add(struct ntfs_inode *ni, struct attr_record *att=
r)
+{
+	struct attr_list_entry *ale;
+	__le64 mref;
+	struct ntfs_attr_search_ctx *ctx;
+	u8 *new_al;
+	int entry_len, entry_offset, err;
+	struct mft_record *ni_mrec;
+	u8 *old_al;
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x.\n",
+			(long long) ni->mft_no,
+			(unsigned int) le32_to_cpu(attr->type));
+
+	if (!ni || !attr) {
+		ntfs_debug("Invalid arguments.\n");
+		return -EINVAL;
+	}
+
+	ni_mrec =3D map_mft_record(ni);
+	if (IS_ERR(ni_mrec)) {
+		ntfs_debug("Invalid arguments.\n");
+		return -EIO;
+	}
+
+	mref =3D MK_LE_MREF(ni->mft_no, le16_to_cpu(ni_mrec->sequence_number));
+	unmap_mft_record(ni);
+
+	if (ni->nr_extents =3D=3D -1)
+		ni =3D ni->ext.base_ntfs_ino;
+
+	if (!NInoAttrList(ni)) {
+		ntfs_debug("Attribute list isn't present.\n");
+		return -ENOENT;
+	}
+
+	/* Determine size and allocate memory for new attribute list. */
+	entry_len =3D (sizeof(struct attr_list_entry) + sizeof(__le16) *
+			attr->name_length + 7) & ~7;
+	new_al =3D ntfs_malloc_nofs(ni->attr_list_size + entry_len);
+	if (!new_al)
+		return -ENOMEM;
+
+	/* Find place for the new entry. */
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		ntfs_error(ni->vol->sb, "Failed to get search context");
+		goto err_out;
+	}
+
+	err =3D ntfs_attr_lookup(attr->type, (attr->name_length) ? (__le16 *)
+			((u8 *)attr + le16_to_cpu(attr->name_offset)) :
+			AT_UNNAMED, attr->name_length, CASE_SENSITIVE,
+			(attr->non_resident) ? le64_to_cpu(attr->data.non_resident.lowest_vcn) :
+			0, (attr->non_resident) ? NULL : ((u8 *)attr +
+			le16_to_cpu(attr->data.resident.value_offset)), (attr->non_resident) ?
+			0 : le32_to_cpu(attr->data.resident.value_length), ctx);
+	if (!err) {
+		/* Found some extent, check it to be before new extent. */
+		if (ctx->al_entry->lowest_vcn =3D=3D attr->data.non_resident.lowest_vcn)=
 {
+			err =3D -EEXIST;
+			ntfs_debug("Such attribute already present in the attribute list.\n");
+			ntfs_attr_put_search_ctx(ctx);
+			goto err_out;
+		}
+		/* Add new entry after this extent. */
+		ale =3D (struct attr_list_entry *)((u8 *)ctx->al_entry +
+				le16_to_cpu(ctx->al_entry->length));
+	} else {
+		/* Check for real errors. */
+		if (err !=3D -ENOENT) {
+			ntfs_debug("Attribute lookup failed.\n");
+			ntfs_attr_put_search_ctx(ctx);
+			goto err_out;
+		}
+		/* No previous extents found. */
+		ale =3D ctx->al_entry;
+	}
+	/* Don't need it anymore, @ctx->al_entry points to @ni->attr_list. */
+	ntfs_attr_put_search_ctx(ctx);
+
+	/* Determine new entry offset. */
+	entry_offset =3D ((u8 *)ale - ni->attr_list);
+	/* Set pointer to new entry. */
+	ale =3D (struct attr_list_entry *)(new_al + entry_offset);
+	memset(ale, 0, entry_len);
+	/* Form new entry. */
+	ale->type =3D attr->type;
+	ale->length =3D cpu_to_le16(entry_len);
+	ale->name_length =3D attr->name_length;
+	ale->name_offset =3D offsetof(struct attr_list_entry, name);
+	if (attr->non_resident)
+		ale->lowest_vcn =3D attr->data.non_resident.lowest_vcn;
+	else
+		ale->lowest_vcn =3D 0;
+	ale->mft_reference =3D mref;
+	ale->instance =3D attr->instance;
+	memcpy(ale->name, (u8 *)attr + le16_to_cpu(attr->name_offset),
+			attr->name_length * sizeof(__le16));
+
+	/* Copy entries from old attribute list to new. */
+	memcpy(new_al, ni->attr_list, entry_offset);
+	memcpy(new_al + entry_offset + entry_len, ni->attr_list +
+			entry_offset, ni->attr_list_size - entry_offset);
+
+	/* Set new runlist. */
+	old_al =3D ni->attr_list;
+	ni->attr_list =3D new_al;
+	ni->attr_list_size =3D ni->attr_list_size + entry_len;
+
+	err =3D ntfs_attrlist_update(ni);
+	if (err) {
+		ni->attr_list =3D old_al;
+		ni->attr_list_size -=3D entry_len;
+		goto err_out;
+	}
+	ntfs_free(old_al);
+	return 0;
+err_out:
+	ntfs_free(new_al);
+	return err;
+}
+
+/**
+ * ntfs_attrlist_entry_rm - remove an attribute list attribute entry
+ * @ctx:	attribute search context describing the attribute list entry
+ *
+ * Remove the attribute list entry @ctx->al_entry from the attribute list.
+ */
+int ntfs_attrlist_entry_rm(struct ntfs_attr_search_ctx *ctx)
+{
+	u8 *new_al;
+	int new_al_len;
+	struct ntfs_inode *base_ni;
+	struct attr_list_entry *ale;
+
+	if (!ctx || !ctx->ntfs_ino || !ctx->al_entry) {
+		ntfs_debug("Invalid arguments.\n");
+		return -EINVAL;
+	}
+
+	if (ctx->base_ntfs_ino)
+		base_ni =3D ctx->base_ntfs_ino;
+	else
+		base_ni =3D ctx->ntfs_ino;
+	ale =3D ctx->al_entry;
+
+	ntfs_debug("Entering for inode 0x%llx, attr 0x%x, lowest_vcn %lld.\n",
+			(long long)ctx->ntfs_ino->mft_no,
+			(unsigned int)le32_to_cpu(ctx->al_entry->type),
+			(long long)le64_to_cpu(ctx->al_entry->lowest_vcn));
+
+	if (!NInoAttrList(base_ni)) {
+		ntfs_debug("Attribute list isn't present.\n");
+		return -ENOENT;
+	}
+
+	/* Allocate memory for new attribute list. */
+	new_al_len =3D base_ni->attr_list_size - le16_to_cpu(ale->length);
+	new_al =3D ntfs_malloc_nofs(new_al_len);
+	if (!new_al)
+		return -ENOMEM;
+
+	/* Copy entries from old attribute list to new. */
+	memcpy(new_al, base_ni->attr_list, (u8 *)ale - base_ni->attr_list);
+	memcpy(new_al + ((u8 *)ale - base_ni->attr_list), (u8 *)ale + le16_to_cpu(
+				ale->length), new_al_len - ((u8 *)ale - base_ni->attr_list));
+
+	/* Set new runlist. */
+	ntfs_free(base_ni->attr_list);
+	base_ni->attr_list =3D new_al;
+	base_ni->attr_list_size =3D new_al_len;
+
+	return ntfs_attrlist_update(base_ni);
+}
diff --git a/fs/ntfsplus/compress.c b/fs/ntfsplus/compress.c
new file mode 100644
index 000000000000..a801ad6eb8fe
--- /dev/null
+++ b/fs/ntfsplus/compress.c
@@ -0,0 +1,1564 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS kernel compressed attributes handling.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ *
+ * Part of this file is based on code from the NTFS-3G project.
+ * and is copyrighted by the respective authors below:
+ * Copyright (c) 2004-2005 Anton Altaparmakov
+ * Copyright (c) 2004-2006 Szabolcs Szakacsits
+ * Copyright (c)      2005 Yura Pakhuchiy
+ * Copyright (c) 2009-2014 Jean-Pierre Andre
+ * Copyright (c)      2014 Eric Biggers
+ */
+
+#include <linux/fs.h>
+#include <linux/blkdev.h>
+#include <linux/vmalloc.h>
+#include <linux/slab.h>
+
+#include "attrib.h"
+#include "inode.h"
+#include "misc.h"
+#include "ntfs.h"
+#include "misc.h"
+#include "aops.h"
+#include "lcnalloc.h"
+#include "mft.h"
+
+/**
+ * enum of constants used in the compression code
+ */
+enum {
+	/* Token types and access mask. */
+	NTFS_SYMBOL_TOKEN	=3D	0,
+	NTFS_PHRASE_TOKEN	=3D	1,
+	NTFS_TOKEN_MASK		=3D	1,
+
+	/* Compression sub-block constants. */
+	NTFS_SB_SIZE_MASK	=3D	0x0fff,
+	NTFS_SB_SIZE		=3D	0x1000,
+	NTFS_SB_IS_COMPRESSED	=3D	0x8000,
+
+	/*
+	 * The maximum compression block size is by definition 16 * the cluster
+	 * size, with the maximum supported cluster size being 4kiB. Thus the
+	 * maximum compression buffer size is 64kiB, so we use this when
+	 * initializing the compression buffer.
+	 */
+	NTFS_MAX_CB_SIZE	=3D 64 * 1024,
+};
+
+/**
+ * ntfs_compression_buffer - one buffer for the decompression engine
+ */
+static u8 *ntfs_compression_buffer;
+
+/**
+ * ntfs_cb_lock - mutex lock which protects ntfs_compression_buffer
+ */
+static DEFINE_MUTEX(ntfs_cb_lock);
+
+/**
+ * allocate_compression_buffers - allocate the decompression buffers
+ *
+ * Caller has to hold the ntfs_lock mutex.
+ *
+ * Return 0 on success or -ENOMEM if the allocations failed.
+ */
+int allocate_compression_buffers(void)
+{
+	if (ntfs_compression_buffer)
+		return 0;
+
+	ntfs_compression_buffer =3D vmalloc(NTFS_MAX_CB_SIZE);
+	if (!ntfs_compression_buffer)
+		return -ENOMEM;
+	return 0;
+}
+
+/**
+ * free_compression_buffers - free the decompression buffers
+ *
+ * Caller has to hold the ntfs_lock mutex.
+ */
+void free_compression_buffers(void)
+{
+	mutex_lock(&ntfs_cb_lock);
+	if (!ntfs_compression_buffer) {
+		mutex_unlock(&ntfs_cb_lock);
+		return;
+	}
+
+	vfree(ntfs_compression_buffer);
+	ntfs_compression_buffer =3D NULL;
+	mutex_unlock(&ntfs_cb_lock);
+}
+
+/**
+ * zero_partial_compressed_page - zero out of bounds compressed page region
+ */
+static void zero_partial_compressed_page(struct page *page,
+		const s64 initialized_size)
+{
+	u8 *kp =3D page_address(page);
+	unsigned int kp_ofs;
+
+	ntfs_debug("Zeroing page region outside initialized size.");
+	if (((s64)page->__folio_index << PAGE_SHIFT) >=3D initialized_size) {
+		clear_page(kp);
+		return;
+	}
+	kp_ofs =3D initialized_size & ~PAGE_MASK;
+	memset(kp + kp_ofs, 0, PAGE_SIZE - kp_ofs);
+}
+
+/**
+ * handle_bounds_compressed_page - test for&handle out of bounds compresse=
d page
+ */
+static inline void handle_bounds_compressed_page(struct page *page,
+		const loff_t i_size, const s64 initialized_size)
+{
+	if ((page->__folio_index >=3D (initialized_size >> PAGE_SHIFT)) &&
+			(initialized_size < i_size))
+		zero_partial_compressed_page(page, initialized_size);
+}
+
+/**
+ * ntfs_decompress - decompress a compression block into an array of pages
+ * @dest_pages:		destination array of pages
+ * @completed_pages:	scratch space to track completed pages
+ * @dest_index:		current index into @dest_pages (IN/OUT)
+ * @dest_ofs:		current offset within @dest_pages[@dest_index] (IN/OUT)
+ * @dest_max_index:	maximum index into @dest_pages (IN)
+ * @dest_max_ofs:	maximum offset within @dest_pages[@dest_max_index] (IN)
+ * @xpage:		the target page (-1 if none) (IN)
+ * @xpage_done:		set to 1 if xpage was completed successfully (IN/OUT)
+ * @cb_start:		compression block to decompress (IN)
+ * @cb_size:		size of compression block @cb_start in bytes (IN)
+ * @i_size:		file size when we started the read (IN)
+ * @initialized_size:	initialized file size when we started the read (IN)
+ *
+ * The caller must have disabled preemption. ntfs_decompress() reenables i=
t when
+ * the critical section is finished.
+ *
+ * This decompresses the compression block @cb_start into the array of
+ * destination pages @dest_pages starting at index @dest_index into @dest_=
pages
+ * and at offset @dest_pos into the page @dest_pages[@dest_index].
+ *
+ * When the page @dest_pages[@xpage] is completed, @xpage_done is set to 1.
+ * If xpage is -1 or @xpage has not been completed, @xpage_done is not mod=
ified.
+ *
+ * @cb_start is a pointer to the compression block which needs decompressi=
ng
+ * and @cb_size is the size of @cb_start in bytes (8-64kiB).
+ *
+ * Return 0 if success or -EOVERFLOW on error in the compressed stream.
+ * @xpage_done indicates whether the target page (@dest_pages[@xpage]) was
+ * completed during the decompression of the compression block (@cb_start).
+ *
+ * Warning: This function *REQUIRES* PAGE_SIZE >=3D 4096 or it will blow up
+ * unpredicatbly! You have been warned!
+ *
+ * Note to hackers: This function may not sleep until it has finished acce=
ssing
+ * the compression block @cb_start as it is a per-CPU buffer.
+ */
+static int ntfs_decompress(struct page *dest_pages[], int completed_pages[=
],
+		int *dest_index, int *dest_ofs, const int dest_max_index,
+		const int dest_max_ofs, const int xpage, char *xpage_done,
+		u8 *const cb_start, const u32 cb_size, const loff_t i_size,
+		const s64 initialized_size)
+{
+	/*
+	 * Pointers into the compressed data, i.e. the compression block (cb),
+	 * and the therein contained sub-blocks (sb).
+	 */
+	u8 *cb_end =3D cb_start + cb_size; /* End of cb. */
+	u8 *cb =3D cb_start;	/* Current position in cb. */
+	u8 *cb_sb_start =3D cb;	/* Beginning of the current sb in the cb. */
+	u8 *cb_sb_end;		/* End of current sb / beginning of next sb. */
+
+	/* Variables for uncompressed data / destination. */
+	struct page *dp;	/* Current destination page being worked on. */
+	u8 *dp_addr;		/* Current pointer into dp. */
+	u8 *dp_sb_start;	/* Start of current sub-block in dp. */
+	u8 *dp_sb_end;		/* End of current sb in dp (dp_sb_start + NTFS_SB_SIZE). =
*/
+	u16 do_sb_start;	/* @dest_ofs when starting this sub-block. */
+	u16 do_sb_end;		/* @dest_ofs of end of this sb (do_sb_start + NTFS_SB_SIZ=
E). */
+
+	/* Variables for tag and token parsing. */
+	u8 tag;			/* Current tag. */
+	int token;		/* Loop counter for the eight tokens in tag. */
+	int nr_completed_pages =3D 0;
+
+	/* Default error code. */
+	int err =3D -EOVERFLOW;
+
+	ntfs_debug("Entering, cb_size =3D 0x%x.", cb_size);
+do_next_sb:
+	ntfs_debug("Beginning sub-block at offset =3D 0x%zx in the cb.",
+			cb - cb_start);
+	/*
+	 * Have we reached the end of the compression block or the end of the
+	 * decompressed data?  The latter can happen for example if the current
+	 * position in the compression block is one byte before its end so the
+	 * first two checks do not detect it.
+	 */
+	if (cb =3D=3D cb_end || !le16_to_cpup((__le16 *)cb) ||
+			(*dest_index =3D=3D dest_max_index &&
+			*dest_ofs =3D=3D dest_max_ofs)) {
+		int i;
+
+		ntfs_debug("Completed. Returning success (0).");
+		err =3D 0;
+return_error:
+		/* We can sleep from now on, so we drop lock. */
+		mutex_unlock(&ntfs_cb_lock);
+		/* Second stage: finalize completed pages. */
+		if (nr_completed_pages > 0) {
+			for (i =3D 0; i < nr_completed_pages; i++) {
+				int di =3D completed_pages[i];
+
+				dp =3D dest_pages[di];
+				/*
+				 * If we are outside the initialized size, zero
+				 * the out of bounds page range.
+				 */
+				handle_bounds_compressed_page(dp, i_size,
+						initialized_size);
+				flush_dcache_page(dp);
+				kunmap_local(page_address(dp));
+				SetPageUptodate(dp);
+				unlock_page(dp);
+				if (di =3D=3D xpage)
+					*xpage_done =3D 1;
+				else
+					put_page(dp);
+				dest_pages[di] =3D NULL;
+			}
+		}
+		return err;
+	}
+
+	/* Setup offsets for the current sub-block destination. */
+	do_sb_start =3D *dest_ofs;
+	do_sb_end =3D do_sb_start + NTFS_SB_SIZE;
+
+	/* Check that we are still within allowed boundaries. */
+	if (*dest_index =3D=3D dest_max_index && do_sb_end > dest_max_ofs)
+		goto return_overflow;
+
+	/* Does the minimum size of a compressed sb overflow valid range? */
+	if (cb + 6 > cb_end)
+		goto return_overflow;
+
+	/* Setup the current sub-block source pointers and validate range. */
+	cb_sb_start =3D cb;
+	cb_sb_end =3D cb_sb_start + (le16_to_cpup((__le16 *)cb) & NTFS_SB_SIZE_MA=
SK)
+			+ 3;
+	if (cb_sb_end > cb_end)
+		goto return_overflow;
+
+	/* Get the current destination page. */
+	dp =3D dest_pages[*dest_index];
+	if (!dp) {
+		/* No page present. Skip decompression of this sub-block. */
+		cb =3D cb_sb_end;
+
+		/* Advance destination position to next sub-block. */
+		*dest_ofs =3D (*dest_ofs + NTFS_SB_SIZE) & ~PAGE_MASK;
+		if (!*dest_ofs && (++*dest_index > dest_max_index))
+			goto return_overflow;
+		goto do_next_sb;
+	}
+
+	/* We have a valid destination page. Setup the destination pointers. */
+	dp_addr =3D (u8 *)page_address(dp) + do_sb_start;
+
+	/* Now, we are ready to process the current sub-block (sb). */
+	if (!(le16_to_cpup((__le16 *)cb) & NTFS_SB_IS_COMPRESSED)) {
+		ntfs_debug("Found uncompressed sub-block.");
+		/* This sb is not compressed, just copy it into destination. */
+
+		/* Advance source position to first data byte. */
+		cb +=3D 2;
+
+		/* An uncompressed sb must be full size. */
+		if (cb_sb_end - cb !=3D NTFS_SB_SIZE)
+			goto return_overflow;
+
+		/* Copy the block and advance the source position. */
+		memcpy(dp_addr, cb, NTFS_SB_SIZE);
+		cb +=3D NTFS_SB_SIZE;
+
+		/* Advance destination position to next sub-block. */
+		*dest_ofs +=3D NTFS_SB_SIZE;
+		*dest_ofs &=3D ~PAGE_MASK;
+		if (!(*dest_ofs)) {
+finalize_page:
+			/*
+			 * First stage: add current page index to array of
+			 * completed pages.
+			 */
+			completed_pages[nr_completed_pages++] =3D *dest_index;
+			if (++*dest_index > dest_max_index)
+				goto return_overflow;
+		}
+		goto do_next_sb;
+	}
+	ntfs_debug("Found compressed sub-block.");
+	/* This sb is compressed, decompress it into destination. */
+
+	/* Setup destination pointers. */
+	dp_sb_start =3D dp_addr;
+	dp_sb_end =3D dp_sb_start + NTFS_SB_SIZE;
+
+	/* Forward to the first tag in the sub-block. */
+	cb +=3D 2;
+do_next_tag:
+	if (cb =3D=3D cb_sb_end) {
+		/* Check if the decompressed sub-block was not full-length. */
+		if (dp_addr < dp_sb_end) {
+			int nr_bytes =3D do_sb_end - *dest_ofs;
+
+			ntfs_debug("Filling incomplete sub-block with zeroes.");
+			/* Zero remainder and update destination position. */
+			memset(dp_addr, 0, nr_bytes);
+			*dest_ofs +=3D nr_bytes;
+		}
+		/* We have finished the current sub-block. */
+		*dest_ofs &=3D ~PAGE_MASK;
+		if (!(*dest_ofs))
+			goto finalize_page;
+		goto do_next_sb;
+	}
+
+	/* Check we are still in range. */
+	if (cb > cb_sb_end || dp_addr > dp_sb_end)
+		goto return_overflow;
+
+	/* Get the next tag and advance to first token. */
+	tag =3D *cb++;
+
+	/* Parse the eight tokens described by the tag. */
+	for (token =3D 0; token < 8; token++, tag >>=3D 1) {
+		register u16 i;
+		u16 lg, pt, length, max_non_overlap;
+		u8 *dp_back_addr;
+
+		/* Check if we are done / still in range. */
+		if (cb >=3D cb_sb_end || dp_addr > dp_sb_end)
+			break;
+
+		/* Determine token type and parse appropriately.*/
+		if ((tag & NTFS_TOKEN_MASK) =3D=3D NTFS_SYMBOL_TOKEN) {
+			/*
+			 * We have a symbol token, copy the symbol across, and
+			 * advance the source and destination positions.
+			 */
+			*dp_addr++ =3D *cb++;
+			++*dest_ofs;
+
+			/* Continue with the next token. */
+			continue;
+		}
+
+		/*
+		 * We have a phrase token. Make sure it is not the first tag in
+		 * the sb as this is illegal and would confuse the code below.
+		 */
+		if (dp_addr =3D=3D dp_sb_start)
+			goto return_overflow;
+
+		/*
+		 * Determine the number of bytes to go back (p) and the number
+		 * of bytes to copy (l). We use an optimized algorithm in which
+		 * we first calculate log2(current destination position in sb),
+		 * which allows determination of l and p in O(1) rather than
+		 * O(n). We just need an arch-optimized log2() function now.
+		 */
+		lg =3D 0;
+		for (i =3D *dest_ofs - do_sb_start - 1; i >=3D 0x10; i >>=3D 1)
+			lg++;
+
+		/* Get the phrase token into i. */
+		pt =3D le16_to_cpup((__le16 *)cb);
+
+		/*
+		 * Calculate starting position of the byte sequence in
+		 * the destination using the fact that p =3D (pt >> (12 - lg)) + 1
+		 * and make sure we don't go too far back.
+		 */
+		dp_back_addr =3D dp_addr - (pt >> (12 - lg)) - 1;
+		if (dp_back_addr < dp_sb_start)
+			goto return_overflow;
+
+		/* Now calculate the length of the byte sequence. */
+		length =3D (pt & (0xfff >> lg)) + 3;
+
+		/* Advance destination position and verify it is in range. */
+		*dest_ofs +=3D length;
+		if (*dest_ofs > do_sb_end)
+			goto return_overflow;
+
+		/* The number of non-overlapping bytes. */
+		max_non_overlap =3D dp_addr - dp_back_addr;
+
+		if (length <=3D max_non_overlap) {
+			/* The byte sequence doesn't overlap, just copy it. */
+			memcpy(dp_addr, dp_back_addr, length);
+
+			/* Advance destination pointer. */
+			dp_addr +=3D length;
+		} else {
+			/*
+			 * The byte sequence does overlap, copy non-overlapping
+			 * part and then do a slow byte by byte copy for the
+			 * overlapping part. Also, advance the destination
+			 * pointer.
+			 */
+			memcpy(dp_addr, dp_back_addr, max_non_overlap);
+			dp_addr +=3D max_non_overlap;
+			dp_back_addr +=3D max_non_overlap;
+			length -=3D max_non_overlap;
+			while (length--)
+				*dp_addr++ =3D *dp_back_addr++;
+		}
+
+		/* Advance source position and continue with the next token. */
+		cb +=3D 2;
+	}
+
+	/* No tokens left in the current tag. Continue with the next tag. */
+	goto do_next_tag;
+
+return_overflow:
+	ntfs_error(NULL, "Failed. Returning -EOVERFLOW.");
+	goto return_error;
+}
+
+/**
+ * ntfs_read_compressed_block - read a compressed block into the page cache
+ * @folio:	locked folio in the compression block(s) we need to read
+ *
+ * When we are called the page has already been verified to be locked and =
the
+ * attribute is known to be non-resident, not encrypted, but compressed.
+ *
+ * 1. Determine which compression block(s) @page is in.
+ * 2. Get hold of all pages corresponding to this/these compression block(=
s).
+ * 3. Read the (first) compression block.
+ * 4. Decompress it into the corresponding pages.
+ * 5. Throw the compressed data away and proceed to 3. for the next compre=
ssion
+ *    block or return success if no more compression blocks left.
+ *
+ * Warning: We have to be careful what we do about existing pages. They mi=
ght
+ * have been written to so that we would lose data if we were to just over=
write
+ * them with the out-of-date uncompressed data.
+ */
+int ntfs_read_compressed_block(struct folio *folio)
+{
+	struct page *page =3D &folio->page;
+	loff_t i_size;
+	s64 initialized_size;
+	struct address_space *mapping =3D page->mapping;
+	struct ntfs_inode *ni =3D NTFS_I(mapping->host);
+	struct ntfs_volume *vol =3D ni->vol;
+	struct super_block *sb =3D vol->sb;
+	struct runlist_element *rl;
+	unsigned long flags;
+	u8 *cb, *cb_pos, *cb_end;
+	unsigned long offset, index =3D page->__folio_index;
+	u32 cb_size =3D ni->itype.compressed.block_size;
+	u64 cb_size_mask =3D cb_size - 1UL;
+	s64 vcn;
+	s64 lcn;
+	/* The first wanted vcn (minimum alignment is PAGE_SIZE). */
+	s64 start_vcn =3D (((s64)index << PAGE_SHIFT) & ~cb_size_mask) >>
+			vol->cluster_size_bits;
+	/*
+	 * The first vcn after the last wanted vcn (minimum alignment is again
+	 * PAGE_SIZE.
+	 */
+	s64 end_vcn =3D ((((s64)(index + 1UL) << PAGE_SHIFT) + cb_size - 1)
+			& ~cb_size_mask) >> vol->cluster_size_bits;
+	/* Number of compression blocks (cbs) in the wanted vcn range. */
+	unsigned int nr_cbs =3D (end_vcn - start_vcn) << vol->cluster_size_bits
+			>> ni->itype.compressed.block_size_bits;
+	/*
+	 * Number of pages required to store the uncompressed data from all
+	 * compression blocks (cbs) overlapping @page. Due to alignment
+	 * guarantees of start_vcn and end_vcn, no need to round up here.
+	 */
+	unsigned int nr_pages =3D (end_vcn - start_vcn) <<
+			vol->cluster_size_bits >> PAGE_SHIFT;
+	unsigned int xpage, max_page, cur_page, cur_ofs, i, page_ofs, page_index;
+	unsigned int cb_clusters, cb_max_ofs;
+	int cb_max_page, err =3D 0;
+	struct page **pages;
+	int *completed_pages;
+	unsigned char xpage_done =3D 0;
+	struct page *lpage;
+
+	ntfs_debug("Entering, page->index =3D 0x%lx, cb_size =3D 0x%x, nr_pages =
=3D %i.",
+			index, cb_size, nr_pages);
+	/*
+	 * Bad things happen if we get here for anything that is not an
+	 * unnamed $DATA attribute.
+	 */
+	if (ni->type !=3D AT_DATA || ni->name_len) {
+		unlock_page(page);
+		return -EIO;
+	}
+
+	pages =3D kmalloc_array(nr_pages, sizeof(struct page *), GFP_NOFS);
+	completed_pages =3D kmalloc_array(nr_pages + 1, sizeof(int), GFP_NOFS);
+
+	if (unlikely(!pages || !completed_pages)) {
+		kfree(pages);
+		kfree(completed_pages);
+		unlock_page(page);
+		ntfs_error(vol->sb, "Failed to allocate internal buffers.");
+		return -ENOMEM;
+	}
+
+	/*
+	 * We have already been given one page, this is the one we must do.
+	 * Once again, the alignment guarantees keep it simple.
+	 */
+	offset =3D start_vcn << vol->cluster_size_bits >> PAGE_SHIFT;
+	xpage =3D index - offset;
+	pages[xpage] =3D page;
+	/*
+	 * The remaining pages need to be allocated and inserted into the page
+	 * cache, alignment guarantees keep all the below much simpler. (-8
+	 */
+	read_lock_irqsave(&ni->size_lock, flags);
+	i_size =3D i_size_read(VFS_I(ni));
+	initialized_size =3D ni->initialized_size;
+	read_unlock_irqrestore(&ni->size_lock, flags);
+	max_page =3D ((i_size + PAGE_SIZE - 1) >> PAGE_SHIFT) -
+			offset;
+	/* Is the page fully outside i_size? (truncate in progress) */
+	if (xpage >=3D max_page) {
+		kfree(pages);
+		kfree(completed_pages);
+		zero_user_segments(page, 0, PAGE_SIZE, 0, 0);
+		ntfs_debug("Compressed read outside i_size - truncated?");
+		SetPageUptodate(page);
+		unlock_page(page);
+		return 0;
+	}
+	if (nr_pages < max_page)
+		max_page =3D nr_pages;
+
+	for (i =3D 0; i < max_page; i++, offset++) {
+		if (i !=3D xpage)
+			pages[i] =3D grab_cache_page_nowait(mapping, offset);
+		page =3D pages[i];
+		if (page) {
+			/*
+			 * We only (re)read the page if it isn't already read
+			 * in and/or dirty or we would be losing data or at
+			 * least wasting our time.
+			 */
+			if (!PageDirty(page) && (!PageUptodate(page))) {
+				kmap_local_page(page);
+				continue;
+			}
+			unlock_page(page);
+			put_page(page);
+			pages[i] =3D NULL;
+		}
+	}
+
+	/*
+	 * We have the runlist, and all the destination pages we need to fill.
+	 * Now read the first compression block.
+	 */
+	cur_page =3D 0;
+	cur_ofs =3D 0;
+	cb_clusters =3D ni->itype.compressed.block_clusters;
+do_next_cb:
+	nr_cbs--;
+
+	mutex_lock(&ntfs_cb_lock);
+	if (!ntfs_compression_buffer)
+		if (allocate_compression_buffers()) {
+			mutex_unlock(&ntfs_cb_lock);
+			goto err_out;
+		}
+
+
+	cb =3D ntfs_compression_buffer;
+	cb_pos =3D cb;
+	cb_end =3D cb + cb_size;
+
+	rl =3D NULL;
+	for (vcn =3D start_vcn, start_vcn +=3D cb_clusters; vcn < start_vcn;
+			vcn++) {
+		bool is_retry =3D false;
+
+		if (!rl) {
+lock_retry_remap:
+			down_read(&ni->runlist.lock);
+			rl =3D ni->runlist.rl;
+		}
+		if (likely(rl !=3D NULL)) {
+			/* Seek to element containing target vcn. */
+			while (rl->length && rl[1].vcn <=3D vcn)
+				rl++;
+			lcn =3D ntfs_rl_vcn_to_lcn(rl, vcn);
+		} else
+			lcn =3D LCN_RL_NOT_MAPPED;
+		ntfs_debug("Reading vcn =3D 0x%llx, lcn =3D 0x%llx.",
+				(unsigned long long)vcn,
+				(unsigned long long)lcn);
+		if (lcn < 0) {
+			/*
+			 * When we reach the first sparse cluster we have
+			 * finished with the cb.
+			 */
+			if (lcn =3D=3D LCN_HOLE)
+				break;
+			if (is_retry || lcn !=3D LCN_RL_NOT_MAPPED) {
+				mutex_unlock(&ntfs_cb_lock);
+				goto rl_err;
+			}
+			is_retry =3D true;
+			/*
+			 * Attempt to map runlist, dropping lock for the
+			 * duration.
+			 */
+			up_read(&ni->runlist.lock);
+			if (!ntfs_map_runlist(ni, vcn))
+				goto lock_retry_remap;
+			mutex_unlock(&ntfs_cb_lock);
+			goto map_rl_err;
+		}
+
+		page_ofs =3D (lcn << vol->cluster_size_bits) & ~PAGE_MASK;
+		page_index =3D (lcn << vol->cluster_size_bits) >> PAGE_SHIFT;
+
+retry:
+		lpage =3D read_mapping_page(sb->s_bdev->bd_mapping,
+					  page_index, NULL);
+		if (PTR_ERR(page) =3D=3D -EINTR)
+			goto retry;
+		else if (IS_ERR(lpage)) {
+			err =3D PTR_ERR(lpage);
+			mutex_unlock(&ntfs_cb_lock);
+			goto read_err;
+		}
+
+		lock_page(lpage);
+		memcpy(cb_pos, page_address(lpage) + page_ofs,
+		       vol->cluster_size);
+		unlock_page(lpage);
+		put_page(lpage);
+		cb_pos +=3D vol->cluster_size;
+	}
+
+	/* Release the lock if we took it. */
+	if (rl)
+		up_read(&ni->runlist.lock);
+
+	/* Just a precaution. */
+	if (cb_pos + 2 <=3D cb + cb_size)
+		*(u16 *)cb_pos =3D 0;
+
+	/* Reset cb_pos back to the beginning. */
+	cb_pos =3D cb;
+
+	/* We now have both source (if present) and destination. */
+	ntfs_debug("Successfully read the compression block.");
+
+	/* The last page and maximum offset within it for the current cb. */
+	cb_max_page =3D (cur_page << PAGE_SHIFT) + cur_ofs + cb_size;
+	cb_max_ofs =3D cb_max_page & ~PAGE_MASK;
+	cb_max_page >>=3D PAGE_SHIFT;
+
+	/* Catch end of file inside a compression block. */
+	if (cb_max_page > max_page)
+		cb_max_page =3D max_page;
+
+	if (vcn =3D=3D start_vcn - cb_clusters) {
+		/* Sparse cb, zero out page range overlapping the cb. */
+		ntfs_debug("Found sparse compression block.");
+		/* We can sleep from now on, so we drop lock. */
+		mutex_unlock(&ntfs_cb_lock);
+		if (cb_max_ofs)
+			cb_max_page--;
+		for (; cur_page < cb_max_page; cur_page++) {
+			page =3D pages[cur_page];
+			if (page) {
+				if (likely(!cur_ofs))
+					clear_page(page_address(page));
+				else
+					memset(page_address(page) + cur_ofs, 0,
+							PAGE_SIZE -
+							cur_ofs);
+				flush_dcache_page(page);
+				kunmap_local(page_address(page));
+				SetPageUptodate(page);
+				unlock_page(page);
+				if (cur_page =3D=3D xpage)
+					xpage_done =3D 1;
+				else
+					put_page(page);
+				pages[cur_page] =3D NULL;
+			}
+			cb_pos +=3D PAGE_SIZE - cur_ofs;
+			cur_ofs =3D 0;
+			if (cb_pos >=3D cb_end)
+				break;
+		}
+		/* If we have a partial final page, deal with it now. */
+		if (cb_max_ofs && cb_pos < cb_end) {
+			page =3D pages[cur_page];
+			if (page)
+				memset(page_address(page) + cur_ofs, 0,
+						cb_max_ofs - cur_ofs);
+			/*
+			 * No need to update cb_pos at this stage:
+			 *	cb_pos +=3D cb_max_ofs - cur_ofs;
+			 */
+			cur_ofs =3D cb_max_ofs;
+		}
+	} else if (vcn =3D=3D start_vcn) {
+		/* We can't sleep so we need two stages. */
+		unsigned int cur2_page =3D cur_page;
+		unsigned int cur_ofs2 =3D cur_ofs;
+		u8 *cb_pos2 =3D cb_pos;
+
+		ntfs_debug("Found uncompressed compression block.");
+		/* Uncompressed cb, copy it to the destination pages. */
+		if (cb_max_ofs)
+			cb_max_page--;
+		/* First stage: copy data into destination pages. */
+		for (; cur_page < cb_max_page; cur_page++) {
+			page =3D pages[cur_page];
+			if (page)
+				memcpy(page_address(page) + cur_ofs, cb_pos,
+						PAGE_SIZE - cur_ofs);
+			cb_pos +=3D PAGE_SIZE - cur_ofs;
+			cur_ofs =3D 0;
+			if (cb_pos >=3D cb_end)
+				break;
+		}
+		/* If we have a partial final page, deal with it now. */
+		if (cb_max_ofs && cb_pos < cb_end) {
+			page =3D pages[cur_page];
+			if (page)
+				memcpy(page_address(page) + cur_ofs, cb_pos,
+						cb_max_ofs - cur_ofs);
+			cb_pos +=3D cb_max_ofs - cur_ofs;
+			cur_ofs =3D cb_max_ofs;
+		}
+		/* We can sleep from now on, so drop lock. */
+		mutex_unlock(&ntfs_cb_lock);
+		/* Second stage: finalize pages. */
+		for (; cur2_page < cb_max_page; cur2_page++) {
+			page =3D pages[cur2_page];
+			if (page) {
+				/*
+				 * If we are outside the initialized size, zero
+				 * the out of bounds page range.
+				 */
+				handle_bounds_compressed_page(page, i_size,
+						initialized_size);
+				flush_dcache_page(page);
+				kunmap_local(page_address(page));
+				SetPageUptodate(page);
+				unlock_page(page);
+				if (cur2_page =3D=3D xpage)
+					xpage_done =3D 1;
+				else
+					put_page(page);
+				pages[cur2_page] =3D NULL;
+			}
+			cb_pos2 +=3D PAGE_SIZE - cur_ofs2;
+			cur_ofs2 =3D 0;
+			if (cb_pos2 >=3D cb_end)
+				break;
+		}
+	} else {
+		/* Compressed cb, decompress it into the destination page(s). */
+		unsigned int prev_cur_page =3D cur_page;
+
+		ntfs_debug("Found compressed compression block.");
+		err =3D ntfs_decompress(pages, completed_pages, &cur_page,
+				&cur_ofs, cb_max_page, cb_max_ofs, xpage,
+				&xpage_done, cb_pos, cb_size - (cb_pos - cb),
+				i_size, initialized_size);
+		/*
+		 * We can sleep from now on, lock already dropped by
+		 * ntfs_decompress().
+		 */
+		if (err) {
+			ntfs_error(vol->sb,
+				"ntfs_decompress() failed in inode 0x%lx with error code %i. Skipping =
this compression block.",
+				ni->mft_no, -err);
+			/* Release the unfinished pages. */
+			for (; prev_cur_page < cur_page; prev_cur_page++) {
+				page =3D pages[prev_cur_page];
+				if (page) {
+					flush_dcache_page(page);
+					kunmap_local(page_address(page));
+					unlock_page(page);
+					if (prev_cur_page !=3D xpage)
+						put_page(page);
+					pages[prev_cur_page] =3D NULL;
+				}
+			}
+		}
+	}
+
+	/* Do we have more work to do? */
+	if (nr_cbs)
+		goto do_next_cb;
+
+	/* Clean up if we have any pages left. Should never happen. */
+	for (cur_page =3D 0; cur_page < max_page; cur_page++) {
+		page =3D pages[cur_page];
+		if (page) {
+			ntfs_error(vol->sb,
+				"Still have pages left! Terminating them with extreme prejudice.  Inod=
e 0x%lx, page index 0x%lx.",
+				ni->mft_no, page->__folio_index);
+			flush_dcache_page(page);
+			kunmap_local(page_address(page));
+			unlock_page(page);
+			if (cur_page !=3D xpage)
+				put_page(page);
+			pages[cur_page] =3D NULL;
+		}
+	}
+
+	/* We no longer need the list of pages. */
+	kfree(pages);
+	kfree(completed_pages);
+
+	/* If we have completed the requested page, we return success. */
+	if (likely(xpage_done))
+		return 0;
+
+	ntfs_debug("Failed. Returning error code %s.", err =3D=3D -EOVERFLOW ?
+			"EOVERFLOW" : (!err ? "EIO" : "unknown error"));
+	return err < 0 ? err : -EIO;
+
+map_rl_err:
+	ntfs_error(vol->sb, "ntfs_map_runlist() failed. Cannot read compression b=
lock.");
+	goto err_out;
+
+rl_err:
+	up_read(&ni->runlist.lock);
+	ntfs_error(vol->sb, "ntfs_rl_vcn_to_lcn() failed. Cannot read compression=
 block.");
+	goto err_out;
+
+read_err:
+	up_read(&ni->runlist.lock);
+	ntfs_error(vol->sb, "IO error while reading compressed data.");
+
+err_out:
+	for (i =3D cur_page; i < max_page; i++) {
+		page =3D pages[i];
+		if (page) {
+			flush_dcache_page(page);
+			kunmap_local(page_address(page));
+			unlock_page(page);
+			if (i !=3D xpage)
+				put_page(page);
+		}
+	}
+	kfree(pages);
+	kfree(completed_pages);
+	return -EIO;
+}
+
+/*
+ * Match length at or above which ntfs_best_match() will stop searching for
+ * longer matches.
+ */
+#define NICE_MATCH_LEN		18
+
+/*
+ * Maximum number of potential matches that ntfs_best_match() will conside=
r at
+ * each position.
+ */
+#define MAX_SEARCH_DEPTH	24
+
+/* log base 2 of the number of entries in the hash table for match-finding=
.  */
+#define HASH_SHIFT		14
+
+/* Constant for the multiplicative hash function.  */
+#define HASH_MULTIPLIER		0x1E35A7BD
+
+struct COMPRESS_CONTEXT {
+	const unsigned char *inbuf;
+	int bufsize;
+	int size;
+	int rel;
+	int mxsz;
+	s16 head[1 << HASH_SHIFT];
+	s16 prev[NTFS_SB_SIZE];
+};
+
+/*
+ * Hash the next 3-byte sequence in the input buffer
+ */
+static inline unsigned int ntfs_hash(const u8 *p)
+{
+	u32 str;
+	u32 hash;
+
+	/*
+	 * Unaligned access allowed, and little endian CPU.
+	 * Callers ensure that at least 4 (not 3) bytes are remaining.
+	 */
+	str =3D *(const u32 *)p & 0xFFFFFF;
+	hash =3D str * HASH_MULTIPLIER;
+
+	/* High bits are more random than the low bits.  */
+	return hash >> (32 - HASH_SHIFT);
+}
+
+/*
+ * Search for the longest sequence matching current position
+ *
+ * A hash table, each entry of which points to a chain of sequence
+ * positions sharing the corresponding hash code, is maintained to speed up
+ * searching for matches.  To maintain the hash table, either
+ * ntfs_best_match() or ntfs_skip_position() has to be called for each
+ * consecutive position.
+ *
+ * This function is heavily used; it has to be optimized carefully.
+ *
+ * This function sets pctx->size and pctx->rel to the length and offset,
+ * respectively, of the longest match found.
+ *
+ * The minimum match length is assumed to be 3, and the maximum match
+ * length is assumed to be pctx->mxsz.  If this function produces
+ * pctx->size < 3, then no match was found.
+ *
+ * Note: for the following reasons, this function is not guaranteed to find
+ * *the* longest match up to pctx->mxsz:
+ *
+ *      (1) If this function finds a match of NICE_MATCH_LEN bytes or grea=
ter,
+ *          it ends early because a match this long is good enough and it'=
s not
+ *          worth spending more time searching.
+ *
+ *      (2) If this function considers MAX_SEARCH_DEPTH matches with a sin=
gle
+ *          position, it ends early and returns the longest match found so=
 far.
+ *          This saves a lot of time on degenerate inputs.
+ */
+static void ntfs_best_match(struct COMPRESS_CONTEXT *pctx, const int i,
+		int best_len)
+{
+	const u8 * const inbuf =3D pctx->inbuf;
+	const u8 * const strptr =3D &inbuf[i]; /* String we're matching against */
+	s16 * const prev =3D pctx->prev;
+	const int max_len =3D min(pctx->bufsize - i, pctx->mxsz);
+	const int nice_len =3D min(NICE_MATCH_LEN, max_len);
+	int depth_remaining =3D MAX_SEARCH_DEPTH;
+	const u8 *best_matchptr =3D strptr;
+	unsigned int hash;
+	s16 cur_match;
+	const u8 *matchptr;
+	int len;
+
+	if (max_len < 4)
+		goto out;
+
+	/* Insert the current sequence into the appropriate hash chain. */
+	hash =3D ntfs_hash(strptr);
+	cur_match =3D pctx->head[hash];
+	prev[i] =3D cur_match;
+	pctx->head[hash] =3D i;
+
+	if (best_len >=3D max_len) {
+		/*
+		 * Lazy match is being attempted, but there aren't enough length
+		 * bits remaining to code a longer match.
+		 */
+		goto out;
+	}
+
+	/* Search the appropriate hash chain for matches. */
+
+	for (; cur_match >=3D 0 && depth_remaining--; cur_match =3D prev[cur_matc=
h]) {
+		matchptr =3D &inbuf[cur_match];
+
+		/*
+		 * Considering the potential match at 'matchptr':  is it longer
+		 * than 'best_len'?
+		 *
+		 * The bytes at index 'best_len' are the most likely to differ,
+		 * so check them first.
+		 *
+		 * The bytes at indices 'best_len - 1' and '0' are less
+		 * important to check separately.  But doing so still gives a
+		 * slight performance improvement, at least on x86_64, probably
+		 * because they create separate branches for the CPU to predict
+		 * independently of the branches in the main comparison loops.
+		 */
+		if (matchptr[best_len] !=3D strptr[best_len] ||
+				matchptr[best_len - 1] !=3D strptr[best_len - 1] ||
+				matchptr[0] !=3D strptr[0])
+			goto next_match;
+
+		for (len =3D 1; len < best_len - 1; len++)
+			if (matchptr[len] !=3D strptr[len])
+				goto next_match;
+
+		/*
+		 * The match is the longest found so far ---
+		 * at least 'best_len' + 1 bytes.  Continue extending it.
+		 */
+
+		best_matchptr =3D matchptr;
+
+		do {
+			if (++best_len >=3D nice_len) {
+				/*
+				 * 'nice_len' reached; don't waste time
+				 * searching for longer matches.  Extend the
+				 * match as far as possible and terminate the
+				 * search.
+				 */
+				while (best_len < max_len &&
+				       (best_matchptr[best_len] =3D=3D
+					strptr[best_len]))
+					best_len++;
+				goto out;
+			}
+		} while (best_matchptr[best_len] =3D=3D strptr[best_len]);
+
+		/* Found a longer match, but 'nice_len' not yet reached.  */
+
+next_match:
+		/* Continue to next match in the chain.  */
+		;
+	}
+
+	/*
+	 * Reached end of chain, or ended early due to reaching the maximum
+	 * search depth.
+	 */
+
+out:
+	/* Return the longest match we were able to find.  */
+	pctx->size =3D best_len;
+	pctx->rel =3D best_matchptr - strptr; /* given as a negative number! */
+}
+
+/*
+ * Advance the match-finder, but don't search for matches.
+ */
+static void ntfs_skip_position(struct COMPRESS_CONTEXT *pctx, const int i)
+{
+	unsigned int hash;
+
+	if (pctx->bufsize - i < 4)
+		return;
+
+	/* Insert the current sequence into the appropriate hash chain.  */
+	hash =3D ntfs_hash(pctx->inbuf + i);
+	pctx->prev[i] =3D pctx->head[hash];
+	pctx->head[hash] =3D i;
+}
+
+/*
+ * Compress a 4096-byte block
+ *
+ * Returns a header of two bytes followed by the compressed data.
+ * If compression is not effective, the header and an uncompressed
+ * block is returned.
+ *
+ * Note : two bytes may be output before output buffer overflow
+ * is detected, so a 4100-bytes output buffer must be reserved.
+ *
+ * Returns the size of the compressed block, including the
+ * header (minimal size is 2, maximum size is 4098)
+ * 0 if an error has been met.
+ */
+static unsigned int ntfs_compress_block(const char *inbuf, const int bufsi=
ze,
+		char *outbuf)
+{
+	struct COMPRESS_CONTEXT *pctx;
+	int i; /* current position */
+	int j; /* end of best match from current position */
+	int k; /* end of best match from next position */
+	int offs; /* offset to best match */
+	int bp; /* bits to store offset */
+	int bp_cur; /* saved bits to store offset at current position */
+	int mxoff; /* max match offset : 1 << bp */
+	unsigned int xout;
+	unsigned int q; /* aggregated offset and size */
+	int have_match; /* do we have a match at the current position? */
+	char *ptag; /* location reserved for a tag */
+	int tag;    /* current value of tag */
+	int ntag;   /* count of bits still undefined in tag */
+
+	pctx =3D ntfs_malloc_nofs(sizeof(struct COMPRESS_CONTEXT));
+	if (!pctx)
+		return -ENOMEM;
+
+	/*
+	 * All hash chains start as empty.  The special value '-1' indicates the
+	 * end of each hash chain.
+	 */
+	memset(pctx->head, 0xFF, sizeof(pctx->head));
+
+	pctx->inbuf =3D (const unsigned char *)inbuf;
+	pctx->bufsize =3D bufsize;
+	xout =3D 2;
+	i =3D 0;
+	bp =3D 4;
+	mxoff =3D 1 << bp;
+	pctx->mxsz =3D (1 << (16 - bp)) + 2;
+	have_match =3D 0;
+	tag =3D 0;
+	ntag =3D 8;
+	ptag =3D &outbuf[xout++];
+
+	while ((i < bufsize) && (xout < (NTFS_SB_SIZE + 2))) {
+
+		/*
+		 * This implementation uses "lazy" parsing: it always chooses
+		 * the longest match, unless the match at the next position is
+		 * longer.  This is the same strategy used by the high
+		 * compression modes of zlib.
+		 */
+		if (!have_match) {
+			/*
+			 * Find the longest match at the current position.  But
+			 * first adjust the maximum match length if needed.
+			 * (This loop might need to run more than one time in
+			 * the case that we just output a long match.)
+			 */
+			while (mxoff < i) {
+				bp++;
+				mxoff <<=3D 1;
+				pctx->mxsz =3D (pctx->mxsz + 2) >> 1;
+			}
+			ntfs_best_match(pctx, i, 2);
+		}
+
+		if (pctx->size >=3D 3) {
+			/* Found a match at the current position.  */
+			j =3D i + pctx->size;
+			bp_cur =3D bp;
+			offs =3D pctx->rel;
+
+			if (pctx->size >=3D NICE_MATCH_LEN) {
+				/* Choose long matches immediately.  */
+				q =3D (~offs << (16 - bp_cur)) + (j - i - 3);
+				outbuf[xout++] =3D q & 255;
+				outbuf[xout++] =3D (q >> 8) & 255;
+				tag |=3D (1 << (8 - ntag));
+
+				if (j =3D=3D bufsize) {
+					/*
+					 * Shortcut if the match extends to the
+					 * end of the buffer.
+					 */
+					i =3D j;
+					--ntag;
+					break;
+				}
+				i +=3D 1;
+				do {
+					ntfs_skip_position(pctx, i);
+				} while (++i !=3D j);
+				have_match =3D 0;
+			} else {
+				/*
+				 * Check for a longer match at the next
+				 * position.
+				 */
+
+				/*
+				 * Doesn't need to be while() since we just
+				 * adjusted the maximum match length at the
+				 * previous position.
+				 */
+				if (mxoff < i + 1) {
+					bp++;
+					mxoff <<=3D 1;
+					pctx->mxsz =3D (pctx->mxsz + 2) >> 1;
+				}
+				ntfs_best_match(pctx, i + 1, pctx->size);
+				k =3D i + 1 + pctx->size;
+
+				if (k > (j + 1)) {
+					/*
+					 * Next match is longer.
+					 * Output a literal.
+					 */
+					outbuf[xout++] =3D inbuf[i++];
+					have_match =3D 1;
+				} else {
+					/*
+					 * Next match isn't longer.
+					 * Output the current match.
+					 */
+					q =3D (~offs << (16 - bp_cur)) +
+						(j - i - 3);
+					outbuf[xout++] =3D q & 255;
+					outbuf[xout++] =3D (q >> 8) & 255;
+					tag |=3D (1 << (8 - ntag));
+
+					/*
+					 * The minimum match length is 3, and
+					 * we've run two bytes through the
+					 * matchfinder already.  So the minimum
+					 * number of positions we need to skip
+					 * is 1.
+					 */
+					i +=3D 2;
+					do {
+						ntfs_skip_position(pctx, i);
+					} while (++i !=3D j);
+					have_match =3D 0;
+				}
+			}
+		} else {
+			/* No match at current position.  Output a literal. */
+			outbuf[xout++] =3D inbuf[i++];
+			have_match =3D 0;
+		}
+
+		/* Store the tag if fully used. */
+		if (!--ntag) {
+			*ptag =3D tag;
+			ntag =3D 8;
+			ptag =3D &outbuf[xout++];
+			tag =3D 0;
+		}
+	}
+
+	/* Store the last tag if partially used. */
+	if (ntag =3D=3D 8)
+		xout--;
+	else
+		*ptag =3D tag;
+
+	/* Determine whether to store the data compressed or uncompressed. */
+	if ((i >=3D bufsize) && (xout < (NTFS_SB_SIZE + 2))) {
+		/* Compressed. */
+		outbuf[0] =3D (xout - 3) & 255;
+		outbuf[1] =3D 0xb0 + (((xout - 3) >> 8) & 15);
+	} else {
+		/* Uncompressed.  */
+		memcpy(&outbuf[2], inbuf, bufsize);
+		if (bufsize < NTFS_SB_SIZE)
+			memset(&outbuf[bufsize + 2], 0, NTFS_SB_SIZE - bufsize);
+		outbuf[0] =3D 0xff;
+		outbuf[1] =3D 0x3f;
+		xout =3D NTFS_SB_SIZE + 2;
+	}
+
+	/*
+	 * Free the compression context and return the total number of bytes
+	 * written to 'outbuf'.
+	 */
+	ntfs_free(pctx);
+	return xout;
+}
+
+static int ntfs_write_cb(struct ntfs_inode *ni, loff_t pos, struct page **=
pages,
+		int pages_per_cb)
+{
+	struct ntfs_volume *vol =3D ni->vol;
+	char *outbuf =3D NULL, *pbuf, *inbuf;
+	u32 compsz, p, insz =3D pages_per_cb << PAGE_SHIFT;
+	s32 rounded, bio_size;
+	unsigned int sz, bsz;
+	bool fail =3D false, allzeroes;
+	/* a single compressed zero */
+	static char onezero[] =3D {0x01, 0xb0, 0x00, 0x00};
+	/* a couple of compressed zeroes */
+	static char twozeroes[] =3D {0x02, 0xb0, 0x00, 0x00, 0x00};
+	/* more compressed zeroes, to be followed by some count */
+	static char morezeroes[] =3D {0x03, 0xb0, 0x02, 0x00};
+	struct page **pages_disk =3D NULL, *pg;
+	s64 bio_lcn;
+	struct runlist_element *rlc, *rl;
+	int i, err;
+	int pages_count =3D (round_up(ni->itype.compressed.block_size + 2 *
+		(ni->itype.compressed.block_size / NTFS_SB_SIZE) + 2, PAGE_SIZE)) / PAGE=
_SIZE;
+	size_t new_rl_count;
+	struct bio *bio =3D NULL;
+	loff_t new_length;
+	s64 new_vcn;
+
+	inbuf =3D vmap(pages, pages_per_cb, VM_MAP, PAGE_KERNEL_RO);
+	if (!inbuf)
+		return -ENOMEM;
+
+	/* may need 2 extra bytes per block and 2 more bytes */
+	pages_disk =3D kcalloc(pages_count, sizeof(struct page *), GFP_NOFS);
+	if (!pages_disk) {
+		vunmap(inbuf);
+		return -ENOMEM;
+	}
+
+	for (i =3D 0; i < pages_count; i++) {
+		pg =3D alloc_page(GFP_KERNEL);
+		if (!pg) {
+			err =3D -ENOMEM;
+			goto out;
+		}
+		pages_disk[i] =3D pg;
+		lock_page(pg);
+		kmap_local_page(pg);
+	}
+
+	outbuf =3D vmap(pages_disk, pages_count, VM_MAP, PAGE_KERNEL);
+	if (!outbuf) {
+		err =3D -ENOMEM;
+		goto out;
+	}
+
+	compsz =3D 0;
+	allzeroes =3D true;
+	for (p =3D 0; (p < insz) && !fail; p +=3D NTFS_SB_SIZE) {
+		if ((p + NTFS_SB_SIZE) < insz)
+			bsz =3D NTFS_SB_SIZE;
+		else
+			bsz =3D insz - p;
+		pbuf =3D &outbuf[compsz];
+		sz =3D ntfs_compress_block(&inbuf[p], bsz, pbuf);
+		/* fail if all the clusters (or more) are needed */
+		if (!sz || ((compsz + sz + vol->cluster_size + 2) >
+			    ni->itype.compressed.block_size))
+			fail =3D true;
+		else {
+			if (allzeroes) {
+				/* check whether this is all zeroes */
+				switch (sz) {
+				case 4:
+					allzeroes =3D !memcmp(pbuf, onezero, 4);
+					break;
+				case 5:
+					allzeroes =3D !memcmp(pbuf, twozeroes, 5);
+					break;
+				case 6:
+					allzeroes =3D !memcmp(pbuf, morezeroes, 4);
+					break;
+				default:
+					allzeroes =3D false;
+					break;
+				}
+			}
+			compsz +=3D sz;
+		}
+	}
+
+	if (!fail && !allzeroes) {
+		outbuf[compsz++] =3D 0;
+		outbuf[compsz++] =3D 0;
+		rounded =3D ((compsz - 1) | (vol->cluster_size - 1)) + 1;
+		memset(&outbuf[compsz], 0, rounded - compsz);
+		bio_size =3D rounded;
+		pages =3D pages_disk;
+	} else if (allzeroes) {
+		err =3D 0;
+		goto out;
+	} else {
+		bio_size =3D insz;
+	}
+
+	new_vcn =3D (pos & ~(ni->itype.compressed.block_size - 1)) >> vol->cluste=
r_size_bits;
+	new_length =3D round_up(bio_size, vol->cluster_size) >> vol->cluster_size=
_bits;
+
+	err =3D ntfs_non_resident_attr_punch_hole(ni, new_vcn, ni->itype.compress=
ed.block_clusters);
+	if (err < 0)
+		goto out;
+
+	rlc =3D ntfs_cluster_alloc(vol, new_vcn, new_length, -1, DATA_ZONE,
+			false, true, true);
+	if (IS_ERR(rlc)) {
+		err =3D PTR_ERR(rlc);
+		goto out;
+	}
+
+	bio_lcn =3D rlc->lcn;
+	down_write(&ni->runlist.lock);
+	rl =3D ntfs_runlists_merge(&ni->runlist, rlc, 0, &new_rl_count);
+	if (IS_ERR(rl)) {
+		up_write(&ni->runlist.lock);
+		ntfs_error(vol->sb, "Failed to merge runlists");
+		err =3D PTR_ERR(rl);
+		if (ntfs_cluster_free_from_rl(vol, rlc))
+			ntfs_error(vol->sb, "Failed to free hot clusters.");
+		ntfs_free(rlc);
+		goto out;
+	}
+
+	ni->runlist.count =3D new_rl_count;
+	ni->runlist.rl =3D rl;
+
+	err =3D ntfs_attr_update_mapping_pairs(ni, 0);
+	up_write(&ni->runlist.lock);
+	if (err) {
+		err =3D -EIO;
+		goto out;
+	}
+
+	i =3D 0;
+	while (bio_size > 0) {
+		int page_size;
+
+		if (bio_size >=3D PAGE_SIZE) {
+			page_size =3D PAGE_SIZE;
+			bio_size -=3D PAGE_SIZE;
+		} else {
+			page_size =3D bio_size;
+			bio_size =3D 0;
+		}
+
+setup_bio:
+		if (!bio) {
+			bio =3D ntfs_setup_bio(vol, REQ_OP_WRITE, bio_lcn + i, 0);
+			if (!bio) {
+				err =3D -ENOMEM;
+				goto out;
+			}
+		}
+
+		if (!bio_add_page(bio, pages[i], page_size, 0)) {
+			err =3D submit_bio_wait(bio);
+			bio_put(bio);
+			if (err)
+				goto out;
+			bio =3D NULL;
+			goto setup_bio;
+		}
+		i++;
+	}
+
+	err =3D submit_bio_wait(bio);
+	bio_put(bio);
+out:
+	vunmap(outbuf);
+	for (i =3D 0; i < pages_count; i++) {
+		pg =3D pages_disk[i];
+		if (pg) {
+			kunmap_local(page_address(pg));
+			unlock_page(pg);
+			put_page(pg);
+		}
+	}
+	kfree(pages_disk);
+	vunmap(inbuf);
+	NInoSetFileNameDirty(ni);
+	mark_mft_record_dirty(ni);
+
+	return err;
+}
+
+int ntfs_compress_write(struct ntfs_inode *ni, loff_t pos, size_t count,
+		struct iov_iter *from)
+{
+	struct folio *folio;
+	struct page **pages =3D NULL, *page;
+	int pages_per_cb =3D ni->itype.compressed.block_size >> PAGE_SHIFT;
+	int cb_size =3D ni->itype.compressed.block_size, cb_off, err =3D 0;
+	int i, ip;
+	size_t written =3D 0;
+	struct address_space *mapping =3D VFS_I(ni)->i_mapping;
+
+	pages =3D kmalloc_array(pages_per_cb, sizeof(struct page *), GFP_NOFS);
+	if (!pages)
+		return -ENOMEM;
+
+	while (count) {
+		pgoff_t index;
+		size_t copied, bytes;
+		int off;
+
+		off =3D pos & (cb_size - 1);
+		bytes =3D cb_size - off;
+		if (bytes > count)
+			bytes =3D count;
+
+		cb_off =3D pos & ~(cb_size - 1);
+		index =3D cb_off >> PAGE_SHIFT;
+
+		if (unlikely(fault_in_iov_iter_readable(from, bytes))) {
+			err =3D -EFAULT;
+			goto out;
+		}
+
+		for (i =3D 0; i < pages_per_cb; i++) {
+			folio =3D ntfs_read_mapping_folio(mapping, index + i);
+			if (IS_ERR(folio)) {
+				for (ip =3D 0; ip < i; ip++) {
+					folio_unlock(page_folio(pages[ip]));
+					folio_put(page_folio(pages[ip]));
+				}
+				err =3D PTR_ERR(folio);
+				goto out;
+			}
+
+			folio_lock(folio);
+			pages[i] =3D folio_page(folio, 0);
+		}
+
+		WARN_ON(!bytes);
+		copied =3D 0;
+		ip =3D off >> PAGE_SHIFT;
+		off =3D offset_in_page(pos);
+
+		for (;;) {
+			size_t cp, tail =3D PAGE_SIZE - off;
+
+			page =3D pages[ip];
+			cp =3D copy_folio_from_iter_atomic(page_folio(page), off,
+					min(tail, bytes), from);
+			flush_dcache_page(page);
+
+			copied +=3D cp;
+			bytes -=3D cp;
+			if (!bytes || !cp)
+				break;
+
+			if (cp < tail) {
+				off +=3D cp;
+			} else {
+				ip++;
+				off =3D 0;
+			}
+		}
+
+		err =3D ntfs_write_cb(ni, pos, pages, pages_per_cb);
+
+		for (i =3D 0; i < pages_per_cb; i++) {
+			folio =3D page_folio(pages[i]);
+			if (i < ip) {
+				folio_clear_dirty(folio);
+				folio_mark_uptodate(folio);
+			}
+			folio_unlock(folio);
+			folio_put(folio);
+		}
+
+		if (err)
+			goto out;
+
+		cond_resched();
+		pos +=3D copied;
+		written +=3D copied;
+		count =3D iov_iter_count(from);
+	}
+
+out:
+	kfree(pages);
+	if (err < 0)
+		written =3D err;
+
+	return written;
+}
--=20
2.25.1
From nobody Sun Feb  8 02:39:38 2026
Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com
 [209.85.214.170])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70F4C3101A7
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:01:10 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.170
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219678; cv=none;
 b=FnAqSchEvEY4Vi7qeV45kwVGuMdPSvlA/LDz1SA6bJ5IlKEKXoeTQQ6Mia171SV0ZOwKJqhMuKmy58GSpvtVSLqGbLJMIvJSgt6dLgVtO4XXxTPv95L5XZGm3NuNy8X8jwz7iGCurqnLrlWVorw8gkgVYAuhMS+KHkGr62D1FPs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219678; c=relaxed/simple;
	bh=a8VQJBUJMdpS3NXA6tTAElxaHapLoyfccZ2E2+EANzk=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=rDSMf5C3y5ie9biDkR3ijlMonqf1SAt7JGYrhDLo5pM0iCh23E/X2+UQUy2flJVbQQQCpCA0lZ4OnaPYjgiofaPpzmnB8YpMFR8nH+gIPvVN4qUIl8mvRs2THS6kmoxYBDbnPdRkDNjggpeGUJsY2hnaGuHpSVNwZmobcscyKFU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.170
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f170.google.com with SMTP id
 d9443c01a7336-2956d816c10so5899295ad.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:01:10 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219669; x=1764824469;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=7RXvY3PupCQoM1kugl8UySzEJV2t4ObA6NRL9zfEXd8=;
        b=AoXeCUvVhAmCjI0rhYHffiB8XURjHfDgEKG42jVd685MrsFfG/A/5lLSuSV7aq/edu
         dhKZgJJA1TvK2Deit4STcGmQwofzzkQpzkWxU77ieAgUgsWUvv30RxjuGUIMvUOwurL+
         mALk9j2hmSUcVYz10DbI2g3ld/YdYSJrrkC3XfXOuwiaVKJ7CDCFlnH6woSjwizjj5z/
         IaTdM+/NjEsVpN8puQ8kBOZqbMFcsXPvaVsddPr5vA3DSwgW9sYNUgnbZDhH9LRKWZ+a
         vvMGOTbSZ54t8sGLnKoapp7SaImB6JJrfbIep7zmp2mvedRRIm0NujIssVfKYv+4BUCH
         vzGg==
X-Forwarded-Encrypted: i=1;
 AJvYcCVsEbfmmXO5wFLFWZp6xGnXyni2CPX9Ftr+iLP7gCS2AXid+rCS4jWSbURMcc53BsFcVyTfJed35i4JHMU=@vger.kernel.org
X-Gm-Message-State: AOJu0YwZ+XvvFeRBxlRBC4dYVB+zFfK4ON/DmUM+jkVsYM3yRMV7WaID
	vJjY7+SyDUu9mtRlwq8kt7geVx3mOhhGEu/WnAmPldWH2kt2Ot4boN9E
X-Gm-Gg: ASbGncs//AtwPiGUQIq5U2ACDDfgoL/O6HiS7pMeuUDSt98b/qhVq1lDq2+VpGvl8cv
	XzKrMUJugyG0qavnS7q91KaiqsZYiuRUyWpxGK1HVBqc1OsHtMVCWd3eUUCQt8RK0XAkym8SLt1
	UajaDOjTN1WpgP6T1R96jXyW6inmd/+rGxUU5q1hgJA66Fap/D0ZsKWBqaTc+DFeIuAiW7cOSh7
	xiEIphJUPCdzWYjLoJRl5w05sLzeJP0L8qymjhw2Cb9vcCnkuUusaVRHNrkUWozW7t7ejPov/Ze
	sNlYnwpy39P+UXqTdHoAyNE/+urCtNKPVLsgZ4iLlG585YZpbhwAHGtriUbGaXuphpueUoP8wff
	5xrSV8Iyv7Cm+ZY/x1Ch9fg1lJFe2mivO2TjqPJ0Z6Q28bWZLU9A/9p0tTwcf1nh3psHhrwqrAB
	UQpdafragpdR256+S0zvBQNCT0dw==
X-Google-Smtp-Source: 
 AGHT+IGl+4dYLS5L8IK4TrUgus+FspiaY4fH3mGqjdkTqAXNQlQX5cLaUttb+LGO4SiE/8kO3bA4Zg==
X-Received: by 2002:a17:903:2346:b0:298:68e:405e with SMTP id
 d9443c01a7336-29b6bf9b0a5mr242855705ad.59.1764219667667;
        Wed, 26 Nov 2025 21:01:07 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.01.01
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:01:06 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>,
	Hyunchul Lee <hyc.lee@gmail.com>
Subject: [PATCH v2 08/11] ntfsplus: add runlist handling and cluster allocator
Date: Thu, 27 Nov 2025 13:59:41 +0900
Message-Id: <20251127045944.26009-9-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds the implementation of runlist handling and cluster allocator
for ntfsplus.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/ntfsplus/bitmap.c   |  290 ++++++
 fs/ntfsplus/lcnalloc.c | 1012 ++++++++++++++++++++
 fs/ntfsplus/runlist.c  | 1983 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 3285 insertions(+)
 create mode 100644 fs/ntfsplus/bitmap.c
 create mode 100644 fs/ntfsplus/lcnalloc.c
 create mode 100644 fs/ntfsplus/runlist.c

diff --git a/fs/ntfsplus/bitmap.c b/fs/ntfsplus/bitmap.c
new file mode 100644
index 000000000000..a806f01db839
--- /dev/null
+++ b/fs/ntfsplus/bitmap.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel bitmap handling. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2004-2005 Anton Altaparmakov
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/bitops.h>
+
+#include "bitmap.h"
+#include "aops.h"
+#include "ntfs.h"
+
+int ntfsp_trim_fs(struct ntfs_volume *vol, struct fstrim_range *range)
+{
+	size_t buf_clusters;
+	pgoff_t index, start_index, end_index;
+	struct file_ra_state *ra;
+	struct folio *folio;
+	unsigned long *bitmap;
+	char *kaddr;
+	u64 end, trimmed =3D 0, start_buf, end_buf, end_cluster;
+	u64 start_cluster =3D range->start >> vol->cluster_size_bits;
+	u32 dq =3D bdev_discard_granularity(vol->sb->s_bdev);
+	int ret =3D 0;
+
+	if (!dq)
+		dq =3D vol->cluster_size;
+
+	if (start_cluster >=3D vol->nr_clusters)
+		return -EINVAL;
+
+	if (range->len =3D=3D (u64)-1)
+		end_cluster =3D vol->nr_clusters;
+	else {
+		end_cluster =3D (range->start + range->len + vol->cluster_size - 1) >>
+			vol->cluster_size_bits;
+		if (end_cluster > vol->nr_clusters)
+			end_cluster =3D vol->nr_clusters;
+	}
+
+	ra =3D kzalloc(sizeof(*ra), GFP_NOFS);
+	if (!ra)
+		return -ENOMEM;
+
+	buf_clusters =3D PAGE_SIZE * 8;
+	start_index =3D start_cluster >> 15;
+	end_index =3D (end_cluster + buf_clusters - 1) >> 15;
+
+	for (index =3D start_index; index < end_index; index++) {
+		folio =3D filemap_lock_folio(vol->lcnbmp_ino->i_mapping, index);
+		if (IS_ERR(folio)) {
+			page_cache_sync_readahead(vol->lcnbmp_ino->i_mapping, ra, NULL,
+					index, end_index - index);
+			folio =3D ntfs_read_mapping_folio(vol->lcnbmp_ino->i_mapping, index);
+			if (!IS_ERR(folio))
+				folio_lock(folio);
+		}
+		if (IS_ERR(folio)) {
+			ret =3D PTR_ERR(folio);
+			goto out_free;
+		}
+
+		kaddr =3D kmap_local_folio(folio, 0);
+		bitmap =3D (unsigned long *)kaddr;
+
+		start_buf =3D max_t(u64, index * buf_clusters, start_cluster);
+		end_buf =3D min_t(u64, (index + 1) * buf_clusters, end_cluster);
+
+		end =3D start_buf;
+		while (end < end_buf) {
+			u64 aligned_start, aligned_count;
+			u64 start =3D find_next_zero_bit(bitmap, end_buf - start_buf,
+					end - start_buf) + start_buf;
+			if (start >=3D end_buf)
+				break;
+
+			end =3D find_next_bit(bitmap, end_buf - start_buf,
+					start - start_buf) + start_buf;
+
+			aligned_start =3D ALIGN(start << vol->cluster_size_bits, dq);
+			aligned_count =3D ALIGN_DOWN((end - start) << vol->cluster_size_bits, d=
q);
+			if (aligned_count >=3D range->minlen) {
+				ret =3D blkdev_issue_discard(vol->sb->s_bdev, aligned_start >> 9,
+						aligned_count >> 9, GFP_NOFS);
+				if (ret)
+					goto out_unmap;
+				trimmed +=3D aligned_count;
+			}
+		}
+
+out_unmap:
+		kunmap_local(kaddr);
+		folio_unlock(folio);
+		folio_put(folio);
+
+		if (ret)
+			goto out_free;
+	}
+
+	range->len =3D trimmed;
+
+out_free:
+	kfree(ra);
+	return ret;
+}
+
+/**
+ * __ntfs_bitmap_set_bits_in_run - set a run of bits in a bitmap to a value
+ * @vi:			vfs inode describing the bitmap
+ * @start_bit:		first bit to set
+ * @count:		number of bits to set
+ * @value:		value to set the bits to (i.e. 0 or 1)
+ * @is_rollback:	if 'true' this is a rollback operation
+ *
+ * Set @count bits starting at bit @start_bit in the bitmap described by t=
he
+ * vfs inode @vi to @value, where @value is either 0 or 1.
+ *
+ * @is_rollback should always be 'false', it is for internal use to rollba=
ck
+ * errors.  You probably want to use ntfs_bitmap_set_bits_in_run() instead.
+ */
+int __ntfs_bitmap_set_bits_in_run(struct inode *vi, const s64 start_bit,
+		const s64 count, const u8 value, const bool is_rollback)
+{
+	s64 cnt =3D count;
+	pgoff_t index, end_index;
+	struct address_space *mapping;
+	struct folio *folio;
+	u8 *kaddr;
+	int pos, len;
+	u8 bit;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_volume *vol =3D ni->vol;
+
+	ntfs_debug("Entering for i_ino 0x%lx, start_bit 0x%llx, count 0x%llx, val=
ue %u.%s",
+			vi->i_ino, (unsigned long long)start_bit,
+			(unsigned long long)cnt, (unsigned int)value,
+			is_rollback ? " (rollback)" : "");
+
+	if (start_bit < 0 || cnt < 0 || value > 1)
+		return -EINVAL;
+
+	/*
+	 * Calculate the indices for the pages containing the first and last
+	 * bits, i.e. @start_bit and @start_bit + @cnt - 1, respectively.
+	 */
+	index =3D start_bit >> (3 + PAGE_SHIFT);
+	end_index =3D (start_bit + cnt - 1) >> (3 + PAGE_SHIFT);
+
+	/* Get the page containing the first bit (@start_bit). */
+	mapping =3D vi->i_mapping;
+	folio =3D ntfs_read_mapping_folio(mapping, index);
+	if (IS_ERR(folio)) {
+		if (!is_rollback)
+			ntfs_error(vi->i_sb,
+				"Failed to map first page (error %li), aborting.",
+				PTR_ERR(folio));
+		return PTR_ERR(folio);
+	}
+
+	folio_lock(folio);
+	kaddr =3D kmap_local_folio(folio, 0);
+
+	/* Set @pos to the position of the byte containing @start_bit. */
+	pos =3D (start_bit >> 3) & ~PAGE_MASK;
+
+	/* Calculate the position of @start_bit in the first byte. */
+	bit =3D start_bit & 7;
+
+	/* If the first byte is partial, modify the appropriate bits in it. */
+	if (bit) {
+		u8 *byte =3D kaddr + pos;
+
+		if (ni->mft_no =3D=3D FILE_Bitmap)
+			ntfs_set_lcn_empty_bits(vol, index, value, min_t(s64, 8 - bit, cnt));
+		while ((bit & 7) && cnt) {
+			cnt--;
+			if (value)
+				*byte |=3D 1 << bit++;
+			else
+				*byte &=3D ~(1 << bit++);
+		}
+		/* If we are done, unmap the page and return success. */
+		if (!cnt)
+			goto done;
+
+		/* Update @pos to the new position. */
+		pos++;
+	}
+	/*
+	 * Depending on @value, modify all remaining whole bytes in the page up
+	 * to @cnt.
+	 */
+	len =3D min_t(s64, cnt >> 3, PAGE_SIZE - pos);
+	memset(kaddr + pos, value ? 0xff : 0, len);
+	cnt -=3D len << 3;
+	if (ni->mft_no =3D=3D FILE_Bitmap)
+		ntfs_set_lcn_empty_bits(vol, index, value, len << 3);
+
+	/* Update @len to point to the first not-done byte in the page. */
+	if (cnt < 8)
+		len +=3D pos;
+
+	/* If we are not in the last page, deal with all subsequent pages. */
+	while (index < end_index) {
+		if (cnt <=3D 0)
+			goto rollback;
+
+		/* Update @index and get the next folio. */
+		flush_dcache_folio(folio);
+		folio_mark_dirty(folio);
+		folio_unlock(folio);
+		ntfs_unmap_folio(folio, kaddr);
+		folio =3D ntfs_read_mapping_folio(mapping, ++index);
+		if (IS_ERR(folio)) {
+			ntfs_error(vi->i_sb,
+				   "Failed to map subsequent page (error %li), aborting.",
+				   PTR_ERR(folio));
+			goto rollback;
+		}
+
+		folio_lock(folio);
+		kaddr =3D kmap_local_folio(folio, 0);
+		/*
+		 * Depending on @value, modify all remaining whole bytes in the
+		 * page up to @cnt.
+		 */
+		len =3D min_t(s64, cnt >> 3, PAGE_SIZE);
+		memset(kaddr, value ? 0xff : 0, len);
+		cnt -=3D len << 3;
+		if (ni->mft_no =3D=3D FILE_Bitmap)
+			ntfs_set_lcn_empty_bits(vol, index, value, len << 3);
+	}
+	/*
+	 * The currently mapped page is the last one.  If the last byte is
+	 * partial, modify the appropriate bits in it.  Note, @len is the
+	 * position of the last byte inside the page.
+	 */
+	if (cnt) {
+		u8 *byte;
+
+		WARN_ON(cnt > 7);
+
+		bit =3D cnt;
+		byte =3D kaddr + len;
+		if (ni->mft_no =3D=3D FILE_Bitmap)
+			ntfs_set_lcn_empty_bits(vol, index, value, bit);
+		while (bit--) {
+			if (value)
+				*byte |=3D 1 << bit;
+			else
+				*byte &=3D ~(1 << bit);
+		}
+	}
+done:
+	/* We are done.  Unmap the folio and return success. */
+	flush_dcache_folio(folio);
+	folio_mark_dirty(folio);
+	folio_unlock(folio);
+	ntfs_unmap_folio(folio, kaddr);
+	ntfs_debug("Done.");
+	return 0;
+rollback:
+	/*
+	 * Current state:
+	 *	- no pages are mapped
+	 *	- @count - @cnt is the number of bits that have been modified
+	 */
+	if (is_rollback)
+		return PTR_ERR(folio);
+	if (count !=3D cnt)
+		pos =3D __ntfs_bitmap_set_bits_in_run(vi, start_bit, count - cnt,
+				value ? 0 : 1, true);
+	else
+		pos =3D 0;
+	if (!pos) {
+		/* Rollback was successful. */
+		ntfs_error(vi->i_sb,
+			"Failed to map subsequent page (error %li), aborting.",
+			PTR_ERR(folio));
+	} else {
+		/* Rollback failed. */
+		ntfs_error(vi->i_sb,
+			"Failed to map subsequent page (error %li) and rollback failed (error %=
i). Aborting and leaving inconsistent metadata. Unmount and run chkdsk.",
+			PTR_ERR(folio), pos);
+		NVolSetErrors(NTFS_SB(vi->i_sb));
+	}
+	return PTR_ERR(folio);
+}
diff --git a/fs/ntfsplus/lcnalloc.c b/fs/ntfsplus/lcnalloc.c
new file mode 100644
index 000000000000..92db7c7780e9
--- /dev/null
+++ b/fs/ntfsplus/lcnalloc.c
@@ -0,0 +1,1012 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Cluster (de)allocation code. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2004-2005 Anton Altaparmakov
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ *
+ * Part of this file is based on code from the NTFS-3G project.
+ * and is copyrighted by the respective authors below:
+ * Copyright (c) 2002-2004 Anton Altaparmakov
+ * Copyright (c) 2004 Yura Pakhuchiy
+ * Copyright (c) 2004-2008 Szabolcs Szakacsits
+ * Copyright (c) 2008-2009 Jean-Pierre Andre
+ */
+
+#include "lcnalloc.h"
+#include "bitmap.h"
+#include "misc.h"
+#include "aops.h"
+#include "ntfs.h"
+
+/**
+ * ntfs_cluster_free_from_rl_nolock - free clusters from runlist
+ * @vol:	mounted ntfs volume on which to free the clusters
+ * @rl:		runlist describing the clusters to free
+ *
+ * Free all the clusters described by the runlist @rl on the volume @vol. =
 In
+ * the case of an error being returned, at least some of the clusters were=
 not
+ * freed.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: - The volume lcn bitmap must be locked for writing on entry an=
d is
+ *	      left locked on return.
+ */
+int ntfs_cluster_free_from_rl_nolock(struct ntfs_volume *vol,
+		const struct runlist_element *rl)
+{
+	struct inode *lcnbmp_vi =3D vol->lcnbmp_ino;
+	int ret =3D 0;
+	s64 nr_freed =3D 0;
+
+	ntfs_debug("Entering.");
+	if (!rl)
+		return 0;
+
+	if (!NVolFreeClusterKnown(vol))
+		wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+
+	for (; rl->length; rl++) {
+		int err;
+
+		if (rl->lcn < 0)
+			continue;
+		err =3D ntfs_bitmap_clear_run(lcnbmp_vi, rl->lcn, rl->length);
+		if (unlikely(err && (!ret || ret =3D=3D -ENOMEM) && ret !=3D err))
+			ret =3D err;
+		else
+			nr_freed +=3D rl->length;
+	}
+	ntfs_inc_free_clusters(vol, nr_freed);
+	ntfs_debug("Done.");
+	return ret;
+}
+
+static s64 max_empty_bit_range(unsigned char *buf, int size)
+{
+	int i, j, run =3D 0;
+	int max_range =3D 0;
+	s64 start_pos =3D -1;
+
+	ntfs_debug("Entering\n");
+
+	i =3D 0;
+	while (i < size) {
+		switch (*buf) {
+		case 0:
+			do {
+				buf++;
+				run +=3D 8;
+				i++;
+			} while ((i < size) && !*buf);
+			break;
+		case 255:
+			if (run > max_range) {
+				max_range =3D run;
+				start_pos =3D (s64)i * 8 - run;
+			}
+			run =3D 0;
+			do {
+				buf++;
+				i++;
+			} while ((i < size) && (*buf =3D=3D 255));
+			break;
+		default:
+			for (j =3D 0; j < 8; j++) {
+				int bit =3D *buf & (1 << j);
+
+				if (bit) {
+					if (run > max_range) {
+						max_range =3D run;
+						start_pos =3D (s64)i * 8 + (j - run);
+					}
+					run =3D 0;
+				} else
+					run++;
+			}
+			i++;
+			buf++;
+		}
+	}
+
+	if (run > max_range)
+		start_pos =3D (s64)i * 8 - run;
+
+	return start_pos;
+}
+
+/**
+ * ntfs_cluster_alloc - allocate clusters on an ntfs volume
+ *
+ * Allocate @count clusters preferably starting at cluster @start_lcn or a=
t the
+ * current allocator position if @start_lcn is -1, on the mounted ntfs vol=
ume
+ * @vol. @zone is either DATA_ZONE for allocation of normal clusters or
+ * MFT_ZONE for allocation of clusters for the master file table, i.e. the
+ * $MFT/$DATA attribute.
+ *
+ * @start_vcn specifies the vcn of the first allocated cluster.  This makes
+ * merging the resulting runlist with the old runlist easier.
+ *
+ * If @is_extension is 'true', the caller is allocating clusters to extend=
 an
+ * attribute and if it is 'false', the caller is allocating clusters to fi=
ll a
+ * hole in an attribute.  Practically the difference is that if @is_extens=
ion
+ * is 'true' the returned runlist will be terminated with LCN_ENOENT and if
+ * @is_extension is 'false' the runlist will be terminated with
+ * LCN_RL_NOT_MAPPED.
+ *
+ * You need to check the return value with IS_ERR().  If this is false, the
+ * function was successful and the return value is a runlist describing the
+ * allocated cluster(s).  If IS_ERR() is true, the function failed and
+ * PTR_ERR() gives you the error code.
+ *
+ * Notes on the allocation algorithm
+ * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+ *
+ * There are two data zones.  First is the area between the end of the mft=
 zone
+ * and the end of the volume, and second is the area between the start of =
the
+ * volume and the start of the mft zone.  On unmodified/standard NTFS 1.x
+ * volumes, the second data zone does not exist due to the mft zone being
+ * expanded to cover the start of the volume in order to reserve space for=
 the
+ * mft bitmap attribute.
+ *
+ * This is not the prettiest function but the complexity stems from the ne=
ed of
+ * implementing the mft vs data zoned approach and from the fact that we h=
ave
+ * access to the lcn bitmap in portions of up to 8192 bytes at a time, so =
we
+ * need to cope with crossing over boundaries of two buffers.  Further, the
+ * fact that the allocator allows for caller supplied hints as to the loca=
tion
+ * of where allocation should begin and the fact that the allocator keeps =
track
+ * of where in the data zones the next natural allocation should occur,
+ * contribute to the complexity of the function.  But it should all be
+ * worthwhile, because this allocator should: 1) be a full implementation =
of
+ * the MFT zone approach used by Windows NT, 2) cause reduction in
+ * fragmentation, and 3) be speedy in allocations (the code is not optimiz=
ed
+ * for speed, but the algorithm is, so further speed improvements are prob=
ably
+ * possible).
+ *
+ * Locking: - The volume lcn bitmap must be unlocked on entry and is unloc=
ked
+ *	      on return.
+ *	    - This function takes the volume lcn bitmap lock for writing and
+ *	      modifies the bitmap contents.
+ */
+struct runlist_element *ntfs_cluster_alloc(struct ntfs_volume *vol, const =
s64 start_vcn,
+		const s64 count, const s64 start_lcn,
+		const int zone,
+		const bool is_extension,
+		const bool is_contig,
+		const bool is_dealloc)
+{
+	s64 zone_start, zone_end, bmp_pos, bmp_initial_pos, last_read_pos, lcn;
+	s64 prev_lcn =3D 0, prev_run_len =3D 0, mft_zone_size;
+	s64 clusters, free_clusters;
+	loff_t i_size;
+	struct inode *lcnbmp_vi;
+	struct runlist_element *rl =3D NULL;
+	struct address_space *mapping;
+	struct folio *folio =3D NULL;
+	u8 *buf =3D NULL, *byte;
+	int err =3D 0, rlpos, rlsize, buf_size, pg_off;
+	u8 pass, done_zones, search_zone, need_writeback =3D 0, bit;
+	unsigned int memalloc_flags;
+	u8 has_guess;
+	pgoff_t index;
+
+	ntfs_debug("Entering for start_vcn 0x%llx, count 0x%llx, start_lcn 0x%llx=
, zone %s_ZONE.",
+			start_vcn, count, start_lcn,
+			zone =3D=3D MFT_ZONE ? "MFT" : "DATA");
+
+	lcnbmp_vi =3D vol->lcnbmp_ino;
+	if (start_vcn < 0 || start_lcn < LCN_HOLE ||
+	    zone < FIRST_ZONE || zone > LAST_ZONE)
+		return ERR_PTR(-EINVAL);
+
+	/* Return NULL if @count is zero. */
+	if (count < 0 || !count)
+		return ERR_PTR(-EINVAL);
+
+	memalloc_flags =3D memalloc_nofs_save();
+
+	if (!NVolFreeClusterKnown(vol))
+		wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+	free_clusters =3D atomic64_read(&vol->free_clusters);
+
+	/* Take the lcnbmp lock for writing. */
+	down_write(&vol->lcnbmp_lock);
+	if (is_dealloc =3D=3D false)
+		free_clusters -=3D atomic64_read(&vol->dirty_clusters);
+
+	if (free_clusters < count) {
+		up_write(&vol->lcnbmp_lock);
+		return ERR_PTR(-ENOSPC);
+	}
+
+	/*
+	 * If no specific @start_lcn was requested, use the current data zone
+	 * position, otherwise use the requested @start_lcn but make sure it
+	 * lies outside the mft zone.  Also set done_zones to 0 (no zones done)
+	 * and pass depending on whether we are starting inside a zone (1) or
+	 * at the beginning of a zone (2).  If requesting from the MFT_ZONE,
+	 * we either start at the current position within the mft zone or at
+	 * the specified position.  If the latter is out of bounds then we start
+	 * at the beginning of the MFT_ZONE.
+	 */
+	done_zones =3D 0;
+	pass =3D 1;
+	/*
+	 * zone_start and zone_end are the current search range.  search_zone
+	 * is 1 for mft zone, 2 for data zone 1 (end of mft zone till end of
+	 * volume) and 4 for data zone 2 (start of volume till start of mft
+	 * zone).
+	 */
+	has_guess =3D 1;
+	zone_start =3D start_lcn;
+
+	if (zone_start < 0) {
+		if (zone =3D=3D DATA_ZONE)
+			zone_start =3D vol->data1_zone_pos;
+		else
+			zone_start =3D vol->mft_zone_pos;
+		if (!zone_start) {
+			/*
+			 * Zone starts at beginning of volume which means a
+			 * single pass is sufficient.
+			 */
+			pass =3D 2;
+		}
+		has_guess =3D 0;
+	}
+
+	if (!zone_start || zone_start =3D=3D vol->mft_zone_start ||
+			zone_start =3D=3D vol->mft_zone_end)
+		pass =3D 2;
+
+	if (zone_start < vol->mft_zone_start) {
+		zone_end =3D vol->mft_zone_start;
+		search_zone =3D 4;
+		/* Skip searching the mft zone. */
+		done_zones |=3D 1;
+	} else if (zone_start < vol->mft_zone_end) {
+		zone_end =3D vol->mft_zone_end;
+		search_zone =3D 1;
+	} else {
+		zone_end =3D vol->nr_clusters;
+		search_zone =3D 2;
+		/* Skip searching the mft zone. */
+		done_zones |=3D 1;
+	}
+
+	/*
+	 * bmp_pos is the current bit position inside the bitmap.  We use
+	 * bmp_initial_pos to determine whether or not to do a zone switch.
+	 */
+	bmp_pos =3D bmp_initial_pos =3D zone_start;
+
+	/* Loop until all clusters are allocated, i.e. clusters =3D=3D 0. */
+	clusters =3D count;
+	rlpos =3D rlsize =3D 0;
+	mapping =3D lcnbmp_vi->i_mapping;
+	i_size =3D i_size_read(lcnbmp_vi);
+	while (1) {
+		ntfs_debug("Start of outer while loop: done_zones 0x%x, search_zone %i, =
pass %i, zone_start 0x%llx, zone_end 0x%llx, bmp_initial_pos 0x%llx, bmp_po=
s 0x%llx, rlpos %i, rlsize %i.",
+				done_zones, search_zone, pass,
+				zone_start, zone_end, bmp_initial_pos,
+				bmp_pos, rlpos, rlsize);
+		/* Loop until we run out of free clusters. */
+		last_read_pos =3D bmp_pos >> 3;
+		ntfs_debug("last_read_pos 0x%llx.", last_read_pos);
+		if (last_read_pos >=3D i_size) {
+			ntfs_debug("End of attribute reached. Skipping to zone_pass_done.");
+			goto zone_pass_done;
+		}
+		if (likely(folio)) {
+			if (need_writeback) {
+				ntfs_debug("Marking page dirty.");
+				flush_dcache_folio(folio);
+				folio_mark_dirty(folio);
+				need_writeback =3D 0;
+			}
+			folio_unlock(folio);
+			ntfs_unmap_folio(folio, buf);
+			folio =3D NULL;
+		}
+
+		index =3D last_read_pos >> PAGE_SHIFT;
+		pg_off =3D last_read_pos & ~PAGE_MASK;
+		buf_size =3D PAGE_SIZE - pg_off;
+		if (unlikely(last_read_pos + buf_size > i_size))
+			buf_size =3D i_size - last_read_pos;
+		buf_size <<=3D 3;
+		lcn =3D bmp_pos & 7;
+		bmp_pos &=3D ~(s64)7;
+
+		if (vol->lcn_empty_bits_per_page[index] =3D=3D 0)
+			goto next_bmp_pos;
+
+		folio =3D ntfs_read_mapping_folio(mapping, index);
+		if (IS_ERR(folio)) {
+			err =3D PTR_ERR(folio);
+			ntfs_error(vol->sb, "Failed to map page.");
+			goto out;
+		}
+
+		folio_lock(folio);
+		buf =3D kmap_local_folio(folio, 0) + pg_off;
+		ntfs_debug("Before inner while loop: buf_size %i, lcn 0x%llx, bmp_pos 0x=
%llx, need_writeback %i.",
+				buf_size, lcn, bmp_pos, need_writeback);
+		while (lcn < buf_size && lcn + bmp_pos < zone_end) {
+			byte =3D buf + (lcn >> 3);
+			ntfs_debug("In inner while loop: buf_size %i, lcn 0x%llx, bmp_pos 0x%ll=
x, need_writeback %i, byte ofs 0x%x, *byte 0x%x.",
+					buf_size, lcn, bmp_pos, need_writeback,
+					(unsigned int)(lcn >> 3),
+					(unsigned int)*byte);
+			bit =3D 1 << (lcn & 7);
+			ntfs_debug("bit 0x%x.", bit);
+
+			if (has_guess) {
+				if (*byte & bit) {
+					if (is_contig =3D=3D true && prev_run_len > 0)
+						goto done;
+
+					has_guess =3D 0;
+					break;
+				}
+			} else {
+				lcn =3D max_empty_bit_range(buf, buf_size >> 3);
+				if (lcn < 0)
+					break;
+				has_guess =3D 1;
+				continue;
+			}
+			/*
+			 * Allocate more memory if needed, including space for
+			 * the terminator element.
+			 * ntfs_malloc_nofs() operates on whole pages only.
+			 */
+			if ((rlpos + 2) * sizeof(*rl) > rlsize) {
+				struct runlist_element *rl2;
+
+				ntfs_debug("Reallocating memory.");
+				if (!rl)
+					ntfs_debug("First free bit is at s64 0x%llx.",
+							lcn + bmp_pos);
+				rl2 =3D ntfs_malloc_nofs(rlsize + (int)PAGE_SIZE);
+				if (unlikely(!rl2)) {
+					err =3D -ENOMEM;
+					ntfs_error(vol->sb, "Failed to allocate memory.");
+					goto out;
+				}
+				memcpy(rl2, rl, rlsize);
+				ntfs_free(rl);
+				rl =3D rl2;
+				rlsize +=3D PAGE_SIZE;
+				ntfs_debug("Reallocated memory, rlsize 0x%x.",
+						rlsize);
+			}
+			/* Allocate the bitmap bit. */
+			*byte |=3D bit;
+			/* We need to write this bitmap page to disk. */
+			need_writeback =3D 1;
+			ntfs_debug("*byte 0x%x, need_writeback is set.",
+					(unsigned int)*byte);
+			ntfs_dec_free_clusters(vol, 1);
+			ntfs_set_lcn_empty_bits(vol, index, 1, 1);
+
+			/*
+			 * Coalesce with previous run if adjacent LCNs.
+			 * Otherwise, append a new run.
+			 */
+			ntfs_debug("Adding run (lcn 0x%llx, len 0x%llx), prev_lcn 0x%llx, lcn 0=
x%llx, bmp_pos 0x%llx, prev_run_len 0x%llx, rlpos %i.",
+					lcn + bmp_pos, 1ULL, prev_lcn,
+					lcn, bmp_pos, prev_run_len, rlpos);
+			if (prev_lcn =3D=3D lcn + bmp_pos - prev_run_len && rlpos) {
+				ntfs_debug("Coalescing to run (lcn 0x%llx, len 0x%llx).",
+						rl[rlpos - 1].lcn,
+						rl[rlpos - 1].length);
+				rl[rlpos - 1].length =3D ++prev_run_len;
+				ntfs_debug("Run now (lcn 0x%llx, len 0x%llx), prev_run_len 0x%llx.",
+						rl[rlpos - 1].lcn,
+						rl[rlpos - 1].length,
+						prev_run_len);
+			} else {
+				if (likely(rlpos)) {
+					ntfs_debug("Adding new run, (previous run lcn 0x%llx, len 0x%llx).",
+							rl[rlpos - 1].lcn, rl[rlpos - 1].length);
+					rl[rlpos].vcn =3D rl[rlpos - 1].vcn +
+							prev_run_len;
+				} else {
+					ntfs_debug("Adding new run, is first run.");
+					rl[rlpos].vcn =3D start_vcn;
+				}
+				rl[rlpos].lcn =3D prev_lcn =3D lcn + bmp_pos;
+				rl[rlpos].length =3D prev_run_len =3D 1;
+				rlpos++;
+			}
+			/* Done? */
+			if (!--clusters) {
+				s64 tc;
+done:
+				/*
+				 * Update the current zone position.  Positions
+				 * of already scanned zones have been updated
+				 * during the respective zone switches.
+				 */
+				tc =3D lcn + bmp_pos + 1;
+				ntfs_debug("Done. Updating current zone position, tc 0x%llx, search_zo=
ne %i.",
+						tc, search_zone);
+				switch (search_zone) {
+				case 1:
+					ntfs_debug("Before checks, vol->mft_zone_pos 0x%llx.",
+							vol->mft_zone_pos);
+					if (tc >=3D vol->mft_zone_end) {
+						vol->mft_zone_pos =3D
+								vol->mft_lcn;
+						if (!vol->mft_zone_end)
+							vol->mft_zone_pos =3D 0;
+					} else if ((bmp_initial_pos >=3D
+							vol->mft_zone_pos ||
+							tc > vol->mft_zone_pos)
+							&& tc >=3D vol->mft_lcn)
+						vol->mft_zone_pos =3D tc;
+					ntfs_debug("After checks, vol->mft_zone_pos 0x%llx.",
+							vol->mft_zone_pos);
+					break;
+				case 2:
+					ntfs_debug("Before checks, vol->data1_zone_pos 0x%llx.",
+							vol->data1_zone_pos);
+					if (tc >=3D vol->nr_clusters)
+						vol->data1_zone_pos =3D
+							     vol->mft_zone_end;
+					else if ((bmp_initial_pos >=3D
+						    vol->data1_zone_pos ||
+						    tc > vol->data1_zone_pos)
+						    && tc >=3D vol->mft_zone_end)
+						vol->data1_zone_pos =3D tc;
+					ntfs_debug("After checks, vol->data1_zone_pos 0x%llx.",
+							vol->data1_zone_pos);
+					break;
+				case 4:
+					ntfs_debug("Before checks, vol->data2_zone_pos 0x%llx.",
+							vol->data2_zone_pos);
+					if (tc >=3D vol->mft_zone_start)
+						vol->data2_zone_pos =3D 0;
+					else if (bmp_initial_pos >=3D
+						      vol->data2_zone_pos ||
+						      tc > vol->data2_zone_pos)
+						vol->data2_zone_pos =3D tc;
+					ntfs_debug("After checks, vol->data2_zone_pos 0x%llx.",
+							vol->data2_zone_pos);
+					break;
+				default:
+					WARN_ON(1);
+				}
+				ntfs_debug("Finished.  Going to out.");
+				goto out;
+			}
+			lcn++;
+		}
+next_bmp_pos:
+		bmp_pos +=3D buf_size;
+		ntfs_debug("After inner while loop: buf_size 0x%x, lcn 0x%llx, bmp_pos 0=
x%llx, need_writeback %i.",
+				buf_size, lcn, bmp_pos, need_writeback);
+		if (bmp_pos < zone_end) {
+			ntfs_debug("Continuing outer while loop, bmp_pos 0x%llx, zone_end 0x%ll=
x.",
+					bmp_pos, zone_end);
+			continue;
+		}
+zone_pass_done:	/* Finished with the current zone pass. */
+		ntfs_debug("At zone_pass_done, pass %i.", pass);
+		if (pass =3D=3D 1) {
+			/*
+			 * Now do pass 2, scanning the first part of the zone
+			 * we omitted in pass 1.
+			 */
+			pass =3D 2;
+			zone_end =3D zone_start;
+			switch (search_zone) {
+			case 1: /* mft_zone */
+				zone_start =3D vol->mft_zone_start;
+				break;
+			case 2: /* data1_zone */
+				zone_start =3D vol->mft_zone_end;
+				break;
+			case 4: /* data2_zone */
+				zone_start =3D 0;
+				break;
+			default:
+				WARN_ON(1);
+			}
+			/* Sanity check. */
+			if (zone_end < zone_start)
+				zone_end =3D zone_start;
+			bmp_pos =3D zone_start;
+			ntfs_debug("Continuing outer while loop, pass 2, zone_start 0x%llx, zon=
e_end 0x%llx, bmp_pos 0x%llx.",
+					zone_start, zone_end, bmp_pos);
+			continue;
+		} /* pass =3D=3D 2 */
+done_zones_check:
+		ntfs_debug("At done_zones_check, search_zone %i, done_zones before 0x%x,=
 done_zones after 0x%x.",
+				search_zone, done_zones,
+				done_zones | search_zone);
+		done_zones |=3D search_zone;
+		if (done_zones < 7) {
+			ntfs_debug("Switching zone.");
+			/* Now switch to the next zone we haven't done yet. */
+			pass =3D 1;
+			switch (search_zone) {
+			case 1:
+				ntfs_debug("Switching from mft zone to data1 zone.");
+				/* Update mft zone position. */
+				if (rlpos) {
+					s64 tc;
+
+					ntfs_debug("Before checks, vol->mft_zone_pos 0x%llx.",
+							vol->mft_zone_pos);
+					tc =3D rl[rlpos - 1].lcn +
+							rl[rlpos - 1].length;
+					if (tc >=3D vol->mft_zone_end) {
+						vol->mft_zone_pos =3D
+								vol->mft_lcn;
+						if (!vol->mft_zone_end)
+							vol->mft_zone_pos =3D 0;
+					} else if ((bmp_initial_pos >=3D
+							vol->mft_zone_pos ||
+							tc > vol->mft_zone_pos)
+							&& tc >=3D vol->mft_lcn)
+						vol->mft_zone_pos =3D tc;
+					ntfs_debug("After checks, vol->mft_zone_pos 0x%llx.",
+							vol->mft_zone_pos);
+				}
+				/* Switch from mft zone to data1 zone. */
+switch_to_data1_zone:		search_zone =3D 2;
+				zone_start =3D bmp_initial_pos =3D
+						vol->data1_zone_pos;
+				zone_end =3D vol->nr_clusters;
+				if (zone_start =3D=3D vol->mft_zone_end)
+					pass =3D 2;
+				if (zone_start >=3D zone_end) {
+					vol->data1_zone_pos =3D zone_start =3D
+							vol->mft_zone_end;
+					pass =3D 2;
+				}
+				break;
+			case 2:
+				ntfs_debug("Switching from data1 zone to data2 zone.");
+				/* Update data1 zone position. */
+				if (rlpos) {
+					s64 tc;
+
+					ntfs_debug("Before checks, vol->data1_zone_pos 0x%llx.",
+							vol->data1_zone_pos);
+					tc =3D rl[rlpos - 1].lcn +
+							rl[rlpos - 1].length;
+					if (tc >=3D vol->nr_clusters)
+						vol->data1_zone_pos =3D
+							     vol->mft_zone_end;
+					else if ((bmp_initial_pos >=3D
+						    vol->data1_zone_pos ||
+						    tc > vol->data1_zone_pos)
+						    && tc >=3D vol->mft_zone_end)
+						vol->data1_zone_pos =3D tc;
+					ntfs_debug("After checks, vol->data1_zone_pos 0x%llx.",
+							vol->data1_zone_pos);
+				}
+				/* Switch from data1 zone to data2 zone. */
+				search_zone =3D 4;
+				zone_start =3D bmp_initial_pos =3D
+						vol->data2_zone_pos;
+				zone_end =3D vol->mft_zone_start;
+				if (!zone_start)
+					pass =3D 2;
+				if (zone_start >=3D zone_end) {
+					vol->data2_zone_pos =3D zone_start =3D
+							bmp_initial_pos =3D 0;
+					pass =3D 2;
+				}
+				break;
+			case 4:
+				ntfs_debug("Switching from data2 zone to data1 zone.");
+				/* Update data2 zone position. */
+				if (rlpos) {
+					s64 tc;
+
+					ntfs_debug("Before checks, vol->data2_zone_pos 0x%llx.",
+							vol->data2_zone_pos);
+					tc =3D rl[rlpos - 1].lcn +
+							rl[rlpos - 1].length;
+					if (tc >=3D vol->mft_zone_start)
+						vol->data2_zone_pos =3D 0;
+					else if (bmp_initial_pos >=3D
+						      vol->data2_zone_pos ||
+						      tc > vol->data2_zone_pos)
+						vol->data2_zone_pos =3D tc;
+					ntfs_debug("After checks, vol->data2_zone_pos 0x%llx.",
+							vol->data2_zone_pos);
+				}
+				/* Switch from data2 zone to data1 zone. */
+				goto switch_to_data1_zone;
+			default:
+				WARN_ON(1);
+			}
+			ntfs_debug("After zone switch, search_zone %i, pass %i, bmp_initial_pos=
 0x%llx, zone_start 0x%llx, zone_end 0x%llx.",
+					search_zone, pass,
+					bmp_initial_pos,
+					zone_start,
+					zone_end);
+			bmp_pos =3D zone_start;
+			if (zone_start =3D=3D zone_end) {
+				ntfs_debug("Empty zone, going to done_zones_check.");
+				/* Empty zone. Don't bother searching it. */
+				goto done_zones_check;
+			}
+			ntfs_debug("Continuing outer while loop.");
+			continue;
+		} /* done_zones =3D=3D 7 */
+		ntfs_debug("All zones are finished.");
+		/*
+		 * All zones are finished!  If DATA_ZONE, shrink mft zone.  If
+		 * MFT_ZONE, we have really run out of space.
+		 */
+		mft_zone_size =3D vol->mft_zone_end - vol->mft_zone_start;
+		ntfs_debug("vol->mft_zone_start 0x%llx, vol->mft_zone_end 0x%llx, mft_zo=
ne_size 0x%llx.",
+				vol->mft_zone_start, vol->mft_zone_end,
+				mft_zone_size);
+		if (zone =3D=3D MFT_ZONE || mft_zone_size <=3D 0) {
+			ntfs_debug("No free clusters left, going to out.");
+			/* Really no more space left on device. */
+			err =3D -ENOSPC;
+			goto out;
+		} /* zone =3D=3D DATA_ZONE && mft_zone_size > 0 */
+		ntfs_debug("Shrinking mft zone.");
+		zone_end =3D vol->mft_zone_end;
+		mft_zone_size >>=3D 1;
+		if (mft_zone_size > 0)
+			vol->mft_zone_end =3D vol->mft_zone_start + mft_zone_size;
+		else /* mft zone and data2 zone no longer exist. */
+			vol->data2_zone_pos =3D vol->mft_zone_start =3D
+					vol->mft_zone_end =3D 0;
+		if (vol->mft_zone_pos >=3D vol->mft_zone_end) {
+			vol->mft_zone_pos =3D vol->mft_lcn;
+			if (!vol->mft_zone_end)
+				vol->mft_zone_pos =3D 0;
+		}
+		bmp_pos =3D zone_start =3D bmp_initial_pos =3D
+				vol->data1_zone_pos =3D vol->mft_zone_end;
+		search_zone =3D 2;
+		pass =3D 2;
+		done_zones &=3D ~2;
+		ntfs_debug("After shrinking mft zone, mft_zone_size 0x%llx, vol->mft_zon=
e_start 0x%llx, vol->mft_zone_end 0x%llx, vol->mft_zone_pos 0x%llx, search_=
zone 2, pass 2, dones_zones 0x%x, zone_start 0x%llx, zone_end 0x%llx, vol->=
data1_zone_pos 0x%llx, continuing outer while loop.",
+				mft_zone_size, vol->mft_zone_start,
+				vol->mft_zone_end, vol->mft_zone_pos,
+				done_zones, zone_start, zone_end,
+				vol->data1_zone_pos);
+	}
+	ntfs_debug("After outer while loop.");
+out:
+	ntfs_debug("At out.");
+	/* Add runlist terminator element. */
+	if (likely(rl)) {
+		rl[rlpos].vcn =3D rl[rlpos - 1].vcn + rl[rlpos - 1].length;
+		rl[rlpos].lcn =3D is_extension ? LCN_ENOENT : LCN_RL_NOT_MAPPED;
+		rl[rlpos].length =3D 0;
+	}
+	if (likely(folio && !IS_ERR(folio))) {
+		if (need_writeback) {
+			ntfs_debug("Marking page dirty.");
+			flush_dcache_folio(folio);
+			folio_mark_dirty(folio);
+			need_writeback =3D 0;
+		}
+		folio_unlock(folio);
+		ntfs_unmap_folio(folio, buf);
+	}
+	if (likely(!err)) {
+		if (is_dealloc =3D=3D true)
+			ntfs_release_dirty_clusters(vol, rl->length);
+		up_write(&vol->lcnbmp_lock);
+		memalloc_nofs_restore(memalloc_flags);
+		ntfs_debug("Done.");
+		return rl =3D=3D NULL ? ERR_PTR(-EIO) : rl;
+	}
+	if (err !=3D -ENOSPC)
+		ntfs_error(vol->sb,
+			"Failed to allocate clusters, aborting (error %i).",
+			err);
+	if (rl) {
+		int err2;
+
+		if (err =3D=3D -ENOSPC)
+			ntfs_debug("Not enough space to complete allocation, err -ENOSPC, first=
 free lcn 0x%llx, could allocate up to 0x%llx clusters.",
+					rl[0].lcn, count - clusters);
+		/* Deallocate all allocated clusters. */
+		ntfs_debug("Attempting rollback...");
+		err2 =3D ntfs_cluster_free_from_rl_nolock(vol, rl);
+		if (err2) {
+			ntfs_error(vol->sb,
+				"Failed to rollback (error %i). Leaving inconsistent metadata! Unmount=
 and run chkdsk.",
+				err2);
+			NVolSetErrors(vol);
+		}
+		/* Free the runlist. */
+		ntfs_free(rl);
+	} else if (err =3D=3D -ENOSPC)
+		ntfs_debug("No space left at all, err =3D -ENOSPC, first free lcn =3D 0x=
%llx.",
+				vol->data1_zone_pos);
+	atomic64_set(&vol->dirty_clusters, 0);
+	up_write(&vol->lcnbmp_lock);
+	memalloc_nofs_restore(memalloc_flags);
+	return ERR_PTR(err);
+}
+
+/**
+ * __ntfs_cluster_free - free clusters on an ntfs volume
+ * @ni:		ntfs inode whose runlist describes the clusters to free
+ * @start_vcn:	vcn in the runlist of @ni at which to start freeing clusters
+ * @count:	number of clusters to free or -1 for all clusters
+ * @ctx:	active attribute search context if present or NULL if not
+ * @is_rollback:	true if this is a rollback operation
+ *
+ * Free @count clusters starting at the cluster @start_vcn in the runlist
+ * described by the vfs inode @ni.
+ *
+ * If @count is -1, all clusters from @start_vcn to the end of the runlist=
 are
+ * deallocated.  Thus, to completely free all clusters in a runlist, use
+ * @start_vcn =3D 0 and @count =3D -1.
+ *
+ * If @ctx is specified, it is an active search context of @ni and its bas=
e mft
+ * record.  This is needed when __ntfs_cluster_free() encounters unmapped
+ * runlist fragments and allows their mapping.  If you do not have the mft
+ * record mapped, you can specify @ctx as NULL and __ntfs_cluster_free() w=
ill
+ * perform the necessary mapping and unmapping.
+ *
+ * Note, __ntfs_cluster_free() saves the state of @ctx on entry and restor=
es it
+ * before returning.  Thus, @ctx will be left pointing to the same attribu=
te on
+ * return as on entry.  However, the actual pointers in @ctx may point to
+ * different memory locations on return, so you must remember to reset any
+ * cached pointers from the @ctx, i.e. after the call to __ntfs_cluster_fr=
ee(),
+ * you will probably want to do:
+ *	m =3D ctx->mrec;
+ *	a =3D ctx->attr;
+ * Assuming you cache ctx->attr in a variable @a of type attr_record * and=
 that
+ * you cache ctx->mrec in a variable @m of type struct mft_record *.
+ *
+ * @is_rollback should always be 'false', it is for internal use to rollba=
ck
+ * errors.  You probably want to use ntfs_cluster_free() instead.
+ *
+ * Note, __ntfs_cluster_free() does not modify the runlist, so you have to
+ * remove from the runlist or mark sparse the freed runs later.
+ *
+ * Return the number of deallocated clusters (not counting sparse ones) on
+ * success and -errno on error.
+ *
+ * WARNING: If @ctx is supplied, regardless of whether success or failure =
is
+ *	    returned, you need to check IS_ERR(@ctx->mrec) and if 'true' the @c=
tx
+ *	    is no longer valid, i.e. you need to either call
+ *	    ntfs_attr_reinit_search_ctx() or ntfs_attr_put_search_ctx() on it.
+ *	    In that case PTR_ERR(@ctx->mrec) will give you the error code for
+ *	    why the mapping of the old inode failed.
+ *
+ * Locking: - The runlist described by @ni must be locked for writing on e=
ntry
+ *	      and is locked on return.  Note the runlist may be modified when
+ *	      needed runlist fragments need to be mapped.
+ *	    - The volume lcn bitmap must be unlocked on entry and is unlocked
+ *	      on return.
+ *	    - This function takes the volume lcn bitmap lock for writing and
+ *	      modifies the bitmap contents.
+ *	    - If @ctx is NULL, the base mft record of @ni must not be mapped on
+ *	      entry and it will be left unmapped on return.
+ *	    - If @ctx is not NULL, the base mft record must be mapped on entry
+ *	      and it will be left mapped on return.
+ */
+s64 __ntfs_cluster_free(struct ntfs_inode *ni, const s64 start_vcn, s64 co=
unt,
+		struct ntfs_attr_search_ctx *ctx, const bool is_rollback)
+{
+	s64 delta, to_free, total_freed, real_freed;
+	struct ntfs_volume *vol;
+	struct inode *lcnbmp_vi;
+	struct runlist_element *rl;
+	int err;
+	unsigned int memalloc_flags;
+
+	ntfs_debug("Entering for i_ino 0x%lx, start_vcn 0x%llx, count 0x%llx.%s",
+			ni->mft_no, start_vcn, count,
+			is_rollback ? " (rollback)" : "");
+	vol =3D ni->vol;
+	lcnbmp_vi =3D vol->lcnbmp_ino;
+	if (start_vcn < 0 || count < -1)
+		return -EINVAL;
+
+	if (!NVolFreeClusterKnown(vol))
+		wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+
+	/*
+	 * Lock the lcn bitmap for writing but only if not rolling back.  We
+	 * must hold the lock all the way including through rollback otherwise
+	 * rollback is not possible because once we have cleared a bit and
+	 * dropped the lock, anyone could have set the bit again, thus
+	 * allocating the cluster for another use.
+	 */
+	if (likely(!is_rollback)) {
+		memalloc_flags =3D memalloc_nofs_save();
+		down_write(&vol->lcnbmp_lock);
+	}
+
+	total_freed =3D real_freed =3D 0;
+
+	rl =3D ntfs_attr_find_vcn_nolock(ni, start_vcn, ctx);
+	if (IS_ERR(rl)) {
+		err =3D PTR_ERR(rl);
+		if (err =3D=3D -ENOENT) {
+			if (likely(!is_rollback)) {
+				up_write(&vol->lcnbmp_lock);
+				memalloc_nofs_restore(memalloc_flags);
+			}
+			return 0;
+		}
+
+		if (!is_rollback)
+			ntfs_error(vol->sb,
+				"Failed to find first runlist element (error %d), aborting.",
+				err);
+		goto err_out;
+	}
+	if (unlikely(rl->lcn < LCN_HOLE)) {
+		if (!is_rollback)
+			ntfs_error(vol->sb, "First runlist element has invalid lcn, aborting.");
+		err =3D -EIO;
+		goto err_out;
+	}
+	/* Find the starting cluster inside the run that needs freeing. */
+	delta =3D start_vcn - rl->vcn;
+
+	/* The number of clusters in this run that need freeing. */
+	to_free =3D rl->length - delta;
+	if (count >=3D 0 && to_free > count)
+		to_free =3D count;
+
+	if (likely(rl->lcn >=3D 0)) {
+		/* Do the actual freeing of the clusters in this run. */
+		err =3D ntfs_bitmap_set_bits_in_run(lcnbmp_vi, rl->lcn + delta,
+				to_free, likely(!is_rollback) ? 0 : 1);
+		if (unlikely(err)) {
+			if (!is_rollback)
+				ntfs_error(vol->sb,
+					"Failed to clear first run (error %i), aborting.",
+					err);
+			goto err_out;
+		}
+		/* We have freed @to_free real clusters. */
+		real_freed =3D to_free;
+	}
+	/* Go to the next run and adjust the number of clusters left to free. */
+	++rl;
+	if (count >=3D 0)
+		count -=3D to_free;
+
+	/* Keep track of the total "freed" clusters, including sparse ones. */
+	total_freed =3D to_free;
+	/*
+	 * Loop over the remaining runs, using @count as a capping value, and
+	 * free them.
+	 */
+	for (; rl->length && count !=3D 0; ++rl) {
+		if (unlikely(rl->lcn < LCN_HOLE)) {
+			s64 vcn;
+
+			/* Attempt to map runlist. */
+			vcn =3D rl->vcn;
+			rl =3D ntfs_attr_find_vcn_nolock(ni, vcn, ctx);
+			if (IS_ERR(rl)) {
+				err =3D PTR_ERR(rl);
+				if (!is_rollback)
+					ntfs_error(vol->sb,
+						"Failed to map runlist fragment or failed to find subsequent runlist=
 element.");
+				goto err_out;
+			}
+			if (unlikely(rl->lcn < LCN_HOLE)) {
+				if (!is_rollback)
+					ntfs_error(vol->sb,
+						"Runlist element has invalid lcn (0x%llx).",
+						rl->lcn);
+				err =3D -EIO;
+				goto err_out;
+			}
+		}
+		/* The number of clusters in this run that need freeing. */
+		to_free =3D rl->length;
+		if (count >=3D 0 && to_free > count)
+			to_free =3D count;
+
+		if (likely(rl->lcn >=3D 0)) {
+			/* Do the actual freeing of the clusters in the run. */
+			err =3D ntfs_bitmap_set_bits_in_run(lcnbmp_vi, rl->lcn,
+					to_free, likely(!is_rollback) ? 0 : 1);
+			if (unlikely(err)) {
+				if (!is_rollback)
+					ntfs_error(vol->sb, "Failed to clear subsequent run.");
+				goto err_out;
+			}
+			/* We have freed @to_free real clusters. */
+			real_freed +=3D to_free;
+		}
+		/* Adjust the number of clusters left to free. */
+		if (count >=3D 0)
+			count -=3D to_free;
+
+		/* Update the total done clusters. */
+		total_freed +=3D to_free;
+	}
+	ntfs_inc_free_clusters(vol, real_freed);
+	if (likely(!is_rollback)) {
+		up_write(&vol->lcnbmp_lock);
+		memalloc_nofs_restore(memalloc_flags);
+	}
+
+	WARN_ON(count > 0);
+
+	if (NVolDiscard(vol) && !is_rollback) {
+		s64 total_discarded =3D 0, rl_off;
+		u32 gran =3D bdev_discard_granularity(vol->sb->s_bdev);
+
+		rl =3D ntfs_attr_find_vcn_nolock(ni, start_vcn, ctx);
+		if (IS_ERR(rl))
+			return real_freed;
+		rl_off =3D start_vcn - rl->vcn;
+		while (rl->length && total_discarded < total_freed) {
+			s64 to_discard =3D rl->length - rl_off;
+
+			if (to_discard + total_discarded > total_freed)
+				to_discard =3D total_freed - total_discarded;
+			if (rl->lcn >=3D 0) {
+				sector_t start_sector, end_sector;
+				int ret;
+
+				start_sector =3D ALIGN((rl->lcn + rl_off) << vol->cluster_size_bits,
+						     gran) >> SECTOR_SHIFT;
+				end_sector =3D ALIGN_DOWN((rl->lcn + rl_off + to_discard) <<
+							vol->cluster_size_bits, gran) >>
+							SECTOR_SHIFT;
+				if (start_sector < end_sector) {
+					ret =3D blkdev_issue_discard(vol->sb->s_bdev, start_sector,
+								   end_sector - start_sector,
+								   GFP_NOFS);
+					if (ret)
+						break;
+				}
+			}
+
+			total_discarded +=3D to_discard;
+			++rl;
+			rl_off =3D 0;
+		}
+	}
+
+	/* We are done.  Return the number of actually freed clusters. */
+	ntfs_debug("Done.");
+	return real_freed;
+err_out:
+	if (is_rollback)
+		return err;
+	/* If no real clusters were freed, no need to rollback. */
+	if (!real_freed) {
+		up_write(&vol->lcnbmp_lock);
+		memalloc_nofs_restore(memalloc_flags);
+		return err;
+	}
+	/*
+	 * Attempt to rollback and if that succeeds just return the error code.
+	 * If rollback fails, set the volume errors flag, emit an error
+	 * message, and return the error code.
+	 */
+	delta =3D __ntfs_cluster_free(ni, start_vcn, total_freed, ctx, true);
+	if (delta < 0) {
+		ntfs_error(vol->sb,
+			"Failed to rollback (error %i).  Leaving inconsistent metadata!  Unmoun=
t and run chkdsk.",
+			(int)delta);
+		NVolSetErrors(vol);
+	}
+	ntfs_dec_free_clusters(vol, delta);
+	up_write(&vol->lcnbmp_lock);
+	memalloc_nofs_restore(memalloc_flags);
+	ntfs_error(vol->sb, "Aborting (error %i).", err);
+	return err;
+}
diff --git a/fs/ntfsplus/runlist.c b/fs/ntfsplus/runlist.c
new file mode 100644
index 000000000000..3dcd797efcc8
--- /dev/null
+++ b/fs/ntfsplus/runlist.c
@@ -0,0 +1,1983 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS runlist handling code.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2007 Anton Altaparmakov
+ * Copyright (c) 2002-2005 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ *
+ * Part of this file is based on code from the NTFS-3G project.
+ * and is copyrighted by the respective authors below:
+ * Copyright (c) 2002-2005 Anton Altaparmakov
+ * Copyright (c) 2002-2005 Richard Russon
+ * Copyright (c) 2002-2008 Szabolcs Szakacsits
+ * Copyright (c) 2004 Yura Pakhuchiy
+ * Copyright (c) 2007-2022 Jean-Pierre Andre
+ */
+
+#include "misc.h"
+#include "ntfs.h"
+#include "attrib.h"
+
+/**
+ * ntfs_rl_mm - runlist memmove
+ *
+ * It is up to the caller to serialize access to the runlist @base.
+ */
+static inline void ntfs_rl_mm(struct runlist_element *base, int dst, int s=
rc, int size)
+{
+	if (likely((dst !=3D src) && (size > 0)))
+		memmove(base + dst, base + src, size * sizeof(*base));
+}
+
+/**
+ * ntfs_rl_mc - runlist memory copy
+ *
+ * It is up to the caller to serialize access to the runlists @dstbase and
+ * @srcbase.
+ */
+static inline void ntfs_rl_mc(struct runlist_element *dstbase, int dst,
+		struct runlist_element *srcbase, int src, int size)
+{
+	if (likely(size > 0))
+		memcpy(dstbase + dst, srcbase + src, size * sizeof(*dstbase));
+}
+
+/**
+ * ntfs_rl_realloc - Reallocate memory for runlists
+ * @rl:		original runlist
+ * @old_size:	number of runlist elements in the original runlist @rl
+ * @new_size:	number of runlist elements we need space for
+ *
+ * As the runlists grow, more memory will be required.  To prevent the
+ * kernel having to allocate and reallocate large numbers of small bits of
+ * memory, this function returns an entire page of memory.
+ *
+ * It is up to the caller to serialize access to the runlist @rl.
+ *
+ * N.B.  If the new allocation doesn't require a different number of pages=
 in
+ *       memory, the function will return the original pointer.
+ */
+struct runlist_element *ntfs_rl_realloc(struct runlist_element *rl,
+		int old_size, int new_size)
+{
+	struct runlist_element *new_rl;
+
+	old_size =3D PAGE_ALIGN(old_size * sizeof(*rl));
+	new_size =3D PAGE_ALIGN(new_size * sizeof(*rl));
+	if (old_size =3D=3D new_size)
+		return rl;
+
+	new_rl =3D ntfs_malloc_nofs(new_size);
+	if (unlikely(!new_rl))
+		return ERR_PTR(-ENOMEM);
+
+	if (likely(rl !=3D NULL)) {
+		if (unlikely(old_size > new_size))
+			old_size =3D new_size;
+		memcpy(new_rl, rl, old_size);
+		ntfs_free(rl);
+	}
+	return new_rl;
+}
+
+/**
+ * ntfs_rl_realloc_nofail - Reallocate memory for runlists
+ * @rl:		original runlist
+ * @old_size:	number of runlist elements in the original runlist @rl
+ * @new_size:	number of runlist elements we need space for
+ *
+ * As the runlists grow, more memory will be required.  To prevent the
+ * kernel having to allocate and reallocate large numbers of small bits of
+ * memory, this function returns an entire page of memory.
+ *
+ * This function guarantees that the allocation will succeed.  It will sle=
ep
+ * for as long as it takes to complete the allocation.
+ *
+ * It is up to the caller to serialize access to the runlist @rl.
+ *
+ * N.B.  If the new allocation doesn't require a different number of pages=
 in
+ *       memory, the function will return the original pointer.
+ */
+static inline struct runlist_element *ntfs_rl_realloc_nofail(struct runlis=
t_element *rl,
+		int old_size, int new_size)
+{
+	struct runlist_element *new_rl;
+
+	old_size =3D PAGE_ALIGN(old_size * sizeof(*rl));
+	new_size =3D PAGE_ALIGN(new_size * sizeof(*rl));
+	if (old_size =3D=3D new_size)
+		return rl;
+
+	new_rl =3D ntfs_malloc_nofs_nofail(new_size);
+
+	if (likely(rl !=3D NULL)) {
+		if (unlikely(old_size > new_size))
+			old_size =3D new_size;
+		memcpy(new_rl, rl, old_size);
+		ntfs_free(rl);
+	}
+	return new_rl;
+}
+
+/**
+ * ntfs_are_rl_mergeable - test if two runlists can be joined together
+ * @dst:	original runlist
+ * @src:	new runlist to test for mergeability with @dst
+ *
+ * Test if two runlists can be joined together. For this, their VCNs and L=
CNs
+ * must be adjacent.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @sr=
c.
+ *
+ * Return: true   Success, the runlists can be merged.
+ *	   false  Failure, the runlists cannot be merged.
+ */
+static inline bool ntfs_are_rl_mergeable(struct runlist_element *dst,
+		struct runlist_element *src)
+{
+	/* We can merge unmapped regions even if they are misaligned. */
+	if ((dst->lcn =3D=3D LCN_RL_NOT_MAPPED) && (src->lcn =3D=3D LCN_RL_NOT_MA=
PPED))
+		return true;
+	/* If the runs are misaligned, we cannot merge them. */
+	if ((dst->vcn + dst->length) !=3D src->vcn)
+		return false;
+	/* If both runs are non-sparse and contiguous, we can merge them. */
+	if ((dst->lcn >=3D 0) && (src->lcn >=3D 0) &&
+			((dst->lcn + dst->length) =3D=3D src->lcn))
+		return true;
+	/* If we are merging two holes, we can merge them. */
+	if ((dst->lcn =3D=3D LCN_HOLE) && (src->lcn =3D=3D LCN_HOLE))
+		return true;
+	/* If we are merging two dealloc, we can merge them. */
+	if ((dst->lcn =3D=3D LCN_DELALLOC) && (src->lcn =3D=3D LCN_DELALLOC))
+		return true;
+	/* Cannot merge. */
+	return false;
+}
+
+/**
+ * __ntfs_rl_merge - merge two runlists without testing if they can be mer=
ged
+ * @dst:	original, destination runlist
+ * @src:	new runlist to merge with @dst
+ *
+ * Merge the two runlists, writing into the destination runlist @dst. The
+ * caller must make sure the runlists can be merged or this will corrupt t=
he
+ * destination runlist.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @sr=
c.
+ */
+static inline void __ntfs_rl_merge(struct runlist_element *dst, struct run=
list_element *src)
+{
+	dst->length +=3D src->length;
+}
+
+/**
+ * ntfs_rl_append - append a runlist after a given element
+ *
+ * Append the runlist @src after element @loc in @dst.  Merge the right en=
d of
+ * the new runlist, if necessary. Adjust the size of the hole before the
+ * appended runlist.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @sr=
c.
+ *
+ * On success, return a pointer to the new, combined, runlist. Note, both
+ * runlists @dst and @src are deallocated before returning so you cannot u=
se
+ * the pointers for anything any more. (Strictly speaking the returned run=
list
+ * may be the same as @dst but this is irrelevant.)
+ */
+static inline struct runlist_element *ntfs_rl_append(struct runlist_elemen=
t *dst,
+		int dsize, struct runlist_element *src, int ssize, int loc,
+		size_t *new_size)
+{
+	bool right =3D false;	/* Right end of @src needs merging. */
+	int marker;		/* End of the inserted runs. */
+
+	/* First, check if the right hand end needs merging. */
+	if ((loc + 1) < dsize)
+		right =3D ntfs_are_rl_mergeable(src + ssize - 1, dst + loc + 1);
+
+	/* Space required: @dst size + @src size, less one if we merged. */
+	dst =3D ntfs_rl_realloc(dst, dsize, dsize + ssize - right);
+	if (IS_ERR(dst))
+		return dst;
+
+	*new_size =3D dsize + ssize - right;
+	/*
+	 * We are guaranteed to succeed from here so can start modifying the
+	 * original runlists.
+	 */
+
+	/* First, merge the right hand end, if necessary. */
+	if (right)
+		__ntfs_rl_merge(src + ssize - 1, dst + loc + 1);
+
+	/* First run after the @src runs that have been inserted. */
+	marker =3D loc + ssize + 1;
+
+	/* Move the tail of @dst out of the way, then copy in @src. */
+	ntfs_rl_mm(dst, marker, loc + 1 + right, dsize - (loc + 1 + right));
+	ntfs_rl_mc(dst, loc + 1, src, 0, ssize);
+
+	/* Adjust the size of the preceding hole. */
+	dst[loc].length =3D dst[loc + 1].vcn - dst[loc].vcn;
+
+	/* We may have changed the length of the file, so fix the end marker */
+	if (dst[marker].lcn =3D=3D LCN_ENOENT)
+		dst[marker].vcn =3D dst[marker - 1].vcn + dst[marker - 1].length;
+
+	return dst;
+}
+
+/**
+ * ntfs_rl_insert - insert a runlist into another
+ *
+ * Insert the runlist @src before element @loc in the runlist @dst. Merge =
the
+ * left end of the new runlist, if necessary. Adjust the size of the hole
+ * after the inserted runlist.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @sr=
c.
+ *
+ * On success, return a pointer to the new, combined, runlist. Note, both
+ * runlists @dst and @src are deallocated before returning so you cannot u=
se
+ * the pointers for anything any more. (Strictly speaking the returned run=
list
+ * may be the same as @dst but this is irrelevant.)
+ */
+static inline struct runlist_element *ntfs_rl_insert(struct runlist_elemen=
t *dst,
+		int dsize, struct runlist_element *src, int ssize, int loc,
+		size_t *new_size)
+{
+	bool left =3D false;	/* Left end of @src needs merging. */
+	bool disc =3D false;	/* Discontinuity between @dst and @src. */
+	int marker;		/* End of the inserted runs. */
+
+	/*
+	 * disc =3D> Discontinuity between the end of @dst and the start of @src.
+	 *	   This means we might need to insert a "not mapped" run.
+	 */
+	if (loc =3D=3D 0)
+		disc =3D (src[0].vcn > 0);
+	else {
+		s64 merged_length;
+
+		left =3D ntfs_are_rl_mergeable(dst + loc - 1, src);
+
+		merged_length =3D dst[loc - 1].length;
+		if (left)
+			merged_length +=3D src->length;
+
+		disc =3D (src[0].vcn > dst[loc - 1].vcn + merged_length);
+	}
+	/*
+	 * Space required: @dst size + @src size, less one if we merged, plus
+	 * one if there was a discontinuity.
+	 */
+	dst =3D ntfs_rl_realloc(dst, dsize, dsize + ssize - left + disc);
+	if (IS_ERR(dst))
+		return dst;
+
+	*new_size =3D dsize + ssize - left + disc;
+	/*
+	 * We are guaranteed to succeed from here so can start modifying the
+	 * original runlist.
+	 */
+	if (left)
+		__ntfs_rl_merge(dst + loc - 1, src);
+	/*
+	 * First run after the @src runs that have been inserted.
+	 * Nominally,  @marker equals @loc + @ssize, i.e. location + number of
+	 * runs in @src.  However, if @left, then the first run in @src has
+	 * been merged with one in @dst.  And if @disc, then @dst and @src do
+	 * not meet and we need an extra run to fill the gap.
+	 */
+	marker =3D loc + ssize - left + disc;
+
+	/* Move the tail of @dst out of the way, then copy in @src. */
+	ntfs_rl_mm(dst, marker, loc, dsize - loc);
+	ntfs_rl_mc(dst, loc + disc, src, left, ssize - left);
+
+	/* Adjust the VCN of the first run after the insertion... */
+	dst[marker].vcn =3D dst[marker - 1].vcn + dst[marker - 1].length;
+	/* ... and the length. */
+	if (dst[marker].lcn =3D=3D LCN_HOLE || dst[marker].lcn =3D=3D LCN_RL_NOT_=
MAPPED ||
+	    dst[marker].lcn =3D=3D LCN_DELALLOC)
+		dst[marker].length =3D dst[marker + 1].vcn - dst[marker].vcn;
+
+	/* Writing beyond the end of the file and there is a discontinuity. */
+	if (disc) {
+		if (loc > 0) {
+			dst[loc].vcn =3D dst[loc - 1].vcn + dst[loc - 1].length;
+			dst[loc].length =3D dst[loc + 1].vcn - dst[loc].vcn;
+		} else {
+			dst[loc].vcn =3D 0;
+			dst[loc].length =3D dst[loc + 1].vcn;
+		}
+		dst[loc].lcn =3D LCN_RL_NOT_MAPPED;
+	}
+	return dst;
+}
+
+/**
+ * ntfs_rl_replace - overwrite a runlist element with another runlist
+ *
+ * Replace the runlist element @dst at @loc with @src. Merge the left and
+ * right ends of the inserted runlist, if necessary.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @sr=
c.
+ *
+ * On success, return a pointer to the new, combined, runlist. Note, both
+ * runlists @dst and @src are deallocated before returning so you cannot u=
se
+ * the pointers for anything any more. (Strictly speaking the returned run=
list
+ * may be the same as @dst but this is irrelevant.)
+ */
+static inline struct runlist_element *ntfs_rl_replace(struct runlist_eleme=
nt *dst,
+		int dsize, struct runlist_element *src, int ssize, int loc,
+		size_t *new_size)
+{
+	int delta;
+	bool left =3D false;	/* Left end of @src needs merging. */
+	bool right =3D false;	/* Right end of @src needs merging. */
+	int tail;		/* Start of tail of @dst. */
+	int marker;		/* End of the inserted runs. */
+
+	/* First, see if the left and right ends need merging. */
+	if ((loc + 1) < dsize)
+		right =3D ntfs_are_rl_mergeable(src + ssize - 1, dst + loc + 1);
+	if (loc > 0)
+		left =3D ntfs_are_rl_mergeable(dst + loc - 1, src);
+	/*
+	 * Allocate some space.  We will need less if the left, right, or both
+	 * ends get merged.  The -1 accounts for the run being replaced.
+	 */
+	delta =3D ssize - 1 - left - right;
+	if (delta > 0) {
+		dst =3D ntfs_rl_realloc(dst, dsize, dsize + delta);
+		if (IS_ERR(dst))
+			return dst;
+	}
+
+	*new_size =3D dsize + delta;
+	/*
+	 * We are guaranteed to succeed from here so can start modifying the
+	 * original runlists.
+	 */
+
+	/* First, merge the left and right ends, if necessary. */
+	if (right)
+		__ntfs_rl_merge(src + ssize - 1, dst + loc + 1);
+	if (left)
+		__ntfs_rl_merge(dst + loc - 1, src);
+	/*
+	 * Offset of the tail of @dst.  This needs to be moved out of the way
+	 * to make space for the runs to be copied from @src, i.e. the first
+	 * run of the tail of @dst.
+	 * Nominally, @tail equals @loc + 1, i.e. location, skipping the
+	 * replaced run.  However, if @right, then one of @dst's runs is
+	 * already merged into @src.
+	 */
+	tail =3D loc + right + 1;
+	/*
+	 * First run after the @src runs that have been inserted, i.e. where
+	 * the tail of @dst needs to be moved to.
+	 * Nominally, @marker equals @loc + @ssize, i.e. location + number of
+	 * runs in @src.  However, if @left, then the first run in @src has
+	 * been merged with one in @dst.
+	 */
+	marker =3D loc + ssize - left;
+
+	/* Move the tail of @dst out of the way, then copy in @src. */
+	ntfs_rl_mm(dst, marker, tail, dsize - tail);
+	ntfs_rl_mc(dst, loc, src, left, ssize - left);
+
+	/* We may have changed the length of the file, so fix the end marker. */
+	if (dsize - tail > 0 && dst[marker].lcn =3D=3D LCN_ENOENT)
+		dst[marker].vcn =3D dst[marker - 1].vcn + dst[marker - 1].length;
+	return dst;
+}
+
+/**
+ * ntfs_rl_split - insert a runlist into the centre of a hole
+ *
+ * Split the runlist @dst at @loc into two and insert @new in between the =
two
+ * fragments. No merging of runlists is necessary. Adjust the size of the
+ * holes either side.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @sr=
c.
+ *
+ * On success, return a pointer to the new, combined, runlist. Note, both
+ * runlists @dst and @src are deallocated before returning so you cannot u=
se
+ * the pointers for anything any more. (Strictly speaking the returned run=
list
+ * may be the same as @dst but this is irrelevant.)
+ */
+static inline struct runlist_element *ntfs_rl_split(struct runlist_element=
 *dst, int dsize,
+		struct runlist_element *src, int ssize, int loc,
+		size_t *new_size)
+{
+	/* Space required: @dst size + @src size + one new hole. */
+	dst =3D ntfs_rl_realloc(dst, dsize, dsize + ssize + 1);
+	if (IS_ERR(dst))
+		return dst;
+
+	*new_size =3D dsize + ssize + 1;
+	/*
+	 * We are guaranteed to succeed from here so can start modifying the
+	 * original runlists.
+	 */
+
+	/* Move the tail of @dst out of the way, then copy in @src. */
+	ntfs_rl_mm(dst, loc + 1 + ssize, loc, dsize - loc);
+	ntfs_rl_mc(dst, loc + 1, src, 0, ssize);
+
+	/* Adjust the size of the holes either size of @src. */
+	dst[loc].length		=3D dst[loc+1].vcn       - dst[loc].vcn;
+	dst[loc+ssize+1].vcn    =3D dst[loc+ssize].vcn   + dst[loc+ssize].length;
+	dst[loc+ssize+1].length =3D dst[loc+ssize+2].vcn - dst[loc+ssize+1].vcn;
+
+	return dst;
+}
+
+/**
+ * ntfs_runlists_merge - merge two runlists into one
+ *
+ * First we sanity check the two runlists @srl and @drl to make sure that =
they
+ * are sensible and can be merged. The runlist @srl must be either after t=
he
+ * runlist @drl or completely within a hole (or unmapped region) in @drl.
+ *
+ * It is up to the caller to serialize access to the runlists @drl and @sr=
l.
+ *
+ * Merging of runlists is necessary in two cases:
+ *   1. When attribute lists are used and a further extent is being mapped.
+ *   2. When new clusters are allocated to fill a hole or extend a file.
+ *
+ * There are four possible ways @srl can be merged. It can:
+ *	- be inserted at the beginning of a hole,
+ *	- split the hole in two and be inserted between the two fragments,
+ *	- be appended at the end of a hole, or it can
+ *	- replace the whole hole.
+ * It can also be appended to the end of the runlist, which is just a vari=
ant
+ * of the insert case.
+ *
+ * On success, return a pointer to the new, combined, runlist. Note, both
+ * runlists @drl and @srl are deallocated before returning so you cannot u=
se
+ * the pointers for anything any more. (Strictly speaking the returned run=
list
+ * may be the same as @dst but this is irrelevant.)
+ */
+struct runlist_element *ntfs_runlists_merge(struct runlist *d_runlist,
+				     struct runlist_element *srl, size_t s_rl_count,
+				     size_t *new_rl_count)
+{
+	int di, si;		/* Current index into @[ds]rl. */
+	int sstart;		/* First index with lcn > LCN_RL_NOT_MAPPED. */
+	int dins;		/* Index into @drl at which to insert @srl. */
+	int dend, send;		/* Last index into @[ds]rl. */
+	int dfinal, sfinal;	/* The last index into @[ds]rl with lcn >=3D LCN_HOLE=
. */
+	int marker =3D 0;
+	s64 marker_vcn =3D 0;
+	struct runlist_element *drl =3D d_runlist->rl, *rl;
+
+#ifdef DEBUG
+	ntfs_debug("dst:");
+	ntfs_debug_dump_runlist(drl);
+	ntfs_debug("src:");
+	ntfs_debug_dump_runlist(srl);
+#endif
+
+	/* Check for silly calling... */
+	if (unlikely(!srl))
+		return drl;
+	if (IS_ERR(srl) || IS_ERR(drl))
+		return ERR_PTR(-EINVAL);
+
+	if (s_rl_count =3D=3D 0) {
+		for (; srl[s_rl_count].length; s_rl_count++)
+			;
+		s_rl_count++;
+	}
+
+	/* Check for the case where the first mapping is being done now. */
+	if (unlikely(!drl)) {
+		drl =3D srl;
+		/* Complete the source runlist if necessary. */
+		if (unlikely(drl[0].vcn)) {
+			/* Scan to the end of the source runlist. */
+			drl =3D ntfs_rl_realloc(drl, s_rl_count, s_rl_count + 1);
+			if (IS_ERR(drl))
+				return drl;
+			/* Insert start element at the front of the runlist. */
+			ntfs_rl_mm(drl, 1, 0, s_rl_count);
+			drl[0].vcn =3D 0;
+			drl[0].lcn =3D LCN_RL_NOT_MAPPED;
+			drl[0].length =3D drl[1].vcn;
+			s_rl_count++;
+		}
+
+		*new_rl_count =3D s_rl_count;
+		goto finished;
+	}
+
+	if (d_runlist->count < 1 || s_rl_count < 2)
+		return ERR_PTR(-EINVAL);
+
+	si =3D di =3D 0;
+
+	/* Skip any unmapped start element(s) in the source runlist. */
+	while (srl[si].length && srl[si].lcn < LCN_HOLE)
+		si++;
+
+	/* Can't have an entirely unmapped source runlist. */
+	WARN_ON(!srl[si].length);
+
+	/* Record the starting points. */
+	sstart =3D si;
+
+	/*
+	 * Skip forward in @drl until we reach the position where @srl needs to
+	 * be inserted. If we reach the end of @drl, @srl just needs to be
+	 * appended to @drl.
+	 */
+	rl =3D __ntfs_attr_find_vcn_nolock(d_runlist, srl[sstart].vcn);
+	if (IS_ERR(rl))
+		di =3D (int)d_runlist->count - 1;
+	else
+		di =3D (int)(rl - d_runlist->rl);
+	dins =3D di;
+
+	/* Sanity check for illegal overlaps. */
+	if ((drl[di].vcn =3D=3D srl[si].vcn) && (drl[di].lcn >=3D 0) &&
+			(srl[si].lcn >=3D 0)) {
+		ntfs_error(NULL, "Run lists overlap. Cannot merge!");
+		return ERR_PTR(-ERANGE);
+	}
+
+	/* Scan to the end of both runlists in order to know their sizes. */
+	send =3D (int)s_rl_count - 1;
+	dend =3D (int)d_runlist->count - 1;
+
+	if (srl[send].lcn =3D=3D LCN_ENOENT)
+		marker_vcn =3D srl[marker =3D send].vcn;
+
+	/* Scan to the last element with lcn >=3D LCN_HOLE. */
+	for (sfinal =3D send; sfinal >=3D 0 && srl[sfinal].lcn < LCN_HOLE; sfinal=
--)
+		;
+	for (dfinal =3D dend; dfinal >=3D 0 && drl[dfinal].lcn < LCN_HOLE; dfinal=
--)
+		;
+
+	{
+	bool start;
+	bool finish;
+	int ds =3D dend + 1;		/* Number of elements in drl & srl */
+	int ss =3D sfinal - sstart + 1;
+
+	start  =3D ((drl[dins].lcn <  LCN_RL_NOT_MAPPED) ||    /* End of file   */
+		  (drl[dins].vcn =3D=3D srl[sstart].vcn));	     /* Start of hole */
+	finish =3D ((drl[dins].lcn >=3D LCN_RL_NOT_MAPPED) &&    /* End of file  =
 */
+		 ((drl[dins].vcn + drl[dins].length) <=3D      /* End of hole   */
+		  (srl[send - 1].vcn + srl[send - 1].length)));
+
+	/* Or we will lose an end marker. */
+	if (finish && !drl[dins].length)
+		ss++;
+	if (marker && (drl[dins].vcn + drl[dins].length > srl[send - 1].vcn))
+		finish =3D false;
+
+	if (start) {
+		if (finish)
+			drl =3D ntfs_rl_replace(drl, ds, srl + sstart, ss, dins, new_rl_count);
+		else
+			drl =3D ntfs_rl_insert(drl, ds, srl + sstart, ss, dins, new_rl_count);
+	} else {
+		if (finish)
+			drl =3D ntfs_rl_append(drl, ds, srl + sstart, ss, dins, new_rl_count);
+		else
+			drl =3D ntfs_rl_split(drl, ds, srl + sstart, ss, dins, new_rl_count);
+	}
+	if (IS_ERR(drl)) {
+		ntfs_error(NULL, "Merge failed.");
+		return drl;
+	}
+	ntfs_free(srl);
+	if (marker) {
+		ntfs_debug("Triggering marker code.");
+		for (ds =3D dend; drl[ds].length; ds++)
+			;
+		/* We only need to care if @srl ended after @drl. */
+		if (drl[ds].vcn <=3D marker_vcn) {
+			int slots =3D 0;
+
+			if (drl[ds].vcn =3D=3D marker_vcn) {
+				ntfs_debug("Old marker =3D 0x%llx, replacing with LCN_ENOENT.",
+						drl[ds].lcn);
+				drl[ds].lcn =3D LCN_ENOENT;
+				goto finished;
+			}
+			/*
+			 * We need to create an unmapped runlist element in
+			 * @drl or extend an existing one before adding the
+			 * ENOENT terminator.
+			 */
+			if (drl[ds].lcn =3D=3D LCN_ENOENT) {
+				ds--;
+				slots =3D 1;
+			}
+			if (drl[ds].lcn !=3D LCN_RL_NOT_MAPPED) {
+				/* Add an unmapped runlist element. */
+				if (!slots) {
+					drl =3D ntfs_rl_realloc_nofail(drl, ds,
+							ds + 2);
+					slots =3D 2;
+					*new_rl_count +=3D 2;
+				}
+				ds++;
+				/* Need to set vcn if it isn't set already. */
+				if (slots !=3D 1)
+					drl[ds].vcn =3D drl[ds - 1].vcn +
+							drl[ds - 1].length;
+				drl[ds].lcn =3D LCN_RL_NOT_MAPPED;
+				/* We now used up a slot. */
+				slots--;
+			}
+			drl[ds].length =3D marker_vcn - drl[ds].vcn;
+			/* Finally add the ENOENT terminator. */
+			ds++;
+			if (!slots) {
+				drl =3D ntfs_rl_realloc_nofail(drl, ds, ds + 1);
+				*new_rl_count +=3D 1;
+			}
+			drl[ds].vcn =3D marker_vcn;
+			drl[ds].lcn =3D LCN_ENOENT;
+			drl[ds].length =3D (s64)0;
+		}
+	}
+	}
+
+finished:
+	/* The merge was completed successfully. */
+	ntfs_debug("Merged runlist:");
+	ntfs_debug_dump_runlist(drl);
+	return drl;
+}
+
+/**
+ * ntfs_mapping_pairs_decompress - convert mapping pairs array to runlist
+ *
+ * It is up to the caller to serialize access to the runlist @old_rl.
+ *
+ * Decompress the attribute @attr's mapping pairs array into a runlist. On
+ * success, return the decompressed runlist.
+ *
+ * If @old_rl is not NULL, decompressed runlist is inserted into the
+ * appropriate place in @old_rl and the resultant, combined runlist is
+ * returned. The original @old_rl is deallocated.
+ *
+ * On error, return -errno. @old_rl is left unmodified in that case.
+ */
+struct runlist_element *ntfs_mapping_pairs_decompress(const struct ntfs_vo=
lume *vol,
+		const struct attr_record *attr, struct runlist *old_runlist,
+		size_t *new_rl_count)
+{
+	s64 vcn;		/* Current vcn. */
+	s64 lcn;		/* Current lcn. */
+	s64 deltaxcn;		/* Change in [vl]cn. */
+	struct runlist_element *rl, *new_rl;	/* The output runlist. */
+	u8 *buf;		/* Current position in mapping pairs array. */
+	u8 *attr_end;		/* End of attribute. */
+	int rlsize;		/* Size of runlist buffer. */
+	u16 rlpos;		/* Current runlist position in units of struct runlist_elemen=
ts. */
+	u8 b;			/* Current byte offset in buf. */
+
+#ifdef DEBUG
+	/* Make sure attr exists and is non-resident. */
+	if (!attr || !attr->non_resident ||
+	    le64_to_cpu(attr->data.non_resident.lowest_vcn) < 0) {
+		ntfs_error(vol->sb, "Invalid arguments.");
+		return ERR_PTR(-EINVAL);
+	}
+#endif
+	/* Start at vcn =3D lowest_vcn and lcn 0. */
+	vcn =3D le64_to_cpu(attr->data.non_resident.lowest_vcn);
+	lcn =3D 0;
+	/* Get start of the mapping pairs array. */
+	buf =3D (u8 *)attr +
+		le16_to_cpu(attr->data.non_resident.mapping_pairs_offset);
+	attr_end =3D (u8 *)attr + le32_to_cpu(attr->length);
+	if (unlikely(buf < (u8 *)attr || buf > attr_end)) {
+		ntfs_error(vol->sb, "Corrupt attribute.");
+		return ERR_PTR(-EIO);
+	}
+
+	/* Current position in runlist array. */
+	rlpos =3D 0;
+	/* Allocate first page and set current runlist size to one page. */
+	rl =3D ntfs_malloc_nofs(rlsize =3D PAGE_SIZE);
+	if (unlikely(!rl))
+		return ERR_PTR(-ENOMEM);
+	/* Insert unmapped starting element if necessary. */
+	if (vcn) {
+		rl->vcn =3D 0;
+		rl->lcn =3D LCN_RL_NOT_MAPPED;
+		rl->length =3D vcn;
+		rlpos++;
+	}
+	while (buf < attr_end && *buf) {
+		/*
+		 * Allocate more memory if needed, including space for the
+		 * not-mapped and terminator elements. ntfs_malloc_nofs()
+		 * operates on whole pages only.
+		 */
+		if (((rlpos + 3) * sizeof(*rl)) > rlsize) {
+			struct runlist_element *rl2;
+
+			rl2 =3D ntfs_malloc_nofs(rlsize + (int)PAGE_SIZE);
+			if (unlikely(!rl2)) {
+				ntfs_free(rl);
+				return ERR_PTR(-ENOMEM);
+			}
+			memcpy(rl2, rl, rlsize);
+			ntfs_free(rl);
+			rl =3D rl2;
+			rlsize +=3D PAGE_SIZE;
+		}
+		/* Enter the current vcn into the current runlist element. */
+		rl[rlpos].vcn =3D vcn;
+		/*
+		 * Get the change in vcn, i.e. the run length in clusters.
+		 * Doing it this way ensures that we signextend negative values.
+		 * A negative run length doesn't make any sense, but hey, I
+		 * didn't make up the NTFS specs and Windows NT4 treats the run
+		 * length as a signed value so that's how it is...
+		 */
+		b =3D *buf & 0xf;
+		if (b) {
+			if (unlikely(buf + b > attr_end))
+				goto io_error;
+			for (deltaxcn =3D (s8)buf[b--]; b; b--)
+				deltaxcn =3D (deltaxcn << 8) + buf[b];
+		} else { /* The length entry is compulsory. */
+			ntfs_error(vol->sb, "Missing length entry in mapping pairs array.");
+			deltaxcn =3D (s64)-1;
+		}
+		/*
+		 * Assume a negative length to indicate data corruption and
+		 * hence clean-up and return NULL.
+		 */
+		if (unlikely(deltaxcn < 0)) {
+			ntfs_error(vol->sb, "Invalid length in mapping pairs array.");
+			goto err_out;
+		}
+		/*
+		 * Enter the current run length into the current runlist
+		 * element.
+		 */
+		rl[rlpos].length =3D deltaxcn;
+		/* Increment the current vcn by the current run length. */
+		vcn +=3D deltaxcn;
+		/*
+		 * There might be no lcn change at all, as is the case for
+		 * sparse clusters on NTFS 3.0+, in which case we set the lcn
+		 * to LCN_HOLE.
+		 */
+		if (!(*buf & 0xf0))
+			rl[rlpos].lcn =3D LCN_HOLE;
+		else {
+			/* Get the lcn change which really can be negative. */
+			u8 b2 =3D *buf & 0xf;
+
+			b =3D b2 + ((*buf >> 4) & 0xf);
+			if (buf + b > attr_end)
+				goto io_error;
+			for (deltaxcn =3D (s8)buf[b--]; b > b2; b--)
+				deltaxcn =3D (deltaxcn << 8) + buf[b];
+			/* Change the current lcn to its new value. */
+			lcn +=3D deltaxcn;
+#ifdef DEBUG
+			/*
+			 * On NTFS 1.2-, apparently can have lcn =3D=3D -1 to
+			 * indicate a hole. But we haven't verified ourselves
+			 * whether it is really the lcn or the deltaxcn that is
+			 * -1. So if either is found give us a message so we
+			 * can investigate it further!
+			 */
+			if (vol->major_ver < 3) {
+				if (unlikely(deltaxcn =3D=3D -1))
+					ntfs_error(vol->sb, "lcn delta =3D=3D -1");
+				if (unlikely(lcn =3D=3D -1))
+					ntfs_error(vol->sb, "lcn =3D=3D -1");
+			}
+#endif
+			/* Check lcn is not below -1. */
+			if (unlikely(lcn < -1)) {
+				ntfs_error(vol->sb, "Invalid s64 < -1 in mapping pairs array.");
+				goto err_out;
+			}
+
+			/* chkdsk accepts zero-sized runs only for holes */
+			if ((lcn !=3D -1) && !rl[rlpos].length) {
+				ntfs_error(vol->sb, "Invalid zero-sized data run.\n");
+				goto err_out;
+			}
+
+			/* Enter the current lcn into the runlist element. */
+			rl[rlpos].lcn =3D lcn;
+		}
+		/* Get to the next runlist element, skipping zero-sized holes */
+		if (rl[rlpos].length)
+			rlpos++;
+		/* Increment the buffer position to the next mapping pair. */
+		buf +=3D (*buf & 0xf) + ((*buf >> 4) & 0xf) + 1;
+	}
+	if (unlikely(buf >=3D attr_end))
+		goto io_error;
+	/*
+	 * If there is a highest_vcn specified, it must be equal to the final
+	 * vcn in the runlist - 1, or something has gone badly wrong.
+	 */
+	deltaxcn =3D le64_to_cpu(attr->data.non_resident.highest_vcn);
+	if (unlikely(deltaxcn && vcn - 1 !=3D deltaxcn)) {
+mpa_err:
+		ntfs_error(vol->sb, "Corrupt mapping pairs array in non-resident attribu=
te.");
+		goto err_out;
+	}
+	/* Setup not mapped runlist element if this is the base extent. */
+	if (!attr->data.non_resident.lowest_vcn) {
+		s64 max_cluster;
+
+		max_cluster =3D ((le64_to_cpu(attr->data.non_resident.allocated_size) +
+				vol->cluster_size - 1) >>
+				vol->cluster_size_bits) - 1;
+		/*
+		 * A highest_vcn of zero means this is a single extent
+		 * attribute so simply terminate the runlist with LCN_ENOENT).
+		 */
+		if (deltaxcn) {
+			/*
+			 * If there is a difference between the highest_vcn and
+			 * the highest cluster, the runlist is either corrupt
+			 * or, more likely, there are more extents following
+			 * this one.
+			 */
+			if (deltaxcn < max_cluster) {
+				ntfs_debug("More extents to follow; deltaxcn =3D 0x%llx, max_cluster =
=3D 0x%llx",
+						deltaxcn, max_cluster);
+				rl[rlpos].vcn =3D vcn;
+				vcn +=3D rl[rlpos].length =3D max_cluster -
+						deltaxcn;
+				rl[rlpos].lcn =3D LCN_RL_NOT_MAPPED;
+				rlpos++;
+			} else if (unlikely(deltaxcn > max_cluster)) {
+				ntfs_error(vol->sb,
+					   "Corrupt attribute. deltaxcn =3D 0x%llx, max_cluster =3D 0x%llx",
+					   deltaxcn, max_cluster);
+				goto mpa_err;
+			}
+		}
+		rl[rlpos].lcn =3D LCN_ENOENT;
+	} else /* Not the base extent. There may be more extents to follow. */
+		rl[rlpos].lcn =3D LCN_RL_NOT_MAPPED;
+
+	/* Setup terminating runlist element. */
+	rl[rlpos].vcn =3D vcn;
+	rl[rlpos].length =3D (s64)0;
+	/* If no existing runlist was specified, we are done. */
+	if (!old_runlist || !old_runlist->rl) {
+		*new_rl_count =3D rlpos + 1;
+		ntfs_debug("Mapping pairs array successfully decompressed:");
+		ntfs_debug_dump_runlist(rl);
+		return rl;
+	}
+	/* Now combine the new and old runlists checking for overlaps. */
+	new_rl =3D ntfs_runlists_merge(old_runlist, rl, rlpos + 1, new_rl_count);
+	if (!IS_ERR(new_rl))
+		return new_rl;
+	ntfs_free(rl);
+	ntfs_error(vol->sb, "Failed to merge runlists.");
+	return new_rl;
+io_error:
+	ntfs_error(vol->sb, "Corrupt attribute.");
+err_out:
+	ntfs_free(rl);
+	return ERR_PTR(-EIO);
+}
+
+/**
+ * ntfs_rl_vcn_to_lcn - convert a vcn into a lcn given a runlist
+ * @rl:		runlist to use for conversion
+ * @vcn:	vcn to convert
+ *
+ * Convert the virtual cluster number @vcn of an attribute into a logical
+ * cluster number (lcn) of a device using the runlist @rl to map vcns to t=
heir
+ * corresponding lcns.
+ *
+ * It is up to the caller to serialize access to the runlist @rl.
+ *
+ * Since lcns must be >=3D 0, we use negative return codes with special me=
aning:
+ *
+ * Return code		Meaning / Description
+ * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
+ *  LCN_HOLE		Hole / not allocated on disk.
+ *  LCN_RL_NOT_MAPPED	This is part of the runlist which has not been
+ *			inserted into the runlist yet.
+ *  LCN_ENOENT		There is no such vcn in the attribute.
+ *
+ * Locking: - The caller must have locked the runlist (for reading or writ=
ing).
+ *	    - This function does not touch the lock, nor does it modify the
+ *	      runlist.
+ */
+s64 ntfs_rl_vcn_to_lcn(const struct runlist_element *rl, const s64 vcn)
+{
+	int i;
+
+	/*
+	 * If rl is NULL, assume that we have found an unmapped runlist. The
+	 * caller can then attempt to map it and fail appropriately if
+	 * necessary.
+	 */
+	if (unlikely(!rl))
+		return LCN_RL_NOT_MAPPED;
+
+	/* Catch out of lower bounds vcn. */
+	if (unlikely(vcn < rl[0].vcn))
+		return LCN_ENOENT;
+
+	for (i =3D 0; likely(rl[i].length); i++) {
+		if (vcn < rl[i+1].vcn) {
+			if (likely(rl[i].lcn >=3D 0))
+				return rl[i].lcn + (vcn - rl[i].vcn);
+			return rl[i].lcn;
+		}
+	}
+	/*
+	 * The terminator element is setup to the correct value, i.e. one of
+	 * LCN_HOLE, LCN_RL_NOT_MAPPED, or LCN_ENOENT.
+	 */
+	if (likely(rl[i].lcn < 0))
+		return rl[i].lcn;
+	/* Just in case... We could replace this with BUG() some day. */
+	return LCN_ENOENT;
+}
+
+/**
+ * ntfs_rl_find_vcn_nolock - find a vcn in a runlist
+ * @rl:		runlist to search
+ * @vcn:	vcn to find
+ *
+ * Find the virtual cluster number @vcn in the runlist @rl and return the
+ * address of the runlist element containing the @vcn on success.
+ *
+ * Return NULL if @rl is NULL or @vcn is in an unmapped part/out of bounds=
 of
+ * the runlist.
+ *
+ * Locking: The runlist must be locked on entry.
+ */
+struct runlist_element *ntfs_rl_find_vcn_nolock(struct runlist_element *rl=
, const s64 vcn)
+{
+	if (unlikely(!rl || vcn < rl[0].vcn))
+		return NULL;
+	while (likely(rl->length)) {
+		if (unlikely(vcn < rl[1].vcn)) {
+			if (likely(rl->lcn >=3D LCN_HOLE))
+				return rl;
+			return NULL;
+		}
+		rl++;
+	}
+	if (likely(rl->lcn =3D=3D LCN_ENOENT))
+		return rl;
+	return NULL;
+}
+
+/**
+ * ntfs_get_nr_significant_bytes - get number of bytes needed to store a n=
umber
+ * @n:		number for which to get the number of bytes for
+ *
+ * Return the number of bytes required to store @n unambiguously as
+ * a signed number.
+ *
+ * This is used in the context of the mapping pairs array to determine how
+ * many bytes will be needed in the array to store a given logical cluster
+ * number (lcn) or a specific run length.
+ *
+ * Return the number of bytes written.  This function cannot fail.
+ */
+static inline int ntfs_get_nr_significant_bytes(const s64 n)
+{
+	s64 l =3D n;
+	int i;
+	s8 j;
+
+	i =3D 0;
+	do {
+		l >>=3D 8;
+		i++;
+	} while (l !=3D 0 && l !=3D -1);
+	j =3D (n >> 8 * (i - 1)) & 0xff;
+	/* If the sign bit is wrong, we need an extra byte. */
+	if ((n < 0 && j >=3D 0) || (n > 0 && j < 0))
+		i++;
+	return i;
+}
+
+/**
+ * ntfs_get_size_for_mapping_pairs - get bytes needed for mapping pairs ar=
ray
+ *
+ * Walk the locked runlist @rl and calculate the size in bytes of the mapp=
ing
+ * pairs array corresponding to the runlist @rl, starting at vcn @first_vc=
n and
+ * finishing with vcn @last_vcn.
+ *
+ * A @last_vcn of -1 means end of runlist and in that case the size of the
+ * mapping pairs array corresponding to the runlist starting at vcn @first=
_vcn
+ * and finishing at the end of the runlist is determined.
+ *
+ * This for example allows us to allocate a buffer of the right size when
+ * building the mapping pairs array.
+ *
+ * If @rl is NULL, just return 1 (for the single terminator byte).
+ *
+ * Return the calculated size in bytes on success.  On error, return -errn=
o.
+ */
+int ntfs_get_size_for_mapping_pairs(const struct ntfs_volume *vol,
+		const struct runlist_element *rl, const s64 first_vcn,
+		const s64 last_vcn, int max_mp_size)
+{
+	s64 prev_lcn;
+	int rls;
+	bool the_end =3D false;
+
+	if (first_vcn < 0 || last_vcn < -1)
+		return -EINVAL;
+
+	if (last_vcn >=3D 0 && first_vcn > last_vcn)
+		return -EINVAL;
+
+	if (!rl) {
+		WARN_ON(first_vcn);
+		WARN_ON(last_vcn > 0);
+		return 1;
+	}
+	if (max_mp_size <=3D 0)
+		max_mp_size =3D INT_MAX;
+	/* Skip to runlist element containing @first_vcn. */
+	while (rl->length && first_vcn >=3D rl[1].vcn)
+		rl++;
+	if (unlikely((!rl->length && first_vcn > rl->vcn) ||
+			first_vcn < rl->vcn))
+		return -EINVAL;
+	prev_lcn =3D 0;
+	/* Always need the termining zero byte. */
+	rls =3D 1;
+	/* Do the first partial run if present. */
+	if (first_vcn > rl->vcn) {
+		s64 delta, length =3D rl->length;
+
+		/* We know rl->length !=3D 0 already. */
+		if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
+			goto err_out;
+		/*
+		 * If @stop_vcn is given and finishes inside this run, cap the
+		 * run length.
+		 */
+		if (unlikely(last_vcn >=3D 0 && rl[1].vcn > last_vcn)) {
+			s64 s1 =3D last_vcn + 1;
+
+			if (unlikely(rl[1].vcn > s1))
+				length =3D s1 - rl->vcn;
+			the_end =3D true;
+		}
+		delta =3D first_vcn - rl->vcn;
+		/* Header byte + length. */
+		rls +=3D 1 + ntfs_get_nr_significant_bytes(length - delta);
+		/*
+		 * If the logical cluster number (lcn) denotes a hole and we
+		 * are on NTFS 3.0+, we don't store it at all, i.e. we need
+		 * zero space.  On earlier NTFS versions we just store the lcn.
+		 * Note: this assumes that on NTFS 1.2-, holes are stored with
+		 * an lcn of -1 and not a delta_lcn of -1 (unless both are -1).
+		 */
+		if (likely(rl->lcn >=3D 0 || vol->major_ver < 3)) {
+			prev_lcn =3D rl->lcn;
+			if (likely(rl->lcn >=3D 0))
+				prev_lcn +=3D delta;
+			/* Change in lcn. */
+			rls +=3D ntfs_get_nr_significant_bytes(prev_lcn);
+		}
+		/* Go to next runlist element. */
+		rl++;
+	}
+	/* Do the full runs. */
+	for (; rl->length && !the_end; rl++) {
+		s64 length =3D rl->length;
+
+		if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
+			goto err_out;
+		/*
+		 * If @stop_vcn is given and finishes inside this run, cap the
+		 * run length.
+		 */
+		if (unlikely(last_vcn >=3D 0 && rl[1].vcn > last_vcn)) {
+			s64 s1 =3D last_vcn + 1;
+
+			if (unlikely(rl[1].vcn > s1))
+				length =3D s1 - rl->vcn;
+			the_end =3D true;
+		}
+		/* Header byte + length. */
+		rls +=3D 1 + ntfs_get_nr_significant_bytes(length);
+		/*
+		 * If the logical cluster number (lcn) denotes a hole and we
+		 * are on NTFS 3.0+, we don't store it at all, i.e. we need
+		 * zero space.  On earlier NTFS versions we just store the lcn.
+		 * Note: this assumes that on NTFS 1.2-, holes are stored with
+		 * an lcn of -1 and not a delta_lcn of -1 (unless both are -1).
+		 */
+		if (likely(rl->lcn >=3D 0 || vol->major_ver < 3)) {
+			/* Change in lcn. */
+			rls +=3D ntfs_get_nr_significant_bytes(rl->lcn -
+					prev_lcn);
+			prev_lcn =3D rl->lcn;
+		}
+
+		if (rls > max_mp_size)
+			break;
+	}
+	return rls;
+err_out:
+	if (rl->lcn =3D=3D LCN_RL_NOT_MAPPED)
+		rls =3D -EINVAL;
+	else
+		rls =3D -EIO;
+	return rls;
+}
+
+/**
+ * ntfs_write_significant_bytes - write the significant bytes of a number
+ * @dst:	destination buffer to write to
+ * @dst_max:	pointer to last byte of destination buffer for bounds checking
+ * @n:		number whose significant bytes to write
+ *
+ * Store in @dst, the minimum bytes of the number @n which are required to
+ * identify @n unambiguously as a signed number, taking care not to exceed
+ * @dest_max, the maximum position within @dst to which we are allowed to
+ * write.
+ *
+ * This is used when building the mapping pairs array of a runlist to comp=
ress
+ * a given logical cluster number (lcn) or a specific run length to the mi=
nimum
+ * size possible.
+ *
+ * Return the number of bytes written on success.  On error, i.e. the
+ * destination buffer @dst is too small, return -ENOSPC.
+ */
+static inline int ntfs_write_significant_bytes(s8 *dst, const s8 *dst_max,
+		const s64 n)
+{
+	s64 l =3D n;
+	int i;
+	s8 j;
+
+	i =3D 0;
+	do {
+		if (unlikely(dst > dst_max))
+			goto err_out;
+		*dst++ =3D l & 0xffll;
+		l >>=3D 8;
+		i++;
+	} while (l !=3D 0 && l !=3D -1);
+	j =3D (n >> 8 * (i - 1)) & 0xff;
+	/* If the sign bit is wrong, we need an extra byte. */
+	if (n < 0 && j >=3D 0) {
+		if (unlikely(dst > dst_max))
+			goto err_out;
+		i++;
+		*dst =3D (s8)-1;
+	} else if (n > 0 && j < 0) {
+		if (unlikely(dst > dst_max))
+			goto err_out;
+		i++;
+		*dst =3D (s8)0;
+	}
+	return i;
+err_out:
+	return -ENOSPC;
+}
+
+/**
+ * ntfs_mapping_pairs_build - build the mapping pairs array from a runlist
+ *
+ * Create the mapping pairs array from the locked runlist @rl, starting at=
 vcn
+ * @first_vcn and finishing with vcn @last_vcn and save the array in @dst.
+ * @dst_len is the size of @dst in bytes and it should be at least equal t=
o the
+ * value obtained by calling ntfs_get_size_for_mapping_pairs().
+ *
+ * A @last_vcn of -1 means end of runlist and in that case the mapping pai=
rs
+ * array corresponding to the runlist starting at vcn @first_vcn and finis=
hing
+ * at the end of the runlist is created.
+ *
+ * If @rl is NULL, just write a single terminator byte to @dst.
+ *
+ * On success or -ENOSPC error, if @stop_vcn is not NULL, *@stop_vcn is se=
t to
+ * the first vcn outside the destination buffer.  Note that on error, @dst=
 has
+ * been filled with all the mapping pairs that will fit, thus it can be tr=
eated
+ * as partial success, in that a new attribute extent needs to be created =
or
+ * the next extent has to be used and the mapping pairs build has to be
+ * continued with @first_vcn set to *@stop_vcn.
+ */
+int ntfs_mapping_pairs_build(const struct ntfs_volume *vol, s8 *dst,
+		const int dst_len, const struct runlist_element *rl,
+		const s64 first_vcn, const s64 last_vcn, s64 *const stop_vcn,
+		struct runlist_element **stop_rl, unsigned int *de_cluster_count)
+{
+	s64 prev_lcn;
+	s8 *dst_max, *dst_next;
+	int err =3D -ENOSPC;
+	bool the_end =3D false;
+	s8 len_len, lcn_len;
+	unsigned int de_cnt =3D 0;
+
+	if (first_vcn < 0 || last_vcn < -1 || dst_len < 1)
+		return -EINVAL;
+	if (last_vcn >=3D 0 && first_vcn > last_vcn)
+		return -EINVAL;
+
+	if (!rl) {
+		WARN_ON(first_vcn || last_vcn > 0);
+		if (stop_vcn)
+			*stop_vcn =3D 0;
+		/* Terminator byte. */
+		*dst =3D 0;
+		return 0;
+	}
+	/* Skip to runlist element containing @first_vcn. */
+	while (rl->length && first_vcn >=3D rl[1].vcn)
+		rl++;
+	if (unlikely((!rl->length && first_vcn > rl->vcn) ||
+			first_vcn < rl->vcn))
+		return -EINVAL;
+	/*
+	 * @dst_max is used for bounds checking in
+	 * ntfs_write_significant_bytes().
+	 */
+	dst_max =3D dst + dst_len - 1;
+	prev_lcn =3D 0;
+	/* Do the first partial run if present. */
+	if (first_vcn > rl->vcn) {
+		s64 delta, length =3D rl->length;
+
+		/* We know rl->length !=3D 0 already. */
+		if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
+			goto err_out;
+		/*
+		 * If @stop_vcn is given and finishes inside this run, cap the
+		 * run length.
+		 */
+		if (unlikely(last_vcn >=3D 0 && rl[1].vcn > last_vcn)) {
+			s64 s1 =3D last_vcn + 1;
+
+			if (unlikely(rl[1].vcn > s1))
+				length =3D s1 - rl->vcn;
+			the_end =3D true;
+		}
+		delta =3D first_vcn - rl->vcn;
+		/* Write length. */
+		len_len =3D ntfs_write_significant_bytes(dst + 1, dst_max,
+				length - delta);
+		if (unlikely(len_len < 0))
+			goto size_err;
+		/*
+		 * If the logical cluster number (lcn) denotes a hole and we
+		 * are on NTFS 3.0+, we don't store it at all, i.e. we need
+		 * zero space.  On earlier NTFS versions we just write the lcn
+		 * change.
+		 */
+		if (likely(rl->lcn >=3D 0 || vol->major_ver < 3)) {
+			prev_lcn =3D rl->lcn;
+			if (likely(rl->lcn >=3D 0))
+				prev_lcn +=3D delta;
+			/* Write change in lcn. */
+			lcn_len =3D ntfs_write_significant_bytes(dst + 1 +
+					len_len, dst_max, prev_lcn);
+			if (unlikely(lcn_len < 0))
+				goto size_err;
+		} else
+			lcn_len =3D 0;
+		dst_next =3D dst + len_len + lcn_len + 1;
+		if (unlikely(dst_next > dst_max))
+			goto size_err;
+		/* Update header byte. */
+		*dst =3D lcn_len << 4 | len_len;
+		/* Position at next mapping pairs array element. */
+		dst =3D dst_next;
+		/* Go to next runlist element. */
+		rl++;
+	}
+	/* Do the full runs. */
+	for (; rl->length && !the_end; rl++) {
+		s64 length =3D rl->length;
+
+		if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
+			goto err_out;
+		/*
+		 * If @stop_vcn is given and finishes inside this run, cap the
+		 * run length.
+		 */
+		if (unlikely(last_vcn >=3D 0 && rl[1].vcn > last_vcn)) {
+			s64 s1 =3D last_vcn + 1;
+
+			if (unlikely(rl[1].vcn > s1))
+				length =3D s1 - rl->vcn;
+			the_end =3D true;
+		}
+		/* Write length. */
+		len_len =3D ntfs_write_significant_bytes(dst + 1, dst_max,
+				length);
+		if (unlikely(len_len < 0))
+			goto size_err;
+		/*
+		 * If the logical cluster number (lcn) denotes a hole and we
+		 * are on NTFS 3.0+, we don't store it at all, i.e. we need
+		 * zero space.  On earlier NTFS versions we just write the lcn
+		 * change.
+		 */
+		if (likely(rl->lcn >=3D 0 || vol->major_ver < 3)) {
+			/* Write change in lcn. */
+			lcn_len =3D ntfs_write_significant_bytes(dst + 1 +
+					len_len, dst_max, rl->lcn - prev_lcn);
+			if (unlikely(lcn_len < 0))
+				goto size_err;
+			prev_lcn =3D rl->lcn;
+		} else {
+			if (rl->lcn =3D=3D LCN_DELALLOC)
+				de_cnt +=3D rl->length;
+			lcn_len =3D 0;
+		}
+		dst_next =3D dst + len_len + lcn_len + 1;
+		if (unlikely(dst_next > dst_max))
+			goto size_err;
+		/* Update header byte. */
+		*dst =3D lcn_len << 4 | len_len;
+		/* Position at next mapping pairs array element. */
+		dst =3D dst_next;
+	}
+	/* Success. */
+	if (de_cluster_count)
+		*de_cluster_count =3D de_cnt;
+	err =3D 0;
+size_err:
+	/* Set stop vcn. */
+	if (stop_vcn)
+		*stop_vcn =3D rl->vcn;
+	if (stop_rl)
+		*stop_rl =3D (struct runlist_element *)rl;
+	/* Add terminator byte. */
+	*dst =3D 0;
+	return err;
+err_out:
+	if (rl->lcn =3D=3D LCN_RL_NOT_MAPPED)
+		err =3D -EINVAL;
+	else
+		err =3D -EIO;
+	return err;
+}
+
+/**
+ * ntfs_rl_truncate_nolock - truncate a runlist starting at a specified vcn
+ * @vol:	ntfs volume (needed for error output)
+ * @runlist:	runlist to truncate
+ * @new_length:	the new length of the runlist in VCNs
+ *
+ * Truncate the runlist described by @runlist as well as the memory buffer
+ * holding the runlist elements to a length of @new_length VCNs.
+ *
+ * If @new_length lies within the runlist, the runlist elements with VCNs =
of
+ * @new_length and above are discarded.  As a special case if @new_length =
is
+ * zero, the runlist is discarded and set to NULL.
+ *
+ * If @new_length lies beyond the runlist, a sparse runlist element is add=
ed to
+ * the end of the runlist @runlist or if the last runlist element is a spa=
rse
+ * one already, this is extended.
+ *
+ * Note, no checking is done for unmapped runlist elements.  It is assumed=
 that
+ * the caller has mapped any elements that need to be mapped already.
+ *
+ * Return 0 on success and -errno on error.
+ */
+int ntfs_rl_truncate_nolock(const struct ntfs_volume *vol, struct runlist =
*const runlist,
+		const s64 new_length)
+{
+	struct runlist_element *rl;
+	int old_size;
+
+	ntfs_debug("Entering for new_length 0x%llx.", (long long)new_length);
+
+	if (!runlist || new_length < 0)
+		return -EINVAL;
+
+	rl =3D runlist->rl;
+	if (new_length < rl->vcn)
+		return -EINVAL;
+
+	/* Find @new_length in the runlist. */
+	while (likely(rl->length && new_length >=3D rl[1].vcn))
+		rl++;
+	/*
+	 * If not at the end of the runlist we need to shrink it.
+	 * If at the end of the runlist we need to expand it.
+	 */
+	if (rl->length) {
+		struct runlist_element *trl;
+		bool is_end;
+
+		ntfs_debug("Shrinking runlist.");
+		/* Determine the runlist size. */
+		trl =3D rl + 1;
+		while (likely(trl->length))
+			trl++;
+		old_size =3D trl - runlist->rl + 1;
+		/* Truncate the run. */
+		rl->length =3D new_length - rl->vcn;
+		/*
+		 * If a run was partially truncated, make the following runlist
+		 * element a terminator.
+		 */
+		is_end =3D false;
+		if (rl->length) {
+			rl++;
+			if (!rl->length)
+				is_end =3D true;
+			rl->vcn =3D new_length;
+			rl->length =3D 0;
+		}
+		rl->lcn =3D LCN_ENOENT;
+		runlist->count =3D rl - runlist->rl + 1;
+		/* Reallocate memory if necessary. */
+		if (!is_end) {
+			int new_size =3D rl - runlist->rl + 1;
+
+			rl =3D ntfs_rl_realloc(runlist->rl, old_size, new_size);
+			if (IS_ERR(rl))
+				ntfs_warning(vol->sb,
+					"Failed to shrink runlist buffer.  This just wastes a bit of memory t=
emporarily so we ignore it and return success.");
+			else
+				runlist->rl =3D rl;
+		}
+	} else if (likely(/* !rl->length && */ new_length > rl->vcn)) {
+		ntfs_debug("Expanding runlist.");
+		/*
+		 * If there is a previous runlist element and it is a sparse
+		 * one, extend it.  Otherwise need to add a new, sparse runlist
+		 * element.
+		 */
+		if ((rl > runlist->rl) && ((rl - 1)->lcn =3D=3D LCN_HOLE))
+			(rl - 1)->length =3D new_length - (rl - 1)->vcn;
+		else {
+			/* Determine the runlist size. */
+			old_size =3D rl - runlist->rl + 1;
+			/* Reallocate memory if necessary. */
+			rl =3D ntfs_rl_realloc(runlist->rl, old_size,
+					old_size + 1);
+			if (IS_ERR(rl)) {
+				ntfs_error(vol->sb, "Failed to expand runlist buffer, aborting.");
+				return PTR_ERR(rl);
+			}
+			runlist->rl =3D rl;
+			/*
+			 * Set @rl to the same runlist element in the new
+			 * runlist as before in the old runlist.
+			 */
+			rl +=3D old_size - 1;
+			/* Add a new, sparse runlist element. */
+			rl->lcn =3D LCN_HOLE;
+			rl->length =3D new_length - rl->vcn;
+			/* Add a new terminator runlist element. */
+			rl++;
+			rl->length =3D 0;
+			runlist->count =3D old_size + 1;
+		}
+		rl->vcn =3D new_length;
+		rl->lcn =3D LCN_ENOENT;
+	} else /* if (unlikely(!rl->length && new_length =3D=3D rl->vcn)) */ {
+		/* Runlist already has same size as requested. */
+		rl->lcn =3D LCN_ENOENT;
+	}
+	ntfs_debug("Done.");
+	return 0;
+}
+
+/**
+ * ntfs_rl_sparse - check whether runlist have sparse regions or not.
+ * @rl:         runlist to check
+ *
+ * Return 1 if have, 0 if not, -errno on error.
+ */
+int ntfs_rl_sparse(struct runlist_element *rl)
+{
+	struct runlist_element *rlc;
+
+	if (!rl)
+		return -EINVAL;
+
+	for (rlc =3D rl; rlc->length; rlc++)
+		if (rlc->lcn < 0) {
+			if (rlc->lcn !=3D LCN_HOLE && rlc->lcn !=3D LCN_DELALLOC) {
+				pr_err("%s: bad runlist", __func__);
+				return -EINVAL;
+			}
+			return 1;
+		}
+	return 0;
+}
+
+/**
+ * ntfs_rl_get_compressed_size - calculate length of non sparse regions
+ * @vol:        ntfs volume (need for cluster size)
+ * @rl:         runlist to calculate for
+ *
+ * Return compressed size or -errno on error.
+ */
+s64 ntfs_rl_get_compressed_size(struct ntfs_volume *vol, struct runlist_el=
ement *rl)
+{
+	struct runlist_element *rlc;
+	s64 ret =3D 0;
+
+	if (!rl)
+		return -EINVAL;
+
+	for (rlc =3D rl; rlc->length; rlc++) {
+		if (rlc->lcn < 0) {
+			if (rlc->lcn !=3D LCN_HOLE && rlc->lcn !=3D LCN_DELALLOC) {
+				ntfs_error(vol->sb, "%s: bad runlist, rlc->lcn : %lld",
+						__func__, rlc->lcn);
+				return -EINVAL;
+			}
+		} else
+			ret +=3D rlc->length;
+	}
+	return ret << vol->cluster_size_bits;
+}
+
+static inline bool ntfs_rle_lcn_contiguous(struct runlist_element *left_rl=
e,
+					   struct runlist_element *right_rle)
+{
+	if (left_rle->lcn > LCN_HOLE &&
+	    left_rle->lcn + left_rle->length =3D=3D right_rle->lcn)
+		return true;
+	else if (left_rle->lcn =3D=3D LCN_HOLE && right_rle->lcn =3D=3D LCN_HOLE)
+		return true;
+	else
+		return false;
+}
+
+static inline bool ntfs_rle_contain(struct runlist_element *rle, s64 vcn)
+{
+	if (rle->length > 0 &&
+	    vcn >=3D rle->vcn && vcn < rle->vcn + rle->length)
+		return true;
+	else
+		return false;
+}
+
+struct runlist_element *ntfs_rl_insert_range(struct runlist_element *dst_r=
l, int dst_cnt,
+				      struct runlist_element *src_rl, int src_cnt,
+				      size_t *new_rl_cnt)
+{
+	struct runlist_element *i_rl, *new_rl, *src_rl_origin =3D src_rl;
+	struct runlist_element dst_rl_split;
+	s64 start_vcn =3D src_rl[0].vcn;
+	int new_1st_cnt, new_2nd_cnt, new_3rd_cnt, new_cnt;
+
+	if (!dst_rl || !src_rl || !new_rl_cnt)
+		return ERR_PTR(-EINVAL);
+	if (dst_cnt <=3D 0 || src_cnt <=3D 0)
+		return ERR_PTR(-EINVAL);
+	if (!(dst_rl[dst_cnt - 1].lcn =3D=3D LCN_ENOENT &&
+	      dst_rl[dst_cnt - 1].length =3D=3D 0) ||
+	    src_rl[src_cnt - 1].lcn < LCN_HOLE)
+		return ERR_PTR(-EINVAL);
+
+	start_vcn =3D src_rl[0].vcn;
+
+	i_rl =3D ntfs_rl_find_vcn_nolock(dst_rl, start_vcn);
+	if (!i_rl ||
+	    (i_rl->lcn =3D=3D LCN_ENOENT && i_rl->vcn !=3D start_vcn) ||
+	    (i_rl->lcn !=3D LCN_ENOENT && !ntfs_rle_contain(i_rl, start_vcn)))
+		return ERR_PTR(-EINVAL);
+
+	new_1st_cnt =3D (int)(i_rl - dst_rl);
+	if (new_1st_cnt > dst_cnt)
+		return ERR_PTR(-EINVAL);
+	new_3rd_cnt =3D dst_cnt - new_1st_cnt;
+	if (new_3rd_cnt < 1)
+		return ERR_PTR(-EINVAL);
+
+	if (i_rl[0].vcn !=3D start_vcn) {
+		if (i_rl[0].lcn =3D=3D LCN_HOLE && src_rl[0].lcn =3D=3D LCN_HOLE)
+			goto merge_src_rle;
+
+		/* split @i_rl[0] and create @dst_rl_split */
+		dst_rl_split.vcn =3D i_rl[0].vcn;
+		dst_rl_split.length =3D start_vcn - i_rl[0].vcn;
+		dst_rl_split.lcn =3D i_rl[0].lcn;
+
+		i_rl[0].vcn =3D start_vcn;
+		i_rl[0].length -=3D dst_rl_split.length;
+		i_rl[0].lcn +=3D dst_rl_split.length;
+	} else {
+		struct runlist_element *dst_rle, *src_rle;
+merge_src_rle:
+
+		/* not split @i_rl[0] */
+		dst_rl_split.lcn =3D LCN_ENOENT;
+
+		/* merge @src_rl's first run and @i_rl[0]'s left run if possible */
+		dst_rle =3D &dst_rl[new_1st_cnt - 1];
+		src_rle =3D &src_rl[0];
+		if (new_1st_cnt > 0 && ntfs_rle_lcn_contiguous(dst_rle, src_rle)) {
+			WARN_ON(dst_rle->vcn + dst_rle->length !=3D src_rle->vcn);
+			dst_rle->length +=3D src_rle->length;
+			src_rl++;
+			src_cnt--;
+		} else {
+			/* merge @src_rl's last run and @i_rl[0]'s right if possible */
+			dst_rle =3D &dst_rl[new_1st_cnt];
+			src_rle =3D &src_rl[src_cnt - 1];
+
+			if (ntfs_rle_lcn_contiguous(dst_rle, src_rle)) {
+				dst_rle->length +=3D src_rle->length;
+				src_cnt--;
+			}
+		}
+	}
+
+	new_2nd_cnt =3D src_cnt;
+	new_cnt =3D new_1st_cnt + new_2nd_cnt + new_3rd_cnt;
+	new_cnt +=3D dst_rl_split.lcn >=3D LCN_HOLE ? 1 : 0;
+	new_rl =3D ntfs_malloc_nofs(new_cnt * sizeof(*new_rl));
+	if (!new_rl)
+		return ERR_PTR(-ENOMEM);
+
+	/* Copy the @dst_rl's first half to @new_rl */
+	ntfs_rl_mc(new_rl, 0, dst_rl, 0, new_1st_cnt);
+	if (dst_rl_split.lcn >=3D LCN_HOLE) {
+		ntfs_rl_mc(new_rl, new_1st_cnt, &dst_rl_split, 0, 1);
+		new_1st_cnt++;
+	}
+	/* Copy the @src_rl to @new_rl */
+	ntfs_rl_mc(new_rl, new_1st_cnt, src_rl, 0, new_2nd_cnt);
+	/* Copy the @dst_rl's second half to @new_rl */
+	if (new_3rd_cnt >=3D 1) {
+		struct runlist_element *rl, *rl_3rd;
+		int dst_1st_cnt =3D dst_rl_split.lcn >=3D LCN_HOLE ?
+			new_1st_cnt - 1 : new_1st_cnt;
+
+		ntfs_rl_mc(new_rl, new_1st_cnt + new_2nd_cnt,
+			   dst_rl, dst_1st_cnt, new_3rd_cnt);
+		/* Update vcn of the @dst_rl's second half runs to reflect
+		 * appended @src_rl.
+		 */
+		if (new_1st_cnt + new_2nd_cnt =3D=3D 0) {
+			rl_3rd =3D &new_rl[new_1st_cnt + new_2nd_cnt + 1];
+			rl =3D &new_rl[new_1st_cnt + new_2nd_cnt];
+		} else {
+			rl_3rd =3D &new_rl[new_1st_cnt + new_2nd_cnt];
+			rl =3D &new_rl[new_1st_cnt + new_2nd_cnt - 1];
+		}
+		do {
+			rl_3rd->vcn =3D rl->vcn + rl->length;
+			if (rl_3rd->length <=3D 0)
+				break;
+			rl =3D rl_3rd;
+			rl_3rd++;
+		} while (1);
+	}
+	*new_rl_cnt =3D new_1st_cnt + new_2nd_cnt + new_3rd_cnt;
+
+	ntfs_free(dst_rl);
+	ntfs_free(src_rl_origin);
+	return new_rl;
+}
+
+struct runlist_element *ntfs_rl_punch_hole(struct runlist_element *dst_rl,=
 int dst_cnt,
+				    s64 start_vcn, s64 len,
+				    struct runlist_element **punch_rl,
+				    size_t *new_rl_cnt)
+{
+	struct runlist_element *s_rl, *e_rl, *new_rl, *dst_3rd_rl, hole_rl[1];
+	s64 end_vcn;
+	int new_1st_cnt, dst_3rd_cnt, new_cnt, punch_cnt, merge_cnt;
+	bool begin_split, end_split, one_split_3;
+
+	if (dst_cnt < 2 ||
+	    !(dst_rl[dst_cnt - 1].lcn =3D=3D LCN_ENOENT &&
+	      dst_rl[dst_cnt - 1].length =3D=3D 0))
+		return ERR_PTR(-EINVAL);
+
+	end_vcn =3D min(start_vcn + len - 1,
+		      dst_rl[dst_cnt - 2].vcn + dst_rl[dst_cnt - 2].length - 1);
+
+	s_rl =3D ntfs_rl_find_vcn_nolock(dst_rl, start_vcn);
+	if (!s_rl ||
+	    s_rl->lcn <=3D LCN_ENOENT ||
+	    !ntfs_rle_contain(s_rl, start_vcn))
+		return ERR_PTR(-EINVAL);
+
+	begin_split =3D s_rl->vcn !=3D start_vcn ? true : false;
+
+	e_rl =3D ntfs_rl_find_vcn_nolock(dst_rl, end_vcn);
+	if (!e_rl ||
+	    e_rl->lcn <=3D LCN_ENOENT ||
+	    !ntfs_rle_contain(e_rl, end_vcn))
+		return ERR_PTR(-EINVAL);
+
+	end_split =3D e_rl->vcn + e_rl->length - 1 !=3D end_vcn ? true : false;
+
+	/* @s_rl has to be split into left, punched hole, and right */
+	one_split_3 =3D e_rl =3D=3D s_rl && begin_split && end_split ? true : fal=
se;
+
+	punch_cnt =3D (int)(e_rl - s_rl) + 1;
+
+	*punch_rl =3D ntfs_malloc_nofs((punch_cnt + 1) * sizeof(struct runlist_el=
ement));
+	if (!*punch_rl)
+		return ERR_PTR(-ENOMEM);
+
+	new_cnt =3D dst_cnt - (int)(e_rl - s_rl + 1) + 3;
+	new_rl =3D ntfs_malloc_nofs(new_cnt * sizeof(struct runlist_element));
+	if (!new_rl) {
+		ntfs_free(*punch_rl);
+		*punch_rl =3D NULL;
+		return ERR_PTR(-ENOMEM);
+	}
+
+	new_1st_cnt =3D (int)(s_rl - dst_rl) + 1;
+	ntfs_rl_mc(*punch_rl, 0, dst_rl, new_1st_cnt - 1, punch_cnt);
+
+	(*punch_rl)[punch_cnt].lcn =3D LCN_ENOENT;
+	(*punch_rl)[punch_cnt].length =3D 0;
+
+	if (!begin_split)
+		new_1st_cnt--;
+	dst_3rd_rl =3D e_rl;
+	dst_3rd_cnt =3D (int)(&dst_rl[dst_cnt - 1] - e_rl) + 1;
+	if (!end_split) {
+		dst_3rd_rl++;
+		dst_3rd_cnt--;
+	}
+
+	/* Copy the 1st part of @dst_rl into @new_rl */
+	ntfs_rl_mc(new_rl, 0, dst_rl, 0, new_1st_cnt);
+	if (begin_split) {
+		/* the @e_rl has to be splited and copied into the last of @new_rl
+		 * and the first of @punch_rl
+		 */
+		s64 first_cnt =3D start_vcn - dst_rl[new_1st_cnt - 1].vcn;
+
+		if (new_1st_cnt)
+			new_rl[new_1st_cnt - 1].length =3D first_cnt;
+
+		(*punch_rl)[0].vcn =3D start_vcn;
+		(*punch_rl)[0].length -=3D first_cnt;
+		if ((*punch_rl)[0].lcn > LCN_HOLE)
+			(*punch_rl)[0].lcn +=3D first_cnt;
+	}
+
+	/* Copy a hole into @new_rl */
+	hole_rl[0].vcn =3D start_vcn;
+	hole_rl[0].length =3D (s64)len;
+	hole_rl[0].lcn =3D LCN_HOLE;
+	ntfs_rl_mc(new_rl, new_1st_cnt, hole_rl, 0, 1);
+
+	/* Copy the 3rd part of @dst_rl into @new_rl */
+	ntfs_rl_mc(new_rl, new_1st_cnt + 1, dst_3rd_rl, 0, dst_3rd_cnt);
+	if (end_split) {
+		/* the @e_rl has to be splited and copied into the first of
+		 * @new_rl and the last of @punch_rl
+		 */
+		s64 first_cnt =3D end_vcn - dst_3rd_rl[0].vcn + 1;
+
+		new_rl[new_1st_cnt + 1].vcn =3D end_vcn + 1;
+		new_rl[new_1st_cnt + 1].length -=3D first_cnt;
+		if (new_rl[new_1st_cnt + 1].lcn > LCN_HOLE)
+			new_rl[new_1st_cnt + 1].lcn +=3D first_cnt;
+
+		if (one_split_3)
+			(*punch_rl)[punch_cnt - 1].length -=3D
+				new_rl[new_1st_cnt + 1].length;
+		else
+			(*punch_rl)[punch_cnt - 1].length =3D first_cnt;
+	}
+
+	/* Merge left and hole, or hole and right in @new_rl, if left or right
+	 * consists of holes.
+	 */
+	merge_cnt =3D 0;
+	if (new_1st_cnt > 0 && new_rl[new_1st_cnt - 1].lcn =3D=3D LCN_HOLE) {
+		/* Merge right and hole */
+		s_rl =3D  &new_rl[new_1st_cnt - 1];
+		s_rl->length +=3D s_rl[1].length;
+		merge_cnt =3D 1;
+		/* Merge left and right */
+		if (new_1st_cnt + 1 < new_cnt &&
+		    new_rl[new_1st_cnt + 1].lcn =3D=3D LCN_HOLE) {
+			s_rl->length +=3D s_rl[2].length;
+			merge_cnt++;
+		}
+	} else if (new_1st_cnt + 1 < new_cnt &&
+		   new_rl[new_1st_cnt + 1].lcn =3D=3D LCN_HOLE) {
+		/* Merge left and hole */
+		s_rl =3D &new_rl[new_1st_cnt];
+		s_rl->length +=3D s_rl[1].length;
+		merge_cnt =3D 1;
+	}
+	if (merge_cnt) {
+		struct runlist_element *d_rl, *src_rl;
+
+		d_rl =3D s_rl + 1;
+		src_rl =3D s_rl + 1 + merge_cnt;
+		ntfs_rl_mm(new_rl, (int)(d_rl - new_rl), (int)(src_rl - new_rl),
+			   (int)(&new_rl[new_cnt - 1] - src_rl) + 1);
+	}
+
+	(*punch_rl)[punch_cnt].vcn =3D (*punch_rl)[punch_cnt - 1].vcn +
+		(*punch_rl)[punch_cnt - 1].length;
+
+	/* punch_cnt elements of dst are replaced with one hole */
+	*new_rl_cnt =3D dst_cnt - (punch_cnt - (int)begin_split - (int)end_split)=
 +
+		1 - merge_cnt;
+	ntfs_free(dst_rl);
+	return new_rl;
+}
+
+struct runlist_element *ntfs_rl_collapse_range(struct runlist_element *dst=
_rl, int dst_cnt,
+					s64 start_vcn, s64 len,
+					struct runlist_element **punch_rl,
+					size_t *new_rl_cnt)
+{
+	struct runlist_element *s_rl, *e_rl, *new_rl, *dst_3rd_rl;
+	s64 end_vcn;
+	int new_1st_cnt, dst_3rd_cnt, new_cnt, punch_cnt, merge_cnt, i;
+	bool begin_split, end_split, one_split_3;
+
+	if (dst_cnt < 2 ||
+	    !(dst_rl[dst_cnt - 1].lcn =3D=3D LCN_ENOENT &&
+	      dst_rl[dst_cnt - 1].length =3D=3D 0))
+		return ERR_PTR(-EINVAL);
+
+	end_vcn =3D min(start_vcn + len - 1,
+			dst_rl[dst_cnt - 1].vcn - 1);
+
+	s_rl =3D ntfs_rl_find_vcn_nolock(dst_rl, start_vcn);
+	if (!s_rl ||
+	    s_rl->lcn <=3D LCN_ENOENT ||
+	    !ntfs_rle_contain(s_rl, start_vcn))
+		return ERR_PTR(-EINVAL);
+
+	begin_split =3D s_rl->vcn !=3D start_vcn ? true : false;
+
+	e_rl =3D ntfs_rl_find_vcn_nolock(dst_rl, end_vcn);
+	if (!e_rl ||
+	    e_rl->lcn <=3D LCN_ENOENT ||
+	    !ntfs_rle_contain(e_rl, end_vcn))
+		return ERR_PTR(-EINVAL);
+
+	end_split =3D e_rl->vcn + e_rl->length - 1 !=3D end_vcn ? true : false;
+
+	/* @s_rl has to be split into left, collapsed, and right */
+	one_split_3 =3D e_rl =3D=3D s_rl && begin_split && end_split ? true : fal=
se;
+
+	punch_cnt =3D (int)(e_rl - s_rl) + 1;
+	*punch_rl =3D ntfs_malloc_nofs((punch_cnt + 1) * sizeof(struct runlist_el=
ement));
+	if (!*punch_rl)
+		return ERR_PTR(-ENOMEM);
+
+	new_cnt =3D dst_cnt - (int)(e_rl - s_rl + 1) + 3;
+	new_rl =3D ntfs_malloc_nofs(new_cnt * sizeof(struct runlist_element));
+	if (!new_rl) {
+		ntfs_free(*punch_rl);
+		*punch_rl =3D NULL;
+		return ERR_PTR(-ENOMEM);
+	}
+
+	new_1st_cnt =3D (int)(s_rl - dst_rl) + 1;
+	ntfs_rl_mc(*punch_rl, 0, dst_rl, new_1st_cnt - 1, punch_cnt);
+	(*punch_rl)[punch_cnt].lcn =3D LCN_ENOENT;
+	(*punch_rl)[punch_cnt].length =3D 0;
+
+	if (!begin_split)
+		new_1st_cnt--;
+	dst_3rd_rl =3D e_rl;
+	dst_3rd_cnt =3D (int)(&dst_rl[dst_cnt - 1] - e_rl) + 1;
+	if (!end_split) {
+		dst_3rd_rl++;
+		dst_3rd_cnt--;
+	}
+
+	/* Copy the 1st part of @dst_rl into @new_rl */
+	ntfs_rl_mc(new_rl, 0, dst_rl, 0, new_1st_cnt);
+	if (begin_split) {
+		/* the @e_rl has to be splited and copied into the last of @new_rl
+		 * and the first of @punch_rl
+		 */
+		s64 first_cnt =3D start_vcn - dst_rl[new_1st_cnt - 1].vcn;
+
+		new_rl[new_1st_cnt - 1].length =3D first_cnt;
+
+		(*punch_rl)[0].vcn =3D start_vcn;
+		(*punch_rl)[0].length -=3D first_cnt;
+		if ((*punch_rl)[0].lcn > LCN_HOLE)
+			(*punch_rl)[0].lcn +=3D first_cnt;
+	}
+
+	/* Copy the 3rd part of @dst_rl into @new_rl */
+	ntfs_rl_mc(new_rl, new_1st_cnt, dst_3rd_rl, 0, dst_3rd_cnt);
+	if (end_split) {
+		/* the @e_rl has to be splited and copied into the first of
+		 * @new_rl and the last of @punch_rl
+		 */
+		s64 first_cnt =3D end_vcn - dst_3rd_rl[0].vcn + 1;
+
+		new_rl[new_1st_cnt].vcn =3D end_vcn + 1;
+		new_rl[new_1st_cnt].length -=3D first_cnt;
+		if (new_rl[new_1st_cnt].lcn > LCN_HOLE)
+			new_rl[new_1st_cnt].lcn +=3D first_cnt;
+
+		if (one_split_3)
+			(*punch_rl)[punch_cnt - 1].length -=3D
+				new_rl[new_1st_cnt].length;
+		else
+			(*punch_rl)[punch_cnt - 1].length =3D first_cnt;
+	}
+
+	/* Adjust vcn */
+	if (new_1st_cnt =3D=3D 0)
+		new_rl[new_1st_cnt].vcn =3D 0;
+	for (i =3D new_1st_cnt =3D=3D 0 ? 1 : new_1st_cnt; new_rl[i].length; i++)
+		new_rl[i].vcn =3D new_rl[i - 1].vcn + new_rl[i - 1].length;
+	new_rl[i].vcn =3D new_rl[i - 1].vcn + new_rl[i - 1].length;
+
+	/* Merge left and hole, or hole and right in @new_rl, if left or right
+	 * consists of holes.
+	 */
+	merge_cnt =3D 0;
+	i =3D new_1st_cnt =3D=3D 0 ? 1 : new_1st_cnt;
+	if (ntfs_rle_lcn_contiguous(&new_rl[i - 1], &new_rl[i])) {
+		/* Merge right and left */
+		s_rl =3D  &new_rl[new_1st_cnt - 1];
+		s_rl->length +=3D s_rl[1].length;
+		merge_cnt =3D 1;
+	}
+	if (merge_cnt) {
+		struct runlist_element *d_rl, *src_rl;
+
+		d_rl =3D s_rl + 1;
+		src_rl =3D s_rl + 1 + merge_cnt;
+		ntfs_rl_mm(new_rl, (int)(d_rl - new_rl), (int)(src_rl - new_rl),
+			   (int)(&new_rl[new_cnt - 1] - src_rl) + 1);
+	}
+
+	(*punch_rl)[punch_cnt].vcn =3D (*punch_rl)[punch_cnt - 1].vcn +
+		(*punch_rl)[punch_cnt - 1].length;
+
+	/* punch_cnt elements of dst are extracted */
+	*new_rl_cnt =3D dst_cnt - (punch_cnt - (int)begin_split - (int)end_split)=
 -
+		merge_cnt;
+
+	ntfs_free(dst_rl);
+	return new_rl;
+}
--=20
2.25.1
From nobody Sun Feb  8 02:39:38 2026
Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com
 [209.85.214.181])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E8252FB632
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:01:15 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.181
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219680; cv=none;
 b=GMGpd8z3jesqdAP6RVymBqv0SgMq7ImdwRsHkm3lq3NJlUzdMEPRrPkdYIXWGemh6yXez9KYF6AVjgJff1VvZUocHtCIo5PSjtqimaOAHWx0EaBAIYo0K/f9mUAJJR+lOY2S0xXTZsvLCmaqQmbqQm3HBIoQ8ouevioRSrg8eoY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219680; c=relaxed/simple;
	bh=pI+On5GdqcEs5aHYmirxT5SGTUya+MorhDzdQz3HuVw=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=nmjm44tIwstBbBHG6SC6H+s+cCXkYGAXNwwnwTwxxzK6jRVaMFCXM8b3ELFsg2PprLzXG8vTdivJ5yZltnVyAbDAlj8SwAkPaCGoFj1SqDibd/703kavKn3WYHq60QvJxYaCzXKH24lah/6e0b/bbDguupdZX/FdqhBTAvZKYi0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.181
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f181.google.com with SMTP id
 d9443c01a7336-29853ec5b8cso5698005ad.3
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:01:15 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219675; x=1764824475;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=Mouh1zOCUbfadzpLEOEbIEAA0r6eO7W7vXcgNNx0w5Y=;
        b=pjFKsJ9cjDjZkELjuQ6iM4eqTCCLnMU8wRb3LwRg16hEs/qb4F4AD00iNB2u/I6YpI
         vIann/SLygP2z+2vG5eoT/v8UgomD8CvdWmpmGmCpbOzBIqcNNuqQVw0GfwtAQzVJWZi
         bUhPkXRDH7nN2DfYb2RuDjqk1X7zf9YkmnJWofsewQQVZuMwhSK6bo73qqT7PZa5JDiY
         /S3D1ojhnLOuRdaspICyIlxau4PWab8Osc45LAR2TKhzqU8VVhJTlg89qPSob2pR7tmR
         33P90KUh3xm4SwpnYEfSIFrMREv5agGC2cCSgEK3qtIqA+TRXIiNssrLP69lmbnoaJdu
         R7vQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCWeyuR+Mza/V5ggw0ZwAkM5z0rMPrnl273hll5QoC7ukIh7uU/EBNaNrjjKFpJUA85O6eOlT4sHsdWBUwY=@vger.kernel.org
X-Gm-Message-State: AOJu0YzmAZOMZQo5z6nL9yMicCuxyFoRQsctrlNIrPSTnGuZDsZY/dG3
	I0lNLSGWs4tGrEOyFc9eZyMYDUfLAVUUSNAvD3eLWyx+ch+FzQpUeen7
X-Gm-Gg: ASbGnctClQNVHbXbJ4M1LDock2KG5VkdH207egKsgiyk6ZRnDam5F8UTDfp/XPwCZ0W
	psqLh3dTkH+gfH3noX0duWDlZcBvjoWhiPDFpE4gJdQVy8XT3WuaLcI/1E3mCDXDZ8N7vc2okYb
	snBNh1aWvzFZrm8AecC7cUz0UlmLwIlxaGtaypXm41pb/AeYG3lxR/ZHPmhGF4yw+8TA//jucSu
	rxBFGZCuAbbiY9EViGftqF4zGbAkrQbWTzOWeCLU5F2TDHrW7LaHYcdK2Xmhqhf+JY1fKKa3Uqj
	cS0LTTO7jbIErm1ck2ylolBUdrWaxA/sxIRyS/p0zRNiuL1TZcICbEpOFxFemFMi5Ew7n7gfJa6
	fbCf6BMkm1i5Am5qttCOLArPEnFKM77vic/8heXZ93BhEahDi7gKNTsvNugenaxfjIPpFVO87L1
	/sIsm2tfnzvMlgly9IFEoiwK3EdK5MPBN4BUjZ
X-Google-Smtp-Source: 
 AGHT+IHaGxRUrrwuK9dVchyFDrYPvs/urspVektlGChbF85OiYpnR3GGUFxa2CbABeDkrsltZQns+Q==
X-Received: by 2002:a17:903:1b2b:b0:298:43f4:cc49 with SMTP id
 d9443c01a7336-29b6c5225f0mr270246795ad.24.1764219675055;
        Wed, 26 Nov 2025 21:01:15 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.01.09
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:01:14 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>
Subject: [PATCH v2 09/11] ntfsplus: add reparse and ea operations
Date: Thu, 27 Nov 2025 13:59:42 +0900
Message-Id: <20251127045944.26009-10-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds the implementation of reparse and ea operations for ntfsplus.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/ntfsplus/ea.c      | 931 ++++++++++++++++++++++++++++++++++++++++++
 fs/ntfsplus/reparse.c | 550 +++++++++++++++++++++++++
 2 files changed, 1481 insertions(+)
 create mode 100644 fs/ntfsplus/ea.c
 create mode 100644 fs/ntfsplus/reparse.c

diff --git a/fs/ntfsplus/ea.c b/fs/ntfsplus/ea.c
new file mode 100644
index 000000000000..5658aa0e21ea
--- /dev/null
+++ b/fs/ntfsplus/ea.c
@@ -0,0 +1,931 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * Pocessing of EA's
+ *
+ * Part of this file is based on code from the NTFS-3G project.
+ *
+ * Copyright (c) 2014-2021 Jean-Pierre Andre
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/fs.h>
+#include <linux/posix_acl.h>
+#include <linux/posix_acl_xattr.h>
+#include <linux/xattr.h>
+
+#include "layout.h"
+#include "attrib.h"
+#include "index.h"
+#include "dir.h"
+#include "ea.h"
+#include "misc.h"
+
+static int ntfs_write_ea(struct ntfs_inode *ni, int type, char *value, s64=
 ea_off,
+		s64 ea_size, bool need_truncate)
+{
+	struct inode *ea_vi;
+	int err =3D 0;
+	s64 written;
+
+	ea_vi =3D ntfs_attr_iget(VFS_I(ni), type, AT_UNNAMED, 0);
+	if (IS_ERR(ea_vi))
+		return PTR_ERR(ea_vi);
+
+	written =3D ntfs_inode_attr_pwrite(ea_vi, ea_off, ea_size, value, false);
+	if (written !=3D ea_size)
+		err =3D -EIO;
+	else {
+		struct ntfs_inode *ea_ni =3D NTFS_I(ea_vi);
+
+		if (need_truncate && ea_ni->data_size > ea_off + ea_size)
+			ntfs_attr_truncate(ea_ni, ea_off + ea_size);
+		mark_mft_record_dirty(ni);
+	}
+
+	iput(ea_vi);
+	return err;
+}
+
+static int ntfs_ea_lookup(char *ea_buf, s64 ea_buf_size, const char *name,
+		int name_len, s64 *ea_offset, s64 *ea_size)
+{
+	const struct ea_attr *p_ea;
+	s64 offset;
+	unsigned int next;
+
+	if (ea_buf_size < sizeof(struct ea_attr))
+		goto out;
+
+	offset =3D 0;
+	do {
+		p_ea =3D (const struct ea_attr *)&ea_buf[offset];
+		next =3D le32_to_cpu(p_ea->next_entry_offset);
+
+		if (offset + next > ea_buf_size ||
+		    ((1 + p_ea->ea_name_length) > (ea_buf_size - offset)))
+			break;
+
+		if (p_ea->ea_name_length =3D=3D name_len &&
+		    !memcmp(p_ea->ea_name, name, name_len)) {
+			*ea_offset =3D offset;
+			if (next)
+				*ea_size =3D next;
+			else {
+				unsigned int ea_len =3D 1 + p_ea->ea_name_length +
+						le16_to_cpu(p_ea->ea_value_length);
+
+				if ((ea_buf_size - offset) < ea_len)
+					goto out;
+
+				*ea_size =3D ALIGN(struct_size(p_ea, ea_name,
+							1 + p_ea->ea_name_length +
+							le16_to_cpu(p_ea->ea_value_length)), 4);
+			}
+
+			if (ea_buf_size < *ea_offset + *ea_size)
+				goto out;
+
+			return 0;
+		}
+		offset +=3D next;
+	} while (next > 0 && offset < ea_buf_size &&
+		 sizeof(struct ea_attr) < (ea_buf_size - offset));
+
+out:
+	return -ENOENT;
+}
+
+/*
+ * Return the existing EA
+ *
+ * The EA_INFORMATION is not examined and the consistency of the
+ * existing EA is not checked.
+ *
+ * If successful, the full attribute is returned unchanged
+ * and its size is returned.
+ * If the designated buffer is too small, the needed size is
+ * returned, and the buffer is left unchanged.
+ * If there is an error, a negative value is returned and errno
+ * is set according to the error.
+ */
+static int ntfs_get_ea(struct inode *inode, const char *name, size_t name_=
len,
+		void *buffer, size_t size)
+{
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+	const struct ea_attr *p_ea;
+	char *ea_buf;
+	s64 ea_off, ea_size, all_ea_size, ea_info_size;
+	int err;
+	unsigned short int ea_value_len, ea_info_qlen;
+	struct ea_information *p_ea_info;
+
+	if (!NInoHasEA(ni))
+		return -ENODATA;
+
+	p_ea_info =3D ntfs_attr_readall(ni, AT_EA_INFORMATION, NULL, 0,
+			&ea_info_size);
+	if (!p_ea_info || ea_info_size !=3D sizeof(struct ea_information)) {
+		ntfs_free(p_ea_info);
+		return -ENODATA;
+	}
+
+	ea_info_qlen =3D le16_to_cpu(p_ea_info->ea_query_length);
+	ntfs_free(p_ea_info);
+
+	ea_buf =3D ntfs_attr_readall(ni, AT_EA, NULL, 0, &all_ea_size);
+	if (!ea_buf)
+		return -ENODATA;
+
+	err =3D ntfs_ea_lookup(ea_buf, ea_info_qlen, name, name_len, &ea_off,
+			&ea_size);
+	if (!err) {
+		p_ea =3D (struct ea_attr *)&ea_buf[ea_off];
+		ea_value_len =3D le16_to_cpu(p_ea->ea_value_length);
+		if (!buffer) {
+			ntfs_free(ea_buf);
+			return ea_value_len;
+		}
+
+		if (ea_value_len > size) {
+			err =3D -ERANGE;
+			goto free_ea_buf;
+		}
+
+		memcpy(buffer, &p_ea->ea_name[p_ea->ea_name_length + 1],
+				ea_value_len);
+		ntfs_free(ea_buf);
+		return ea_value_len;
+	}
+
+	err =3D -ENODATA;
+free_ea_buf:
+	ntfs_free(ea_buf);
+	return err;
+}
+
+static inline int ea_packed_size(const struct ea_attr *p_ea)
+{
+	/*
+	 * 4 bytes for header (flags and lengths) + name length + 1 +
+	 * value length.
+	 */
+	return 5 + p_ea->ea_name_length + le16_to_cpu(p_ea->ea_value_length);
+}
+
+/*
+ * Set a new EA, and set EA_INFORMATION accordingly
+ *
+ * This is roughly the same as ZwSetEaFile() on Windows, however
+ * the "offset to next" of the last EA should not be cleared.
+ *
+ * Consistency of the new EA is first checked.
+ *
+ * EA_INFORMATION is set first, and it is restored to its former
+ * state if setting EA fails.
+ */
+static int ntfs_set_ea(struct inode *inode, const char *name, size_t name_=
len,
+		const void *value, size_t val_size, int flags,
+		__le16 *packed_ea_size)
+{
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+	struct ea_information *p_ea_info =3D NULL;
+	int ea_packed, err =3D 0;
+	struct ea_attr *p_ea;
+	unsigned short int ea_info_qsize;
+	char *ea_buf =3D NULL;
+	size_t new_ea_size =3D ALIGN(struct_size(p_ea, ea_name, 1 + name_len + va=
l_size), 4);
+	s64 ea_off, ea_info_size, all_ea_size, ea_size;
+
+	if (name_len > 255)
+		return -ENAMETOOLONG;
+
+	if (ntfs_attr_exist(ni, AT_EA_INFORMATION, AT_UNNAMED, 0)) {
+		p_ea_info =3D ntfs_attr_readall(ni, AT_EA_INFORMATION, NULL, 0,
+						&ea_info_size);
+		if (!p_ea_info || ea_info_size !=3D sizeof(struct ea_information))
+			goto out;
+
+		ea_buf =3D ntfs_attr_readall(ni, AT_EA, NULL, 0, &all_ea_size);
+		if (!ea_buf) {
+			ea_info_qsize =3D 0;
+			ntfs_free(p_ea_info);
+			goto create_ea_info;
+		}
+
+		ea_info_qsize =3D le32_to_cpu(p_ea_info->ea_query_length);
+	} else {
+create_ea_info:
+		p_ea_info =3D ntfs_malloc_nofs(sizeof(struct ea_information));
+		if (!p_ea_info)
+			return -ENOMEM;
+
+		ea_info_qsize =3D 0;
+		err =3D ntfs_attr_add(ni, AT_EA_INFORMATION, AT_UNNAMED, 0,
+				(char *)p_ea_info, sizeof(struct ea_information));
+		if (err)
+			goto out;
+
+		if (ntfs_attr_exist(ni, AT_EA, AT_UNNAMED, 0)) {
+			err =3D ntfs_attr_remove(ni, AT_EA, AT_UNNAMED, 0);
+			if (err)
+				goto out;
+		}
+
+		goto alloc_new_ea;
+	}
+
+	if (ea_info_qsize > all_ea_size) {
+		err =3D -EIO;
+		goto out;
+	}
+
+	err =3D ntfs_ea_lookup(ea_buf, ea_info_qsize, name, name_len, &ea_off,
+			&ea_size);
+	if (ea_info_qsize && !err) {
+		if (flags & XATTR_CREATE) {
+			err =3D -EEXIST;
+			goto out;
+		}
+
+		p_ea =3D (struct ea_attr *)(ea_buf + ea_off);
+
+		if (val_size &&
+		    le16_to_cpu(p_ea->ea_value_length) =3D=3D val_size &&
+		    !memcmp(p_ea->ea_name + p_ea->ea_name_length + 1, value,
+			    val_size))
+			goto out;
+
+		le16_add_cpu(&p_ea_info->ea_length, 0 - ea_packed_size(p_ea));
+
+		if (p_ea->flags & NEED_EA)
+			le16_add_cpu(&p_ea_info->need_ea_count, -1);
+
+		memmove((char *)p_ea, (char *)p_ea + ea_size, ea_info_qsize - (ea_off + =
ea_size));
+		ea_info_qsize -=3D ea_size;
+		p_ea_info->ea_query_length =3D cpu_to_le16(ea_info_qsize);
+
+		err =3D ntfs_write_ea(ni, AT_EA_INFORMATION, (char *)p_ea_info, 0,
+				sizeof(struct ea_information), false);
+		if (err)
+			goto out;
+
+		err =3D ntfs_write_ea(ni, AT_EA, ea_buf, 0, ea_info_qsize, true);
+		if (err)
+			goto out;
+
+		if ((flags & XATTR_REPLACE) && !val_size) {
+			/* Remove xattr. */
+			goto out;
+		}
+	} else {
+		if (flags & XATTR_REPLACE) {
+			err =3D -ENODATA;
+			goto out;
+		}
+	}
+	ntfs_free(ea_buf);
+
+alloc_new_ea:
+	ea_buf =3D kzalloc(new_ea_size, GFP_NOFS);
+	if (!ea_buf) {
+		err =3D -ENOMEM;
+		goto out;
+	}
+
+	/*
+	 * EA and REPARSE_POINT compatibility not checked any more,
+	 * required by Windows 10, but having both may lead to
+	 * problems with earlier versions.
+	 */
+	p_ea =3D (struct ea_attr *)ea_buf;
+	memcpy(p_ea->ea_name, name, name_len);
+	p_ea->ea_name_length =3D name_len;
+	p_ea->ea_name[name_len] =3D 0;
+	memcpy(p_ea->ea_name + name_len + 1, value, val_size);
+	p_ea->ea_value_length =3D cpu_to_le16(val_size);
+	p_ea->next_entry_offset =3D cpu_to_le32(new_ea_size);
+
+	ea_packed =3D le16_to_cpu(p_ea_info->ea_length) + ea_packed_size(p_ea);
+	p_ea_info->ea_length =3D cpu_to_le16(ea_packed);
+	p_ea_info->ea_query_length =3D cpu_to_le32(ea_info_qsize + new_ea_size);
+
+	if (ea_packed > 0xffff ||
+	    ntfs_attr_size_bounds_check(ni->vol, AT_EA, new_ea_size)) {
+		err =3D -EFBIG;
+		goto out;
+	}
+
+	/*
+	 * no EA or EA_INFORMATION : add them
+	 */
+	if (!ntfs_attr_exist(ni, AT_EA, AT_UNNAMED, 0)) {
+		err =3D ntfs_attr_add(ni, AT_EA, AT_UNNAMED, 0, (char *)p_ea,
+				new_ea_size);
+		if (err)
+			goto out;
+	} else {
+		err =3D ntfs_write_ea(ni, AT_EA, (char *)p_ea, ea_info_qsize,
+				new_ea_size, false);
+		if (err)
+			goto out;
+	}
+
+	err =3D ntfs_write_ea(ni, AT_EA_INFORMATION, (char *)p_ea_info, 0,
+			sizeof(struct ea_information), false);
+	if (err)
+		goto out;
+
+	if (packed_ea_size)
+		*packed_ea_size =3D p_ea_info->ea_length;
+	mark_mft_record_dirty(ni);
+out:
+	if (ea_info_qsize > 0)
+		NInoSetHasEA(ni);
+	else
+		NInoClearHasEA(ni);
+
+	ntfs_free(ea_buf);
+	ntfs_free(p_ea_info);
+
+	return err;
+}
+
+/*
+ * Check for the presence of an EA "$LXDEV" (used by WSL)
+ * and return its value as a device address
+ */
+int ntfs_ea_get_wsl_inode(struct inode *inode, dev_t *rdevp, unsigned int =
flags)
+{
+	int err;
+	__le32 v;
+
+	if (!(flags & NTFS_VOL_UID)) {
+		/* Load uid to lxuid EA */
+		err =3D ntfs_get_ea(inode, "$LXUID", sizeof("$LXUID") - 1, &v,
+				sizeof(v));
+		if (err < 0)
+			return err;
+		i_uid_write(inode, le32_to_cpu(v));
+	}
+
+	if (!(flags & NTFS_VOL_UID)) {
+		/* Load gid to lxgid EA */
+		err =3D ntfs_get_ea(inode, "$LXGID", sizeof("$LXGID") - 1, &v,
+				sizeof(v));
+		if (err < 0)
+			return err;
+		i_gid_write(inode, le32_to_cpu(v));
+	}
+
+	/* Load mode to lxmod EA */
+	err =3D ntfs_get_ea(inode, "$LXMOD", sizeof("$LXMOD") - 1, &v, sizeof(v));
+	if (err > 0) {
+		inode->i_mode =3D le32_to_cpu(v);
+	} else {
+		/* Everyone gets all permissions. */
+		inode->i_mode |=3D 0777;
+	}
+
+	/* Load mode to lxdev EA */
+	err =3D ntfs_get_ea(inode, "$LXDEV", sizeof("$LXDEV") - 1, &v, sizeof(v));
+	if (err > 0)
+		*rdevp =3D le32_to_cpu(v);
+	err =3D 0;
+
+	return err;
+}
+
+int ntfs_ea_set_wsl_inode(struct inode *inode, dev_t rdev, __le16 *ea_size,
+		unsigned int flags)
+{
+	__le32 v;
+	int err;
+
+	if (flags & NTFS_EA_UID) {
+		/* Store uid to lxuid EA */
+		v =3D cpu_to_le32(i_uid_read(inode));
+		err =3D ntfs_set_ea(inode, "$LXUID", sizeof("$LXUID") - 1, &v,
+				sizeof(v), 0, ea_size);
+		if (err)
+			return err;
+	}
+
+	if (flags & NTFS_EA_GID) {
+		/* Store gid to lxgid EA */
+		v =3D cpu_to_le32(i_gid_read(inode));
+		err =3D ntfs_set_ea(inode, "$LXGID", sizeof("$LXGID") - 1, &v,
+				sizeof(v), 0, ea_size);
+		if (err)
+			return err;
+	}
+
+	if (flags & NTFS_EA_MODE) {
+		/* Store mode to lxmod EA */
+		v =3D cpu_to_le32(inode->i_mode);
+		err =3D ntfs_set_ea(inode, "$LXMOD", sizeof("$LXMOD") - 1, &v,
+				sizeof(v), 0, ea_size);
+		if (err)
+			return err;
+	}
+
+	if (rdev) {
+		v =3D cpu_to_le32(rdev);
+		err =3D ntfs_set_ea(inode, "$LXDEV", sizeof("$LXDEV") - 1, &v, sizeof(v),
+				0, ea_size);
+	}
+
+	return err;
+}
+
+ssize_t ntfsp_listxattr(struct dentry *dentry, char *buffer, size_t size)
+{
+	struct inode *inode =3D d_inode(dentry);
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+	const struct ea_attr *p_ea;
+	s64 offset, ea_buf_size, ea_info_size;
+	int next, err =3D 0, ea_size;
+	unsigned int ea_info_qsize;
+	char *ea_buf =3D NULL;
+	ssize_t ret =3D 0;
+	struct ea_information *ea_info;
+
+	if (!NInoHasEA(ni))
+		return 0;
+
+	mutex_lock(&NTFS_I(inode)->mrec_lock);
+	ea_info =3D ntfs_attr_readall(ni, AT_EA_INFORMATION, NULL, 0,
+			&ea_info_size);
+	if (!ea_info || ea_info_size !=3D sizeof(struct ea_information))
+		goto out;
+
+	ea_info_qsize =3D le16_to_cpu(ea_info->ea_query_length);
+
+	ea_buf =3D ntfs_attr_readall(ni, AT_EA, NULL, 0, &ea_buf_size);
+	if (!ea_buf)
+		goto out;
+
+	if (ea_info_qsize > ea_buf_size)
+		goto out;
+
+	if (ea_buf_size < sizeof(struct ea_attr))
+		goto out;
+
+	offset =3D 0;
+	do {
+		p_ea =3D (const struct ea_attr *)&ea_buf[offset];
+		next =3D le32_to_cpu(p_ea->next_entry_offset);
+		if (next)
+			ea_size =3D next;
+		else
+			ea_size =3D ALIGN(struct_size(p_ea, ea_name,
+						1 + p_ea->ea_name_length +
+						le16_to_cpu(p_ea->ea_value_length)),
+					4);
+		if (buffer) {
+			if (offset + ea_size > ea_info_qsize)
+				break;
+
+			if (ret + p_ea->ea_name_length + 1 > size) {
+				err =3D -ERANGE;
+				goto out;
+			}
+
+			if (p_ea->ea_name_length + 1 > (ea_info_qsize - offset))
+				break;
+
+			memcpy(buffer + ret, p_ea->ea_name, p_ea->ea_name_length);
+			buffer[ret + p_ea->ea_name_length] =3D 0;
+		}
+
+		ret +=3D p_ea->ea_name_length + 1;
+		offset +=3D ea_size;
+	} while (next > 0 && offset < ea_info_qsize &&
+		 sizeof(struct ea_attr) < (ea_info_qsize - offset));
+
+out:
+	mutex_unlock(&NTFS_I(inode)->mrec_lock);
+	ntfs_free(ea_info);
+	ntfs_free(ea_buf);
+
+	return err ? err : ret;
+}
+
+// clang-format off
+#define SYSTEM_DOS_ATTRIB     "system.dos_attrib"
+#define SYSTEM_NTFS_ATTRIB    "system.ntfs_attrib"
+#define SYSTEM_NTFS_ATTRIB_BE "system.ntfs_attrib_be"
+// clang-format on
+
+static int ntfs_getxattr(const struct xattr_handler *handler,
+		struct dentry *unused, struct inode *inode, const char *name,
+		void *buffer, size_t size)
+{
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+	int err;
+
+	if (NVolShutdown(ni->vol))
+		return -EIO;
+
+	if (!strcmp(name, SYSTEM_DOS_ATTRIB)) {
+		if (!buffer) {
+			err =3D sizeof(u8);
+		} else if (size < sizeof(u8)) {
+			err =3D -ENODATA;
+		} else {
+			err =3D sizeof(u8);
+			*(u8 *)buffer =3D ni->flags;
+		}
+		goto out;
+	}
+
+	if (!strcmp(name, SYSTEM_NTFS_ATTRIB) ||
+	    !strcmp(name, SYSTEM_NTFS_ATTRIB_BE)) {
+		if (!buffer) {
+			err =3D sizeof(u32);
+		} else if (size < sizeof(u32)) {
+			err =3D -ENODATA;
+		} else {
+			err =3D sizeof(u32);
+			*(u32 *)buffer =3D le32_to_cpu(ni->flags);
+			if (!strcmp(name, SYSTEM_NTFS_ATTRIB_BE))
+				*(__be32 *)buffer =3D cpu_to_be32(*(u32 *)buffer);
+		}
+		goto out;
+	}
+
+	mutex_lock(&ni->mrec_lock);
+	err =3D ntfs_get_ea(inode, name, strlen(name), buffer, size);
+	mutex_unlock(&ni->mrec_lock);
+
+out:
+	return err;
+}
+
+static int ntfs_new_attr_flags(struct ntfs_inode *ni, __le32 fattr)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	struct mft_record *m;
+	struct attr_record *a;
+	__le16 new_aflags;
+	int mp_size, mp_ofs, name_ofs, arec_size, err;
+
+	m =3D map_mft_record(ni);
+	if (IS_ERR(m))
+		return PTR_ERR(m);
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, m);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+	err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (err) {
+		err =3D -EINVAL;
+		goto err_out;
+	}
+
+	a =3D ctx->attr;
+	new_aflags =3D ctx->attr->flags;
+
+	if (fattr & FILE_ATTR_SPARSE_FILE)
+		new_aflags |=3D ATTR_IS_SPARSE;
+	else
+		new_aflags &=3D ~ATTR_IS_SPARSE;
+
+	if (fattr & FILE_ATTR_COMPRESSED)
+		new_aflags |=3D ATTR_IS_COMPRESSED;
+	else
+		new_aflags &=3D ~ATTR_IS_COMPRESSED;
+
+	if (new_aflags =3D=3D a->flags)
+		return 0;
+
+	if ((new_aflags & (ATTR_IS_SPARSE | ATTR_IS_COMPRESSED)) =3D=3D
+			  (ATTR_IS_SPARSE | ATTR_IS_COMPRESSED)) {
+		pr_err("file can't be sparsed and compressed\n");
+		err =3D -EOPNOTSUPP;
+		goto err_out;
+	}
+
+	if (!a->non_resident)
+		goto out;
+
+	if (a->data.non_resident.data_size) {
+		pr_err("Can't change sparsed/compressed for non-empty file");
+		err =3D -EOPNOTSUPP;
+		goto err_out;
+	}
+
+	if (new_aflags & (ATTR_IS_SPARSE | ATTR_IS_COMPRESSED))
+		name_ofs =3D (offsetof(struct attr_record,
+				     data.non_resident.compressed_size) +
+					sizeof(a->data.non_resident.compressed_size) + 7) & ~7;
+	else
+		name_ofs =3D (offsetof(struct attr_record,
+				     data.non_resident.compressed_size) + 7) & ~7;
+
+	mp_size =3D ntfs_get_size_for_mapping_pairs(ni->vol, ni->runlist.rl, 0, -=
1, -1);
+	if (unlikely(mp_size < 0)) {
+		err =3D mp_size;
+		ntfs_debug("Failed to get size for mapping pairs array, error code %i.\n=
", err);
+		goto err_out;
+	}
+
+	mp_ofs =3D (name_ofs + a->name_length * sizeof(__le16) + 7) & ~7;
+	arec_size =3D (mp_ofs + mp_size + 7) & ~7;
+
+	err =3D ntfs_attr_record_resize(m, a, arec_size);
+	if (unlikely(err))
+		goto err_out;
+
+	if (new_aflags & (ATTR_IS_SPARSE | ATTR_IS_COMPRESSED)) {
+		a->data.non_resident.compression_unit =3D 0;
+		if (new_aflags & ATTR_IS_COMPRESSED || ni->vol->major_ver < 3)
+			a->data.non_resident.compression_unit =3D 4;
+		a->data.non_resident.compressed_size =3D 0;
+		ni->itype.compressed.size =3D 0;
+		if (a->data.non_resident.compression_unit) {
+			ni->itype.compressed.block_size =3D 1U <<
+				(a->data.non_resident.compression_unit +
+				 ni->vol->cluster_size_bits);
+			ni->itype.compressed.block_size_bits =3D
+					ffs(ni->itype.compressed.block_size) -
+					1;
+			ni->itype.compressed.block_clusters =3D 1U <<
+					a->data.non_resident.compression_unit;
+		} else {
+			ni->itype.compressed.block_size =3D 0;
+			ni->itype.compressed.block_size_bits =3D 0;
+			ni->itype.compressed.block_clusters =3D 0;
+		}
+
+		if (new_aflags & ATTR_IS_SPARSE) {
+			NInoSetSparse(ni);
+			ni->flags |=3D FILE_ATTR_SPARSE_FILE;
+		}
+
+		if (new_aflags & ATTR_IS_COMPRESSED) {
+			NInoSetCompressed(ni);
+			ni->flags |=3D FILE_ATTR_COMPRESSED;
+			VFS_I(ni)->i_mapping->a_ops =3D &ntfs_compressed_aops;
+		}
+	} else {
+		ni->flags &=3D ~(FILE_ATTR_SPARSE_FILE | FILE_ATTR_COMPRESSED);
+		a->data.non_resident.compression_unit =3D 0;
+		VFS_I(ni)->i_mapping->a_ops =3D &ntfs_normal_aops;
+		NInoClearSparse(ni);
+		NInoClearCompressed(ni);
+	}
+
+	a->name_offset =3D cpu_to_le16(name_ofs);
+	a->data.non_resident.mapping_pairs_offset =3D cpu_to_le16(mp_ofs);
+
+out:
+	a->flags =3D new_aflags;
+	mark_mft_record_dirty(ctx->ntfs_ino);
+err_out:
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(ni);
+	return err;
+}
+
+static int ntfs_setxattr(const struct xattr_handler *handler,
+		struct mnt_idmap *idmap, struct dentry *unused,
+		struct inode *inode, const char *name, const void *value,
+		size_t size, int flags)
+{
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+	int err;
+	__le32 fattr;
+
+	if (NVolShutdown(ni->vol))
+		return -EIO;
+
+	if (!strcmp(name, SYSTEM_DOS_ATTRIB)) {
+		if (sizeof(u8) !=3D size)
+			goto out;
+		fattr =3D cpu_to_le32(*(u8 *)value);
+		goto set_fattr;
+	}
+
+	if (!strcmp(name, SYSTEM_NTFS_ATTRIB) ||
+	    !strcmp(name, SYSTEM_NTFS_ATTRIB_BE)) {
+		if (size !=3D sizeof(u32))
+			goto out;
+		if (!strcmp(name, SYSTEM_NTFS_ATTRIB_BE))
+			fattr =3D cpu_to_le32(be32_to_cpu(*(__be32 *)value));
+		else
+			fattr =3D cpu_to_le32(*(u32 *)value);
+
+		if (S_ISREG(inode->i_mode)) {
+			mutex_lock(&ni->mrec_lock);
+			err =3D ntfs_new_attr_flags(ni, fattr);
+			mutex_unlock(&ni->mrec_lock);
+			if (err)
+				goto out;
+		}
+
+set_fattr:
+		if (S_ISDIR(inode->i_mode))
+			fattr |=3D FILE_ATTR_DIRECTORY;
+		else
+			fattr &=3D ~FILE_ATTR_DIRECTORY;
+
+		if (ni->flags !=3D fattr) {
+			ni->flags =3D fattr;
+			if (fattr & FILE_ATTR_READONLY)
+				inode->i_mode &=3D ~0222;
+			else
+				inode->i_mode |=3D 0222;
+			NInoSetFileNameDirty(ni);
+			mark_inode_dirty(inode);
+		}
+		err =3D 0;
+		goto out;
+	}
+
+	mutex_lock(&ni->mrec_lock);
+	err =3D ntfs_set_ea(inode, name, strlen(name), value, size, flags, NULL);
+	mutex_unlock(&ni->mrec_lock);
+
+out:
+	inode_set_ctime_current(inode);
+	mark_inode_dirty(inode);
+	return err;
+}
+
+static bool ntfs_xattr_user_list(struct dentry *dentry)
+{
+	return true;
+}
+
+// clang-format off
+static const struct xattr_handler ntfs_other_xattr_handler =3D {
+	.prefix	=3D "",
+	.get	=3D ntfs_getxattr,
+	.set	=3D ntfs_setxattr,
+	.list	=3D ntfs_xattr_user_list,
+};
+
+const struct xattr_handler * const ntfsp_xattr_handlers[] =3D {
+	&ntfs_other_xattr_handler,
+	NULL,
+};
+// clang-format on
+
+#ifdef CONFIG_NTFSPLUS_FS_POSIX_ACL
+struct posix_acl *ntfsp_get_acl(struct mnt_idmap *idmap, struct dentry *de=
ntry,
+			       int type)
+{
+	struct inode *inode =3D d_inode(dentry);
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+	const char *name;
+	size_t name_len;
+	struct posix_acl *acl;
+	int err;
+	void *buf;
+
+	/* Allocate PATH_MAX bytes. */
+	buf =3D __getname();
+	if (!buf)
+		return ERR_PTR(-ENOMEM);
+
+	/* Possible values of 'type' was already checked above. */
+	if (type =3D=3D ACL_TYPE_ACCESS) {
+		name =3D XATTR_NAME_POSIX_ACL_ACCESS;
+		name_len =3D sizeof(XATTR_NAME_POSIX_ACL_ACCESS) - 1;
+	} else {
+		name =3D XATTR_NAME_POSIX_ACL_DEFAULT;
+		name_len =3D sizeof(XATTR_NAME_POSIX_ACL_DEFAULT) - 1;
+	}
+
+	mutex_lock(&ni->mrec_lock);
+	err =3D ntfs_get_ea(inode, name, name_len, buf, PATH_MAX);
+	mutex_unlock(&ni->mrec_lock);
+
+	/* Translate extended attribute to acl. */
+	if (err >=3D 0)
+		acl =3D posix_acl_from_xattr(&init_user_ns, buf, err);
+	else if (err =3D=3D -ENODATA)
+		acl =3D NULL;
+	else
+		acl =3D ERR_PTR(err);
+
+	if (!IS_ERR(acl))
+		set_cached_acl(inode, type, acl);
+
+	__putname(buf);
+
+	return acl;
+}
+
+static noinline int ntfs_set_acl_ex(struct mnt_idmap *idmap,
+				    struct inode *inode, struct posix_acl *acl,
+				    int type, bool init_acl)
+{
+	const char *name;
+	size_t size, name_len;
+	void *value;
+	int err;
+	int flags;
+	umode_t mode;
+
+	if (S_ISLNK(inode->i_mode))
+		return -EOPNOTSUPP;
+
+	mode =3D inode->i_mode;
+	switch (type) {
+	case ACL_TYPE_ACCESS:
+		/* Do not change i_mode if we are in init_acl */
+		if (acl && !init_acl) {
+			err =3D posix_acl_update_mode(idmap, inode, &mode, &acl);
+			if (err)
+				return err;
+		}
+		name =3D XATTR_NAME_POSIX_ACL_ACCESS;
+		name_len =3D sizeof(XATTR_NAME_POSIX_ACL_ACCESS) - 1;
+		break;
+
+	case ACL_TYPE_DEFAULT:
+		if (!S_ISDIR(inode->i_mode))
+			return acl ? -EACCES : 0;
+		name =3D XATTR_NAME_POSIX_ACL_DEFAULT;
+		name_len =3D sizeof(XATTR_NAME_POSIX_ACL_DEFAULT) - 1;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	if (!acl) {
+		/* Remove xattr if it can be presented via mode. */
+		size =3D 0;
+		value =3D NULL;
+		flags =3D XATTR_REPLACE;
+	} else {
+		size =3D posix_acl_xattr_size(acl->a_count);
+		value =3D kmalloc(size, GFP_NOFS);
+		if (!value)
+			return -ENOMEM;
+		err =3D posix_acl_to_xattr(&init_user_ns, acl, value, size);
+		if (err < 0)
+			goto out;
+		flags =3D 0;
+	}
+
+	mutex_lock(&NTFS_I(inode)->mrec_lock);
+	err =3D ntfs_set_ea(inode, name, name_len, value, size, flags, NULL);
+	mutex_unlock(&NTFS_I(inode)->mrec_lock);
+	if (err =3D=3D -ENODATA && !size)
+		err =3D 0; /* Removing non existed xattr. */
+	if (!err) {
+		set_cached_acl(inode, type, acl);
+		inode->i_mode =3D mode;
+		inode_set_ctime_current(inode);
+		mark_inode_dirty(inode);
+	}
+
+out:
+	kfree(value);
+
+	return err;
+}
+
+int ntfsp_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
+		 struct posix_acl *acl, int type)
+{
+	return ntfs_set_acl_ex(idmap, d_inode(dentry), acl, type, false);
+}
+
+int ntfsp_init_acl(struct mnt_idmap *idmap, struct inode *inode,
+		  struct inode *dir)
+{
+	struct posix_acl *default_acl, *acl;
+	int err;
+
+	err =3D posix_acl_create(dir, &inode->i_mode, &default_acl, &acl);
+	if (err)
+		return err;
+
+	if (default_acl) {
+		err =3D ntfs_set_acl_ex(idmap, inode, default_acl,
+				      ACL_TYPE_DEFAULT, true);
+		posix_acl_release(default_acl);
+	} else {
+		inode->i_default_acl =3D NULL;
+	}
+
+	if (acl) {
+		if (!err)
+			err =3D ntfs_set_acl_ex(idmap, inode, acl,
+					      ACL_TYPE_ACCESS, true);
+		posix_acl_release(acl);
+	} else {
+		inode->i_acl =3D NULL;
+	}
+
+	return err;
+}
+#endif
diff --git a/fs/ntfsplus/reparse.c b/fs/ntfsplus/reparse.c
new file mode 100644
index 000000000000..ff46ef07178a
--- /dev/null
+++ b/fs/ntfsplus/reparse.c
@@ -0,0 +1,550 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * Processing of reparse points
+ *
+ * Part of this file is based on code from the NTFS-3G project.
+ *
+ * Copyright (c) 2008-2021 Jean-Pierre Andre
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include "ntfs.h"
+#include "layout.h"
+#include "attrib.h"
+#include "inode.h"
+#include "dir.h"
+#include "volume.h"
+#include "mft.h"
+#include "index.h"
+#include "lcnalloc.h"
+#include "reparse.h"
+#include "misc.h"
+
+struct WSL_LINK_REPARSE_DATA {
+	__le32	type;
+	char	link[];
+};
+
+struct REPARSE_INDEX {			/* index entry in $Extend/$Reparse */
+	struct index_entry_header header;
+	struct reparse_index_key key;
+	__le32 filling;
+};
+
+__le16 reparse_index_name[] =3D { cpu_to_le16('$'),
+				  cpu_to_le16('R') };
+
+/*
+ * Do some sanity checks on reparse data
+ *
+ * Microsoft reparse points have an 8-byte header whereas
+ * non-Microsoft reparse points have a 24-byte header.  In each case,
+ * 'reparse_data_length' must equal the number of non-header bytes.
+ *
+ * If the reparse data looks like a junction point or symbolic
+ * link, more checks can be done.
+ */
+static bool valid_reparse_data(struct ntfs_inode *ni,
+		const struct reparse_point *reparse_attr, size_t size)
+{
+	bool ok;
+	const struct WSL_LINK_REPARSE_DATA *wsl_reparse_data;
+
+	ok =3D ni && reparse_attr && (size >=3D sizeof(struct reparse_point)) &&
+		(reparse_attr->reparse_tag !=3D IO_REPARSE_TAG_RESERVED_ZERO) &&
+		(((size_t)le16_to_cpu(reparse_attr->reparse_data_length) +
+		  sizeof(struct reparse_point) +
+		  ((reparse_attr->reparse_tag & IO_REPARSE_TAG_IS_MICROSOFT) ?
+		   0 : sizeof(struct guid))) =3D=3D size);
+	if (ok) {
+		switch (reparse_attr->reparse_tag) {
+		case IO_REPARSE_TAG_LX_SYMLINK:
+			wsl_reparse_data =3D (const struct WSL_LINK_REPARSE_DATA *)
+						reparse_attr->reparse_data;
+			if ((le16_to_cpu(reparse_attr->reparse_data_length) <=3D
+			     sizeof(wsl_reparse_data->type)) ||
+			    (wsl_reparse_data->type !=3D cpu_to_le32(2)))
+				ok =3D false;
+			break;
+		case IO_REPARSE_TAG_AF_UNIX:
+		case IO_REPARSE_TAG_LX_FIFO:
+		case IO_REPARSE_TAG_LX_CHR:
+		case IO_REPARSE_TAG_LX_BLK:
+			if (reparse_attr->reparse_data_length ||
+			    !(ni->flags & FILE_ATTRIBUTE_RECALL_ON_OPEN))
+				ok =3D false;
+			break;
+		default:
+			break;
+		}
+	}
+	return ok;
+}
+
+static unsigned int ntfs_reparse_tag_mode(struct reparse_point *reparse_at=
tr)
+{
+	unsigned int mode =3D 0;
+
+	switch (reparse_attr->reparse_tag) {
+	case IO_REPARSE_TAG_SYMLINK:
+	case IO_REPARSE_TAG_LX_SYMLINK:
+		mode =3D S_IFLNK;
+		break;
+	case IO_REPARSE_TAG_AF_UNIX:
+		mode =3D S_IFSOCK;
+		break;
+	case IO_REPARSE_TAG_LX_FIFO:
+		mode =3D S_IFIFO;
+		break;
+	case IO_REPARSE_TAG_LX_CHR:
+		mode =3D S_IFCHR;
+		break;
+	case IO_REPARSE_TAG_LX_BLK:
+		mode =3D S_IFBLK;
+	}
+
+	return mode;
+}
+
+/*
+ * Get the target for symbolic link
+ */
+unsigned int ntfs_make_symlink(struct ntfs_inode *ni)
+{
+	s64 attr_size =3D 0;
+	unsigned int lth;
+	struct reparse_point *reparse_attr;
+	struct WSL_LINK_REPARSE_DATA *wsl_link_data;
+	unsigned int mode =3D 0;
+
+	reparse_attr =3D ntfs_attr_readall(ni, AT_REPARSE_POINT, NULL, 0,
+					 &attr_size);
+	if (reparse_attr && attr_size &&
+	    valid_reparse_data(ni, reparse_attr, attr_size)) {
+		switch (reparse_attr->reparse_tag) {
+		case IO_REPARSE_TAG_LX_SYMLINK:
+			wsl_link_data =3D (struct WSL_LINK_REPARSE_DATA *)reparse_attr->reparse=
_data;
+			if (wsl_link_data->type =3D=3D cpu_to_le32(2)) {
+				lth =3D le16_to_cpu(reparse_attr->reparse_data_length) -
+						  sizeof(wsl_link_data->type);
+				ni->target =3D ntfs_malloc_nofs(lth + 1);
+				if (ni->target) {
+					memcpy(ni->target, wsl_link_data->link, lth);
+					ni->target[lth] =3D 0;
+					mode =3D ntfs_reparse_tag_mode(reparse_attr);
+				}
+			}
+			break;
+		default:
+			mode =3D ntfs_reparse_tag_mode(reparse_attr);
+		}
+	} else
+		ni->flags &=3D ~FILE_ATTR_REPARSE_POINT;
+
+	if (reparse_attr)
+		ntfs_free(reparse_attr);
+
+	return mode;
+}
+
+unsigned int ntfs_reparse_tag_dt_types(struct ntfs_volume *vol, unsigned l=
ong mref)
+{
+	s64 attr_size =3D 0;
+	struct reparse_point *reparse_attr;
+	unsigned int dt_type =3D DT_UNKNOWN;
+	struct inode *vi;
+
+	vi =3D ntfs_iget(vol->sb, mref);
+	if (IS_ERR(vi))
+		return PTR_ERR(vi);
+
+	reparse_attr =3D (struct reparse_point *)ntfs_attr_readall(NTFS_I(vi),
+			AT_REPARSE_POINT, NULL, 0, &attr_size);
+
+	if (reparse_attr && attr_size) {
+		switch (reparse_attr->reparse_tag) {
+		case IO_REPARSE_TAG_SYMLINK:
+		case IO_REPARSE_TAG_LX_SYMLINK:
+			dt_type =3D DT_LNK;
+			break;
+		case IO_REPARSE_TAG_AF_UNIX:
+			dt_type =3D DT_SOCK;
+			break;
+		case IO_REPARSE_TAG_LX_FIFO:
+			dt_type =3D DT_FIFO;
+			break;
+		case IO_REPARSE_TAG_LX_CHR:
+			dt_type =3D DT_CHR;
+			break;
+		case IO_REPARSE_TAG_LX_BLK:
+			dt_type =3D DT_BLK;
+		}
+	}
+
+	if (reparse_attr)
+		ntfs_free(reparse_attr);
+
+	iput(vi);
+	return dt_type;
+}
+
+/*
+ * Set the index for new reparse data
+ */
+static int set_reparse_index(struct ntfs_inode *ni, struct ntfs_index_cont=
ext *xr,
+		__le32 reparse_tag)
+{
+	struct REPARSE_INDEX indx;
+	u64 file_id_cpu;
+	__le64 file_id;
+
+	file_id_cpu =3D MK_MREF(ni->mft_no, ni->seq_no);
+	file_id =3D cpu_to_le64(file_id_cpu);
+	indx.header.data.vi.data_offset =3D
+		cpu_to_le16(sizeof(struct index_entry_header) + sizeof(struct reparse_in=
dex_key));
+	indx.header.data.vi.data_length =3D 0;
+	indx.header.data.vi.reservedV =3D 0;
+	indx.header.length =3D cpu_to_le16(sizeof(struct REPARSE_INDEX));
+	indx.header.key_length =3D cpu_to_le16(sizeof(struct reparse_index_key));
+	indx.header.flags =3D 0;
+	indx.header.reserved =3D 0;
+	indx.key.reparse_tag =3D reparse_tag;
+	/* danger on processors which require proper alignment! */
+	memcpy(&indx.key.file_id, &file_id, 8);
+	indx.filling =3D 0;
+	ntfs_index_ctx_reinit(xr);
+
+	return ntfs_ie_add(xr, (struct index_entry *)&indx);
+}
+
+/*
+ * Remove a reparse data index entry if attribute present
+ */
+static int remove_reparse_index(struct inode *rp, struct ntfs_index_contex=
t *xr,
+				__le32 *preparse_tag)
+{
+	struct reparse_index_key key;
+	u64 file_id_cpu;
+	__le64 file_id;
+	s64 size;
+	struct ntfs_inode *ni =3D NTFS_I(rp);
+	int err =3D 0, ret =3D ni->data_size;
+
+	if (ni->data_size =3D=3D 0)
+		return 0;
+
+	/* read the existing reparse_tag */
+	size =3D ntfs_inode_attr_pread(rp, 0, 4, (char *)preparse_tag);
+	if (size !=3D 4)
+		return -ENODATA;
+
+	file_id_cpu =3D MK_MREF(ni->mft_no, ni->seq_no);
+	file_id =3D cpu_to_le64(file_id_cpu);
+	key.reparse_tag =3D *preparse_tag;
+	/* danger on processors which require proper alignment! */
+	memcpy(&key.file_id, &file_id, 8);
+	if (!ntfs_index_lookup(&key, sizeof(struct reparse_index_key), xr)) {
+		err =3D ntfs_index_rm(xr);
+		if (err)
+			ret =3D err;
+	}
+	return ret;
+}
+
+/*
+ * Open the $Extend/$Reparse file and its index
+ */
+static struct ntfs_index_context *open_reparse_index(struct ntfs_volume *v=
ol)
+{
+	struct ntfs_index_context *xr =3D NULL;
+	u64 mref;
+	__le16 *uname;
+	struct ntfs_name *name =3D NULL;
+	int uname_len;
+	struct inode *vi, *dir_vi;
+
+	/* do not use path_name_to inode - could reopen root */
+	dir_vi =3D ntfs_iget(vol->sb, FILE_Extend);
+	if (IS_ERR(dir_vi))
+		return NULL;
+
+	uname_len =3D ntfs_nlstoucs(vol, "$Reparse", 8, &uname,
+				  NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		iput(dir_vi);
+		return NULL;
+	}
+
+	mutex_lock_nested(&NTFS_I(dir_vi)->mrec_lock, NTFS_REPARSE_MUTEX_PARENT);
+	mref =3D ntfs_lookup_inode_by_name(NTFS_I(dir_vi), uname, uname_len,
+					 &name);
+	mutex_unlock(&NTFS_I(dir_vi)->mrec_lock);
+	kfree(name);
+	kmem_cache_free(ntfs_name_cache, uname);
+	if (IS_ERR_MREF(mref))
+		goto put_dir_vi;
+
+	vi =3D ntfs_iget(vol->sb, MREF(mref));
+	if (IS_ERR(vi))
+		goto put_dir_vi;
+
+	xr =3D ntfs_index_ctx_get(NTFS_I(vi), reparse_index_name, 2);
+	if (!xr)
+		iput(vi);
+put_dir_vi:
+	iput(dir_vi);
+	return xr;
+}
+
+
+/*
+ * Update the reparse data and index
+ *
+ * The reparse data attribute should have been created, and
+ * an existing index is expected if there is an existing value.
+ *
+ */
+static int update_reparse_data(struct ntfs_inode *ni, struct ntfs_index_co=
ntext *xr,
+		char *value, size_t size)
+{
+	struct inode *rp_inode;
+	int err =3D 0;
+	s64 written;
+	int oldsize;
+	__le32 reparse_tag;
+	struct ntfs_inode *rp_ni;
+
+	rp_inode =3D ntfs_attr_iget(VFS_I(ni), AT_REPARSE_POINT, AT_UNNAMED, 0);
+	if (IS_ERR(rp_inode))
+		return -EINVAL;
+	rp_ni =3D NTFS_I(rp_inode);
+
+	/* remove the existing reparse data */
+	oldsize =3D remove_reparse_index(rp_inode, xr, &reparse_tag);
+	if (oldsize < 0) {
+		err =3D oldsize;
+		goto put_rp_inode;
+	}
+
+	/* overwrite value if any */
+	written =3D ntfs_inode_attr_pwrite(rp_inode, 0, size, value, false);
+	if (written !=3D size) {
+		ntfs_error(ni->vol->sb, "Failed to update reparse data\n");
+		err =3D -EIO;
+		goto put_rp_inode;
+	}
+
+	if (set_reparse_index(ni, xr, ((const struct reparse_point *)value)->repa=
rse_tag) &&
+	    oldsize > 0) {
+		/*
+		 * If cannot index, try to remove the reparse
+		 * data and log the error. There will be an
+		 * inconsistency if removal fails.
+		 */
+		ntfs_attr_rm(rp_ni);
+		ntfs_error(ni->vol->sb,
+			   "Failed to index reparse data. Possible corruption.\n");
+	}
+
+	mark_mft_record_dirty(ni);
+put_rp_inode:
+	iput(rp_inode);
+
+	return err;
+}
+
+/*
+ * Delete a reparse index entry
+ */
+int ntfs_delete_reparse_index(struct ntfs_inode *ni)
+{
+	struct inode *vi;
+	struct ntfs_index_context *xr;
+	struct ntfs_inode *xrni;
+	__le32 reparse_tag;
+	int err =3D 0;
+
+	if (!(ni->flags & FILE_ATTR_REPARSE_POINT))
+		return 0;
+
+	vi =3D ntfs_attr_iget(VFS_I(ni), AT_REPARSE_POINT, AT_UNNAMED, 0);
+	if (IS_ERR(vi))
+		return PTR_ERR(vi);
+
+	/*
+	 * read the existing reparse data (the tag is enough)
+	 * and un-index it
+	 */
+	xr =3D open_reparse_index(ni->vol);
+	if (xr) {
+		xrni =3D xr->idx_ni;
+		mutex_lock_nested(&xrni->mrec_lock, NTFS_REPARSE_MUTEX_PARENT);
+		err =3D remove_reparse_index(vi, xr, &reparse_tag);
+		if (err < 0) {
+			ntfs_index_ctx_put(xr);
+			mutex_unlock(&xrni->mrec_lock);
+			iput(VFS_I(xrni));
+			goto out;
+		}
+		mark_mft_record_dirty(xrni);
+		ntfs_index_ctx_put(xr);
+		mutex_unlock(&xrni->mrec_lock);
+		iput(VFS_I(xrni));
+	}
+
+	ni->flags &=3D ~FILE_ATTR_REPARSE_POINT;
+	NInoSetFileNameDirty(ni);
+	mark_mft_record_dirty(ni);
+
+out:
+	iput(vi);
+	return err;
+}
+
+/*
+ * Set the reparse data from an extended attribute
+ */
+static int ntfs_set_ntfs_reparse_data(struct ntfs_inode *ni, char *value, =
size_t size)
+{
+	int err =3D 0;
+	struct ntfs_inode *xrni;
+	struct ntfs_index_context *xr;
+
+	if (!ni)
+		return -EINVAL;
+
+	/*
+	 * reparse data compatibily with EA is not checked
+	 * any more, it is required by Windows 10, but may
+	 * lead to problems with earlier versions.
+	 */
+	if (valid_reparse_data(ni, (const struct reparse_point *)value, size) =3D=
=3D false)
+		return -EINVAL;
+
+	xr =3D open_reparse_index(ni->vol);
+	if (!xr)
+		return -EINVAL;
+	xrni =3D xr->idx_ni;
+
+	if (!ntfs_attr_exist(ni, AT_REPARSE_POINT, AT_UNNAMED, 0)) {
+		u8 dummy =3D 0;
+
+		/*
+		 * no reparse data attribute : add one,
+		 * apparently, this does not feed the new value in
+		 * Note : NTFS version must be >=3D 3
+		 */
+		if (ni->vol->major_ver < 3) {
+			err =3D -EOPNOTSUPP;
+			ntfs_index_ctx_put(xr);
+			goto out;
+		}
+
+		err =3D ntfs_attr_add(ni, AT_REPARSE_POINT, AT_UNNAMED, 0, &dummy, 0);
+		if (err) {
+			ntfs_index_ctx_put(xr);
+			goto out;
+		}
+		ni->flags |=3D FILE_ATTR_REPARSE_POINT;
+		NInoSetFileNameDirty(ni);
+		mark_mft_record_dirty(ni);
+	}
+
+	/* update value and index */
+	mutex_lock_nested(&xrni->mrec_lock, NTFS_REPARSE_MUTEX_PARENT);
+	err =3D update_reparse_data(ni, xr, value, size);
+	if (err) {
+		ni->flags &=3D ~FILE_ATTR_REPARSE_POINT;
+		NInoSetFileNameDirty(ni);
+		mark_mft_record_dirty(ni);
+	}
+	ntfs_index_ctx_put(xr);
+	mutex_unlock(&xrni->mrec_lock);
+
+out:
+	if (!err)
+		mark_mft_record_dirty(xrni);
+	iput(VFS_I(xrni));
+
+	return err;
+}
+
+/*
+ * Set reparse data for a WSL type symlink
+ */
+int ntfs_reparse_set_wsl_symlink(struct ntfs_inode *ni,
+		const __le16 *target, int target_len)
+{
+	int err =3D 0;
+	int len;
+	int reparse_len;
+	unsigned char *utarget =3D NULL;
+	struct reparse_point *reparse;
+	struct WSL_LINK_REPARSE_DATA *data;
+
+	utarget =3D (char *)NULL;
+	len =3D ntfs_ucstonls(ni->vol, target, target_len, &utarget, 0);
+	if (len <=3D 0)
+		return -EINVAL;
+
+	reparse_len =3D sizeof(struct reparse_point) + sizeof(data->type) + len;
+	reparse =3D (struct reparse_point *)ntfs_malloc_nofs(reparse_len);
+	if (!reparse) {
+		err =3D -ENOMEM;
+		ntfs_free(utarget);
+	} else {
+		data =3D (struct WSL_LINK_REPARSE_DATA *)reparse->reparse_data;
+		reparse->reparse_tag =3D IO_REPARSE_TAG_LX_SYMLINK;
+		reparse->reparse_data_length =3D
+			cpu_to_le16(sizeof(data->type) + len);
+		reparse->reserved =3D 0;
+		data->type =3D cpu_to_le32(2);
+		memcpy(data->link, utarget, len);
+		err =3D ntfs_set_ntfs_reparse_data(ni,
+				(char *)reparse, reparse_len);
+		ntfs_free(reparse);
+		if (!err)
+			ni->target =3D utarget;
+	}
+	return err;
+}
+
+/*
+ * Set reparse data for a WSL special file other than a symlink
+ * (socket, fifo, character or block device)
+ */
+int ntfs_reparse_set_wsl_not_symlink(struct ntfs_inode *ni, mode_t mode)
+{
+	int err;
+	int len;
+	int reparse_len;
+	__le32 reparse_tag;
+	struct reparse_point *reparse;
+
+	len =3D 0;
+	if (S_ISSOCK(mode))
+		reparse_tag =3D IO_REPARSE_TAG_AF_UNIX;
+	else if (S_ISFIFO(mode))
+		reparse_tag =3D IO_REPARSE_TAG_LX_FIFO;
+	else if (S_ISCHR(mode))
+		reparse_tag =3D IO_REPARSE_TAG_LX_CHR;
+	else if (S_ISBLK(mode))
+		reparse_tag =3D IO_REPARSE_TAG_LX_BLK;
+	else
+		return -EOPNOTSUPP;
+
+	reparse_len =3D sizeof(struct reparse_point) + len;
+	reparse =3D (struct reparse_point *)ntfs_malloc_nofs(reparse_len);
+	if (!reparse)
+		err =3D -ENOMEM;
+	else {
+		reparse->reparse_tag =3D reparse_tag;
+		reparse->reparse_data_length =3D cpu_to_le16(len);
+		reparse->reserved =3D cpu_to_le16(0);
+		err =3D ntfs_set_ntfs_reparse_data(ni, (char *)reparse,
+						 reparse_len);
+		ntfs_free(reparse);
+	}
+
+	return err;
+}
--=20
2.25.1
From nobody Sun Feb  8 02:39:38 2026
Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com
 [209.85.214.181])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE6D83112A1
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:01:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.181
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219691; cv=none;
 b=G6Lyd6bjigApRmTHbVDWPJ9w5fE7h2X6K2xE8XP6UXZqALObt1dTK4MvEPX5M/Hc51mtQr1pF+b55yE9dDwROJzBK7CUg36Wn5w8oQB4/i97HtMHLUwK8GcVnJrtzWc65Y89gvVmx9sQN6TU42XzEjNHkiQ8bw83uhczgEXudCs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219691; c=relaxed/simple;
	bh=QBiGfnw6D0AisBjt8bLbb+tUDFuhu0cw0dXg6lX0aWQ=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=D7eCRerLQUHz6DBw+UH8H8cE/06+aGXQmhygwnBg+ybD6kNPT3giKWbBWYebb9wbFUH+pkP++Qwh/FHGI/OmhYru5RdspdYxnvpxQPGoH+rCIUvVLZk+Y6JU69LH/PqceA066w8Eh+vCcm3nxwG7EJWluA9tkc9GKKu2RkRdWmU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.181
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f181.google.com with SMTP id
 d9443c01a7336-297ef378069so4782875ad.3
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:01:23 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219683; x=1764824483;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=p37yDdAenay1YMuz8i0RhGCtbMbyJ/bdh+HdPgyeuXw=;
        b=IKQ7QU9xLW//+w+lni/NuXRc82gieRfBM5XlVpgFlIyClEJZhgtRW4+O8Bf1FHzzPn
         IXgQJnlEVsi/P2JscA0AYOMXXSXqrAAvr2edk2TNLupCffJy0YUvvC7pdn6brbPcBPYG
         vQWC0mfV/equzN3tFGZTvHL/ycP71FHp+4cD1HV2o5O3LtLrQG8CzWsQcmMgTVgrRrpb
         ligo4U1FZwM9WNqDC1I1CTnNDOvboBR7Fi+cO7UwJewWdx9PPZR8XAwX3anIjIlUJ2U/
         qkRfFYonGSxJ89mrZLLe7Kgjvl0GARkGf5ePQAVatTxXDX0dBaS20p8T7UfwJE33wBBa
         16ow==
X-Forwarded-Encrypted: i=1;
 AJvYcCWu4mYZN2uVPQsDdRB7oqAE/S1xyn+iznjY0sBlpXRrh3yzrlqTgMgJNNtL1T3q+pBi82y1Hv3Y6Ggbb8g=@vger.kernel.org
X-Gm-Message-State: AOJu0Yxt14ss6WBn0qoODqHCcRNIR7uTqX+Cx8UvPjbK/u2T8u9CrQ6m
	dH9/AN8wlekT7IMnSL1asiTUIAL3IhvsVx886JUUTAu/bBg6ZI+ZZ96FNx+PIdqv
X-Gm-Gg: ASbGncu+wwi0ngaLbj+RlSWz8u2P0/TU7fZywhUiXEWRLo0VMZGkArUEXuG/ms2wwEd
	q2uZb+WOuY5Ypche0t4utIfoUpQS3RE7cjdxV6KbJYJqVDGHurBRb8GhAksmnO5TroNsjrBkX5s
	AIItYGC6jMREpCNOa30T7LvYlIyrn8h//hayy9nzxPNXW/sL6IORPeII5XfAOdgPpqyQIvheG8t
	io5n3aNUNfWSL80ZjZ/3FH9kbCnY/wyq+DdC/L6Uuq9vy7AVm6WbeK1yF+AGw5UP5agAZTqc6PT
	O9igxNRiDztm6lFbg+n3Jp+aPiukzt6pFVb43GI6SoI8l4ro9RJRKjWWsUnaX7pgw5UBQax79vi
	OeZpGYZbTcPuhw6h2wG9QM6a2LJRiEiWlFjBH9KKvM7pjYm53o8is2odlUro8dMAD+v13YlA4C3
	izu43r6F+0p9Ngbwjg5qur9CR05Q==
X-Google-Smtp-Source: 
 AGHT+IFBSSL1UXzegiJD1H2NtWqupcVBiE98w6OyPepRmif+i3KdMurHbwH9C9I2tqiJesT29hYEFQ==
X-Received: by 2002:a17:902:ebca:b0:295:3f35:a312 with SMTP id
 d9443c01a7336-29baaf9595cmr96705545ad.20.1764219682388;
        Wed, 26 Nov 2025 21:01:22 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.01.16
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:01:21 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>
Subject: [PATCH v2 10/11] ntfsplus: add misc operations
Date: Thu, 27 Nov 2025 13:59:43 +0900
Message-Id: <20251127045944.26009-11-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

This adds the implementation of misc operations for ntfsplus.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/ntfsplus/collate.c | 178 ++++++++++
 fs/ntfsplus/logfile.c | 770 ++++++++++++++++++++++++++++++++++++++++++
 fs/ntfsplus/misc.c    | 213 ++++++++++++
 fs/ntfsplus/unistr.c  | 473 ++++++++++++++++++++++++++
 fs/ntfsplus/upcase.c  |  73 ++++
 5 files changed, 1707 insertions(+)
 create mode 100644 fs/ntfsplus/collate.c
 create mode 100644 fs/ntfsplus/logfile.c
 create mode 100644 fs/ntfsplus/misc.c
 create mode 100644 fs/ntfsplus/unistr.c
 create mode 100644 fs/ntfsplus/upcase.c

diff --git a/fs/ntfsplus/collate.c b/fs/ntfsplus/collate.c
new file mode 100644
index 000000000000..82aeab3a434c
--- /dev/null
+++ b/fs/ntfsplus/collate.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel collation handling. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2004 Anton Altaparmakov
+ *
+ * Part of this file is based on code from the NTFS-3G project.
+ * and is copyrighted by the respective authors below:
+ * Copyright (c) 2004 Anton Altaparmakov
+ * Copyright (c) 2005 Yura Pakhuchiy
+ */
+
+#include "collate.h"
+#include "misc.h"
+#include "ntfs.h"
+
+static int ntfs_collate_binary(struct ntfs_volume *vol,
+		const void *data1, const int data1_len,
+		const void *data2, const int data2_len)
+{
+	int rc;
+
+	ntfs_debug("Entering.");
+	rc =3D memcmp(data1, data2, min(data1_len, data2_len));
+	if (!rc && (data1_len !=3D data2_len)) {
+		if (data1_len < data2_len)
+			rc =3D -1;
+		else
+			rc =3D 1;
+	}
+	ntfs_debug("Done, returning %i", rc);
+	return rc;
+}
+
+static int ntfs_collate_ntofs_ulong(struct ntfs_volume *vol,
+		const void *data1, const int data1_len,
+		const void *data2, const int data2_len)
+{
+	int rc;
+	u32 d1, d2;
+
+	ntfs_debug("Entering.");
+
+	if (data1_len !=3D data2_len || data1_len !=3D 4)
+		return -EINVAL;
+
+	d1 =3D le32_to_cpup(data1);
+	d2 =3D le32_to_cpup(data2);
+	if (d1 < d2)
+		rc =3D -1;
+	else {
+		if (d1 =3D=3D d2)
+			rc =3D 0;
+		else
+			rc =3D 1;
+	}
+	ntfs_debug("Done, returning %i", rc);
+	return rc;
+}
+
+/**
+ * ntfs_collate_ntofs_ulongs - Which of two le32 arrays should be listed f=
irst
+ *
+ * Returns: -1, 0 or 1 depending of how the arrays compare
+ */
+static int ntfs_collate_ntofs_ulongs(struct ntfs_volume *vol,
+		const void *data1, const int data1_len,
+		const void *data2, const int data2_len)
+{
+	int rc;
+	int len;
+	const __le32 *p1, *p2;
+	u32 d1, d2;
+
+	ntfs_debug("Entering.");
+	if ((data1_len !=3D data2_len) || (data1_len <=3D 0) || (data1_len & 3)) {
+		ntfs_error(vol->sb, "data1_len or data2_len not valid\n");
+		return -1;
+	}
+
+	p1 =3D (const __le32 *)data1;
+	p2 =3D (const __le32 *)data2;
+	len =3D data1_len;
+	do {
+		d1 =3D le32_to_cpup(p1);
+		p1++;
+		d2 =3D le32_to_cpup(p2);
+		p2++;
+	} while ((d1 =3D=3D d2) && ((len -=3D 4) > 0));
+	if (d1 < d2)
+		rc =3D -1;
+	else {
+		if (d1 =3D=3D d2)
+			rc =3D 0;
+		else
+			rc =3D 1;
+	}
+	ntfs_debug("Done, returning %i.", rc);
+	return rc;
+}
+
+/**
+ * ntfs_collate_file_name - Which of two filenames should be listed first
+ */
+static int ntfs_collate_file_name(struct ntfs_volume *vol,
+		const void *data1, const int __always_unused data1_len,
+		const void *data2, const int __always_unused data2_len)
+{
+	int rc;
+
+	ntfs_debug("Entering.\n");
+	rc =3D ntfs_file_compare_values(data1, data2, -2,
+			IGNORE_CASE, vol->upcase, vol->upcase_len);
+	if (!rc)
+		rc =3D ntfs_file_compare_values(data1, data2,
+			-2, CASE_SENSITIVE, vol->upcase, vol->upcase_len);
+	ntfs_debug("Done, returning %i.\n", rc);
+	return rc;
+}
+
+typedef int (*ntfs_collate_func_t)(struct ntfs_volume *, const void *, con=
st int,
+		const void *, const int);
+
+static ntfs_collate_func_t ntfs_do_collate0x0[3] =3D {
+	ntfs_collate_binary,
+	ntfs_collate_file_name,
+	NULL/*ntfs_collate_unicode_string*/,
+};
+
+static ntfs_collate_func_t ntfs_do_collate0x1[4] =3D {
+	ntfs_collate_ntofs_ulong,
+	NULL/*ntfs_collate_ntofs_sid*/,
+	NULL/*ntfs_collate_ntofs_security_hash*/,
+	ntfs_collate_ntofs_ulongs,
+};
+
+/**
+ * ntfs_collate - collate two data items using a specified collation rule
+ * @vol:	ntfs volume to which the data items belong
+ * @cr:		collation rule to use when comparing the items
+ * @data1:	first data item to collate
+ * @data1_len:	length in bytes of @data1
+ * @data2:	second data item to collate
+ * @data2_len:	length in bytes of @data2
+ *
+ * Collate the two data items @data1 and @data2 using the collation rule @=
cr
+ * and return -1, 0, ir 1 if @data1 is found, respectively, to collate bef=
ore,
+ * to match, or to collate after @data2.
+ *
+ * For speed we use the collation rule @cr as an index into two tables of
+ * function pointers to call the appropriate collation function.
+ */
+int ntfs_collate(struct ntfs_volume *vol, __le32 cr,
+		const void *data1, const int data1_len,
+		const void *data2, const int data2_len)
+{
+	int i;
+
+	ntfs_debug("Entering.");
+
+	if (cr !=3D COLLATION_BINARY && cr !=3D COLLATION_NTOFS_ULONG &&
+	    cr !=3D COLLATION_FILE_NAME && cr !=3D COLLATION_NTOFS_ULONGS)
+		return -EINVAL;
+
+	i =3D le32_to_cpu(cr);
+	if (i < 0)
+		return -1;
+	if (i <=3D 0x02)
+		return ntfs_do_collate0x0[i](vol, data1, data1_len,
+				data2, data2_len);
+	if (i < 0x10)
+		return -1;
+	i -=3D 0x10;
+	if (likely(i <=3D 3))
+		return ntfs_do_collate0x1[i](vol, data1, data1_len,
+				data2, data2_len);
+	return 0;
+}
diff --git a/fs/ntfsplus/logfile.c b/fs/ntfsplus/logfile.c
new file mode 100644
index 000000000000..f13cf1456708
--- /dev/null
+++ b/fs/ntfsplus/logfile.c
@@ -0,0 +1,770 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel journal handling. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2002-2007 Anton Altaparmakov
+ */
+
+#include <linux/bio.h>
+
+#include "attrib.h"
+#include "aops.h"
+#include "logfile.h"
+#include "misc.h"
+#include "ntfs.h"
+
+/**
+ * ntfs_check_restart_page_header - check the page header for consistency
+ * @vi:		LogFile inode to which the restart page header belongs
+ * @rp:		restart page header to check
+ * @pos:	position in @vi at which the restart page header resides
+ *
+ * Check the restart page header @rp for consistency and return 'true' if =
it is
+ * consistent and 'false' otherwise.
+ *
+ * This function only needs NTFS_BLOCK_SIZE bytes in @rp, i.e. it does not
+ * require the full restart page.
+ */
+static bool ntfs_check_restart_page_header(struct inode *vi,
+		struct restart_page_header *rp, s64 pos)
+{
+	u32 logfile_system_page_size, logfile_log_page_size;
+	u16 ra_ofs, usa_count, usa_ofs, usa_end =3D 0;
+	bool have_usa =3D true;
+
+	ntfs_debug("Entering.");
+	/*
+	 * If the system or log page sizes are smaller than the ntfs block size
+	 * or either is not a power of 2 we cannot handle this log file.
+	 */
+	logfile_system_page_size =3D le32_to_cpu(rp->system_page_size);
+	logfile_log_page_size =3D le32_to_cpu(rp->log_page_size);
+	if (logfile_system_page_size < NTFS_BLOCK_SIZE ||
+			logfile_log_page_size < NTFS_BLOCK_SIZE ||
+			logfile_system_page_size &
+			(logfile_system_page_size - 1) ||
+			!is_power_of_2(logfile_log_page_size)) {
+		ntfs_error(vi->i_sb, "LogFile uses unsupported page size.");
+		return false;
+	}
+	/*
+	 * We must be either at !pos (1st restart page) or at pos =3D system page
+	 * size (2nd restart page).
+	 */
+	if (pos && pos !=3D logfile_system_page_size) {
+		ntfs_error(vi->i_sb, "Found restart area in incorrect position in LogFil=
e.");
+		return false;
+	}
+	/* We only know how to handle version 1.1. */
+	if (le16_to_cpu(rp->major_ver) !=3D 1 ||
+	    le16_to_cpu(rp->minor_ver) !=3D 1) {
+		ntfs_error(vi->i_sb,
+			"LogFile version %i.%i is not supported.  (This driver supports version=
 1.1 only.)",
+			(int)le16_to_cpu(rp->major_ver),
+			(int)le16_to_cpu(rp->minor_ver));
+		return false;
+	}
+	/*
+	 * If chkdsk has been run the restart page may not be protected by an
+	 * update sequence array.
+	 */
+	if (ntfs_is_chkd_record(rp->magic) && !le16_to_cpu(rp->usa_count)) {
+		have_usa =3D false;
+		goto skip_usa_checks;
+	}
+	/* Verify the size of the update sequence array. */
+	usa_count =3D 1 + (logfile_system_page_size >> NTFS_BLOCK_SIZE_BITS);
+	if (usa_count !=3D le16_to_cpu(rp->usa_count)) {
+		ntfs_error(vi->i_sb,
+			"LogFile restart page specifies inconsistent update sequence array coun=
t.");
+		return false;
+	}
+	/* Verify the position of the update sequence array. */
+	usa_ofs =3D le16_to_cpu(rp->usa_ofs);
+	usa_end =3D usa_ofs + usa_count * sizeof(u16);
+	if (usa_ofs < sizeof(struct restart_page_header) ||
+			usa_end > NTFS_BLOCK_SIZE - sizeof(u16)) {
+		ntfs_error(vi->i_sb,
+			"LogFile restart page specifies inconsistent update sequence array offs=
et.");
+		return false;
+	}
+skip_usa_checks:
+	/*
+	 * Verify the position of the restart area.  It must be:
+	 *	- aligned to 8-byte boundary,
+	 *	- after the update sequence array, and
+	 *	- within the system page size.
+	 */
+	ra_ofs =3D le16_to_cpu(rp->restart_area_offset);
+	if (ra_ofs & 7 || (have_usa ? ra_ofs < usa_end :
+			ra_ofs < sizeof(struct restart_page_header)) ||
+			ra_ofs > logfile_system_page_size) {
+		ntfs_error(vi->i_sb,
+			"LogFile restart page specifies inconsistent restart area offset.");
+		return false;
+	}
+	/*
+	 * Only restart pages modified by chkdsk are allowed to have chkdsk_lsn
+	 * set.
+	 */
+	if (!ntfs_is_chkd_record(rp->magic) && le64_to_cpu(rp->chkdsk_lsn)) {
+		ntfs_error(vi->i_sb,
+			"LogFile restart page is not modified by chkdsk but a chkdsk LSN is spe=
cified.");
+		return false;
+	}
+	ntfs_debug("Done.");
+	return true;
+}
+
+/**
+ * ntfs_check_restart_area - check the restart area for consistency
+ * @vi:		LogFile inode to which the restart page belongs
+ * @rp:		restart page whose restart area to check
+ *
+ * Check the restart area of the restart page @rp for consistency and retu=
rn
+ * 'true' if it is consistent and 'false' otherwise.
+ *
+ * This function assumes that the restart page header has already been
+ * consistency checked.
+ *
+ * This function only needs NTFS_BLOCK_SIZE bytes in @rp, i.e. it does not
+ * require the full restart page.
+ */
+static bool ntfs_check_restart_area(struct inode *vi, struct restart_page_=
header *rp)
+{
+	u64 file_size;
+	struct restart_area *ra;
+	u16 ra_ofs, ra_len, ca_ofs;
+	u8 fs_bits;
+
+	ntfs_debug("Entering.");
+	ra_ofs =3D le16_to_cpu(rp->restart_area_offset);
+	ra =3D (struct restart_area *)((u8 *)rp + ra_ofs);
+	/*
+	 * Everything before ra->file_size must be before the first word
+	 * protected by an update sequence number.  This ensures that it is
+	 * safe to access ra->client_array_offset.
+	 */
+	if (ra_ofs + offsetof(struct restart_area, file_size) >
+			NTFS_BLOCK_SIZE - sizeof(u16)) {
+		ntfs_error(vi->i_sb,
+			"LogFile restart area specifies inconsistent file offset.");
+		return false;
+	}
+	/*
+	 * Now that we can access ra->client_array_offset, make sure everything
+	 * up to the log client array is before the first word protected by an
+	 * update sequence number.  This ensures we can access all of the
+	 * restart area elements safely.  Also, the client array offset must be
+	 * aligned to an 8-byte boundary.
+	 */
+	ca_ofs =3D le16_to_cpu(ra->client_array_offset);
+	if (((ca_ofs + 7) & ~7) !=3D ca_ofs ||
+			ra_ofs + ca_ofs > NTFS_BLOCK_SIZE - sizeof(u16)) {
+		ntfs_error(vi->i_sb,
+			"LogFile restart area specifies inconsistent client array offset.");
+		return false;
+	}
+	/*
+	 * The restart area must end within the system page size both when
+	 * calculated manually and as specified by ra->restart_area_length.
+	 * Also, the calculated length must not exceed the specified length.
+	 */
+	ra_len =3D ca_ofs + le16_to_cpu(ra->log_clients) *
+			sizeof(struct log_client_record);
+	if (ra_ofs + ra_len > le32_to_cpu(rp->system_page_size) ||
+			ra_ofs + le16_to_cpu(ra->restart_area_length) >
+			le32_to_cpu(rp->system_page_size) ||
+			ra_len > le16_to_cpu(ra->restart_area_length)) {
+		ntfs_error(vi->i_sb,
+			"LogFile restart area is out of bounds of the system page size specifie=
d by the restart page header and/or the specified restart area length is in=
consistent.");
+		return false;
+	}
+	/*
+	 * The ra->client_free_list and ra->client_in_use_list must be either
+	 * LOGFILE_NO_CLIENT or less than ra->log_clients or they are
+	 * overflowing the client array.
+	 */
+	if ((ra->client_free_list !=3D LOGFILE_NO_CLIENT &&
+			le16_to_cpu(ra->client_free_list) >=3D
+			le16_to_cpu(ra->log_clients)) ||
+			(ra->client_in_use_list !=3D LOGFILE_NO_CLIENT &&
+			le16_to_cpu(ra->client_in_use_list) >=3D
+			le16_to_cpu(ra->log_clients))) {
+		ntfs_error(vi->i_sb,
+			"LogFile restart area specifies overflowing client free and/or in use l=
ists.");
+		return false;
+	}
+	/*
+	 * Check ra->seq_number_bits against ra->file_size for consistency.
+	 * We cannot just use ffs() because the file size is not a power of 2.
+	 */
+	file_size =3D le64_to_cpu(ra->file_size);
+	fs_bits =3D 0;
+	while (file_size) {
+		file_size >>=3D 1;
+		fs_bits++;
+	}
+	if (le32_to_cpu(ra->seq_number_bits) !=3D 67 - fs_bits) {
+		ntfs_error(vi->i_sb,
+			"LogFile restart area specifies inconsistent sequence number bits.");
+		return false;
+	}
+	/* The log record header length must be a multiple of 8. */
+	if (((le16_to_cpu(ra->log_record_header_length) + 7) & ~7) !=3D
+			le16_to_cpu(ra->log_record_header_length)) {
+		ntfs_error(vi->i_sb,
+			"LogFile restart area specifies inconsistent log record header length."=
);
+		return false;
+	}
+	/* Dito for the log page data offset. */
+	if (((le16_to_cpu(ra->log_page_data_offset) + 7) & ~7) !=3D
+			le16_to_cpu(ra->log_page_data_offset)) {
+		ntfs_error(vi->i_sb,
+			"LogFile restart area specifies inconsistent log page data offset.");
+		return false;
+	}
+	ntfs_debug("Done.");
+	return true;
+}
+
+/**
+ * ntfs_check_log_client_array - check the log client array for consistency
+ * @vi:		LogFile inode to which the restart page belongs
+ * @rp:		restart page whose log client array to check
+ *
+ * Check the log client array of the restart page @rp for consistency and
+ * return 'true' if it is consistent and 'false' otherwise.
+ *
+ * This function assumes that the restart page header and the restart area=
 have
+ * already been consistency checked.
+ *
+ * Unlike ntfs_check_restart_page_header() and ntfs_check_restart_area(), =
this
+ * function needs @rp->system_page_size bytes in @rp, i.e. it requires the=
 full
+ * restart page and the page must be multi sector transfer deprotected.
+ */
+static bool ntfs_check_log_client_array(struct inode *vi,
+		struct restart_page_header *rp)
+{
+	struct restart_area *ra;
+	struct log_client_record *ca, *cr;
+	u16 nr_clients, idx;
+	bool in_free_list, idx_is_first;
+
+	ntfs_debug("Entering.");
+	ra =3D (struct restart_area *)((u8 *)rp + le16_to_cpu(rp->restart_area_of=
fset));
+	ca =3D (struct log_client_record *)((u8 *)ra +
+			le16_to_cpu(ra->client_array_offset));
+	/*
+	 * Check the ra->client_free_list first and then check the
+	 * ra->client_in_use_list.  Check each of the log client records in
+	 * each of the lists and check that the array does not overflow the
+	 * ra->log_clients value.  Also keep track of the number of records
+	 * visited as there cannot be more than ra->log_clients records and
+	 * that way we detect eventual loops in within a list.
+	 */
+	nr_clients =3D le16_to_cpu(ra->log_clients);
+	idx =3D le16_to_cpu(ra->client_free_list);
+	in_free_list =3D true;
+check_list:
+	for (idx_is_first =3D true; idx !=3D LOGFILE_NO_CLIENT_CPU; nr_clients--,
+			idx =3D le16_to_cpu(cr->next_client)) {
+		if (!nr_clients || idx >=3D le16_to_cpu(ra->log_clients))
+			goto err_out;
+		/* Set @cr to the current log client record. */
+		cr =3D ca + idx;
+		/* The first log client record must not have a prev_client. */
+		if (idx_is_first) {
+			if (cr->prev_client !=3D LOGFILE_NO_CLIENT)
+				goto err_out;
+			idx_is_first =3D false;
+		}
+	}
+	/* Switch to and check the in use list if we just did the free list. */
+	if (in_free_list) {
+		in_free_list =3D false;
+		idx =3D le16_to_cpu(ra->client_in_use_list);
+		goto check_list;
+	}
+	ntfs_debug("Done.");
+	return true;
+err_out:
+	ntfs_error(vi->i_sb, "LogFile log client array is corrupt.");
+	return false;
+}
+
+/**
+ * ntfs_check_and_load_restart_page - check the restart page for consisten=
cy
+ * @vi:		LogFile inode to which the restart page belongs
+ * @rp:		restart page to check
+ * @pos:	position in @vi at which the restart page resides
+ * @wrp:	[OUT] copy of the multi sector transfer deprotected restart page
+ * @lsn:	[OUT] set to the current logfile lsn on success
+ *
+ * Check the restart page @rp for consistency and return 0 if it is consis=
tent
+ * and -errno otherwise.  The restart page may have been modified by chkds=
k in
+ * which case its magic is CHKD instead of RSTR.
+ *
+ * This function only needs NTFS_BLOCK_SIZE bytes in @rp, i.e. it does not
+ * require the full restart page.
+ *
+ * If @wrp is not NULL, on success, *@wrp will point to a buffer containin=
g a
+ * copy of the complete multi sector transfer deprotected page.  On failur=
e,
+ * *@wrp is undefined.
+ *
+ * Simillarly, if @lsn is not NULL, on success *@lsn will be set to the cu=
rrent
+ * logfile lsn according to this restart page.  On failure, *@lsn is undef=
ined.
+ *
+ * The following error codes are defined:
+ *	-EINVAL	- The restart page is inconsistent.
+ *	-ENOMEM	- Not enough memory to load the restart page.
+ *	-EIO	- Failed to reading from LogFile.
+ */
+static int ntfs_check_and_load_restart_page(struct inode *vi,
+		struct restart_page_header *rp, s64 pos, struct restart_page_header **wr=
p,
+		s64 *lsn)
+{
+	struct restart_area *ra;
+	struct restart_page_header *trp;
+	int size, err;
+
+	ntfs_debug("Entering.");
+	/* Check the restart page header for consistency. */
+	if (!ntfs_check_restart_page_header(vi, rp, pos)) {
+		/* Error output already done inside the function. */
+		return -EINVAL;
+	}
+	/* Check the restart area for consistency. */
+	if (!ntfs_check_restart_area(vi, rp)) {
+		/* Error output already done inside the function. */
+		return -EINVAL;
+	}
+	ra =3D (struct restart_area *)((u8 *)rp + le16_to_cpu(rp->restart_area_of=
fset));
+	/*
+	 * Allocate a buffer to store the whole restart page so we can multi
+	 * sector transfer deprotect it.
+	 */
+	trp =3D ntfs_malloc_nofs(le32_to_cpu(rp->system_page_size));
+	if (!trp) {
+		ntfs_error(vi->i_sb, "Failed to allocate memory for LogFile restart page=
 buffer.");
+		return -ENOMEM;
+	}
+	/*
+	 * Read the whole of the restart page into the buffer.  If it fits
+	 * completely inside @rp, just copy it from there.  Otherwise map all
+	 * the required pages and copy the data from them.
+	 */
+	size =3D PAGE_SIZE - (pos & ~PAGE_MASK);
+	if (size >=3D le32_to_cpu(rp->system_page_size)) {
+		memcpy(trp, rp, le32_to_cpu(rp->system_page_size));
+	} else {
+		pgoff_t idx;
+		struct folio *folio;
+		int have_read, to_read;
+
+		/* First copy what we already have in @rp. */
+		memcpy(trp, rp, size);
+		/* Copy the remaining data one page at a time. */
+		have_read =3D size;
+		to_read =3D le32_to_cpu(rp->system_page_size) - size;
+		idx =3D (pos + size) >> PAGE_SHIFT;
+		do {
+			folio =3D ntfs_read_mapping_folio(vi->i_mapping, idx);
+			if (IS_ERR(folio)) {
+				ntfs_error(vi->i_sb, "Error mapping LogFile page (index %lu).",
+						idx);
+				err =3D PTR_ERR(folio);
+				if (err !=3D -EIO && err !=3D -ENOMEM)
+					err =3D -EIO;
+				goto err_out;
+			}
+			size =3D min_t(int, to_read, PAGE_SIZE);
+			memcpy((u8 *)trp + have_read, folio_address(folio), size);
+			folio_put(folio);
+			have_read +=3D size;
+			to_read -=3D size;
+			idx++;
+		} while (to_read > 0);
+	}
+	/*
+	 * Perform the multi sector transfer deprotection on the buffer if the
+	 * restart page is protected.
+	 */
+	if ((!ntfs_is_chkd_record(trp->magic) || le16_to_cpu(trp->usa_count)) &&
+	    post_read_mst_fixup((struct ntfs_record *)trp, le32_to_cpu(rp->system=
_page_size))) {
+		/*
+		 * A multi sector transfer error was detected.  We only need to
+		 * abort if the restart page contents exceed the multi sector
+		 * transfer fixup of the first sector.
+		 */
+		if (le16_to_cpu(rp->restart_area_offset) +
+				le16_to_cpu(ra->restart_area_length) >
+				NTFS_BLOCK_SIZE - sizeof(u16)) {
+			ntfs_error(vi->i_sb,
+				"Multi sector transfer error detected in LogFile restart page.");
+			err =3D -EINVAL;
+			goto err_out;
+		}
+	}
+	/*
+	 * If the restart page is modified by chkdsk or there are no active
+	 * logfile clients, the logfile is consistent.  Otherwise, need to
+	 * check the log client records for consistency, too.
+	 */
+	err =3D 0;
+	if (ntfs_is_rstr_record(rp->magic) &&
+			ra->client_in_use_list !=3D LOGFILE_NO_CLIENT) {
+		if (!ntfs_check_log_client_array(vi, trp)) {
+			err =3D -EINVAL;
+			goto err_out;
+		}
+	}
+	if (lsn) {
+		if (ntfs_is_rstr_record(rp->magic))
+			*lsn =3D le64_to_cpu(ra->current_lsn);
+		else /* if (ntfs_is_chkd_record(rp->magic)) */
+			*lsn =3D le64_to_cpu(rp->chkdsk_lsn);
+	}
+	ntfs_debug("Done.");
+	if (wrp)
+		*wrp =3D trp;
+	else {
+err_out:
+		ntfs_free(trp);
+	}
+	return err;
+}
+
+/**
+ * ntfs_check_logfile - check the journal for consistency
+ * @log_vi:	struct inode of loaded journal LogFile to check
+ * @rp:		[OUT] on success this is a copy of the current restart page
+ *
+ * Check the LogFile journal for consistency and return 'true' if it is
+ * consistent and 'false' if not.  On success, the current restart page is
+ * returned in *@rp.  Caller must call ntfs_free(*@rp) when finished with =
it.
+ *
+ * At present we only check the two restart pages and ignore the log record
+ * pages.
+ *
+ * Note that the MstProtected flag is not set on the LogFile inode and hen=
ce
+ * when reading pages they are not deprotected.  This is because we do not=
 know
+ * if the LogFile was created on a system with a different page size to ou=
rs
+ * yet and mst deprotection would fail if our page size is smaller.
+ */
+bool ntfs_check_logfile(struct inode *log_vi, struct restart_page_header *=
*rp)
+{
+	s64 size, pos;
+	s64 rstr1_lsn, rstr2_lsn;
+	struct ntfs_volume *vol =3D NTFS_SB(log_vi->i_sb);
+	struct address_space *mapping =3D log_vi->i_mapping;
+	struct folio *folio =3D NULL;
+	u8 *kaddr =3D NULL;
+	struct restart_page_header *rstr1_ph =3D NULL;
+	struct restart_page_header *rstr2_ph =3D NULL;
+	int log_page_size, err;
+	bool logfile_is_empty =3D true;
+	u8 log_page_bits;
+
+	ntfs_debug("Entering.");
+	/* An empty LogFile must have been clean before it got emptied. */
+	if (NVolLogFileEmpty(vol))
+		goto is_empty;
+	size =3D i_size_read(log_vi);
+	/* Make sure the file doesn't exceed the maximum allowed size. */
+	if (size > MaxLogFileSize)
+		size =3D MaxLogFileSize;
+	/*
+	 * Truncate size to a multiple of the page cache size or the default
+	 * log page size if the page cache size is between the default log page
+	 * log page size if the page cache size is between the default log page
+	 * size and twice that.
+	 */
+	if (DefaultLogPageSize <=3D PAGE_SIZE &&
+	    DefaultLogPageSize * 2 <=3D PAGE_SIZE)
+		log_page_size =3D DefaultLogPageSize;
+	else
+		log_page_size =3D PAGE_SIZE;
+	/*
+	 * Use ntfs_ffs() instead of ffs() to enable the compiler to
+	 * optimize log_page_size and log_page_bits into constants.
+	 */
+	log_page_bits =3D ntfs_ffs(log_page_size) - 1;
+	size &=3D ~(s64)(log_page_size - 1);
+	/*
+	 * Ensure the log file is big enough to store at least the two restart
+	 * pages and the minimum number of log record pages.
+	 */
+	if (size < log_page_size * 2 || (size - log_page_size * 2) >>
+			log_page_bits < MinLogRecordPages) {
+		ntfs_error(vol->sb, "LogFile is too small.");
+		return false;
+	}
+	/*
+	 * Read through the file looking for a restart page.  Since the restart
+	 * page header is at the beginning of a page we only need to search at
+	 * what could be the beginning of a page (for each page size) rather
+	 * than scanning the whole file byte by byte.  If all potential places
+	 * contain empty and uninitialzed records, the log file can be assumed
+	 * to be empty.
+	 */
+	for (pos =3D 0; pos < size; pos <<=3D 1) {
+		pgoff_t idx =3D pos >> PAGE_SHIFT;
+
+		if (!folio || folio->index !=3D idx) {
+			if (folio)
+				ntfs_unmap_folio(folio, kaddr);
+			folio =3D ntfs_read_mapping_folio(mapping, idx);
+			if (IS_ERR(folio)) {
+				ntfs_error(vol->sb, "Error mapping LogFile page (index %lu).",
+						idx);
+				goto err_out;
+			}
+		}
+		kaddr =3D (u8 *)kmap_local_folio(folio, 0) + (pos & ~PAGE_MASK);
+		/*
+		 * A non-empty block means the logfile is not empty while an
+		 * empty block after a non-empty block has been encountered
+		 * means we are done.
+		 */
+		if (!ntfs_is_empty_recordp((__le32 *)kaddr))
+			logfile_is_empty =3D false;
+		else if (!logfile_is_empty)
+			break;
+		/*
+		 * A log record page means there cannot be a restart page after
+		 * this so no need to continue searching.
+		 */
+		if (ntfs_is_rcrd_recordp((__le32 *)kaddr))
+			break;
+		/* If not a (modified by chkdsk) restart page, continue. */
+		if (!ntfs_is_rstr_recordp((__le32 *)kaddr) &&
+				!ntfs_is_chkd_recordp((__le32 *)kaddr)) {
+			if (!pos)
+				pos =3D NTFS_BLOCK_SIZE >> 1;
+			continue;
+		}
+		/*
+		 * Check the (modified by chkdsk) restart page for consistency
+		 * and get a copy of the complete multi sector transfer
+		 * deprotected restart page.
+		 */
+		err =3D ntfs_check_and_load_restart_page(log_vi,
+				(struct restart_page_header *)kaddr, pos,
+				!rstr1_ph ? &rstr1_ph : &rstr2_ph,
+				!rstr1_ph ? &rstr1_lsn : &rstr2_lsn);
+		if (!err) {
+			/*
+			 * If we have now found the first (modified by chkdsk)
+			 * restart page, continue looking for the second one.
+			 */
+			if (!pos) {
+				pos =3D NTFS_BLOCK_SIZE >> 1;
+				continue;
+			}
+			/*
+			 * We have now found the second (modified by chkdsk)
+			 * restart page, so we can stop looking.
+			 */
+			break;
+		}
+		/*
+		 * Error output already done inside the function.  Note, we do
+		 * not abort if the restart page was invalid as we might still
+		 * find a valid one further in the file.
+		 */
+		if (err !=3D -EINVAL) {
+			ntfs_unmap_folio(folio, kaddr);
+			goto err_out;
+		}
+		/* Continue looking. */
+		if (!pos)
+			pos =3D NTFS_BLOCK_SIZE >> 1;
+	}
+	if (folio)
+		ntfs_unmap_folio(folio, kaddr);
+	if (logfile_is_empty) {
+		NVolSetLogFileEmpty(vol);
+is_empty:
+		ntfs_debug("Done.  (LogFile is empty.)");
+		return true;
+	}
+	if (!rstr1_ph) {
+		ntfs_error(vol->sb,
+			"Did not find any restart pages in LogFile and it was not empty.");
+		return false;
+	}
+	/* If both restart pages were found, use the more recent one. */
+	if (rstr2_ph) {
+		/*
+		 * If the second restart area is more recent, switch to it.
+		 * Otherwise just throw it away.
+		 */
+		if (rstr2_lsn > rstr1_lsn) {
+			ntfs_debug("Using second restart page as it is more recent.");
+			ntfs_free(rstr1_ph);
+			rstr1_ph =3D rstr2_ph;
+			/* rstr1_lsn =3D rstr2_lsn; */
+		} else {
+			ntfs_debug("Using first restart page as it is more recent.");
+			ntfs_free(rstr2_ph);
+		}
+		rstr2_ph =3D NULL;
+	}
+	/* All consistency checks passed. */
+	if (rp)
+		*rp =3D rstr1_ph;
+	else
+		ntfs_free(rstr1_ph);
+	ntfs_debug("Done.");
+	return true;
+err_out:
+	if (rstr1_ph)
+		ntfs_free(rstr1_ph);
+	return false;
+}
+
+/**
+ * ntfs_empty_logfile - empty the contents of the LogFile journal
+ * @log_vi:	struct inode of loaded journal LogFile to empty
+ *
+ * Empty the contents of the LogFile journal @log_vi and return 'true' on
+ * success and 'false' on error.
+ *
+ * This function assumes that the LogFile journal has already been consist=
ency
+ * checked by a call to ntfs_check_logfile() and that ntfs_is_logfile_clea=
n()
+ * has been used to ensure that the LogFile is clean.
+ */
+bool ntfs_empty_logfile(struct inode *log_vi)
+{
+	s64 vcn, end_vcn;
+	struct ntfs_inode *log_ni =3D NTFS_I(log_vi);
+	struct ntfs_volume *vol =3D log_ni->vol;
+	struct super_block *sb =3D vol->sb;
+	struct runlist_element *rl;
+	unsigned long flags;
+	int err;
+	bool should_wait =3D true;
+	char *empty_buf =3D NULL;
+	struct file_ra_state *ra =3D NULL;
+
+	ntfs_debug("Entering.");
+	if (NVolLogFileEmpty(vol)) {
+		ntfs_debug("Done.");
+		return true;
+	}
+
+	/*
+	 * We cannot use ntfs_attr_set() because we may be still in the middle
+	 * of a mount operation.  Thus we do the emptying by hand by first
+	 * zapping the page cache pages for the LogFile/DATA attribute and
+	 * then emptying each of the buffers in each of the clusters specified
+	 * by the runlist by hand.
+	 */
+	vcn =3D 0;
+	read_lock_irqsave(&log_ni->size_lock, flags);
+	end_vcn =3D (log_ni->initialized_size + vol->cluster_size_mask) >>
+			vol->cluster_size_bits;
+	read_unlock_irqrestore(&log_ni->size_lock, flags);
+	truncate_inode_pages(log_vi->i_mapping, 0);
+	down_write(&log_ni->runlist.lock);
+	rl =3D log_ni->runlist.rl;
+	if (unlikely(!rl || vcn < rl->vcn || !rl->length)) {
+map_vcn:
+		err =3D ntfs_map_runlist_nolock(log_ni, vcn, NULL);
+		if (err) {
+			ntfs_error(sb, "Failed to map runlist fragment (error %d).", -err);
+			goto err;
+		}
+		rl =3D log_ni->runlist.rl;
+	}
+	/* Seek to the runlist element containing @vcn. */
+	while (rl->length && vcn >=3D rl[1].vcn)
+		rl++;
+
+	err =3D -ENOMEM;
+	empty_buf =3D ntfs_malloc_nofs(vol->cluster_size);
+	if (!empty_buf)
+		goto err;
+
+	memset(empty_buf, 0xff, vol->cluster_size);
+
+	ra =3D kzalloc(sizeof(*ra), GFP_NOFS);
+	if (!ra)
+		goto err;
+
+	file_ra_state_init(ra, sb->s_bdev->bd_mapping);
+	do {
+		s64 lcn;
+		loff_t start, end;
+		s64 len;
+
+		/*
+		 * If this run is not mapped map it now and start again as the
+		 * runlist will have been updated.
+		 */
+		lcn =3D rl->lcn;
+		if (unlikely(lcn =3D=3D LCN_RL_NOT_MAPPED)) {
+			vcn =3D rl->vcn;
+			ntfs_free(empty_buf);
+			goto map_vcn;
+		}
+		/* If this run is not valid abort with an error. */
+		if (unlikely(!rl->length || lcn < LCN_HOLE))
+			goto rl_err;
+		/* Skip holes. */
+		if (lcn =3D=3D LCN_HOLE)
+			continue;
+		start =3D lcn << vol->cluster_size_bits;
+		len =3D rl->length;
+		if (rl[1].vcn > end_vcn)
+			len =3D end_vcn - rl->vcn;
+		end =3D (lcn + len) << vol->cluster_size_bits;
+
+		page_cache_sync_readahead(sb->s_bdev->bd_mapping, ra, NULL,
+			start >> PAGE_SHIFT, (end - start) >> PAGE_SHIFT);
+
+		do {
+			err =3D ntfs_dev_write(sb, empty_buf, start,
+						  vol->cluster_size, should_wait);
+			if (err) {
+				ntfs_error(sb, "ntfs_dev_write failed, err : %d\n", err);
+				goto io_err;
+			}
+
+			/*
+			 * Submit the buffer and wait for i/o to complete but
+			 * only for the first buffer so we do not miss really
+			 * serious i/o errors.  Once the first buffer has
+			 * completed ignore errors afterwards as we can assume
+			 * that if one buffer worked all of them will work.
+			 */
+			if (should_wait)
+				should_wait =3D false;
+			start +=3D vol->cluster_size;
+		} while (start < end);
+	} while ((++rl)->vcn < end_vcn);
+	up_write(&log_ni->runlist.lock);
+	kfree(empty_buf);
+	kfree(ra);
+	truncate_inode_pages(log_vi->i_mapping, 0);
+	/* Set the flag so we do not have to do it again on remount. */
+	NVolSetLogFileEmpty(vol);
+	ntfs_debug("Done.");
+	return true;
+io_err:
+	ntfs_error(sb, "Failed to write buffer.  Unmount and run chkdsk.");
+	goto dirty_err;
+rl_err:
+	ntfs_error(sb, "Runlist is corrupt.  Unmount and run chkdsk.");
+dirty_err:
+	NVolSetErrors(vol);
+	err =3D -EIO;
+err:
+	ntfs_free(empty_buf);
+	kfree(ra);
+	up_write(&log_ni->runlist.lock);
+	ntfs_error(sb, "Failed to fill LogFile with 0xff bytes (error %d).",
+			-err);
+	return false;
+}
diff --git a/fs/ntfsplus/misc.c b/fs/ntfsplus/misc.c
new file mode 100644
index 000000000000..d4d63c74db99
--- /dev/null
+++ b/fs/ntfsplus/misc.c
@@ -0,0 +1,213 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel debug support. Part of the Linux-NTFS project.
+ *
+ * Copyright (C) 1997 Martin von L=C3=B6wis, R=C3=A9gis Duchesne
+ * Copyright (c) 2001-2005 Anton Altaparmakov
+ */
+
+#include <linux/module.h>
+#ifdef CONFIG_SYSCTL
+#include <linux/proc_fs.h>
+#include <linux/sysctl.h>
+#endif
+
+#include "misc.h"
+
+/**
+ * __ntfs_warning - output a warning to the syslog
+ * @function:	name of function outputting the warning
+ * @sb:		super block of mounted ntfs filesystem
+ * @fmt:	warning string containing format specifications
+ * @...:	a variable number of arguments specified in @fmt
+ *
+ * Outputs a warning to the syslog for the mounted ntfs filesystem describ=
ed
+ * by @sb.
+ *
+ * @fmt and the corresponding @... is printf style format string containing
+ * the warning string and the corresponding format arguments, respectively.
+ *
+ * @function is the name of the function from which __ntfs_warning is being
+ * called.
+ *
+ * Note, you should be using debug.h::ntfs_warning(@sb, @fmt, @...) instead
+ * as this provides the @function parameter automatically.
+ */
+void __ntfs_warning(const char *function, const struct super_block *sb,
+		const char *fmt, ...)
+{
+	struct va_format vaf;
+	va_list args;
+	int flen =3D 0;
+
+	if (function)
+		flen =3D strlen(function);
+	va_start(args, fmt);
+	vaf.fmt =3D fmt;
+	vaf.va =3D &args;
+#ifndef DEBUG
+	if (sb)
+		pr_warn_ratelimited("(device %s): %s(): %pV\n",
+			sb->s_id, flen ? function : "", &vaf);
+	else
+		pr_warn_ratelimited("%s(): %pV\n", flen ? function : "", &vaf);
+#else
+	if (sb)
+		pr_warn("(device %s): %s(): %pV\n",
+			sb->s_id, flen ? function : "", &vaf);
+	else
+		pr_warn("%s(): %pV\n", flen ? function : "", &vaf);
+#endif
+	va_end(args);
+}
+
+/**
+ * __ntfs_error - output an error to the syslog
+ * @function:	name of function outputting the error
+ * @sb:		super block of mounted ntfs filesystem
+ * @fmt:	error string containing format specifications
+ * @...:	a variable number of arguments specified in @fmt
+ *
+ * Outputs an error to the syslog for the mounted ntfs filesystem described
+ * by @sb.
+ *
+ * @fmt and the corresponding @... is printf style format string containing
+ * the error string and the corresponding format arguments, respectively.
+ *
+ * @function is the name of the function from which __ntfs_error is being
+ * called.
+ *
+ * Note, you should be using debug.h::ntfs_error(@sb, @fmt, @...) instead
+ * as this provides the @function parameter automatically.
+ */
+void __ntfs_error(const char *function, struct super_block *sb,
+		const char *fmt, ...)
+{
+	struct va_format vaf;
+	va_list args;
+	int flen =3D 0;
+
+	if (function)
+		flen =3D strlen(function);
+	va_start(args, fmt);
+	vaf.fmt =3D fmt;
+	vaf.va =3D &args;
+#ifndef DEBUG
+	if (sb)
+		pr_err_ratelimited("(device %s): %s(): %pV\n",
+		       sb->s_id, flen ? function : "", &vaf);
+	else
+		pr_err_ratelimited("%s(): %pV\n", flen ? function : "", &vaf);
+#else
+	if (sb)
+		pr_err("(device %s): %s(): %pV\n",
+		       sb->s_id, flen ? function : "", &vaf);
+	else
+		pr_err("%s(): %pV\n", flen ? function : "", &vaf);
+#endif
+	va_end(args);
+
+	if (sb)
+		ntfs_handle_error(sb);
+}
+
+#ifdef DEBUG
+
+/* If 1, output debug messages, and if 0, don't. */
+int debug_msgs;
+
+void __ntfs_debug(const char *file, int line, const char *function,
+		const char *fmt, ...)
+{
+	struct va_format vaf;
+	va_list args;
+	int flen =3D 0;
+
+	if (!debug_msgs)
+		return;
+	if (function)
+		flen =3D strlen(function);
+	va_start(args, fmt);
+	vaf.fmt =3D fmt;
+	vaf.va =3D &args;
+	pr_debug("(%s, %d): %s(): %pV", file, line, flen ? function : "", &vaf);
+	va_end(args);
+}
+
+/* Dump a runlist. Caller has to provide synchronisation for @rl. */
+void ntfs_debug_dump_runlist(const struct runlist_element *rl)
+{
+	int i;
+	const char *lcn_str[5] =3D { "LCN_DELALLOC     ", "LCN_HOLE         ",
+				   "LCN_RL_NOT_MAPPED", "LCN_ENOENT       ",
+				   "LCN_unknown      " };
+
+	if (!debug_msgs)
+		return;
+	pr_debug("Dumping runlist (values in hex):\n");
+	if (!rl) {
+		pr_debug("Run list not present.\n");
+		return;
+	}
+	pr_debug("VCN              LCN               Run length\n");
+	for (i =3D 0; ; i++) {
+		s64 lcn =3D (rl + i)->lcn;
+
+		if (lcn < (s64)0) {
+			int index =3D -lcn - 1;
+
+			if (index > -LCN_ENOENT - 1)
+				index =3D 3;
+			pr_debug("%-16Lx %s %-16Lx%s\n",
+					(long long)(rl + i)->vcn, lcn_str[index],
+					(long long)(rl + i)->length,
+					(rl + i)->length ? "" :
+						" (runlist end)");
+		} else
+			pr_debug("%-16Lx %-16Lx  %-16Lx%s\n",
+					(long long)(rl + i)->vcn,
+					(long long)(rl + i)->lcn,
+					(long long)(rl + i)->length,
+					(rl + i)->length ? "" :
+						" (runlist end)");
+		if (!(rl + i)->length)
+			break;
+	}
+}
+
+#ifdef CONFIG_SYSCTL
+/* Definition of the ntfs sysctl. */
+static const struct ctl_table ntfs_sysctls[] =3D {
+	{
+		.procname	=3D "ntfs-debug",
+		.data		=3D &debug_msgs,		/* Data pointer and size. */
+		.maxlen		=3D sizeof(debug_msgs),
+		.mode		=3D 0644,			/* Mode, proc handler. */
+		.proc_handler	=3D proc_dointvec
+	},
+	{}
+};
+
+/* Storage for the sysctls header. */
+static struct ctl_table_header *sysctls_root_table;
+
+/**
+ * ntfs_sysctl - add or remove the debug sysctl
+ * @add:	add (1) or remove (0) the sysctl
+ *
+ * Add or remove the debug sysctl. Return 0 on success or -errno on error.
+ */
+int ntfs_sysctl(int add)
+{
+	if (add) {
+		sysctls_root_table =3D register_sysctl("fs", ntfs_sysctls);
+		if (!sysctls_root_table)
+			return -ENOMEM;
+	} else {
+		unregister_sysctl_table(sysctls_root_table);
+		sysctls_root_table =3D NULL;
+	}
+	return 0;
+}
+#endif /* CONFIG_SYSCTL */
+#endif
diff --git a/fs/ntfsplus/unistr.c b/fs/ntfsplus/unistr.c
new file mode 100644
index 000000000000..810fdb2ab218
--- /dev/null
+++ b/fs/ntfsplus/unistr.c
@@ -0,0 +1,473 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS Unicode string handling. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2006 Anton Altaparmakov
+ */
+
+#include "ntfs.h"
+#include "misc.h"
+
+/*
+ * IMPORTANT
+ * =3D=3D=3D=3D=3D=3D=3D=3D=3D
+ *
+ * All these routines assume that the Unicode characters are in little end=
ian
+ * encoding inside the strings!!!
+ */
+
+/*
+ * This is used by the name collation functions to quickly determine what
+ * characters are (in)valid.
+ */
+static const u8 legal_ansi_char_array[0x40] =3D {
+	0x00, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
+	0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
+
+	0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
+	0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
+
+	0x17, 0x07, 0x18, 0x17, 0x17, 0x17, 0x17, 0x17,
+	0x17, 0x17, 0x18, 0x16, 0x16, 0x17, 0x07, 0x00,
+
+	0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17,
+	0x17, 0x17, 0x04, 0x16, 0x18, 0x16, 0x18, 0x18,
+};
+
+/**
+ * ntfs_are_names_equal - compare two Unicode names for equality
+ * @s1:			name to compare to @s2
+ * @s1_len:		length in Unicode characters of @s1
+ * @s2:			name to compare to @s1
+ * @s2_len:		length in Unicode characters of @s2
+ * @ic:			ignore case bool
+ * @upcase:		upcase table (only if @ic =3D=3D IGNORE_CASE)
+ * @upcase_size:	length in Unicode characters of @upcase (if present)
+ *
+ * Compare the names @s1 and @s2 and return 'true' (1) if the names are
+ * identical, or 'false' (0) if they are not identical. If @ic is IGNORE_C=
ASE,
+ * the @upcase table is used to performa a case insensitive comparison.
+ */
+bool ntfs_are_names_equal(const __le16 *s1, size_t s1_len,
+		const __le16 *s2, size_t s2_len, const u32 ic,
+		const __le16 *upcase, const u32 upcase_size)
+{
+	if (s1_len !=3D s2_len)
+		return false;
+	if (ic =3D=3D CASE_SENSITIVE)
+		return !ntfs_ucsncmp(s1, s2, s1_len);
+	return !ntfs_ucsncasecmp(s1, s2, s1_len, upcase, upcase_size);
+}
+
+/**
+ * ntfs_collate_names - collate two Unicode names
+ * @name1:	first Unicode name to compare
+ * @name1_len:	first Unicode name length
+ * @name2:	second Unicode name to compare
+ * @name2_len:	second Unicode name length
+ * @err_val:	if @name1 contains an invalid character return this value
+ * @ic:		either CASE_SENSITIVE or IGNORE_CASE
+ * @upcase:	upcase table (ignored if @ic is CASE_SENSITIVE)
+ * @upcase_len:	upcase table size (ignored if @ic is CASE_SENSITIVE)
+ *
+ * ntfs_collate_names collates two Unicode names and returns:
+ *
+ *  -1 if the first name collates before the second one,
+ *   0 if the names match,
+ *   1 if the second name collates before the first one, or
+ * @err_val if an invalid character is found in @name1 during the comparis=
on.
+ *
+ * The following characters are considered invalid: '"', '*', '<', '>' and=
 '?'.
+ */
+int ntfs_collate_names(const __le16 *name1, const u32 name1_len,
+		const __le16 *name2, const u32 name2_len,
+		const int err_val, const u32 ic,
+		const __le16 *upcase, const u32 upcase_len)
+{
+	u32 cnt, min_len;
+	u16 c1, c2;
+
+	min_len =3D name1_len;
+	if (name1_len > name2_len)
+		min_len =3D name2_len;
+	for (cnt =3D 0; cnt < min_len; ++cnt) {
+		c1 =3D le16_to_cpu(*name1++);
+		c2 =3D le16_to_cpu(*name2++);
+		if (ic) {
+			if (c1 < upcase_len)
+				c1 =3D le16_to_cpu(upcase[c1]);
+			if (c2 < upcase_len)
+				c2 =3D le16_to_cpu(upcase[c2]);
+		}
+		if (c1 < 64 && legal_ansi_char_array[c1] & 8)
+			return err_val;
+		if (c1 < c2)
+			return -1;
+		if (c1 > c2)
+			return 1;
+	}
+	if (name1_len < name2_len)
+		return -1;
+	if (name1_len =3D=3D name2_len)
+		return 0;
+	/* name1_len > name2_len */
+	c1 =3D le16_to_cpu(*name1);
+	if (c1 < 64 && legal_ansi_char_array[c1] & 8)
+		return err_val;
+	return 1;
+}
+
+/**
+ * ntfs_ucsncmp - compare two little endian Unicode strings
+ * @s1:		first string
+ * @s2:		second string
+ * @n:		maximum unicode characters to compare
+ *
+ * Compare the first @n characters of the Unicode strings @s1 and @s2,
+ * The strings in little endian format and appropriate le16_to_cpu()
+ * conversion is performed on non-little endian machines.
+ *
+ * The function returns an integer less than, equal to, or greater than ze=
ro
+ * if @s1 (or the first @n Unicode characters thereof) is found, respectiv=
ely,
+ * to be less than, to match, or be greater than @s2.
+ */
+int ntfs_ucsncmp(const __le16 *s1, const __le16 *s2, size_t n)
+{
+	u16 c1, c2;
+	size_t i;
+
+	for (i =3D 0; i < n; ++i) {
+		c1 =3D le16_to_cpu(s1[i]);
+		c2 =3D le16_to_cpu(s2[i]);
+		if (c1 < c2)
+			return -1;
+		if (c1 > c2)
+			return 1;
+		if (!c1)
+			break;
+	}
+	return 0;
+}
+
+/**
+ * ntfs_ucsncasecmp - compare two little endian Unicode strings, ignoring =
case
+ * @s1:			first string
+ * @s2:			second string
+ * @n:			maximum unicode characters to compare
+ * @upcase:		upcase table
+ * @upcase_size:	upcase table size in Unicode characters
+ *
+ * Compare the first @n characters of the Unicode strings @s1 and @s2,
+ * ignoring case. The strings in little endian format and appropriate
+ * le16_to_cpu() conversion is performed on non-little endian machines.
+ *
+ * Each character is uppercased using the @upcase table before the compari=
son.
+ *
+ * The function returns an integer less than, equal to, or greater than ze=
ro
+ * if @s1 (or the first @n Unicode characters thereof) is found, respectiv=
ely,
+ * to be less than, to match, or be greater than @s2.
+ */
+int ntfs_ucsncasecmp(const __le16 *s1, const __le16 *s2, size_t n,
+		const __le16 *upcase, const u32 upcase_size)
+{
+	size_t i;
+	u16 c1, c2;
+
+	for (i =3D 0; i < n; ++i) {
+		c1 =3D le16_to_cpu(s1[i]);
+		if (c1 < upcase_size)
+			c1 =3D le16_to_cpu(upcase[c1]);
+		c2 =3D le16_to_cpu(s2[i]);
+		if (c2 < upcase_size)
+			c2 =3D le16_to_cpu(upcase[c2]);
+		if (c1 < c2)
+			return -1;
+		if (c1 > c2)
+			return 1;
+		if (!c1)
+			break;
+	}
+	return 0;
+}
+
+int ntfs_file_compare_values(const struct file_name_attr *file_name_attr1,
+		const struct file_name_attr *file_name_attr2,
+		const int err_val, const u32 ic,
+		const __le16 *upcase, const u32 upcase_len)
+{
+	return ntfs_collate_names((__le16 *)&file_name_attr1->file_name,
+			file_name_attr1->file_name_length,
+			(__le16 *)&file_name_attr2->file_name,
+			file_name_attr2->file_name_length,
+			err_val, ic, upcase, upcase_len);
+}
+
+/**
+ * ntfs_nlstoucs - convert NLS string to little endian Unicode string
+ *
+ * Convert the input string @ins, which is in whatever format the loaded N=
LS
+ * map dictates, into a little endian, 2-byte Unicode string.
+ *
+ * This function allocates the string and the caller is responsible for
+ * calling kmem_cache_free(ntfs_name_cache, *@outs); when finished with it.
+ *
+ * On success the function returns the number of Unicode characters writte=
n to
+ * the output string *@outs (>=3D 0), not counting the terminating Unicode=
 NULL
+ * character. *@outs is set to the allocated output string buffer.
+ *
+ * On error, a negative number corresponding to the error code is returned=
. In
+ * that case the output string is not allocated. Both *@outs and *@outs_len
+ * are then undefined.
+ *
+ * This might look a bit odd due to fast path optimization...
+ */
+int ntfs_nlstoucs(const struct ntfs_volume *vol, const char *ins,
+		const int ins_len, __le16 **outs, int max_name_len)
+{
+	struct nls_table *nls =3D vol->nls_map;
+	__le16 *ucs;
+	wchar_t wc;
+	int i, o, wc_len;
+
+	/* We do not trust outside sources. */
+	if (likely(ins)) {
+		if (max_name_len > NTFS_MAX_NAME_LEN)
+			ucs =3D kvmalloc((max_name_len + 2) * sizeof(__le16),
+				       GFP_NOFS | __GFP_ZERO);
+		else
+			ucs =3D kmem_cache_alloc(ntfs_name_cache, GFP_NOFS);
+		if (likely(ucs)) {
+			if (vol->nls_utf8) {
+				o =3D utf8s_to_utf16s(ins, ins_len,
+						    UTF16_LITTLE_ENDIAN,
+						    ucs,
+						    max_name_len + 2);
+				if (o < 0 || o > max_name_len) {
+					wc_len =3D o;
+					goto name_err;
+				}
+			} else {
+				for (i =3D o =3D 0; i < ins_len; i +=3D wc_len) {
+					wc_len =3D nls->char2uni(ins + i, ins_len - i,
+							&wc);
+					if (likely(wc_len >=3D 0 &&
+					    o < max_name_len)) {
+						if (likely(wc)) {
+							ucs[o++] =3D cpu_to_le16(wc);
+							continue;
+						} /* else if (!wc) */
+						break;
+					}
+
+					goto name_err;
+				}
+			}
+			ucs[o] =3D 0;
+			*outs =3D ucs;
+			return o;
+		} /* else if (!ucs) */
+		ntfs_debug("Failed to allocate buffer for converted name from ntfs_name_=
cache.");
+		return -ENOMEM;
+	} /* else if (!ins) */
+	ntfs_error(vol->sb, "Received NULL pointer.");
+	return -EINVAL;
+name_err:
+	if (max_name_len > NTFS_MAX_NAME_LEN)
+		kvfree(ucs);
+	else
+		kmem_cache_free(ntfs_name_cache, ucs);
+	if (wc_len < 0) {
+		ntfs_debug("Name using character set %s contains characters that cannot =
be converted to Unicode.",
+				nls->charset);
+		i =3D -EILSEQ;
+	} else {
+		ntfs_debug("Name is too long (maximum length for a name on NTFS is %d Un=
icode characters.",
+				max_name_len);
+		i =3D -ENAMETOOLONG;
+	}
+	return i;
+}
+
+/**
+ * ntfs_ucstonls - convert little endian Unicode string to NLS string
+ * @vol:	ntfs volume which we are working with
+ * @ins:	input Unicode string buffer
+ * @ins_len:	length of input string in Unicode characters
+ * @outs:	on return contains the (allocated) output NLS string buffer
+ * @outs_len:	length of output string buffer in bytes
+ *
+ * Convert the input little endian, 2-byte Unicode string @ins, of length
+ * @ins_len into the string format dictated by the loaded NLS.
+ *
+ * If *@outs is NULL, this function allocates the string and the caller is
+ * responsible for calling kfree(*@outs); when finished with it. In this c=
ase
+ * @outs_len is ignored and can be 0.
+ *
+ * On success the function returns the number of bytes written to the outp=
ut
+ * string *@outs (>=3D 0), not counting the terminating NULL byte. If the =
output
+ * string buffer was allocated, *@outs is set to it.
+ *
+ * On error, a negative number corresponding to the error code is returned=
. In
+ * that case the output string is not allocated. The contents of *@outs are
+ * then undefined.
+ *
+ * This might look a bit odd due to fast path optimization...
+ */
+int ntfs_ucstonls(const struct ntfs_volume *vol, const __le16 *ins,
+		const int ins_len, unsigned char **outs, int outs_len)
+{
+	struct nls_table *nls =3D vol->nls_map;
+	unsigned char *ns;
+	int i, o, ns_len, wc;
+
+	/* We don't trust outside sources. */
+	if (ins) {
+		ns =3D *outs;
+		ns_len =3D outs_len;
+		if (ns && !ns_len) {
+			wc =3D -ENAMETOOLONG;
+			goto conversion_err;
+		}
+		if (!ns) {
+			ns_len =3D ins_len * NLS_MAX_CHARSET_SIZE;
+			ns =3D kmalloc(ns_len + 1, GFP_NOFS);
+			if (!ns)
+				goto mem_err_out;
+		}
+
+		if (vol->nls_utf8) {
+			o =3D utf16s_to_utf8s((const wchar_t *)ins, ins_len,
+					UTF16_LITTLE_ENDIAN, ns, ns_len);
+			if (o >=3D ns_len) {
+				wc =3D -ENAMETOOLONG;
+				goto conversion_err;
+			}
+			goto done;
+		}
+
+		for (i =3D o =3D 0; i < ins_len; i++) {
+retry:
+			wc =3D nls->uni2char(le16_to_cpu(ins[i]), ns + o,
+					ns_len - o);
+			if (wc > 0) {
+				o +=3D wc;
+				continue;
+			} else if (!wc)
+				break;
+			else if (wc =3D=3D -ENAMETOOLONG && ns !=3D *outs) {
+				unsigned char *tc;
+				/* Grow in multiples of 64 bytes. */
+				tc =3D kmalloc((ns_len + 64) &
+						~63, GFP_NOFS);
+				if (tc) {
+					memcpy(tc, ns, ns_len);
+					ns_len =3D ((ns_len + 64) & ~63) - 1;
+					kfree(ns);
+					ns =3D tc;
+					goto retry;
+				} /* No memory so goto conversion_error; */
+			} /* wc < 0, real error. */
+			goto conversion_err;
+		}
+done:
+		ns[o] =3D 0;
+		*outs =3D ns;
+		return o;
+	} /* else (!ins) */
+	ntfs_error(vol->sb, "Received NULL pointer.");
+	return -EINVAL;
+conversion_err:
+	ntfs_error(vol->sb,
+		"Unicode name contains characters that cannot be converted to character =
set %s.  You might want to try to use the mount option nls=3Dutf8.",
+		nls->charset);
+	if (ns !=3D *outs)
+		kfree(ns);
+	if (wc !=3D -ENAMETOOLONG)
+		wc =3D -EILSEQ;
+	return wc;
+mem_err_out:
+	ntfs_error(vol->sb, "Failed to allocate name!");
+	return -ENOMEM;
+}
+
+/**
+ * ntfs_ucsnlen - determine the length of a little endian Unicode string
+ * @s:		pointer to Unicode string
+ * @maxlen:	maximum length of string @s
+ *
+ * Return the number of Unicode characters in the little endian Unicode
+ * string @s up to a maximum of maxlen Unicode characters, not including
+ * the terminating (__le16)'\0'. If there is no (__le16)'\0' between @s
+ * and @s + @maxlen, @maxlen is returned.
+ *
+ * This function never looks beyond @s + @maxlen.
+ */
+static u32 ntfs_ucsnlen(const __le16 *s, u32 maxlen)
+{
+	u32 i;
+
+	for (i =3D 0; i < maxlen; i++) {
+		if (!le16_to_cpu(s[i]))
+			break;
+	}
+	return i;
+}
+
+/**
+ * ntfs_ucsndup - duplicate little endian Unicode string
+ * @s:		pointer to Unicode string
+ * @maxlen:	maximum length of string @s
+ *
+ * Return a pointer to a new little endian Unicode string which is a dupli=
cate
+ * of the string s.  Memory for the new string is obtained with ntfs_mallo=
c(3),
+ * and can be freed with free(3).
+ *
+ * A maximum of @maxlen Unicode characters are copied and a terminating
+ * (__le16)'\0' little endian Unicode character is added.
+ *
+ * This function never looks beyond @s + @maxlen.
+ *
+ * Return a pointer to the new little endian Unicode string on success and=
 NULL
+ * on failure with errno set to the error code.
+ */
+__le16 *ntfs_ucsndup(const __le16 *s, u32 maxlen)
+{
+	__le16 *dst;
+	u32 len;
+
+	len =3D ntfs_ucsnlen(s, maxlen);
+	dst =3D ntfs_malloc_nofs((len + 1) * sizeof(__le16));
+	if (dst) {
+		memcpy(dst, s, len * sizeof(__le16));
+		dst[len] =3D cpu_to_le16(L'\0');
+	}
+	return dst;
+}
+
+/**
+ * ntfs_names_are_equal - compare two Unicode names for equality
+ * @s1:                 name to compare to @s2
+ * @s1_len:             length in Unicode characters of @s1
+ * @s2:                 name to compare to @s1
+ * @s2_len:             length in Unicode characters of @s2
+ * @ic:                 ignore case bool
+ * @upcase:             upcase table (only if @ic =3D=3D IGNORE_CASE)
+ * @upcase_size:        length in Unicode characters of @upcase (if presen=
t)
+ *
+ * Compare the names @s1 and @s2 and return TRUE (1) if the names are
+ * identical, or FALSE (0) if they are not identical. If @ic is IGNORE_CAS=
E,
+ * the @upcase table is used to perform a case insensitive comparison.
+ */
+bool ntfs_names_are_equal(const __le16 *s1, size_t s1_len,
+		const __le16 *s2, size_t s2_len,
+		const u32 ic,
+		const __le16 *upcase, const u32 upcase_size)
+{
+	if (s1_len !=3D s2_len)
+		return false;
+	if (!s1_len)
+		return true;
+	if (ic =3D=3D CASE_SENSITIVE)
+		return ntfs_ucsncmp(s1, s2, s1_len) ? false : true;
+	return ntfs_ucsncasecmp(s1, s2, s1_len, upcase, upcase_size) ? false : tr=
ue;
+}
diff --git a/fs/ntfsplus/upcase.c b/fs/ntfsplus/upcase.c
new file mode 100644
index 000000000000..a2b8e56edeff
--- /dev/null
+++ b/fs/ntfsplus/upcase.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Generate the full NTFS Unicode upcase table in little endian.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001 Richard Russon <ntfs@flatcap.org>
+ * Copyright (c) 2001-2006 Anton Altaparmakov
+ */
+
+#include "misc.h"
+#include "ntfs.h"
+
+__le16 *generate_default_upcase(void)
+{
+	static const int uc_run_table[][3] =3D { /* Start, End, Add */
+	{0x0061, 0x007B,  -32}, {0x0451, 0x045D, -80}, {0x1F70, 0x1F72,  74},
+	{0x00E0, 0x00F7,  -32}, {0x045E, 0x0460, -80}, {0x1F72, 0x1F76,  86},
+	{0x00F8, 0x00FF,  -32}, {0x0561, 0x0587, -48}, {0x1F76, 0x1F78, 100},
+	{0x0256, 0x0258, -205}, {0x1F00, 0x1F08,   8}, {0x1F78, 0x1F7A, 128},
+	{0x028A, 0x028C, -217}, {0x1F10, 0x1F16,   8}, {0x1F7A, 0x1F7C, 112},
+	{0x03AC, 0x03AD,  -38}, {0x1F20, 0x1F28,   8}, {0x1F7C, 0x1F7E, 126},
+	{0x03AD, 0x03B0,  -37}, {0x1F30, 0x1F38,   8}, {0x1FB0, 0x1FB2,   8},
+	{0x03B1, 0x03C2,  -32}, {0x1F40, 0x1F46,   8}, {0x1FD0, 0x1FD2,   8},
+	{0x03C2, 0x03C3,  -31}, {0x1F51, 0x1F52,   8}, {0x1FE0, 0x1FE2,   8},
+	{0x03C3, 0x03CC,  -32}, {0x1F53, 0x1F54,   8}, {0x1FE5, 0x1FE6,   7},
+	{0x03CC, 0x03CD,  -64}, {0x1F55, 0x1F56,   8}, {0x2170, 0x2180, -16},
+	{0x03CD, 0x03CF,  -63}, {0x1F57, 0x1F58,   8}, {0x24D0, 0x24EA, -26},
+	{0x0430, 0x0450,  -32}, {0x1F60, 0x1F68,   8}, {0xFF41, 0xFF5B, -32},
+	{0}
+	};
+
+	static const int uc_dup_table[][2] =3D { /* Start, End */
+	{0x0100, 0x012F}, {0x01A0, 0x01A6}, {0x03E2, 0x03EF}, {0x04CB, 0x04CC},
+	{0x0132, 0x0137}, {0x01B3, 0x01B7}, {0x0460, 0x0481}, {0x04D0, 0x04EB},
+	{0x0139, 0x0149}, {0x01CD, 0x01DD}, {0x0490, 0x04BF}, {0x04EE, 0x04F5},
+	{0x014A, 0x0178}, {0x01DE, 0x01EF}, {0x04BF, 0x04BF}, {0x04F8, 0x04F9},
+	{0x0179, 0x017E}, {0x01F4, 0x01F5}, {0x04C1, 0x04C4}, {0x1E00, 0x1E95},
+	{0x018B, 0x018B}, {0x01FA, 0x0218}, {0x04C7, 0x04C8}, {0x1EA0, 0x1EF9},
+	{0}
+	};
+
+	static const int uc_word_table[][2] =3D { /* Offset, Value */
+	{0x00FF, 0x0178}, {0x01AD, 0x01AC}, {0x01F3, 0x01F1}, {0x0269, 0x0196},
+	{0x0183, 0x0182}, {0x01B0, 0x01AF}, {0x0253, 0x0181}, {0x026F, 0x019C},
+	{0x0185, 0x0184}, {0x01B9, 0x01B8}, {0x0254, 0x0186}, {0x0272, 0x019D},
+	{0x0188, 0x0187}, {0x01BD, 0x01BC}, {0x0259, 0x018F}, {0x0275, 0x019F},
+	{0x018C, 0x018B}, {0x01C6, 0x01C4}, {0x025B, 0x0190}, {0x0283, 0x01A9},
+	{0x0192, 0x0191}, {0x01C9, 0x01C7}, {0x0260, 0x0193}, {0x0288, 0x01AE},
+	{0x0199, 0x0198}, {0x01CC, 0x01CA}, {0x0263, 0x0194}, {0x0292, 0x01B7},
+	{0x01A8, 0x01A7}, {0x01DD, 0x018E}, {0x0268, 0x0197},
+	{0}
+	};
+
+	int i, r;
+	__le16 *uc;
+
+	uc =3D ntfs_malloc_nofs(default_upcase_len * sizeof(__le16));
+	if (!uc)
+		return uc;
+	memset(uc, 0, default_upcase_len * sizeof(__le16));
+	/* Generate the little endian Unicode upcase table used by ntfs. */
+	for (i =3D 0; i < default_upcase_len; i++)
+		uc[i] =3D cpu_to_le16(i);
+	for (r =3D 0; uc_run_table[r][0]; r++)
+		for (i =3D uc_run_table[r][0]; i < uc_run_table[r][1]; i++)
+			le16_add_cpu(&uc[i], uc_run_table[r][2]);
+	for (r =3D 0; uc_dup_table[r][0]; r++)
+		for (i =3D uc_dup_table[r][0]; i < uc_dup_table[r][1]; i +=3D 2)
+			le16_add_cpu(&uc[i + 1], -1);
+	for (r =3D 0; uc_word_table[r][0]; r++)
+		uc[uc_word_table[r][0]] =3D cpu_to_le16(uc_word_table[r][1]);
+	return uc;
+}
--=20
2.25.1

From nobody Sun Feb  8 02:39:38 2026
Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com
 [209.85.214.181])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B88130F92B
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:01:30 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.181
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219694; cv=none;
 b=RVXjP21MVKwYPx9zStkHQNoLSxQ3GmUFpMDFbehrJs7SIi3safz8ppHO/EQPwykWcHhVOhQLj2dCTkqQHEDKoN6V9UxFDKBXbDSAZ79hzOtq2zMOhqObPslnVYlcdLWJfOfQPrO5CSFytO4mgRxnREwdArwhYCWCnZKggnGboj8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219694; c=relaxed/simple;
	bh=n5b/MQre1QxLtg9Zn1TXSpWNvwaGErlBIH1V1qOd5wk=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=n/N2AEmVE4Hba2jp0o8E72eFzMIYV6UEvJq9Am3wNzwcwXXeEER0g6dWW+2o3tMhht+g0PSJNCBXKOSQvrrR5oWyDl7RQh5g9UpUENT0o0sPmn5MjGSoU5g1M9pQBb8O3RufMiqLZbL36O2DrY78SEAKQcyV/LkMveeWwP16x0k=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.181
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f181.google.com with SMTP id
 d9443c01a7336-29568d93e87so4565035ad.2
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:01:30 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219689; x=1764824489;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=M8EtTs8rd1c6lDJ+HYVOpgm4BMVT9cA8PtoqZF5AUvk=;
        b=v8xFegFFeNBKZtUhfpgArN5dmAnOe+cFsSlz3SMCMrRQPlBMjRN1NE2NTILrmvejYs
         G0D3UFB4Pa9caYyPQ05ko3qH83oMIUckfrk62ewK0PQrAOkedANAcsqYzsZ1Tb3tTwl4
         skXOyOMB8USTIW7l4+6dJUsRntSLoIccrd4VwKpCn6xV823v4QD131p3Cu+2OW+O5fGK
         FQ+ajZXzUpuX02LhRuksPfUGPPUsSnlMmKXesrDtqaWLuGECplSGPKs2ue1Fjn/hacxv
         p1KeOVN2Yo1PuAHcXgzAMr+aihFT3CTheBakiOXmJPz6Pujj1192l1hkZxyo+GWoz/g+
         qDCw==
X-Forwarded-Encrypted: i=1;
 AJvYcCXgvmheHMC8y1sBqoX8Gh6Nia0JRdmoGzJ84VeyyykzkubbIVT2P5iEWq1mgPa2aemrG/tiHeQbcczwrnk=@vger.kernel.org
X-Gm-Message-State: AOJu0Yz3EDq5ywJvatc7LGhn/Al/fPUZVQHhEImdxfxhXjSCl7XIfrBR
	aVj7z5vA/LtgQPwbPN3vhnLmfkiTFceijz/ce7yfBfRDlq4WWtothXt0
X-Gm-Gg: ASbGnctL2NYwODGRV/Va5bprHoJ6UDSJllIkoOphY3KACM0vPKToqJX6UMkyXAYIs5i
	b7cm+hPzHGiykXUDFf8JzZaaGda23VGxQmkdmorXB23SJU8MOxPZK7I2tHA2UGIZKC2qAR/7NCk
	FlDJ4JvCPkDH8PLwVmWf9wYRrvwKdbmjYIdisrO203sbdQ7914VCXk9Fv2n3U4TOUccDIOKfE8G
	fHw4mcqnwOSyg+TnqSEjjcmtC61GoRYj6JxzZtGEjL6wRvdRv0bb83y62Lp18UPKnt9icxUNTyy
	4LHD+T0qT4F5vMFaFLCRBEQLkHEL9MeBSN5rhevq5+fsC71np0Aviht1plQ46KVoUj/Kuv5+nsS
	kGsTJYwIf886+h1e8gAahLpBlELYiiaOKeoOcFMGL+TGb0xRCNidtD45D/naaSffvs4QsgC0pcH
	a98Qeecf6OR5GxFyf6slas7lpwHA==
X-Google-Smtp-Source: 
 AGHT+IEYAB2a0oO7i5VPi5Akwg5ZUKtDBsMvjSRBiWzLidc/hVipMcno2poMfBe2U+PggXu4ab/YZQ==
X-Received: by 2002:a17:903:11c7:b0:295:5972:4363 with SMTP id
 d9443c01a7336-29baacae71emr114911405ad.0.1764219689255;
        Wed, 26 Nov 2025 21:01:29 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.01.23
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:01:28 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>
Subject: [PATCH v2 11/11] ntfsplus: add Kconfig and Makefile
Date: Thu, 27 Nov 2025 13:59:44 +0900
Message-Id: <20251127045944.26009-12-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds the Kconfig and Makefile for ntfsplus.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/Kconfig           |  1 +
 fs/Makefile          |  1 +
 fs/ntfsplus/Kconfig  | 45 ++++++++++++++++++++++++++++++++++++++++++++
 fs/ntfsplus/Makefile | 18 ++++++++++++++++++
 4 files changed, 65 insertions(+)
 create mode 100644 fs/ntfsplus/Kconfig
 create mode 100644 fs/ntfsplus/Makefile

diff --git a/fs/Kconfig b/fs/Kconfig
index 0bfdaecaa877..70d596b99c8b 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -153,6 +153,7 @@ menu "DOS/FAT/EXFAT/NT Filesystems"
 source "fs/fat/Kconfig"
 source "fs/exfat/Kconfig"
 source "fs/ntfs3/Kconfig"
+source "fs/ntfsplus/Kconfig"
=20
 endmenu
 endif # BLOCK
diff --git a/fs/Makefile b/fs/Makefile
index e3523ab2e587..2e2473451508 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -91,6 +91,7 @@ obj-y				+=3D unicode/
 obj-$(CONFIG_SMBFS)		+=3D smb/
 obj-$(CONFIG_HPFS_FS)		+=3D hpfs/
 obj-$(CONFIG_NTFS3_FS)		+=3D ntfs3/
+obj-$(CONFIG_NTFSPLUS_FS)	+=3D ntfsplus/
 obj-$(CONFIG_UFS_FS)		+=3D ufs/
 obj-$(CONFIG_EFS_FS)		+=3D efs/
 obj-$(CONFIG_JFFS2_FS)		+=3D jffs2/
diff --git a/fs/ntfsplus/Kconfig b/fs/ntfsplus/Kconfig
new file mode 100644
index 000000000000..c13cd06720e7
--- /dev/null
+++ b/fs/ntfsplus/Kconfig
@@ -0,0 +1,45 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config NTFSPLUS_FS
+	tristate "NTFS+ file system support"
+	select NLS
+	help
+	  NTFS is the file system of Microsoft Windows NT, 2000, XP and 2003.
+	  This allows you to mount devices formatted with the ntfs file system.
+
+	  To compile this as a module, choose M here: the module will be called
+	  ntfsplus.
+
+config NTFSPLUS_DEBUG
+	bool "NTFS+ debugging support"
+	depends on NTFSPLUS_FS
+	help
+	  If you are experiencing any problems with the NTFS file system, say
+	  Y here.  This will result in additional consistency checks to be
+	  performed by the driver as well as additional debugging messages to
+	  be written to the system log.  Note that debugging messages are
+	  disabled by default.  To enable them, supply the option debug_msgs=3D1
+	  at the kernel command line when booting the kernel or as an option
+	  to insmod when loading the ntfs module.  Once the driver is active,
+	  you can enable debugging messages by doing (as root):
+	  echo 1 > /proc/sys/fs/ntfs-debug
+	  Replacing the "1" with "0" would disable debug messages.
+
+	  If you leave debugging messages disabled, this results in little
+	  overhead, but enabling debug messages results in very significant
+	  slowdown of the system.
+
+	  When reporting bugs, please try to have available a full dump of
+	  debugging messages while the misbehaviour was occurring.
+
+config NTFSPLUS_FS_POSIX_ACL
+	bool "NTFS+ POSIX Access Control Lists"
+	depends on NTFSPLUS_FS
+	select FS_POSIX_ACL
+	help
+	  POSIX Access Control Lists (ACLs) support additional access rights
+	  for users and groups beyond the standard owner/group/world scheme,
+	  and this option selects support for ACLs specifically for ntfs
+	  filesystems.
+	  NOTE: this is linux only feature. Windows will ignore these ACLs.
+
+	  If you don't know what Access Control Lists are, say N.
diff --git a/fs/ntfsplus/Makefile b/fs/ntfsplus/Makefile
new file mode 100644
index 000000000000..1e7e830dbeec
--- /dev/null
+++ b/fs/ntfsplus/Makefile
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the ntfsplus filesystem support.
+#
+
+# to check robot warnings
+ccflags-y +=3D -Wint-to-pointer-cast \
+        $(call cc-option,-Wunused-but-set-variable,-Wunused-const-variable=
) \
+        $(call cc-option,-Wold-style-declaration,-Wout-of-line-declaration)
+
+obj-$(CONFIG_NTFSPLUS_FS) +=3D ntfsplus.o
+
+ntfsplus-y :=3D aops.o attrib.o collate.o misc.o dir.o file.o index.o inod=
e.o \
+	  mft.o mst.o namei.o runlist.o super.o unistr.o attrlist.o ea.o \
+	  upcase.o bitmap.o lcnalloc.o logfile.o reparse.o compress.o \
+	  ntfs_iomap.o
+
+ccflags-$(CONFIG_NTFSPLUS_DEBUG) +=3D -DDEBUG
--=20
2.25.1