From nobody Sat Feb  7 18:26:59 2026
Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com
 [209.85.214.175])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9ECB12D9780
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 05:00:31 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.175
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764219643; cv=none;
 b=W8YRV6vWNvjmtkYrDFtSiZrMxFvsXxk1kQawfujgDCzBnKreAioZtPqbmHmWK76WqL1/OMQknjOREasIhmdqii+t++z++hln8BhLIkO7VcdtlBFX8FCKNJ4rt/OkpVrkvh5w1i42Gn5i/a0TzwQLR67GXeGzFSFyndxCU3ttut0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764219643; c=relaxed/simple;
	bh=n0JRXr92h3lpVQaSxPHtFuJlO8b9O3BW1Iea7A6/ALw=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=Yk59dHTNQF2QGV27wu/QqkGmWU7i/ygM2HYxZ7rMxCgXnqHPPJrksfY8+T/3sEa6QNpLO1VV5R4EuZRH86G1zdTXtrQNFwpEf1dFNPlKcGeYOaspJKNGrYXx7WwjjdoZIKfw1Qb5ERhTmIZqKu2PAeck94QO03gxX2ZRx2ztIEY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.214.175
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-f175.google.com with SMTP id
 d9443c01a7336-297d4a56f97so7198255ad.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 26 Nov 2025 21:00:31 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764219631; x=1764824431;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=bUjy3b3BR3skAPU1G4VO+s6kZAbKlWZdK5uKQynd2z8=;
        b=ix9ewSFIna3RipeRzqD9jltCaSYdCVXfRfCd5HNx6ub41IPQSjOwXOVoQG26IXarkO
         dLDCGy5TGytZyap9U5cNh+WL4RZphubPR9oOjdT5RRdPNwfGXyFMFDkVhCYYQKRyZWP4
         oWyplxrri5rXud0WPC+Q3PC9Pm2ToNTWGsbc48TIiDsvAM4jBbNZkw1SngmfHFW3kXZ7
         EMEJvthKdRML3BqeIVChwyE/uzvOyKEYj5Rwd2q/0snMQrFjIqZmoKXgWU+QQQTSkl/O
         mvfw+Qjl5A7qh23MFKtkMb5iE+WMly+h+SmQc8DA1mEmOxhs1ELTVAodeCemopwJmsi5
         N7VA==
X-Forwarded-Encrypted: i=1;
 AJvYcCX7QKD5usZAf0FEhB2eRboxG6LDau6JsCB7GnrQNlU3+Blv/OuPJEkAOek5eYKbipLj9nJQdJxzt3vBEmI=@vger.kernel.org
X-Gm-Message-State: AOJu0YzPy7dYpfjLW1wWLWcS6lU0kPAfyrj178V2mIHJ1lvMnBuG6Ogb
	3cYODLOwTab02XciUu5JhjNHNmpLfAB0FJS8CdUDeJc96eG7qNOywb9C
X-Gm-Gg: ASbGnctW/8jP/05L+Rfyp98ViSn4mWGa30pXyYEtmOK9vcv1sLkMslFQJlpXyyZB38z
	A4Gx2W4ULO+PCi14ijb5hAy1lHQ1oImv4zmjOzZMRWsWeLtNnqiZNlAP62IJjVXQ4twSoWO2X5W
	lUy+bINqy+o9/ULonj5AFl+KufRfeu/+ZOtwe80BJAAj0jWulmFnZnmX6yFuHhybizRZDAv+hg4
	8uPccbns5ZitbOyahYcFlCrbPwJwp2CDKBe6ld/F74kTqsya5/9i1LxXG6e7RYqhXvPpGasphms
	kLNLTcGY9PpGlH/JRnkJVB6wycrSeZFKt9oUDxB9q+Y94pnUcscySvMugK+VU419mSqag2dGaoc
	tNxaMZRKOIDXINEC1CKMaZBcPryrkIICeAPJ+D2sOuNG6mY5KYfd3Zee3K7O7LT21P0+d6WLwYy
	Qc5aWLZJDVDMpVGCuyWDoevA4SFg==
X-Google-Smtp-Source: 
 AGHT+IEXgReFwdVm/JnEV/3B/f6UKTP7M0DtsiED3ek+j0y8xlEr/mQ71VtkNI4WpckhfQ4RD4dxSA==
X-Received: by 2002:a17:903:1209:b0:298:3e3a:ae6 with SMTP id
 d9443c01a7336-29b6c6ae64fmr235647345ad.48.1764219628276;
        Wed, 26 Nov 2025 21:00:28 -0800 (PST)
Received: from localhost.localdomain ([1.227.206.162])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-29bceb54454sm2719825ad.84.2025.11.26.21.00.21
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 26 Nov 2025 21:00:26 -0800 (PST)
From: Namjae Jeon <linkinjeon@kernel.org>
To: viro@zeniv.linux.org.uk,
	brauner@kernel.org,
	hch@infradead.org,
	hch@lst.de,
	tytso@mit.edu,
	willy@infradead.org,
	jack@suse.cz,
	djwong@kernel.org,
	josef@toxicpanda.com,
	sandeen@sandeen.net,
	rgoldwyn@suse.com,
	xiang@kernel.org,
	dsterba@suse.com,
	pali@kernel.org,
	ebiggers@kernel.org,
	neil@brown.name,
	amir73il@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	iamjoonsoo.kim@lge.com,
	cheol.lee@lge.com,
	jay.sim@lge.com,
	gunho.lee@lge.com,
	Namjae Jeon <linkinjeon@kernel.org>,
	Hyunchul Lee <hyc.lee@gmail.com>
Subject: [PATCH v2 03/11] ntfsplus: add inode operations
Date: Thu, 27 Nov 2025 13:59:36 +0900
Message-Id: <20251127045944.26009-4-linkinjeon@kernel.org>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20251127045944.26009-1-linkinjeon@kernel.org>
References: <20251127045944.26009-1-linkinjeon@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds the implementation of inode operations for ntfsplus.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/ntfsplus/inode.c | 3729 +++++++++++++++++++++++++++++++++++++++++++
 fs/ntfsplus/mft.c   | 2698 +++++++++++++++++++++++++++++++
 fs/ntfsplus/mst.c   |  195 +++
 fs/ntfsplus/namei.c | 1677 +++++++++++++++++++
 4 files changed, 8299 insertions(+)
 create mode 100644 fs/ntfsplus/inode.c
 create mode 100644 fs/ntfsplus/mft.c
 create mode 100644 fs/ntfsplus/mst.c
 create mode 100644 fs/ntfsplus/namei.c

diff --git a/fs/ntfsplus/inode.c b/fs/ntfsplus/inode.c
new file mode 100644
index 000000000000..f577ef28ba69
--- /dev/null
+++ b/fs/ntfsplus/inode.c
@@ -0,0 +1,3729 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS kernel inode handling.
+ *
+ * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/writeback.h>
+#include <linux/seq_file.h>
+
+#include "lcnalloc.h"
+#include "misc.h"
+#include "ntfs.h"
+#include "index.h"
+#include "attrlist.h"
+#include "reparse.h"
+#include "ea.h"
+#include "attrib.h"
+#include "ntfs_iomap.h"
+
+/**
+ * ntfs_test_inode - compare two (possibly fake) inodes for equality
+ * @vi:		vfs inode which to test
+ * @data:	data which is being tested with
+ *
+ * Compare the ntfs attribute embedded in the ntfs specific part of the vfs
+ * inode @vi for equality with the ntfs attribute @data.
+ *
+ * If searching for the normal file/directory inode, set @na->type to AT_U=
NUSED.
+ * @na->name and @na->name_len are then ignored.
+ *
+ * Return 1 if the attributes match and 0 if not.
+ *
+ * NOTE: This function runs with the inode_hash_lock spin lock held so it =
is not
+ * allowed to sleep.
+ */
+int ntfs_test_inode(struct inode *vi, void *data)
+{
+	struct ntfs_attr *na =3D (struct ntfs_attr *)data;
+	struct ntfs_inode *ni;
+
+	if (vi->i_ino !=3D na->mft_no)
+		return 0;
+
+	ni =3D NTFS_I(vi);
+
+	/* If !NInoAttr(ni), @vi is a normal file or directory inode. */
+	if (likely(!NInoAttr(ni))) {
+		/* If not looking for a normal inode this is a mismatch. */
+		if (unlikely(na->type !=3D AT_UNUSED))
+			return 0;
+	} else {
+		/* A fake inode describing an attribute. */
+		if (ni->type !=3D na->type)
+			return 0;
+		if (ni->name_len !=3D na->name_len)
+			return 0;
+		if (na->name_len && memcmp(ni->name, na->name,
+				na->name_len * sizeof(__le16)))
+			return 0;
+		if (!ni->ext.base_ntfs_ino)
+			return 0;
+	}
+
+	/* Match! */
+	return 1;
+}
+
+/**
+ * ntfs_init_locked_inode - initialize an inode
+ * @vi:		vfs inode to initialize
+ * @data:	data which to initialize @vi to
+ *
+ * Initialize the vfs inode @vi with the values from the ntfs attribute @d=
ata in
+ * order to enable ntfs_test_inode() to do its work.
+ *
+ * If initializing the normal file/directory inode, set @na->type to AT_UN=
USED.
+ * In that case, @na->name and @na->name_len should be set to NULL and 0,
+ * respectively. Although that is not strictly necessary as
+ * ntfs_read_locked_inode() will fill them in later.
+ *
+ * Return 0 on success and error.
+ *
+ * NOTE: This function runs with the inode->i_lock spin lock held so it is=
 not
+ * allowed to sleep. (Hence the GFP_ATOMIC allocation.)
+ */
+static int ntfs_init_locked_inode(struct inode *vi, void *data)
+{
+	struct ntfs_attr *na =3D (struct ntfs_attr *)data;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+
+	vi->i_ino =3D na->mft_no;
+
+	if (na->type =3D=3D AT_INDEX_ALLOCATION)
+		NInoSetMstProtected(ni);
+	else
+		ni->type =3D na->type;
+
+	ni->name =3D na->name;
+	ni->name_len =3D na->name_len;
+	ni->folio =3D NULL;
+	atomic_set(&ni->count, 1);
+
+	/* If initializing a normal inode, we are done. */
+	if (likely(na->type =3D=3D AT_UNUSED))
+		return 0;
+
+	/* It is a fake inode. */
+	NInoSetAttr(ni);
+
+	/*
+	 * We have I30 global constant as an optimization as it is the name
+	 * in >99.9% of named attributes! The other <0.1% incur a GFP_ATOMIC
+	 * allocation but that is ok. And most attributes are unnamed anyway,
+	 * thus the fraction of named attributes with name !=3D I30 is actually
+	 * absolutely tiny.
+	 */
+	if (na->name_len && na->name !=3D I30) {
+		unsigned int i;
+
+		i =3D na->name_len * sizeof(__le16);
+		ni->name =3D kmalloc(i + sizeof(__le16), GFP_ATOMIC);
+		if (!ni->name)
+			return -ENOMEM;
+		memcpy(ni->name, na->name, i);
+		ni->name[na->name_len] =3D 0;
+	}
+	return 0;
+}
+
+static int ntfs_read_locked_inode(struct inode *vi);
+static int ntfs_read_locked_attr_inode(struct inode *base_vi, struct inode=
 *vi);
+static int ntfs_read_locked_index_inode(struct inode *base_vi,
+		struct inode *vi);
+
+/**
+ * ntfs_iget - obtain a struct inode corresponding to a specific normal in=
ode
+ * @sb:		super block of mounted volume
+ * @mft_no:	mft record number / inode number to obtain
+ *
+ * Obtain the struct inode corresponding to a specific normal inode (i.e. a
+ * file or directory).
+ *
+ * If the inode is in the cache, it is just returned with an increased
+ * reference count. Otherwise, a new struct inode is allocated and initial=
ized,
+ * and finally ntfs_read_locked_inode() is called to read in the inode and
+ * fill in the remainder of the inode structure.
+ *
+ * Return the struct inode on success. Check the return value with IS_ERR(=
) and
+ * if true, the function failed and the error code is obtained from PTR_ER=
R().
+ */
+struct inode *ntfs_iget(struct super_block *sb, unsigned long mft_no)
+{
+	struct inode *vi;
+	int err;
+	struct ntfs_attr na;
+
+	na.mft_no =3D mft_no;
+	na.type =3D AT_UNUSED;
+	na.name =3D NULL;
+	na.name_len =3D 0;
+
+	vi =3D iget5_locked(sb, mft_no, ntfs_test_inode,
+			ntfs_init_locked_inode, &na);
+	if (unlikely(!vi))
+		return ERR_PTR(-ENOMEM);
+
+	err =3D 0;
+
+	/* If this is a freshly allocated inode, need to read it now. */
+	if (vi->i_state & I_NEW) {
+		err =3D ntfs_read_locked_inode(vi);
+		unlock_new_inode(vi);
+	}
+	/*
+	 * There is no point in keeping bad inodes around if the failure was
+	 * due to ENOMEM. We want to be able to retry again later.
+	 */
+	if (unlikely(err =3D=3D -ENOMEM)) {
+		iput(vi);
+		vi =3D ERR_PTR(err);
+	}
+	return vi;
+}
+
+/**
+ * ntfs_attr_iget - obtain a struct inode corresponding to an attribute
+ * @base_vi:	vfs base inode containing the attribute
+ * @type:	attribute type
+ * @name:	Unicode name of the attribute (NULL if unnamed)
+ * @name_len:	length of @name in Unicode characters (0 if unnamed)
+ *
+ * Obtain the (fake) struct inode corresponding to the attribute specified=
 by
+ * @type, @name, and @name_len, which is present in the base mft record
+ * specified by the vfs inode @base_vi.
+ *
+ * If the attribute inode is in the cache, it is just returned with an
+ * increased reference count. Otherwise, a new struct inode is allocated a=
nd
+ * initialized, and finally ntfs_read_locked_attr_inode() is called to rea=
d the
+ * attribute and fill in the inode structure.
+ *
+ * Note, for index allocation attributes, you need to use ntfs_index_iget()
+ * instead of ntfs_attr_iget() as working with indices is a lot more compl=
ex.
+ *
+ * Return the struct inode of the attribute inode on success. Check the re=
turn
+ * value with IS_ERR() and if true, the function failed and the error code=
 is
+ * obtained from PTR_ERR().
+ */
+struct inode *ntfs_attr_iget(struct inode *base_vi, __le32 type,
+		__le16 *name, u32 name_len)
+{
+	struct inode *vi;
+	int err;
+	struct ntfs_attr na;
+
+	/* Make sure no one calls ntfs_attr_iget() for indices. */
+	WARN_ON(type =3D=3D AT_INDEX_ALLOCATION);
+
+	na.mft_no =3D base_vi->i_ino;
+	na.type =3D type;
+	na.name =3D name;
+	na.name_len =3D name_len;
+
+	vi =3D iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode,
+			ntfs_init_locked_inode, &na);
+	if (unlikely(!vi))
+		return ERR_PTR(-ENOMEM);
+	err =3D 0;
+
+	/* If this is a freshly allocated inode, need to read it now. */
+	if (vi->i_state & I_NEW) {
+		err =3D ntfs_read_locked_attr_inode(base_vi, vi);
+		unlock_new_inode(vi);
+	}
+	/*
+	 * There is no point in keeping bad attribute inodes around. This also
+	 * simplifies things in that we never need to check for bad attribute
+	 * inodes elsewhere.
+	 */
+	if (unlikely(err)) {
+		iput(vi);
+		vi =3D ERR_PTR(err);
+	}
+	return vi;
+}
+
+/**
+ * ntfs_index_iget - obtain a struct inode corresponding to an index
+ * @base_vi:	vfs base inode containing the index related attributes
+ * @name:	Unicode name of the index
+ * @name_len:	length of @name in Unicode characters
+ *
+ * Obtain the (fake) struct inode corresponding to the index specified by =
@name
+ * and @name_len, which is present in the base mft record specified by the=
 vfs
+ * inode @base_vi.
+ *
+ * If the index inode is in the cache, it is just returned with an increas=
ed
+ * reference count.  Otherwise, a new struct inode is allocated and
+ * initialized, and finally ntfs_read_locked_index_inode() is called to re=
ad
+ * the index related attributes and fill in the inode structure.
+ *
+ * Return the struct inode of the index inode on success. Check the return
+ * value with IS_ERR() and if true, the function failed and the error code=
 is
+ * obtained from PTR_ERR().
+ */
+struct inode *ntfs_index_iget(struct inode *base_vi, __le16 *name,
+		u32 name_len)
+{
+	struct inode *vi;
+	int err;
+	struct ntfs_attr na;
+
+	na.mft_no =3D base_vi->i_ino;
+	na.type =3D AT_INDEX_ALLOCATION;
+	na.name =3D name;
+	na.name_len =3D name_len;
+
+	vi =3D iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode,
+			ntfs_init_locked_inode, &na);
+	if (unlikely(!vi))
+		return ERR_PTR(-ENOMEM);
+
+	err =3D 0;
+
+	/* If this is a freshly allocated inode, need to read it now. */
+	if (vi->i_state & I_NEW) {
+		err =3D ntfs_read_locked_index_inode(base_vi, vi);
+		unlock_new_inode(vi);
+	}
+	/*
+	 * There is no point in keeping bad index inodes around.  This also
+	 * simplifies things in that we never need to check for bad index
+	 * inodes elsewhere.
+	 */
+	if (unlikely(err)) {
+		iput(vi);
+		vi =3D ERR_PTR(err);
+	}
+	return vi;
+}
+
+struct inode *ntfs_alloc_big_inode(struct super_block *sb)
+{
+	struct ntfs_inode *ni;
+
+	ntfs_debug("Entering.");
+	ni =3D alloc_inode_sb(sb, ntfs_big_inode_cache, GFP_NOFS);
+	if (likely(ni !=3D NULL)) {
+		ni->state =3D 0;
+		ni->type =3D 0;
+		ni->mft_no =3D 0;
+		return VFS_I(ni);
+	}
+	ntfs_error(sb, "Allocation of NTFS big inode structure failed.");
+	return NULL;
+}
+
+void ntfs_free_big_inode(struct inode *inode)
+{
+	kmem_cache_free(ntfs_big_inode_cache, NTFS_I(inode));
+}
+
+static int ntfs_non_resident_dealloc_clusters(struct ntfs_inode *ni)
+{
+	struct super_block *sb =3D ni->vol->sb;
+	struct ntfs_attr_search_ctx *actx;
+	int err =3D 0;
+
+	actx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!actx)
+		return -ENOMEM;
+	WARN_ON(actx->mrec->link_count !=3D 0);
+
+	/**
+	 * ntfs_truncate_vfs cannot be called in evict() context due
+	 * to some limitations, which are the @ni vfs inode is marked
+	 * with I_FREEING, and etc.
+	 */
+	if (NInoRunlistDirty(ni)) {
+		err =3D ntfs_cluster_free_from_rl(ni->vol, ni->runlist.rl);
+		if (err)
+			ntfs_error(sb,
+					"Failed to free clusters. Leaving inconsistent metadata.\n");
+	}
+
+	while ((err =3D ntfs_attrs_walk(actx)) =3D=3D 0) {
+		if (actx->attr->non_resident &&
+				(!NInoRunlistDirty(ni) || actx->attr->type !=3D AT_DATA)) {
+			struct runlist_element *rl;
+			size_t new_rl_count;
+
+			rl =3D ntfs_mapping_pairs_decompress(ni->vol, actx->attr, NULL,
+					&new_rl_count);
+			if (IS_ERR(rl)) {
+				err =3D PTR_ERR(rl);
+				ntfs_error(sb,
+					   "Failed to decompress runlist. Leaving inconsistent metadata.\n");
+				continue;
+			}
+
+			err =3D ntfs_cluster_free_from_rl(ni->vol, rl);
+			if (err)
+				ntfs_error(sb,
+					   "Failed to free attribute clusters. Leaving inconsistent metadata.=
\n");
+			ntfs_free(rl);
+		}
+	}
+
+	ntfs_release_dirty_clusters(ni->vol, ni->i_dealloc_clusters);
+	ntfs_attr_put_search_ctx(actx);
+	return err;
+}
+
+int ntfs_drop_big_inode(struct inode *inode)
+{
+	struct ntfs_inode *ni =3D NTFS_I(inode);
+
+	if (!inode_unhashed(inode) && inode->i_state & I_SYNC) {
+		if (ni->type =3D=3D AT_DATA || ni->type =3D=3D AT_INDEX_ALLOCATION) {
+			if (!inode->i_nlink) {
+				struct ntfs_inode *ni =3D NTFS_I(inode);
+
+				if (ni->data_size =3D=3D 0)
+					return 0;
+
+				/* To avoid evict_inode call simultaneously */
+				atomic_inc(&inode->i_count);
+				spin_unlock(&inode->i_lock);
+
+				truncate_setsize(VFS_I(ni), 0);
+				ntfs_truncate_vfs(VFS_I(ni), 0, 1);
+
+				sb_start_intwrite(inode->i_sb);
+				i_size_write(inode, 0);
+				ni->allocated_size =3D ni->initialized_size =3D ni->data_size =3D 0;
+
+				truncate_inode_pages_final(inode->i_mapping);
+				sb_end_intwrite(inode->i_sb);
+
+				spin_lock(&inode->i_lock);
+				atomic_dec(&inode->i_count);
+			}
+			return 0;
+		} else if (ni->type =3D=3D AT_INDEX_ROOT)
+			return 0;
+	}
+
+	return inode_generic_drop(inode);
+}
+
+static inline struct ntfs_inode *ntfs_alloc_extent_inode(void)
+{
+	struct ntfs_inode *ni;
+
+	ntfs_debug("Entering.");
+	ni =3D kmem_cache_alloc(ntfs_inode_cache, GFP_NOFS);
+	if (likely(ni !=3D NULL)) {
+		ni->state =3D 0;
+		return ni;
+	}
+	ntfs_error(NULL, "Allocation of NTFS inode structure failed.");
+	return NULL;
+}
+
+static void ntfs_destroy_extent_inode(struct ntfs_inode *ni)
+{
+	ntfs_debug("Entering.");
+
+	if (!atomic_dec_and_test(&ni->count))
+		WARN_ON(1);
+	if (ni->folio)
+		ntfs_unmap_folio(ni->folio, NULL);
+	kfree(ni->mrec);
+	kmem_cache_free(ntfs_inode_cache, ni);
+}
+
+static struct lock_class_key attr_inode_mrec_lock_class;
+static struct lock_class_key attr_list_inode_mrec_lock_class;
+
+/*
+ * The attribute runlist lock has separate locking rules from the
+ * normal runlist lock, so split the two lock-classes:
+ */
+static struct lock_class_key attr_list_rl_lock_class;
+
+/**
+ * __ntfs_init_inode - initialize ntfs specific part of an inode
+ * @sb:		super block of mounted volume
+ * @ni:		freshly allocated ntfs inode which to initialize
+ *
+ * Initialize an ntfs inode to defaults.
+ *
+ * NOTE: ni->mft_no, ni->state, ni->type, ni->name, and ni->name_len are l=
eft
+ * untouched. Make sure to initialize them elsewhere.
+ */
+void __ntfs_init_inode(struct super_block *sb, struct ntfs_inode *ni)
+{
+	ntfs_debug("Entering.");
+	rwlock_init(&ni->size_lock);
+	ni->initialized_size =3D ni->allocated_size =3D 0;
+	ni->seq_no =3D 0;
+	atomic_set(&ni->count, 1);
+	ni->vol =3D NTFS_SB(sb);
+	ntfs_init_runlist(&ni->runlist);
+	ni->lcn_seek_trunc =3D LCN_RL_NOT_MAPPED;
+	mutex_init(&ni->mrec_lock);
+	if (ni->type =3D=3D AT_ATTRIBUTE_LIST) {
+		lockdep_set_class(&ni->mrec_lock,
+				  &attr_list_inode_mrec_lock_class);
+		lockdep_set_class(&ni->runlist.lock,
+				  &attr_list_rl_lock_class);
+	} else if (NInoAttr(ni)) {
+		lockdep_set_class(&ni->mrec_lock,
+				  &attr_inode_mrec_lock_class);
+	}
+
+	ni->folio =3D NULL;
+	ni->folio_ofs =3D 0;
+	ni->mrec =3D NULL;
+	ni->attr_list_size =3D 0;
+	ni->attr_list =3D NULL;
+	ni->itype.index.block_size =3D 0;
+	ni->itype.index.vcn_size =3D 0;
+	ni->itype.index.collation_rule =3D 0;
+	ni->itype.index.block_size_bits =3D 0;
+	ni->itype.index.vcn_size_bits =3D 0;
+	mutex_init(&ni->extent_lock);
+	ni->nr_extents =3D 0;
+	ni->ext.base_ntfs_ino =3D NULL;
+	ni->flags =3D 0;
+	ni->mft_lcn[0] =3D LCN_RL_NOT_MAPPED;
+	ni->mft_lcn_count =3D 0;
+	ni->target =3D NULL;
+	ni->i_dealloc_clusters =3D 0;
+}
+
+/*
+ * Extent inodes get MFT-mapped in a nested way, while the base inode
+ * is still mapped. Teach this nesting to the lock validator by creating
+ * a separate class for nested inode's mrec_lock's:
+ */
+static struct lock_class_key extent_inode_mrec_lock_key;
+
+inline struct ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
+		unsigned long mft_no)
+{
+	struct ntfs_inode *ni =3D ntfs_alloc_extent_inode();
+
+	ntfs_debug("Entering.");
+	if (likely(ni !=3D NULL)) {
+		__ntfs_init_inode(sb, ni);
+		lockdep_set_class(&ni->mrec_lock, &extent_inode_mrec_lock_key);
+		ni->mft_no =3D mft_no;
+		ni->type =3D AT_UNUSED;
+		ni->name =3D NULL;
+		ni->name_len =3D 0;
+	}
+	return ni;
+}
+
+/**
+ * ntfs_is_extended_system_file - check if a file is in the $Extend direct=
ory
+ * @ctx:	initialized attribute search context
+ *
+ * Search all file name attributes in the inode described by the attribute
+ * search context @ctx and check if any of the names are in the $Extend sy=
stem
+ * directory.
+ *
+ * Return values:
+ *	   1: file is in $Extend directory
+ *	   0: file is not in $Extend directory
+ *    -errno: failed to determine if the file is in the $Extend directory
+ */
+static int ntfs_is_extended_system_file(struct ntfs_attr_search_ctx *ctx)
+{
+	int nr_links, err;
+
+	/* Restart search. */
+	ntfs_attr_reinit_search_ctx(ctx);
+
+	/* Get number of hard links. */
+	nr_links =3D le16_to_cpu(ctx->mrec->link_count);
+
+	/* Loop through all hard links. */
+	while (!(err =3D ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0, NULL, 0,
+			ctx))) {
+		struct file_name_attr *file_name_attr;
+		struct attr_record *attr =3D ctx->attr;
+		u8 *p, *p2;
+
+		nr_links--;
+		/*
+		 * Maximum sanity checking as we are called on an inode that
+		 * we suspect might be corrupt.
+		 */
+		p =3D (u8 *)attr + le32_to_cpu(attr->length);
+		if (p < (u8 *)ctx->mrec || (u8 *)p > (u8 *)ctx->mrec +
+				le32_to_cpu(ctx->mrec->bytes_in_use)) {
+err_corrupt_attr:
+			ntfs_error(ctx->ntfs_ino->vol->sb,
+					"Corrupt file name attribute. You should run chkdsk.");
+			return -EIO;
+		}
+		if (attr->non_resident) {
+			ntfs_error(ctx->ntfs_ino->vol->sb,
+					"Non-resident file name. You should run chkdsk.");
+			return -EIO;
+		}
+		if (attr->flags) {
+			ntfs_error(ctx->ntfs_ino->vol->sb,
+					"File name with invalid flags. You should run chkdsk.");
+			return -EIO;
+		}
+		if (!(attr->data.resident.flags & RESIDENT_ATTR_IS_INDEXED)) {
+			ntfs_error(ctx->ntfs_ino->vol->sb,
+					"Unindexed file name. You should run chkdsk.");
+			return -EIO;
+		}
+		file_name_attr =3D (struct file_name_attr *)((u8 *)attr +
+				le16_to_cpu(attr->data.resident.value_offset));
+		p2 =3D (u8 *)file_name_attr + le32_to_cpu(attr->data.resident.value_leng=
th);
+		if (p2 < (u8 *)attr || p2 > p)
+			goto err_corrupt_attr;
+		/* This attribute is ok, but is it in the $Extend directory? */
+		if (MREF_LE(file_name_attr->parent_directory) =3D=3D FILE_Extend) {
+			unsigned char *s;
+
+			s =3D ntfs_attr_name_get(ctx->ntfs_ino->vol,
+					file_name_attr->file_name,
+					file_name_attr->file_name_length);
+			if (!s)
+				return 1;
+			if (!strcmp("$Reparse", s)) {
+				ntfs_attr_name_free(&s);
+				return 2; /* it's reparse point file */
+			}
+			ntfs_attr_name_free(&s);
+			return 1;	/* YES, it's an extended system file. */
+		}
+	}
+	if (unlikely(err !=3D -ENOENT))
+		return err;
+	if (unlikely(nr_links)) {
+		ntfs_error(ctx->ntfs_ino->vol->sb,
+			"Inode hard link count doesn't match number of name attributes. You sho=
uld run chkdsk.");
+		return -EIO;
+	}
+	return 0;	/* NO, it is not an extended system file. */
+}
+
+static struct lock_class_key ntfs_dir_inval_lock_key;
+
+void ntfs_set_vfs_operations(struct inode *inode, mode_t mode, dev_t dev)
+{
+	if (S_ISDIR(mode)) {
+		if (!NInoAttr(NTFS_I(inode))) {
+			inode->i_op =3D &ntfs_dir_inode_ops;
+			inode->i_fop =3D &ntfs_dir_ops;
+		}
+		if (NInoMstProtected(NTFS_I(inode)))
+			inode->i_mapping->a_ops =3D &ntfs_mst_aops;
+		else
+			inode->i_mapping->a_ops =3D &ntfs_normal_aops;
+		lockdep_set_class(&inode->i_mapping->invalidate_lock,
+				  &ntfs_dir_inval_lock_key);
+	} else if (S_ISLNK(mode)) {
+		inode->i_op =3D &ntfs_symlink_inode_operations;
+		inode->i_mapping->a_ops =3D &ntfs_normal_aops;
+	} else if (S_ISCHR(mode) || S_ISBLK(mode) || S_ISFIFO(mode) || S_ISSOCK(m=
ode)) {
+		inode->i_op =3D &ntfsp_special_inode_operations;
+		init_special_inode(inode, inode->i_mode, dev);
+	} else {
+		if (!NInoAttr(NTFS_I(inode))) {
+			inode->i_op =3D &ntfs_file_inode_ops;
+			inode->i_fop =3D &ntfs_file_ops;
+		}
+		if (NInoMstProtected(NTFS_I(inode)))
+			inode->i_mapping->a_ops =3D &ntfs_mst_aops;
+		else if (NInoCompressed(NTFS_I(inode)))
+			inode->i_mapping->a_ops =3D &ntfs_compressed_aops;
+		else
+			inode->i_mapping->a_ops =3D &ntfs_normal_aops;
+	}
+}
+
+__le16 R[3] =3D { cpu_to_le16('$'), cpu_to_le16('R'), 0 };
+
+/**
+ * ntfs_read_locked_inode - read an inode from its device
+ * @vi:		inode to read
+ *
+ * ntfs_read_locked_inode() is called from ntfs_iget() to read the inode
+ * described by @vi into memory from the device.
+ *
+ * The only fields in @vi that we need to/can look at when the function is
+ * called are i_sb, pointing to the mounted device's super block, and i_in=
o,
+ * the number of the inode to load.
+ *
+ * ntfs_read_locked_inode() maps, pins and locks the mft record number i_i=
no
+ * for reading and sets up the necessary @vi fields as well as initializing
+ * the ntfs inode.
+ *
+ * Q: What locks are held when the function is called?
+ * A: i_state has I_NEW set, hence the inode is locked, also
+ *    i_count is set to 1, so it is not going to go away
+ *    i_flags is set to 0 and we have no business touching it.  Only an io=
ctl()
+ *    is allowed to write to them. We should of course be honouring them b=
ut
+ *    we need to do that using the IS_* macros defined in include/linux/fs=
.h.
+ *    In any case ntfs_read_locked_inode() has nothing to do with i_flags.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_read_locked_inode(struct inode *vi)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(vi->i_sb);
+	struct ntfs_inode *ni;
+	struct mft_record *m;
+	struct attr_record *a;
+	struct standard_information *si;
+	struct ntfs_attr_search_ctx *ctx;
+	int err =3D 0;
+	__le16 *name =3D I30;
+	unsigned int name_len =3D 4, flags =3D 0;
+	int extend_sys =3D 0;
+	dev_t dev =3D 0;
+	bool vol_err =3D true;
+
+	ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
+
+	if (uid_valid(vol->uid)) {
+		vi->i_uid =3D vol->uid;
+		flags |=3D NTFS_VOL_UID;
+	} else
+		vi->i_uid =3D GLOBAL_ROOT_UID;
+
+	if (gid_valid(vol->gid)) {
+		vi->i_gid =3D vol->gid;
+		flags |=3D NTFS_VOL_GID;
+	} else
+		vi->i_gid =3D GLOBAL_ROOT_GID;
+
+	vi->i_mode =3D 0777;
+
+	/*
+	 * Initialize the ntfs specific part of @vi special casing
+	 * FILE_MFT which we need to do at mount time.
+	 */
+	if (vi->i_ino !=3D FILE_MFT)
+		ntfs_init_big_inode(vi);
+	ni =3D NTFS_I(vi);
+
+	m =3D map_mft_record(ni);
+	if (IS_ERR(m)) {
+		err =3D PTR_ERR(m);
+		goto err_out;
+	}
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, m);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto unm_err_out;
+	}
+
+	if (!(m->flags & MFT_RECORD_IN_USE)) {
+		err =3D -ENOENT;
+		vol_err =3D false;
+		goto unm_err_out;
+	}
+
+	if (m->base_mft_record) {
+		ntfs_error(vi->i_sb, "Inode is an extent inode!");
+		goto unm_err_out;
+	}
+
+	/* Transfer information from mft record into vfs and ntfs inodes. */
+	vi->i_generation =3D ni->seq_no =3D le16_to_cpu(m->sequence_number);
+
+	if (le16_to_cpu(m->link_count) < 1) {
+		ntfs_error(vi->i_sb, "Inode link count is 0!");
+		goto unm_err_out;
+	}
+	set_nlink(vi, le16_to_cpu(m->link_count));
+
+	/* If read-only, no one gets write permissions. */
+	if (IS_RDONLY(vi))
+		vi->i_mode &=3D ~0222;
+
+	/*
+	 * Find the standard information attribute in the mft record. At this
+	 * stage we haven't setup the attribute list stuff yet, so this could
+	 * in fact fail if the standard information is in an extent record, but
+	 * I don't think this actually ever happens.
+	 */
+	ntfs_attr_reinit_search_ctx(ctx);
+	err =3D ntfs_attr_lookup(AT_STANDARD_INFORMATION, NULL, 0, 0, 0, NULL, 0,
+			ctx);
+	if (unlikely(err)) {
+		if (err =3D=3D -ENOENT)
+			ntfs_error(vi->i_sb, "$STANDARD_INFORMATION attribute is missing.");
+		goto unm_err_out;
+	}
+	a =3D ctx->attr;
+	/* Get the standard information attribute value. */
+	if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset)
+			+ le32_to_cpu(a->data.resident.value_length) >
+			(u8 *)ctx->mrec + vol->mft_record_size) {
+		ntfs_error(vi->i_sb, "Corrupt standard information attribute in inode.");
+		goto unm_err_out;
+	}
+	si =3D (struct standard_information *)((u8 *)a +
+			le16_to_cpu(a->data.resident.value_offset));
+
+	/* Transfer information from the standard information into vi. */
+	/*
+	 * Note: The i_?times do not quite map perfectly onto the NTFS times,
+	 * but they are close enough, and in the end it doesn't really matter
+	 * that much...
+	 */
+	/*
+	 * mtime is the last change of the data within the file. Not changed
+	 * when only metadata is changed, e.g. a rename doesn't affect mtime.
+	 */
+	ni->i_crtime =3D ntfs2utc(si->creation_time);
+
+	inode_set_mtime_to_ts(vi, ntfs2utc(si->last_data_change_time));
+	/*
+	 * ctime is the last change of the metadata of the file. This obviously
+	 * always changes, when mtime is changed. ctime can be changed on its
+	 * own, mtime is then not changed, e.g. when a file is renamed.
+	 */
+	inode_set_ctime_to_ts(vi, ntfs2utc(si->last_mft_change_time));
+	/*
+	 * Last access to the data within the file. Not changed during a rename
+	 * for example but changed whenever the file is written to.
+	 */
+	inode_set_atime_to_ts(vi, ntfs2utc(si->last_access_time));
+	ni->flags =3D si->file_attributes;
+
+	/* Find the attribute list attribute if present. */
+	ntfs_attr_reinit_search_ctx(ctx);
+	err =3D ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0, 0, 0, NULL, 0, ctx);
+	if (err) {
+		if (unlikely(err !=3D -ENOENT)) {
+			ntfs_error(vi->i_sb, "Failed to lookup attribute list attribute.");
+			goto unm_err_out;
+		}
+	} else {
+		if (vi->i_ino =3D=3D FILE_MFT)
+			goto skip_attr_list_load;
+		ntfs_debug("Attribute list found in inode 0x%lx.", vi->i_ino);
+		NInoSetAttrList(ni);
+		a =3D ctx->attr;
+		if (a->flags & ATTR_COMPRESSION_MASK) {
+			ntfs_error(vi->i_sb,
+				"Attribute list attribute is compressed.");
+			goto unm_err_out;
+		}
+		if (a->flags & ATTR_IS_ENCRYPTED ||
+				a->flags & ATTR_IS_SPARSE) {
+			if (a->non_resident) {
+				ntfs_error(vi->i_sb,
+					"Non-resident attribute list attribute is encrypted/sparse.");
+				goto unm_err_out;
+			}
+			ntfs_warning(vi->i_sb,
+				"Resident attribute list attribute in inode 0x%lx is marked encrypted/=
sparse which is not true.  However, Windows allows this and chkdsk does not=
 detect or correct it so we will just ignore the invalid flags and pretend =
they are not set.",
+				vi->i_ino);
+		}
+		/* Now allocate memory for the attribute list. */
+		ni->attr_list_size =3D (u32)ntfs_attr_size(a);
+		if (!ni->attr_list_size) {
+			ntfs_error(vi->i_sb, "Attr_list_size is zero");
+			goto unm_err_out;
+		}
+		ni->attr_list =3D ntfs_malloc_nofs(ni->attr_list_size);
+		if (!ni->attr_list) {
+			ntfs_error(vi->i_sb,
+				"Not enough memory to allocate buffer for attribute list.");
+			err =3D -ENOMEM;
+			goto unm_err_out;
+		}
+		if (a->non_resident) {
+			NInoSetAttrListNonResident(ni);
+			if (a->data.non_resident.lowest_vcn) {
+				ntfs_error(vi->i_sb, "Attribute list has non zero lowest_vcn.");
+				goto unm_err_out;
+			}
+
+			/* Now load the attribute list. */
+			err =3D load_attribute_list(ni, ni->attr_list, ni->attr_list_size);
+			if (err) {
+				ntfs_error(vi->i_sb, "Failed to load attribute list attribute.");
+				goto unm_err_out;
+			}
+		} else /* if (!a->non_resident) */ {
+			if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset)
+					+ le32_to_cpu(
+					a->data.resident.value_length) >
+					(u8 *)ctx->mrec + vol->mft_record_size) {
+				ntfs_error(vi->i_sb, "Corrupt attribute list in inode.");
+				goto unm_err_out;
+			}
+			/* Now copy the attribute list. */
+			memcpy(ni->attr_list, (u8 *)a + le16_to_cpu(
+					a->data.resident.value_offset),
+					le32_to_cpu(
+					a->data.resident.value_length));
+		}
+	}
+skip_attr_list_load:
+	err =3D ntfs_attr_lookup(AT_EA_INFORMATION, NULL, 0, 0, 0, NULL, 0, ctx);
+	if (!err)
+		NInoSetHasEA(ni);
+
+	ntfs_ea_get_wsl_inode(vi, &dev, flags);
+
+	if (m->flags & MFT_RECORD_IS_DIRECTORY) {
+		vi->i_mode |=3D S_IFDIR;
+		/*
+		 * Apply the directory permissions mask set in the mount
+		 * options.
+		 */
+		vi->i_mode &=3D ~vol->dmask;
+		/* Things break without this kludge! */
+		if (vi->i_nlink > 1)
+			set_nlink(vi, 1);
+	} else {
+		if (ni->flags & FILE_ATTR_REPARSE_POINT) {
+			unsigned int mode;
+
+			mode =3D ntfs_make_symlink(ni);
+			if (mode)
+				vi->i_mode |=3D mode;
+			else {
+				vi->i_mode &=3D ~S_IFLNK;
+				vi->i_mode |=3D S_IFREG;
+			}
+		} else
+			vi->i_mode |=3D S_IFREG;
+		/* Apply the file permissions mask set in the mount options. */
+		vi->i_mode &=3D ~vol->fmask;
+	}
+
+	/*
+	 * If an attribute list is present we now have the attribute list value
+	 * in ntfs_ino->attr_list and it is ntfs_ino->attr_list_size bytes.
+	 */
+	if (S_ISDIR(vi->i_mode)) {
+		struct index_root *ir;
+		u8 *ir_end, *index_end;
+
+view_index_meta:
+		/* It is a directory, find index root attribute. */
+		ntfs_attr_reinit_search_ctx(ctx);
+		err =3D ntfs_attr_lookup(AT_INDEX_ROOT, name, name_len, CASE_SENSITIVE,
+				0, NULL, 0, ctx);
+		if (unlikely(err)) {
+			if (err =3D=3D -ENOENT)
+				ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is missing.");
+			goto unm_err_out;
+		}
+		a =3D ctx->attr;
+		/* Set up the state. */
+		if (unlikely(a->non_resident)) {
+			ntfs_error(vol->sb,
+				"$INDEX_ROOT attribute is not resident.");
+			goto unm_err_out;
+		}
+		/* Ensure the attribute name is placed before the value. */
+		if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=3D
+				le16_to_cpu(a->data.resident.value_offset)))) {
+			ntfs_error(vol->sb,
+				"$INDEX_ROOT attribute name is placed after the attribute value.");
+			goto unm_err_out;
+		}
+		/*
+		 * Compressed/encrypted index root just means that the newly
+		 * created files in that directory should be created compressed/
+		 * encrypted. However index root cannot be both compressed and
+		 * encrypted.
+		 */
+		if (a->flags & ATTR_COMPRESSION_MASK) {
+			NInoSetCompressed(ni);
+			ni->flags |=3D FILE_ATTR_COMPRESSED;
+		}
+		if (a->flags & ATTR_IS_ENCRYPTED) {
+			if (a->flags & ATTR_COMPRESSION_MASK) {
+				ntfs_error(vi->i_sb, "Found encrypted and compressed attribute.");
+				goto unm_err_out;
+			}
+			NInoSetEncrypted(ni);
+			ni->flags |=3D FILE_ATTR_ENCRYPTED;
+		}
+		if (a->flags & ATTR_IS_SPARSE) {
+			NInoSetSparse(ni);
+			ni->flags |=3D FILE_ATTR_SPARSE_FILE;
+		}
+		ir =3D (struct index_root *)((u8 *)a +
+				le16_to_cpu(a->data.resident.value_offset));
+		ir_end =3D (u8 *)ir + le32_to_cpu(a->data.resident.value_length);
+		if (ir_end > (u8 *)ctx->mrec + vol->mft_record_size) {
+			ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is corrupt.");
+			goto unm_err_out;
+		}
+		index_end =3D (u8 *)&ir->index +
+				le32_to_cpu(ir->index.index_length);
+		if (index_end > ir_end) {
+			ntfs_error(vi->i_sb, "Directory index is corrupt.");
+			goto unm_err_out;
+		}
+
+		if (extend_sys) {
+			if (ir->type) {
+				ntfs_error(vi->i_sb, "Indexed attribute is not zero.");
+				goto unm_err_out;
+			}
+		} else {
+			if (ir->type !=3D AT_FILE_NAME) {
+				ntfs_error(vi->i_sb, "Indexed attribute is not $FILE_NAME.");
+				goto unm_err_out;
+			}
+
+			if (ir->collation_rule !=3D COLLATION_FILE_NAME) {
+				ntfs_error(vi->i_sb,
+					"Index collation rule is not COLLATION_FILE_NAME.");
+				goto unm_err_out;
+			}
+		}
+
+		ni->itype.index.collation_rule =3D ir->collation_rule;
+		ni->itype.index.block_size =3D le32_to_cpu(ir->index_block_size);
+		if (ni->itype.index.block_size &
+				(ni->itype.index.block_size - 1)) {
+			ntfs_error(vi->i_sb, "Index block size (%u) is not a power of two.",
+					ni->itype.index.block_size);
+			goto unm_err_out;
+		}
+		if (ni->itype.index.block_size > PAGE_SIZE) {
+			ntfs_error(vi->i_sb,
+				"Index block size (%u) > PAGE_SIZE (%ld) is not supported.",
+				ni->itype.index.block_size,
+				PAGE_SIZE);
+			err =3D -EOPNOTSUPP;
+			goto unm_err_out;
+		}
+		if (ni->itype.index.block_size < NTFS_BLOCK_SIZE) {
+			ntfs_error(vi->i_sb,
+				"Index block size (%u) < NTFS_BLOCK_SIZE (%i) is not supported.",
+				ni->itype.index.block_size,
+				NTFS_BLOCK_SIZE);
+			err =3D -EOPNOTSUPP;
+			goto unm_err_out;
+		}
+		ni->itype.index.block_size_bits =3D
+				ffs(ni->itype.index.block_size) - 1;
+		/* Determine the size of a vcn in the directory index. */
+		if (vol->cluster_size <=3D ni->itype.index.block_size) {
+			ni->itype.index.vcn_size =3D vol->cluster_size;
+			ni->itype.index.vcn_size_bits =3D vol->cluster_size_bits;
+		} else {
+			ni->itype.index.vcn_size =3D vol->sector_size;
+			ni->itype.index.vcn_size_bits =3D vol->sector_size_bits;
+		}
+
+		/* Setup the index allocation attribute, even if not present. */
+		ni->type =3D AT_INDEX_ROOT;
+		ni->name =3D name;
+		ni->name_len =3D name_len;
+		vi->i_size =3D ni->initialized_size =3D ni->data_size =3D
+			le32_to_cpu(a->data.resident.value_length);
+		ni->allocated_size =3D (ni->data_size + 7) & ~7;
+		/* We are done with the mft record, so we release it. */
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(ni);
+		m =3D NULL;
+		ctx =3D NULL;
+		/* Setup the operations for this inode. */
+		ntfs_set_vfs_operations(vi, S_IFDIR, 0);
+		if (ir->index.flags & LARGE_INDEX)
+			NInoSetIndexAllocPresent(ni);
+	} else {
+		/* It is a file. */
+		ntfs_attr_reinit_search_ctx(ctx);
+
+		/* Setup the data attribute, even if not present. */
+		ni->type =3D AT_DATA;
+		ni->name =3D AT_UNNAMED;
+		ni->name_len =3D 0;
+
+		/* Find first extent of the unnamed data attribute. */
+		err =3D ntfs_attr_lookup(AT_DATA, NULL, 0, 0, 0, NULL, 0, ctx);
+		if (unlikely(err)) {
+			vi->i_size =3D ni->initialized_size =3D
+					ni->allocated_size =3D 0;
+			if (err !=3D -ENOENT) {
+				ntfs_error(vi->i_sb, "Failed to lookup $DATA attribute.");
+				goto unm_err_out;
+			}
+			/*
+			 * FILE_Secure does not have an unnamed $DATA
+			 * attribute, so we special case it here.
+			 */
+			if (vi->i_ino =3D=3D FILE_Secure)
+				goto no_data_attr_special_case;
+			/*
+			 * Most if not all the system files in the $Extend
+			 * system directory do not have unnamed data
+			 * attributes so we need to check if the parent
+			 * directory of the file is FILE_Extend and if it is
+			 * ignore this error. To do this we need to get the
+			 * name of this inode from the mft record as the name
+			 * contains the back reference to the parent directory.
+			 */
+			extend_sys =3D ntfs_is_extended_system_file(ctx);
+			if (extend_sys > 0) {
+				if (m->flags & MFT_RECORD_IS_VIEW_INDEX &&
+				    extend_sys =3D=3D 2) {
+					name =3D R;
+					name_len =3D 2;
+					goto view_index_meta;
+				}
+				goto no_data_attr_special_case;
+			}
+
+			err =3D extend_sys;
+			ntfs_error(vi->i_sb, "$DATA attribute is missing, err : %d", err);
+			goto unm_err_out;
+		}
+		a =3D ctx->attr;
+		/* Setup the state. */
+		if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_SPARSE)) {
+			if (a->flags & ATTR_COMPRESSION_MASK) {
+				NInoSetCompressed(ni);
+				ni->flags |=3D FILE_ATTR_COMPRESSED;
+				if (vol->cluster_size > 4096) {
+					ntfs_error(vi->i_sb,
+						"Found compressed data but compression is disabled due to cluster si=
ze (%i) > 4kiB.",
+						vol->cluster_size);
+					goto unm_err_out;
+				}
+				if ((a->flags & ATTR_COMPRESSION_MASK)
+						!=3D ATTR_IS_COMPRESSED) {
+					ntfs_error(vi->i_sb,
+						"Found unknown compression method or corrupt file.");
+					goto unm_err_out;
+				}
+			}
+			if (a->flags & ATTR_IS_SPARSE) {
+				NInoSetSparse(ni);
+				ni->flags |=3D FILE_ATTR_SPARSE_FILE;
+			}
+		}
+		if (a->flags & ATTR_IS_ENCRYPTED) {
+			if (NInoCompressed(ni)) {
+				ntfs_error(vi->i_sb, "Found encrypted and compressed data.");
+				goto unm_err_out;
+			}
+			NInoSetEncrypted(ni);
+			ni->flags |=3D FILE_ATTR_ENCRYPTED;
+		}
+		if (a->non_resident) {
+			NInoSetNonResident(ni);
+			if (NInoCompressed(ni) || NInoSparse(ni)) {
+				if (NInoCompressed(ni) &&
+				    a->data.non_resident.compression_unit !=3D 4) {
+					ntfs_error(vi->i_sb,
+						"Found non-standard compression unit (%u instead of 4).  Cannot hand=
le this.",
+						a->data.non_resident.compression_unit);
+					err =3D -EOPNOTSUPP;
+					goto unm_err_out;
+				}
+
+				if (NInoSparse(ni) &&
+				    a->data.non_resident.compression_unit &&
+				    a->data.non_resident.compression_unit !=3D
+				     vol->sparse_compression_unit) {
+					ntfs_error(vi->i_sb,
+						   "Found non-standard compression unit (%u instead of 0 or %d).  Ca=
nnot handle this.",
+						   a->data.non_resident.compression_unit,
+						   vol->sparse_compression_unit);
+					err =3D -EOPNOTSUPP;
+					goto unm_err_out;
+				}
+
+
+				if (a->data.non_resident.compression_unit) {
+					ni->itype.compressed.block_size =3D 1U <<
+							(a->data.non_resident.compression_unit +
+							vol->cluster_size_bits);
+					ni->itype.compressed.block_size_bits =3D
+							ffs(ni->itype.compressed.block_size) - 1;
+					ni->itype.compressed.block_clusters =3D
+							1U << a->data.non_resident.compression_unit;
+				} else {
+					ni->itype.compressed.block_size =3D 0;
+					ni->itype.compressed.block_size_bits =3D
+							0;
+					ni->itype.compressed.block_clusters =3D
+							0;
+				}
+				ni->itype.compressed.size =3D le64_to_cpu(
+						a->data.non_resident.compressed_size);
+			}
+			if (a->data.non_resident.lowest_vcn) {
+				ntfs_error(vi->i_sb,
+					"First extent of $DATA attribute has non zero lowest_vcn.");
+				goto unm_err_out;
+			}
+			vi->i_size =3D ni->data_size =3D le64_to_cpu(a->data.non_resident.data_=
size);
+			ni->initialized_size =3D le64_to_cpu(a->data.non_resident.initialized_s=
ize);
+			ni->allocated_size =3D le64_to_cpu(a->data.non_resident.allocated_size);
+		} else { /* Resident attribute. */
+			vi->i_size =3D ni->data_size =3D ni->initialized_size =3D le32_to_cpu(
+					a->data.resident.value_length);
+			ni->allocated_size =3D le32_to_cpu(a->length) -
+					le16_to_cpu(
+					a->data.resident.value_offset);
+			if (vi->i_size > ni->allocated_size) {
+				ntfs_error(vi->i_sb,
+					"Resident data attribute is corrupt (size exceeds allocation).");
+				goto unm_err_out;
+			}
+		}
+no_data_attr_special_case:
+		/* We are done with the mft record, so we release it. */
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(ni);
+		m =3D NULL;
+		ctx =3D NULL;
+		/* Setup the operations for this inode. */
+		ntfs_set_vfs_operations(vi, vi->i_mode, dev);
+	}
+
+	if (NVolSysImmutable(vol) && (ni->flags & FILE_ATTR_SYSTEM) &&
+	    !S_ISFIFO(vi->i_mode) && !S_ISSOCK(vi->i_mode) && !S_ISLNK(vi->i_mode=
))
+		vi->i_flags |=3D S_IMMUTABLE;
+
+	/*
+	 * The number of 512-byte blocks used on disk (for stat). This is in so
+	 * far inaccurate as it doesn't account for any named streams or other
+	 * special non-resident attributes, but that is how Windows works, too,
+	 * so we are at least consistent with Windows, if not entirely
+	 * consistent with the Linux Way. Doing it the Linux Way would cause a
+	 * significant slowdown as it would involve iterating over all
+	 * attributes in the mft record and adding the allocated/compressed
+	 * sizes of all non-resident attributes present to give us the Linux
+	 * correct size that should go into i_blocks (after division by 512).
+	 */
+	if (S_ISREG(vi->i_mode) && (NInoCompressed(ni) || NInoSparse(ni)))
+		vi->i_blocks =3D ni->itype.compressed.size >> 9;
+	else
+		vi->i_blocks =3D ni->allocated_size >> 9;
+
+	ntfs_debug("Done.");
+	return 0;
+unm_err_out:
+	if (!err)
+		err =3D -EIO;
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	if (m)
+		unmap_mft_record(ni);
+err_out:
+	if (err !=3D -EOPNOTSUPP && err !=3D -ENOMEM && vol_err =3D=3D true) {
+		ntfs_error(vol->sb,
+			"Failed with error code %i.  Marking corrupt inode 0x%lx as bad.  Run c=
hkdsk.",
+			err, vi->i_ino);
+		NVolSetErrors(vol);
+	}
+	return err;
+}
+
+/**
+ * ntfs_read_locked_attr_inode - read an attribute inode from its base ino=
de
+ * @base_vi:	base inode
+ * @vi:		attribute inode to read
+ *
+ * ntfs_read_locked_attr_inode() is called from ntfs_attr_iget() to read t=
he
+ * attribute inode described by @vi into memory from the base mft record
+ * described by @base_ni.
+ *
+ * ntfs_read_locked_attr_inode() maps, pins and locks the base inode for
+ * reading and looks up the attribute described by @vi before setting up t=
he
+ * necessary fields in @vi as well as initializing the ntfs inode.
+ *
+ * Q: What locks are held when the function is called?
+ * A: i_state has I_NEW set, hence the inode is locked, also
+ *    i_count is set to 1, so it is not going to go away
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Note this cannot be called for AT_INDEX_ALLOCATION.
+ */
+static int ntfs_read_locked_attr_inode(struct inode *base_vi, struct inode=
 *vi)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(vi->i_sb);
+	struct ntfs_inode *ni =3D NTFS_I(vi), *base_ni =3D NTFS_I(base_vi);
+	struct mft_record *m;
+	struct attr_record *a;
+	struct ntfs_attr_search_ctx *ctx;
+	int err =3D 0;
+
+	ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
+
+	ntfs_init_big_inode(vi);
+
+	/* Just mirror the values from the base inode. */
+	vi->i_uid	=3D base_vi->i_uid;
+	vi->i_gid	=3D base_vi->i_gid;
+	set_nlink(vi, base_vi->i_nlink);
+	inode_set_mtime_to_ts(vi, inode_get_mtime(base_vi));
+	inode_set_ctime_to_ts(vi, inode_get_ctime(base_vi));
+	inode_set_atime_to_ts(vi, inode_get_atime(base_vi));
+	vi->i_generation =3D ni->seq_no =3D base_ni->seq_no;
+
+	/* Set inode type to zero but preserve permissions. */
+	vi->i_mode	=3D base_vi->i_mode & ~S_IFMT;
+
+	m =3D map_mft_record(base_ni);
+	if (IS_ERR(m)) {
+		err =3D PTR_ERR(m);
+		goto err_out;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, m);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto unm_err_out;
+	}
+	/* Find the attribute. */
+	err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err))
+		goto unm_err_out;
+	a =3D ctx->attr;
+	if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_SPARSE)) {
+		if (a->flags & ATTR_COMPRESSION_MASK) {
+			NInoSetCompressed(ni);
+			ni->flags |=3D FILE_ATTR_COMPRESSED;
+			if ((ni->type !=3D AT_DATA) || (ni->type =3D=3D AT_DATA &&
+					ni->name_len)) {
+				ntfs_error(vi->i_sb,
+					   "Found compressed non-data or named data attribute.");
+				goto unm_err_out;
+			}
+			if (vol->cluster_size > 4096) {
+				ntfs_error(vi->i_sb,
+					"Found compressed attribute but compression is disabled due to cluste=
r size (%i) > 4kiB.",
+					vol->cluster_size);
+				goto unm_err_out;
+			}
+			if ((a->flags & ATTR_COMPRESSION_MASK) !=3D
+					ATTR_IS_COMPRESSED) {
+				ntfs_error(vi->i_sb, "Found unknown compression method.");
+				goto unm_err_out;
+			}
+		}
+		/*
+		 * The compressed/sparse flag set in an index root just means
+		 * to compress all files.
+		 */
+		if (NInoMstProtected(ni) && ni->type !=3D AT_INDEX_ROOT) {
+			ntfs_error(vi->i_sb,
+				"Found mst protected attribute but the attribute is %s.",
+				NInoCompressed(ni) ? "compressed" : "sparse");
+			goto unm_err_out;
+		}
+		if (a->flags & ATTR_IS_SPARSE) {
+			NInoSetSparse(ni);
+			ni->flags |=3D FILE_ATTR_SPARSE_FILE;
+		}
+	}
+	if (a->flags & ATTR_IS_ENCRYPTED) {
+		if (NInoCompressed(ni)) {
+			ntfs_error(vi->i_sb, "Found encrypted and compressed data.");
+			goto unm_err_out;
+		}
+		/*
+		 * The encryption flag set in an index root just means to
+		 * encrypt all files.
+		 */
+		if (NInoMstProtected(ni) && ni->type !=3D AT_INDEX_ROOT) {
+			ntfs_error(vi->i_sb,
+				"Found mst protected attribute but the attribute is encrypted.");
+			goto unm_err_out;
+		}
+		if (ni->type !=3D AT_DATA) {
+			ntfs_error(vi->i_sb,
+				"Found encrypted non-data attribute.");
+			goto unm_err_out;
+		}
+		NInoSetEncrypted(ni);
+		ni->flags |=3D FILE_ATTR_ENCRYPTED;
+	}
+	if (!a->non_resident) {
+		/* Ensure the attribute name is placed before the value. */
+		if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=3D
+				le16_to_cpu(a->data.resident.value_offset)))) {
+			ntfs_error(vol->sb,
+				"Attribute name is placed after the attribute value.");
+			goto unm_err_out;
+		}
+		if (NInoMstProtected(ni)) {
+			ntfs_error(vi->i_sb,
+				"Found mst protected attribute but the attribute is resident.");
+			goto unm_err_out;
+		}
+		vi->i_size =3D ni->initialized_size =3D ni->data_size =3D le32_to_cpu(
+				a->data.resident.value_length);
+		ni->allocated_size =3D le32_to_cpu(a->length) -
+				le16_to_cpu(a->data.resident.value_offset);
+		if (vi->i_size > ni->allocated_size) {
+			ntfs_error(vi->i_sb,
+				"Resident attribute is corrupt (size exceeds allocation).");
+			goto unm_err_out;
+		}
+	} else {
+		NInoSetNonResident(ni);
+		/*
+		 * Ensure the attribute name is placed before the mapping pairs
+		 * array.
+		 */
+		if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=3D
+				le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset)))) {
+			ntfs_error(vol->sb,
+				"Attribute name is placed after the mapping pairs array.");
+			goto unm_err_out;
+		}
+		if (NInoCompressed(ni) || NInoSparse(ni)) {
+			if (NInoCompressed(ni) && a->data.non_resident.compression_unit !=3D 4)=
 {
+				ntfs_error(vi->i_sb,
+					"Found non-standard compression unit (%u instead of 4).  Cannot handl=
e this.",
+					a->data.non_resident.compression_unit);
+				err =3D -EOPNOTSUPP;
+				goto unm_err_out;
+			}
+			if (a->data.non_resident.compression_unit) {
+				ni->itype.compressed.block_size =3D 1U <<
+						(a->data.non_resident.compression_unit +
+						vol->cluster_size_bits);
+				ni->itype.compressed.block_size_bits =3D
+						ffs(ni->itype.compressed.block_size) - 1;
+				ni->itype.compressed.block_clusters =3D 1U <<
+						a->data.non_resident.compression_unit;
+			} else {
+				ni->itype.compressed.block_size =3D 0;
+				ni->itype.compressed.block_size_bits =3D 0;
+				ni->itype.compressed.block_clusters =3D 0;
+			}
+			ni->itype.compressed.size =3D le64_to_cpu(
+					a->data.non_resident.compressed_size);
+		}
+		if (a->data.non_resident.lowest_vcn) {
+			ntfs_error(vi->i_sb, "First extent of attribute has non-zero lowest_vcn=
.");
+			goto unm_err_out;
+		}
+		vi->i_size =3D ni->data_size =3D le64_to_cpu(a->data.non_resident.data_s=
ize);
+		ni->initialized_size =3D le64_to_cpu(a->data.non_resident.initialized_si=
ze);
+		ni->allocated_size =3D le64_to_cpu(a->data.non_resident.allocated_size);
+	}
+	vi->i_mapping->a_ops =3D &ntfs_normal_aops;
+	if (NInoMstProtected(ni))
+		vi->i_mapping->a_ops =3D &ntfs_mst_aops;
+	else if (NInoCompressed(ni))
+		vi->i_mapping->a_ops =3D &ntfs_compressed_aops;
+	if ((NInoCompressed(ni) || NInoSparse(ni)) && ni->type !=3D AT_INDEX_ROOT)
+		vi->i_blocks =3D ni->itype.compressed.size >> 9;
+	else
+		vi->i_blocks =3D ni->allocated_size >> 9;
+	/*
+	 * Make sure the base inode does not go away and attach it to the
+	 * attribute inode.
+	 */
+	if (!igrab(base_vi)) {
+		err =3D -ENOENT;
+		goto unm_err_out;
+	}
+	ni->ext.base_ntfs_ino =3D base_ni;
+	ni->nr_extents =3D -1;
+
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(base_ni);
+
+	ntfs_debug("Done.");
+	return 0;
+
+unm_err_out:
+	if (!err)
+		err =3D -EIO;
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(base_ni);
+err_out:
+	if (err !=3D -ENOENT)
+		ntfs_error(vol->sb,
+			"Failed with error code %i while reading attribute inode (mft_no 0x%lx,=
 type 0x%x, name_len %i).  Marking corrupt inode and base inode 0x%lx as ba=
d.  Run chkdsk.",
+			err, vi->i_ino, ni->type, ni->name_len,
+			base_vi->i_ino);
+	if (err !=3D -ENOENT && err !=3D -ENOMEM)
+		NVolSetErrors(vol);
+	return err;
+}
+
+/**
+ * ntfs_read_locked_index_inode - read an index inode from its base inode
+ * @base_vi:	base inode
+ * @vi:		index inode to read
+ *
+ * ntfs_read_locked_index_inode() is called from ntfs_index_iget() to read=
 the
+ * index inode described by @vi into memory from the base mft record descr=
ibed
+ * by @base_ni.
+ *
+ * ntfs_read_locked_index_inode() maps, pins and locks the base inode for
+ * reading and looks up the attributes relating to the index described by =
@vi
+ * before setting up the necessary fields in @vi as well as initializing t=
he
+ * ntfs inode.
+ *
+ * Note, index inodes are essentially attribute inodes (NInoAttr() is true)
+ * with the attribute type set to AT_INDEX_ALLOCATION.  Apart from that, t=
hey
+ * are setup like directory inodes since directories are a special case of
+ * indices ao they need to be treated in much the same way.  Most importan=
tly,
+ * for small indices the index allocation attribute might not actually exi=
st.
+ * However, the index root attribute always exists but this does not need =
to
+ * have an inode associated with it and this is why we define a new inode =
type
+ * index.  Also, like for directories, we need to have an attribute inode =
for
+ * the bitmap attribute corresponding to the index allocation attribute an=
d we
+ * can store this in the appropriate field of the inode, just like we do f=
or
+ * normal directory inodes.
+ *
+ * Q: What locks are held when the function is called?
+ * A: i_state has I_NEW set, hence the inode is locked, also
+ *    i_count is set to 1, so it is not going to go away
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_read_locked_index_inode(struct inode *base_vi, struct inod=
e *vi)
+{
+	loff_t bvi_size;
+	struct ntfs_volume *vol =3D NTFS_SB(vi->i_sb);
+	struct ntfs_inode *ni =3D NTFS_I(vi), *base_ni =3D NTFS_I(base_vi), *bni;
+	struct inode *bvi;
+	struct mft_record *m;
+	struct attr_record *a;
+	struct ntfs_attr_search_ctx *ctx;
+	struct index_root *ir;
+	u8 *ir_end, *index_end;
+	int err =3D 0;
+
+	ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
+	lockdep_assert_held(&base_ni->mrec_lock);
+
+	ntfs_init_big_inode(vi);
+	/* Just mirror the values from the base inode. */
+	vi->i_uid	=3D base_vi->i_uid;
+	vi->i_gid	=3D base_vi->i_gid;
+	set_nlink(vi, base_vi->i_nlink);
+	inode_set_mtime_to_ts(vi, inode_get_mtime(base_vi));
+	inode_set_ctime_to_ts(vi, inode_get_ctime(base_vi));
+	inode_set_atime_to_ts(vi, inode_get_atime(base_vi));
+	vi->i_generation =3D ni->seq_no =3D base_ni->seq_no;
+	/* Set inode type to zero but preserve permissions. */
+	vi->i_mode	=3D base_vi->i_mode & ~S_IFMT;
+	/* Map the mft record for the base inode. */
+	m =3D map_mft_record(base_ni);
+	if (IS_ERR(m)) {
+		err =3D PTR_ERR(m);
+		goto err_out;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(base_ni, m);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto unm_err_out;
+	}
+	/* Find the index root attribute. */
+	err =3D ntfs_attr_lookup(AT_INDEX_ROOT, ni->name, ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err)) {
+		if (err =3D=3D -ENOENT)
+			ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is missing.");
+		goto unm_err_out;
+	}
+	a =3D ctx->attr;
+	/* Set up the state. */
+	if (unlikely(a->non_resident)) {
+		ntfs_error(vol->sb, "$INDEX_ROOT attribute is not resident.");
+		goto unm_err_out;
+	}
+	/* Ensure the attribute name is placed before the value. */
+	if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=3D
+			le16_to_cpu(a->data.resident.value_offset)))) {
+		ntfs_error(vol->sb,
+			"$INDEX_ROOT attribute name is placed after the attribute value.");
+		goto unm_err_out;
+	}
+
+	ir =3D (struct index_root *)((u8 *)a + le16_to_cpu(a->data.resident.value=
_offset));
+	ir_end =3D (u8 *)ir + le32_to_cpu(a->data.resident.value_length);
+	if (ir_end > (u8 *)ctx->mrec + vol->mft_record_size) {
+		ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is corrupt.");
+		goto unm_err_out;
+	}
+	index_end =3D (u8 *)&ir->index + le32_to_cpu(ir->index.index_length);
+	if (index_end > ir_end) {
+		ntfs_error(vi->i_sb, "Index is corrupt.");
+		goto unm_err_out;
+	}
+
+	ni->itype.index.collation_rule =3D ir->collation_rule;
+	ntfs_debug("Index collation rule is 0x%x.",
+			le32_to_cpu(ir->collation_rule));
+	ni->itype.index.block_size =3D le32_to_cpu(ir->index_block_size);
+	if (!is_power_of_2(ni->itype.index.block_size)) {
+		ntfs_error(vi->i_sb, "Index block size (%u) is not a power of two.",
+				ni->itype.index.block_size);
+		goto unm_err_out;
+	}
+	if (ni->itype.index.block_size > PAGE_SIZE) {
+		ntfs_error(vi->i_sb, "Index block size (%u) > PAGE_SIZE (%ld) is not sup=
ported.",
+				ni->itype.index.block_size, PAGE_SIZE);
+		err =3D -EOPNOTSUPP;
+		goto unm_err_out;
+	}
+	if (ni->itype.index.block_size < NTFS_BLOCK_SIZE) {
+		ntfs_error(vi->i_sb,
+				"Index block size (%u) < NTFS_BLOCK_SIZE (%i) is not supported.",
+				ni->itype.index.block_size, NTFS_BLOCK_SIZE);
+		err =3D -EOPNOTSUPP;
+		goto unm_err_out;
+	}
+	ni->itype.index.block_size_bits =3D ffs(ni->itype.index.block_size) - 1;
+	/* Determine the size of a vcn in the index. */
+	if (vol->cluster_size <=3D ni->itype.index.block_size) {
+		ni->itype.index.vcn_size =3D vol->cluster_size;
+		ni->itype.index.vcn_size_bits =3D vol->cluster_size_bits;
+	} else {
+		ni->itype.index.vcn_size =3D vol->sector_size;
+		ni->itype.index.vcn_size_bits =3D vol->sector_size_bits;
+	}
+
+	/* Find index allocation attribute. */
+	ntfs_attr_reinit_search_ctx(ctx);
+	err =3D ntfs_attr_lookup(AT_INDEX_ALLOCATION, ni->name, ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err)) {
+		if (err =3D=3D -ENOENT) {
+			/* No index allocation. */
+			vi->i_size =3D ni->initialized_size =3D ni->allocated_size =3D 0;
+			/* We are done with the mft record, so we release it. */
+			ntfs_attr_put_search_ctx(ctx);
+			unmap_mft_record(base_ni);
+			m =3D NULL;
+			ctx =3D NULL;
+			goto skip_large_index_stuff;
+		} else
+			ntfs_error(vi->i_sb, "Failed to lookup $INDEX_ALLOCATION attribute.");
+		goto unm_err_out;
+	}
+	NInoSetIndexAllocPresent(ni);
+	NInoSetNonResident(ni);
+	ni->type =3D AT_INDEX_ALLOCATION;
+
+	a =3D ctx->attr;
+	if (!a->non_resident) {
+		ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is resident.");
+		goto unm_err_out;
+	}
+	/*
+	 * Ensure the attribute name is placed before the mapping pairs array.
+	 */
+	if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=3D
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset)))) {
+		ntfs_error(vol->sb,
+			"$INDEX_ALLOCATION attribute name is placed after the mapping pairs arr=
ay.");
+		goto unm_err_out;
+	}
+	if (a->flags & ATTR_IS_ENCRYPTED) {
+		ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is encrypted.");
+		goto unm_err_out;
+	}
+	if (a->flags & ATTR_IS_SPARSE) {
+		ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is sparse.");
+		goto unm_err_out;
+	}
+	if (a->flags & ATTR_COMPRESSION_MASK) {
+		ntfs_error(vi->i_sb,
+			"$INDEX_ALLOCATION attribute is compressed.");
+		goto unm_err_out;
+	}
+	if (a->data.non_resident.lowest_vcn) {
+		ntfs_error(vi->i_sb,
+			"First extent of $INDEX_ALLOCATION attribute has non zero lowest_vcn.");
+		goto unm_err_out;
+	}
+	vi->i_size =3D ni->data_size =3D le64_to_cpu(a->data.non_resident.data_si=
ze);
+	ni->initialized_size =3D le64_to_cpu(a->data.non_resident.initialized_siz=
e);
+	ni->allocated_size =3D le64_to_cpu(a->data.non_resident.allocated_size);
+	/*
+	 * We are done with the mft record, so we release it.  Otherwise
+	 * we would deadlock in ntfs_attr_iget().
+	 */
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(base_ni);
+	m =3D NULL;
+	ctx =3D NULL;
+	/* Get the index bitmap attribute inode. */
+	bvi =3D ntfs_attr_iget(base_vi, AT_BITMAP, ni->name, ni->name_len);
+	if (IS_ERR(bvi)) {
+		ntfs_error(vi->i_sb, "Failed to get bitmap attribute.");
+		err =3D PTR_ERR(bvi);
+		goto unm_err_out;
+	}
+	bni =3D NTFS_I(bvi);
+	if (NInoCompressed(bni) || NInoEncrypted(bni) ||
+			NInoSparse(bni)) {
+		ntfs_error(vi->i_sb,
+			"$BITMAP attribute is compressed and/or encrypted and/or sparse.");
+		goto iput_unm_err_out;
+	}
+	/* Consistency check bitmap size vs. index allocation size. */
+	bvi_size =3D i_size_read(bvi);
+	if ((bvi_size << 3) < (vi->i_size >> ni->itype.index.block_size_bits)) {
+		ntfs_error(vi->i_sb,
+			"Index bitmap too small (0x%llx) for index allocation (0x%llx).",
+			bvi_size << 3, vi->i_size);
+		goto iput_unm_err_out;
+	}
+	iput(bvi);
+skip_large_index_stuff:
+	/* Setup the operations for this index inode. */
+	ntfs_set_vfs_operations(vi, S_IFDIR, 0);
+	vi->i_blocks =3D ni->allocated_size >> 9;
+	/*
+	 * Make sure the base inode doesn't go away and attach it to the
+	 * index inode.
+	 */
+	if (!igrab(base_vi))
+		goto unm_err_out;
+	ni->ext.base_ntfs_ino =3D base_ni;
+	ni->nr_extents =3D -1;
+
+	ntfs_debug("Done.");
+	return 0;
+iput_unm_err_out:
+	iput(bvi);
+unm_err_out:
+	if (!err)
+		err =3D -EIO;
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	if (m)
+		unmap_mft_record(base_ni);
+err_out:
+	ntfs_error(vi->i_sb,
+		"Failed with error code %i while reading index inode (mft_no 0x%lx, name=
_len %i.",
+		err, vi->i_ino, ni->name_len);
+	if (err !=3D -EOPNOTSUPP && err !=3D -ENOMEM)
+		NVolSetErrors(vol);
+	return err;
+}
+
+/**
+ * load_attribute_list_mount - load an attribute list into memory
+ * @vol:		ntfs volume from which to read
+ * @runlist:		runlist of the attribute list
+ * @al_start:		destination buffer
+ * @size:		size of the destination buffer in bytes
+ * @initialized_size:	initialized size of the attribute list
+ *
+ * Walk the runlist @runlist and load all clusters from it copying them in=
to
+ * the linear buffer @al. The maximum number of bytes copied to @al is @si=
ze
+ * bytes. Note, @size does not need to be a multiple of the cluster size. =
If
+ * @initialized_size is less than @size, the region in @al between
+ * @initialized_size and @size will be zeroed and not read from disk.
+ *
+ * Return 0 on success or -errno on error.
+ */
+static int load_attribute_list_mount(struct ntfs_volume *vol,
+		struct runlist_element *rl, u8 *al_start, const s64 size,
+		const s64 initialized_size)
+{
+	s64 lcn;
+	u8 *al =3D al_start;
+	u8 *al_end =3D al + initialized_size;
+	struct super_block *sb;
+	int err =3D 0;
+	loff_t rl_byte_off, rl_byte_len;
+
+	ntfs_debug("Entering.");
+	if (!vol || !rl || !al || size <=3D 0 || initialized_size < 0 ||
+			initialized_size > size)
+		return -EINVAL;
+	if (!initialized_size) {
+		memset(al, 0, size);
+		return 0;
+	}
+	sb =3D vol->sb;
+
+	/* Read all clusters specified by the runlist one run at a time. */
+	while (rl->length) {
+		lcn =3D ntfs_rl_vcn_to_lcn(rl, rl->vcn);
+		ntfs_debug("Reading vcn =3D 0x%llx, lcn =3D 0x%llx.",
+				(unsigned long long)rl->vcn,
+				(unsigned long long)lcn);
+		/* The attribute list cannot be sparse. */
+		if (lcn < 0) {
+			ntfs_error(sb, "ntfs_rl_vcn_to_lcn() failed. Cannot read attribute list=
.");
+			goto err_out;
+		}
+
+		rl_byte_off =3D lcn << vol->cluster_size_bits;
+		rl_byte_len =3D rl->length << vol->cluster_size_bits;
+
+		if (al + rl_byte_len > al_end)
+			rl_byte_len =3D al_end - al;
+
+		err =3D ntfs_dev_read(sb, al, rl_byte_off, rl_byte_len);
+		if (err) {
+			ntfs_error(sb, "Cannot read attribute list.");
+			goto err_out;
+		}
+
+		if (al + rl_byte_len >=3D al_end) {
+			if (initialized_size < size)
+				goto initialize;
+			goto done;
+		}
+
+		al +=3D rl_byte_len;
+		rl++;
+	}
+	if (initialized_size < size) {
+initialize:
+		memset(al_start + initialized_size, 0, size - initialized_size);
+	}
+done:
+	return err;
+	/* Real overflow! */
+	ntfs_error(sb, "Attribute list buffer overflow. Read attribute list is tr=
uncated.");
+err_out:
+	err =3D -EIO;
+	goto done;
+}
+/*
+ * The MFT inode has special locking, so teach the lock validator
+ * about this by splitting off the locking rules of the MFT from
+ * the locking rules of other inodes. The MFT inode can never be
+ * accessed from the VFS side (or even internally), only by the
+ * map_mft functions.
+ */
+static struct lock_class_key mft_ni_runlist_lock_key, mft_ni_mrec_lock_key;
+
+/**
+ * ntfs_read_inode_mount - special read_inode for mount time use only
+ * @vi:		inode to read
+ *
+ * Read inode FILE_MFT at mount time, only called with super_block lock
+ * held from within the read_super() code path.
+ *
+ * This function exists because when it is called the page cache for $MFT/=
$DATA
+ * is not initialized and hence we cannot get at the contents of mft recor=
ds
+ * by calling map_mft_record*().
+ *
+ * Further it needs to cope with the circular references problem, i.e. can=
not
+ * load any attributes other than $ATTRIBUTE_LIST until $DATA is loaded, b=
ecause
+ * we do not know where the other extent mft records are yet and again, be=
cause
+ * we cannot call map_mft_record*() yet.  Obviously this applies only when=
 an
+ * attribute list is actually present in $MFT inode.
+ *
+ * We solve these problems by starting with the $DATA attribute before any=
thing
+ * else and iterating using ntfs_attr_lookup($DATA) over all extents.  As =
each
+ * extent is found, we ntfs_mapping_pairs_decompress() including the impli=
ed
+ * ntfs_runlists_merge().  Each step of the iteration necessarily provides
+ * sufficient information for the next step to complete.
+ *
+ * This should work but there are two possible pit falls (see inline comme=
nts
+ * below), but only time will tell if they are real pits or just smoke...
+ */
+int ntfs_read_inode_mount(struct inode *vi)
+{
+	s64 next_vcn, last_vcn, highest_vcn;
+	struct super_block *sb =3D vi->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	struct ntfs_inode *ni;
+	struct mft_record *m =3D NULL;
+	struct attr_record *a;
+	struct ntfs_attr_search_ctx *ctx;
+	unsigned int i, nr_blocks;
+	int err;
+	size_t new_rl_count;
+
+	ntfs_debug("Entering.");
+
+	/* Initialize the ntfs specific part of @vi. */
+	ntfs_init_big_inode(vi);
+
+	ni =3D NTFS_I(vi);
+
+	/* Setup the data attribute. It is special as it is mst protected. */
+	NInoSetNonResident(ni);
+	NInoSetMstProtected(ni);
+	NInoSetSparseDisabled(ni);
+	ni->type =3D AT_DATA;
+	ni->name =3D AT_UNNAMED;
+	ni->name_len =3D 0;
+	/*
+	 * This sets up our little cheat allowing us to reuse the async read io
+	 * completion handler for directories.
+	 */
+	ni->itype.index.block_size =3D vol->mft_record_size;
+	ni->itype.index.block_size_bits =3D vol->mft_record_size_bits;
+
+	/* Very important! Needed to be able to call map_mft_record*(). */
+	vol->mft_ino =3D vi;
+
+	/* Allocate enough memory to read the first mft record. */
+	if (vol->mft_record_size > 64 * 1024) {
+		ntfs_error(sb, "Unsupported mft record size %i (max 64kiB).",
+				vol->mft_record_size);
+		goto err_out;
+	}
+
+	i =3D vol->mft_record_size;
+	if (i < sb->s_blocksize)
+		i =3D sb->s_blocksize;
+
+	m =3D (struct mft_record *)ntfs_malloc_nofs(i);
+	if (!m) {
+		ntfs_error(sb, "Failed to allocate buffer for $MFT record 0.");
+		goto err_out;
+	}
+
+	/* Determine the first block of the $MFT/$DATA attribute. */
+	nr_blocks =3D vol->mft_record_size >> sb->s_blocksize_bits;
+	if (!nr_blocks)
+		nr_blocks =3D 1;
+
+	/* Load $MFT/$DATA's first mft record. */
+	err =3D ntfs_dev_read(sb, m, vol->mft_lcn << vol->cluster_size_bits, i);
+	if (err) {
+		ntfs_error(sb, "Device read failed.");
+		goto err_out;
+	}
+
+	if (le32_to_cpu(m->bytes_allocated) !=3D vol->mft_record_size) {
+		ntfs_error(sb, "Incorrect mft record size %u in superblock, should be %u=
.",
+				le32_to_cpu(m->bytes_allocated), vol->mft_record_size);
+		goto err_out;
+	}
+
+	/* Apply the mst fixups. */
+	if (post_read_mst_fixup((struct ntfs_record *)m, vol->mft_record_size)) {
+		ntfs_error(sb, "MST fixup failed. $MFT is corrupt.");
+		goto err_out;
+	}
+
+	if (ntfs_mft_record_check(vol, m, FILE_MFT)) {
+		ntfs_error(sb, "ntfs_mft_record_check failed. $MFT is corrupt.");
+		goto err_out;
+	}
+
+	/* Need this to sanity check attribute list references to $MFT. */
+	vi->i_generation =3D ni->seq_no =3D le16_to_cpu(m->sequence_number);
+
+	/* Provides read_folio() for map_mft_record(). */
+	vi->i_mapping->a_ops =3D &ntfs_mst_aops;
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, m);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+	/* Find the attribute list attribute if present. */
+	err =3D ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0, 0, 0, NULL, 0, ctx);
+	if (err) {
+		if (unlikely(err !=3D -ENOENT)) {
+			ntfs_error(sb,
+				"Failed to lookup attribute list attribute. You should run chkdsk.");
+			goto put_err_out;
+		}
+	} else /* if (!err) */ {
+		struct attr_list_entry *al_entry, *next_al_entry;
+		u8 *al_end;
+		static const char *es =3D "  Not allowed.  $MFT is corrupt.  You should =
run chkdsk.";
+
+		ntfs_debug("Attribute list attribute found in $MFT.");
+		NInoSetAttrList(ni);
+		a =3D ctx->attr;
+		if (a->flags & ATTR_COMPRESSION_MASK) {
+			ntfs_error(sb,
+				"Attribute list attribute is compressed.%s",
+				es);
+			goto put_err_out;
+		}
+		if (a->flags & ATTR_IS_ENCRYPTED ||
+				a->flags & ATTR_IS_SPARSE) {
+			if (a->non_resident) {
+				ntfs_error(sb,
+					"Non-resident attribute list attribute is encrypted/sparse.%s",
+					es);
+				goto put_err_out;
+			}
+			ntfs_warning(sb,
+				"Resident attribute list attribute in $MFT system file is marked encry=
pted/sparse which is not true.  However, Windows allows this and chkdsk doe=
s not detect or correct it so we will just ignore the invalid flags and pre=
tend they are not set.");
+		}
+		/* Now allocate memory for the attribute list. */
+		ni->attr_list_size =3D (u32)ntfs_attr_size(a);
+		if (!ni->attr_list_size) {
+			ntfs_error(sb, "Attr_list_size is zero");
+			goto put_err_out;
+		}
+		ni->attr_list =3D ntfs_malloc_nofs(ni->attr_list_size);
+		if (!ni->attr_list) {
+			ntfs_error(sb, "Not enough memory to allocate buffer for attribute list=
.");
+			goto put_err_out;
+		}
+		if (a->non_resident) {
+			struct runlist_element *rl;
+			size_t new_rl_count;
+
+			NInoSetAttrListNonResident(ni);
+			if (a->data.non_resident.lowest_vcn) {
+				ntfs_error(sb,
+					"Attribute list has non zero lowest_vcn. $MFT is corrupt. You should =
run chkdsk.");
+				goto put_err_out;
+			}
+
+			rl =3D ntfs_mapping_pairs_decompress(vol, a, NULL, &new_rl_count);
+			if (IS_ERR(rl)) {
+				err =3D PTR_ERR(rl);
+				ntfs_error(sb,
+					   "Mapping pairs decompression failed with error code %i.",
+					   -err);
+				goto put_err_out;
+			}
+
+			err =3D load_attribute_list_mount(vol, rl, ni->attr_list, ni->attr_list=
_size,
+					le64_to_cpu(a->data.non_resident.initialized_size));
+			ntfs_free(rl);
+			if (err) {
+				ntfs_error(sb,
+					   "Failed to load attribute list with error code %i.",
+					   -err);
+				goto put_err_out;
+			}
+		} else /* if (!ctx.attr->non_resident) */ {
+			if ((u8 *)a + le16_to_cpu(
+					a->data.resident.value_offset) +
+					le32_to_cpu(a->data.resident.value_length) >
+					(u8 *)ctx->mrec + vol->mft_record_size) {
+				ntfs_error(sb, "Corrupt attribute list attribute.");
+				goto put_err_out;
+			}
+			/* Now copy the attribute list. */
+			memcpy(ni->attr_list, (u8 *)a + le16_to_cpu(
+					a->data.resident.value_offset),
+					le32_to_cpu(a->data.resident.value_length));
+		}
+		/* The attribute list is now setup in memory. */
+		al_entry =3D (struct attr_list_entry *)ni->attr_list;
+		al_end =3D (u8 *)al_entry + ni->attr_list_size;
+		for (;; al_entry =3D next_al_entry) {
+			/* Out of bounds check. */
+			if ((u8 *)al_entry < ni->attr_list ||
+					(u8 *)al_entry > al_end)
+				goto em_put_err_out;
+			/* Catch the end of the attribute list. */
+			if ((u8 *)al_entry =3D=3D al_end)
+				goto em_put_err_out;
+			if (!al_entry->length)
+				goto em_put_err_out;
+			if ((u8 *)al_entry + 6 > al_end ||
+			    (u8 *)al_entry + le16_to_cpu(al_entry->length) > al_end)
+				goto em_put_err_out;
+			next_al_entry =3D (struct attr_list_entry *)((u8 *)al_entry +
+					le16_to_cpu(al_entry->length));
+			if (le32_to_cpu(al_entry->type) > le32_to_cpu(AT_DATA))
+				goto em_put_err_out;
+			if (al_entry->type !=3D AT_DATA)
+				continue;
+			/* We want an unnamed attribute. */
+			if (al_entry->name_length)
+				goto em_put_err_out;
+			/* Want the first entry, i.e. lowest_vcn =3D=3D 0. */
+			if (al_entry->lowest_vcn)
+				goto em_put_err_out;
+			/* First entry has to be in the base mft record. */
+			if (MREF_LE(al_entry->mft_reference) !=3D vi->i_ino) {
+				/* MFT references do not match, logic fails. */
+				ntfs_error(sb,
+					"BUG: The first $DATA extent of $MFT is not in the base mft record.");
+				goto put_err_out;
+			} else {
+				/* Sequence numbers must match. */
+				if (MSEQNO_LE(al_entry->mft_reference) !=3D
+						ni->seq_no)
+					goto em_put_err_out;
+				/* Got it. All is ok. We can stop now. */
+				break;
+			}
+		}
+	}
+
+	ntfs_attr_reinit_search_ctx(ctx);
+
+	/* Now load all attribute extents. */
+	a =3D NULL;
+	next_vcn =3D last_vcn =3D highest_vcn =3D 0;
+	while (!(err =3D ntfs_attr_lookup(AT_DATA, NULL, 0, 0, next_vcn, NULL, 0,
+			ctx))) {
+		struct runlist_element *nrl;
+
+		/* Cache the current attribute. */
+		a =3D ctx->attr;
+		/* $MFT must be non-resident. */
+		if (!a->non_resident) {
+			ntfs_error(sb,
+				"$MFT must be non-resident but a resident extent was found. $MFT is co=
rrupt. Run chkdsk.");
+			goto put_err_out;
+		}
+		/* $MFT must be uncompressed and unencrypted. */
+		if (a->flags & ATTR_COMPRESSION_MASK ||
+				a->flags & ATTR_IS_ENCRYPTED ||
+				a->flags & ATTR_IS_SPARSE) {
+			ntfs_error(sb,
+				"$MFT must be uncompressed, non-sparse, and unencrypted but a compress=
ed/sparse/encrypted extent was found. $MFT is corrupt. Run chkdsk.");
+			goto put_err_out;
+		}
+		/*
+		 * Decompress the mapping pairs array of this extent and merge
+		 * the result into the existing runlist. No need for locking
+		 * as we have exclusive access to the inode at this time and we
+		 * are a mount in progress task, too.
+		 */
+		nrl =3D ntfs_mapping_pairs_decompress(vol, a, &ni->runlist,
+						    &new_rl_count);
+		if (IS_ERR(nrl)) {
+			ntfs_error(sb,
+				"ntfs_mapping_pairs_decompress() failed with error code %ld.",
+				PTR_ERR(nrl));
+			goto put_err_out;
+		}
+		ni->runlist.rl =3D nrl;
+		ni->runlist.count =3D new_rl_count;
+
+		/* Are we in the first extent? */
+		if (!next_vcn) {
+			if (a->data.non_resident.lowest_vcn) {
+				ntfs_error(sb,
+					"First extent of $DATA attribute has non zero lowest_vcn. $MFT is cor=
rupt. You should run chkdsk.");
+				goto put_err_out;
+			}
+			/* Get the last vcn in the $DATA attribute. */
+			last_vcn =3D le64_to_cpu(a->data.non_resident.allocated_size) >>
+				vol->cluster_size_bits;
+			/* Fill in the inode size. */
+			vi->i_size =3D le64_to_cpu(a->data.non_resident.data_size);
+			ni->initialized_size =3D le64_to_cpu(a->data.non_resident.initialized_s=
ize);
+			ni->allocated_size =3D le64_to_cpu(a->data.non_resident.allocated_size);
+			/*
+			 * Verify the number of mft records does not exceed
+			 * 2^32 - 1.
+			 */
+			if ((vi->i_size >> vol->mft_record_size_bits) >=3D
+					(1ULL << 32)) {
+				ntfs_error(sb, "$MFT is too big! Aborting.");
+				goto put_err_out;
+			}
+			/*
+			 * We have got the first extent of the runlist for
+			 * $MFT which means it is now relatively safe to call
+			 * the normal ntfs_read_inode() function.
+			 * Complete reading the inode, this will actually
+			 * re-read the mft record for $MFT, this time entering
+			 * it into the page cache with which we complete the
+			 * kick start of the volume. It should be safe to do
+			 * this now as the first extent of $MFT/$DATA is
+			 * already known and we would hope that we don't need
+			 * further extents in order to find the other
+			 * attributes belonging to $MFT. Only time will tell if
+			 * this is really the case. If not we will have to play
+			 * magic at this point, possibly duplicating a lot of
+			 * ntfs_read_inode() at this point. We will need to
+			 * ensure we do enough of its work to be able to call
+			 * ntfs_read_inode() on extents of $MFT/$DATA. But lets
+			 * hope this never happens...
+			 */
+			err =3D ntfs_read_locked_inode(vi);
+			if (err) {
+				ntfs_error(sb, "ntfs_read_inode() of $MFT failed.\n");
+				ntfs_attr_put_search_ctx(ctx);
+				/* Revert to the safe super operations. */
+				ntfs_free(m);
+				return -1;
+			}
+			/*
+			 * Re-initialize some specifics about $MFT's inode as
+			 * ntfs_read_inode() will have set up the default ones.
+			 */
+			/* Set uid and gid to root. */
+			vi->i_uid =3D GLOBAL_ROOT_UID;
+			vi->i_gid =3D GLOBAL_ROOT_GID;
+			/* Regular file. No access for anyone. */
+			vi->i_mode =3D S_IFREG;
+			/* No VFS initiated operations allowed for $MFT. */
+			vi->i_op =3D &ntfs_empty_inode_ops;
+			vi->i_fop =3D &ntfs_empty_file_ops;
+		}
+
+		/* Get the lowest vcn for the next extent. */
+		highest_vcn =3D le64_to_cpu(a->data.non_resident.highest_vcn);
+		next_vcn =3D highest_vcn + 1;
+
+		/* Only one extent or error, which we catch below. */
+		if (next_vcn <=3D 0)
+			break;
+
+		/* Avoid endless loops due to corruption. */
+		if (next_vcn < le64_to_cpu(a->data.non_resident.lowest_vcn)) {
+			ntfs_error(sb, "$MFT has corrupt attribute list attribute. Run chkdsk."=
);
+			goto put_err_out;
+		}
+	}
+	if (err !=3D -ENOENT) {
+		ntfs_error(sb, "Failed to lookup $MFT/$DATA attribute extent. Run chkdsk=
.\n");
+		goto put_err_out;
+	}
+	if (!a) {
+		ntfs_error(sb, "$MFT/$DATA attribute not found. $MFT is corrupt. Run chk=
dsk.");
+		goto put_err_out;
+	}
+	if (highest_vcn && highest_vcn !=3D last_vcn - 1) {
+		ntfs_error(sb, "Failed to load the complete runlist for $MFT/$DATA. Run =
chkdsk.");
+		ntfs_debug("highest_vcn =3D 0x%llx, last_vcn - 1 =3D 0x%llx",
+				(unsigned long long)highest_vcn,
+				(unsigned long long)last_vcn - 1);
+		goto put_err_out;
+	}
+	ntfs_attr_put_search_ctx(ctx);
+	ntfs_debug("Done.");
+	ntfs_free(m);
+
+	/*
+	 * Split the locking rules of the MFT inode from the
+	 * locking rules of other inodes:
+	 */
+	lockdep_set_class(&ni->runlist.lock, &mft_ni_runlist_lock_key);
+	lockdep_set_class(&ni->mrec_lock, &mft_ni_mrec_lock_key);
+
+	return 0;
+
+em_put_err_out:
+	ntfs_error(sb,
+		"Couldn't find first extent of $DATA attribute in attribute list. $MFT i=
s corrupt. Run chkdsk.");
+put_err_out:
+	ntfs_attr_put_search_ctx(ctx);
+err_out:
+	ntfs_error(sb, "Failed. Marking inode as bad.");
+	ntfs_free(m);
+	return -1;
+}
+
+static void __ntfs_clear_inode(struct ntfs_inode *ni)
+{
+	/* Free all alocated memory. */
+	if (NInoNonResident(ni) && ni->runlist.rl) {
+		ntfs_free(ni->runlist.rl);
+		ni->runlist.rl =3D NULL;
+	}
+
+	if (ni->attr_list) {
+		ntfs_free(ni->attr_list);
+		ni->attr_list =3D NULL;
+	}
+
+	if (ni->name_len && ni->name !=3D I30 &&
+	    ni->name !=3D reparse_index_name &&
+	    ni->name !=3D R) {
+		WARN_ON(!ni->name);
+		kfree(ni->name);
+	}
+}
+
+void ntfs_clear_extent_inode(struct ntfs_inode *ni)
+{
+	ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
+
+	WARN_ON(NInoAttr(ni));
+	WARN_ON(ni->nr_extents !=3D -1);
+
+	__ntfs_clear_inode(ni);
+	ntfs_destroy_extent_inode(ni);
+}
+
+static int ntfs_delete_base_inode(struct ntfs_inode *ni)
+{
+	struct super_block *sb =3D ni->vol->sb;
+	int err;
+
+	if (NInoAttr(ni) || ni->nr_extents =3D=3D -1)
+		return 0;
+
+	err =3D ntfs_non_resident_dealloc_clusters(ni);
+
+	/*
+	 * Deallocate extent mft records and free extent inodes.
+	 * No need to lock as no one else has a reference.
+	 */
+	while (ni->nr_extents) {
+		err =3D ntfs_mft_record_free(ni->vol, *(ni->ext.extent_ntfs_inos));
+		if (err)
+			ntfs_error(sb,
+				"Failed to free extent MFT record. Leaving inconsistent metadata.\n");
+		ntfs_inode_close(*(ni->ext.extent_ntfs_inos));
+	}
+
+	/* Deallocate base mft record */
+	err =3D ntfs_mft_record_free(ni->vol, ni);
+	if (err)
+		ntfs_error(sb, "Failed to free base MFT record. Leaving inconsistent met=
adata.\n");
+	return err;
+}
+
+/**
+ * ntfs_evict_big_inode - clean up the ntfs specific part of an inode
+ * @vi:		vfs inode pending annihilation
+ *
+ * When the VFS is going to remove an inode from memory, ntfs_clear_big_in=
ode()
+ * is called, which deallocates all memory belonging to the NTFS specific =
part
+ * of the inode and returns.
+ *
+ * If the MFT record is dirty, we commit it before doing anything else.
+ */
+void ntfs_evict_big_inode(struct inode *vi)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+
+	truncate_inode_pages_final(&vi->i_data);
+
+	if (!vi->i_nlink) {
+		if (!NInoAttr(ni)) {
+			/* Never called with extent inodes */
+			WARN_ON(ni->nr_extents =3D=3D -1);
+			ntfs_delete_base_inode(ni);
+		}
+		goto release;
+	}
+
+	if (NInoDirty(ni)) {
+		/* Committing the inode also commits all extent inodes. */
+		ntfs_commit_inode(vi);
+
+		if (NInoDirty(ni)) {
+			ntfs_debug("Failed to commit dirty inode 0x%lx.  Losing data!",
+				   vi->i_ino);
+			NInoClearAttrListDirty(ni);
+			NInoClearDirty(ni);
+		}
+	}
+
+	/* No need to lock at this stage as no one else has a reference. */
+	if (ni->nr_extents > 0) {
+		int i;
+
+		for (i =3D 0; i < ni->nr_extents; i++) {
+			if (ni->ext.extent_ntfs_inos[i])
+				ntfs_clear_extent_inode(ni->ext.extent_ntfs_inos[i]);
+		}
+		ni->nr_extents =3D 0;
+		ntfs_free(ni->ext.extent_ntfs_inos);
+	}
+
+release:
+	clear_inode(vi);
+	__ntfs_clear_inode(ni);
+
+	if (NInoAttr(ni)) {
+		/* Release the base inode if we are holding it. */
+		if (ni->nr_extents =3D=3D -1) {
+			iput(VFS_I(ni->ext.base_ntfs_ino));
+			ni->nr_extents =3D 0;
+			ni->ext.base_ntfs_ino =3D NULL;
+		}
+	}
+
+	if (!atomic_dec_and_test(&ni->count))
+		WARN_ON(1);
+	if (ni->folio)
+		ntfs_unmap_folio(ni->folio, NULL);
+	kfree(ni->mrec);
+	ntfs_free(ni->target);
+}
+
+/**
+ * ntfs_show_options - show mount options in /proc/mounts
+ * @sf:		seq_file in which to write our mount options
+ * @root:	root of the mounted tree whose mount options to display
+ *
+ * Called by the VFS once for each mounted ntfs volume when someone reads
+ * /proc/mounts in order to display the NTFS specific mount options of each
+ * mount. The mount options of fs specified by @root are written to the se=
q file
+ * @sf and success is returned.
+ */
+int ntfs_show_options(struct seq_file *sf, struct dentry *root)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(root->d_sb);
+	int i;
+
+	if (uid_valid(vol->uid))
+		seq_printf(sf, ",uid=3D%i", from_kuid_munged(&init_user_ns, vol->uid));
+	if (gid_valid(vol->gid))
+		seq_printf(sf, ",gid=3D%i", from_kgid_munged(&init_user_ns, vol->gid));
+	if (vol->fmask =3D=3D vol->dmask)
+		seq_printf(sf, ",umask=3D0%o", vol->fmask);
+	else {
+		seq_printf(sf, ",fmask=3D0%o", vol->fmask);
+		seq_printf(sf, ",dmask=3D0%o", vol->dmask);
+	}
+	seq_printf(sf, ",iocharset=3D%s", vol->nls_map->charset);
+	if (NVolCaseSensitive(vol))
+		seq_puts(sf, ",case_sensitive");
+	else
+		seq_puts(sf, ",nocase");
+	if (NVolShowSystemFiles(vol))
+		seq_puts(sf, ",show_sys_files,showmeta");
+	for (i =3D 0; on_errors_arr[i].val; i++) {
+		if (on_errors_arr[i].val =3D=3D vol->on_errors)
+			seq_printf(sf, ",errors=3D%s", on_errors_arr[i].str);
+	}
+	seq_printf(sf, ",mft_zone_multiplier=3D%i", vol->mft_zone_multiplier);
+	if (NVolSysImmutable(vol))
+		seq_puts(sf, ",sys_immutable");
+	if (!NVolShowHiddenFiles(vol))
+		seq_puts(sf, ",nohidden");
+	if (NVolHideDotFiles(vol))
+		seq_puts(sf, ",hide_dot_files");
+	if (NVolCheckWindowsNames(vol))
+		seq_puts(sf, ",windows_names");
+	if (NVolDiscard(vol))
+		seq_puts(sf, ",discard");
+	if (NVolDisableSparse(vol))
+		seq_puts(sf, ",disable_sparse");
+	if (vol->sb->s_flags & SB_POSIXACL)
+		seq_puts(sf, ",acl");
+	return 0;
+}
+
+int ntfs_extend_initialized_size(struct inode *vi, const loff_t offset,
+		const loff_t new_size)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	loff_t old_init_size;
+	unsigned long flags;
+	int err;
+
+	read_lock_irqsave(&ni->size_lock, flags);
+	old_init_size =3D ni->initialized_size;
+	read_unlock_irqrestore(&ni->size_lock, flags);
+
+	if (!NInoNonResident(ni))
+		return -EINVAL;
+	if (old_init_size >=3D new_size)
+		return 0;
+
+	err =3D ntfs_attr_map_whole_runlist(ni);
+	if (err)
+		return err;
+
+	if (!NInoCompressed(ni) && old_init_size < offset) {
+		err =3D iomap_zero_range(vi, old_init_size,
+				       offset - old_init_size,
+				       NULL, &ntfs_read_iomap_ops,
+				       &ntfs_iomap_folio_ops, NULL);
+		if (err)
+			return err;
+	}
+
+
+	mutex_lock(&ni->mrec_lock);
+	err =3D ntfs_attr_set_initialized_size(ni, new_size);
+	mutex_unlock(&ni->mrec_lock);
+	if (err)
+		truncate_setsize(vi, old_init_size);
+	return err;
+}
+
+int ntfs_truncate_vfs(struct inode *vi, loff_t new_size, loff_t i_size)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	int err;
+
+	mutex_lock(&ni->mrec_lock);
+	err =3D __ntfs_attr_truncate_vfs(ni, new_size, i_size);
+	mutex_unlock(&ni->mrec_lock);
+	if (err < 0)
+		return err;
+
+	inode_set_mtime_to_ts(vi, inode_set_ctime_current(vi));
+	return 0;
+}
+
+/**
+ * ntfs_inode_sync_standard_information - update standard information attr=
ibute
+ * @vi:	inode to update standard information
+ * @m:	mft record
+ *
+ * Return 0 on success or -errno on error.
+ */
+static int ntfs_inode_sync_standard_information(struct inode *vi, struct m=
ft_record *m)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_attr_search_ctx *ctx;
+	struct standard_information *si;
+	__le64 nt;
+	int err =3D 0;
+	bool modified =3D false;
+
+	/* Update the access times in the standard information attribute. */
+	ctx =3D ntfs_attr_get_search_ctx(ni, m);
+	if (unlikely(!ctx))
+		return -ENOMEM;
+	err =3D ntfs_attr_lookup(AT_STANDARD_INFORMATION, NULL, 0,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err)) {
+		ntfs_attr_put_search_ctx(ctx);
+		return err;
+	}
+	si =3D (struct standard_information *)((u8 *)ctx->attr +
+			le16_to_cpu(ctx->attr->data.resident.value_offset));
+	if (si->file_attributes !=3D ni->flags) {
+		si->file_attributes =3D ni->flags;
+		modified =3D true;
+	}
+
+	/* Update the creation times if they have changed. */
+	nt =3D utc2ntfs(ni->i_crtime);
+	if (si->creation_time !=3D nt) {
+		ntfs_debug("Updating creation time for inode 0x%lx: old =3D 0x%llx, new =
=3D 0x%llx",
+				vi->i_ino, le64_to_cpu(si->creation_time),
+				le64_to_cpu(nt));
+		si->creation_time =3D nt;
+		modified =3D true;
+	}
+
+	/* Update the access times if they have changed. */
+	nt =3D utc2ntfs(inode_get_mtime(vi));
+	if (si->last_data_change_time !=3D nt) {
+		ntfs_debug("Updating mtime for inode 0x%lx: old =3D 0x%llx, new =3D 0x%l=
lx",
+				vi->i_ino, le64_to_cpu(si->last_data_change_time),
+				le64_to_cpu(nt));
+		si->last_data_change_time =3D nt;
+		modified =3D true;
+	}
+
+	nt =3D utc2ntfs(inode_get_ctime(vi));
+	if (si->last_mft_change_time !=3D nt) {
+		ntfs_debug("Updating ctime for inode 0x%lx: old =3D 0x%llx, new =3D 0x%l=
lx",
+				vi->i_ino, le64_to_cpu(si->last_mft_change_time),
+				le64_to_cpu(nt));
+		si->last_mft_change_time =3D nt;
+		modified =3D true;
+	}
+	nt =3D utc2ntfs(inode_get_atime(vi));
+	if (si->last_access_time !=3D nt) {
+		ntfs_debug("Updating atime for inode 0x%lx: old =3D 0x%llx, new =3D 0x%l=
lx",
+				vi->i_ino,
+				le64_to_cpu(si->last_access_time),
+				le64_to_cpu(nt));
+		si->last_access_time =3D nt;
+		modified =3D true;
+	}
+
+	/*
+	 * If we just modified the standard information attribute we need to
+	 * mark the mft record it is in dirty.  We do this manually so that
+	 * mark_inode_dirty() is not called which would redirty the inode and
+	 * hence result in an infinite loop of trying to write the inode.
+	 * There is no need to mark the base inode nor the base mft record
+	 * dirty, since we are going to write this mft record below in any case
+	 * and the base mft record may actually not have been modified so it
+	 * might not need to be written out.
+	 * NOTE: It is not a problem when the inode for $MFT itself is being
+	 * written out as mark_ntfs_record_dirty() will only set I_DIRTY_PAGES
+	 * on the $MFT inode and hence ntfs_write_inode() will not be
+	 * re-invoked because of it which in turn is ok since the dirtied mft
+	 * record will be cleaned and written out to disk below, i.e. before
+	 * this function returns.
+	 */
+	if (modified)
+		NInoSetDirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+
+	return err;
+}
+
+/**
+ * ntfs_inode_sync_filename - update FILE_NAME attributes
+ * @ni:	ntfs inode to update FILE_NAME attributes
+ *
+ * Update all FILE_NAME attributes for inode @ni in the index.
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_sync_filename(struct ntfs_inode *ni)
+{
+	struct inode *index_vi;
+	struct super_block *sb =3D VFS_I(ni)->i_sb;
+	struct ntfs_attr_search_ctx *ctx =3D NULL;
+	struct ntfs_index_context *ictx;
+	struct ntfs_inode *index_ni;
+	struct file_name_attr *fn;
+	struct file_name_attr *fnx;
+	struct reparse_point *rpp;
+	__le32 reparse_tag;
+	int err =3D 0;
+	unsigned long flags;
+
+	ntfs_debug("Entering for inode %lld\n", (long long)ni->mft_no);
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx)
+		return -ENOMEM;
+
+	/* Collect the reparse tag, if any */
+	reparse_tag =3D cpu_to_le32(0);
+	if (ni->flags & FILE_ATTR_REPARSE_POINT) {
+		if (!ntfs_attr_lookup(AT_REPARSE_POINT, NULL,
+					0, CASE_SENSITIVE, 0, NULL, 0, ctx)) {
+			rpp =3D (struct reparse_point *)((u8 *)ctx->attr +
+					le16_to_cpu(ctx->attr->data.resident.value_offset));
+			reparse_tag =3D rpp->reparse_tag;
+		}
+		ntfs_attr_reinit_search_ctx(ctx);
+	}
+
+	/* Walk through all FILE_NAME attributes and update them. */
+	while (!(err =3D ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0, NULL, 0, c=
tx))) {
+		fn =3D (struct file_name_attr *)((u8 *)ctx->attr +
+				le16_to_cpu(ctx->attr->data.resident.value_offset));
+		if (MREF_LE(fn->parent_directory) =3D=3D ni->mft_no)
+			continue;
+
+		index_vi =3D ntfs_iget(sb, MREF_LE(fn->parent_directory));
+		if (IS_ERR(index_vi)) {
+			ntfs_error(sb, "Failed to open inode %lld with index",
+					(long long)MREF_LE(fn->parent_directory));
+			continue;
+		}
+
+		index_ni =3D NTFS_I(index_vi);
+
+		mutex_lock_nested(&index_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+		if (NInoBeingDeleted(ni)) {
+			iput(index_vi);
+			mutex_unlock(&index_ni->mrec_lock);
+			continue;
+		}
+
+		ictx =3D ntfs_index_ctx_get(index_ni, I30, 4);
+		if (!ictx) {
+			ntfs_error(sb, "Failed to get index ctx, inode %lld",
+					(long long)index_ni->mft_no);
+			iput(index_vi);
+			mutex_unlock(&index_ni->mrec_lock);
+			continue;
+		}
+
+		err =3D ntfs_index_lookup(fn, sizeof(struct file_name_attr), ictx);
+		if (err) {
+			ntfs_debug("Index lookup failed, inode %lld",
+					(long long)index_ni->mft_no);
+			ntfs_index_ctx_put(ictx);
+			iput(index_vi);
+			mutex_unlock(&index_ni->mrec_lock);
+			continue;
+		}
+		/* Update flags and file size. */
+		fnx =3D (struct file_name_attr *)ictx->data;
+		fnx->file_attributes =3D
+			(fnx->file_attributes & ~FILE_ATTR_VALID_FLAGS) |
+			(ni->flags & FILE_ATTR_VALID_FLAGS);
+		if (ctx->mrec->flags & MFT_RECORD_IS_DIRECTORY)
+			fnx->data_size =3D fnx->allocated_size =3D 0;
+		else {
+			read_lock_irqsave(&ni->size_lock, flags);
+			if (NInoSparse(ni) || NInoCompressed(ni))
+				fnx->allocated_size =3D cpu_to_le64(ni->itype.compressed.size);
+			else
+				fnx->allocated_size =3D cpu_to_le64(ni->allocated_size);
+			fnx->data_size =3D cpu_to_le64(ni->data_size);
+
+			/*
+			 * The file name record has also to be fixed if some
+			 * attribute update implied the unnamed data to be
+			 * made non-resident
+			 */
+			fn->allocated_size =3D fnx->allocated_size;
+			fn->data_size =3D fnx->data_size;
+			read_unlock_irqrestore(&ni->size_lock, flags);
+		}
+
+		/* update or clear the reparse tag in the index */
+		fnx->type.rp.reparse_point_tag =3D reparse_tag;
+		fnx->creation_time =3D fn->creation_time;
+		fnx->last_data_change_time =3D fn->last_data_change_time;
+		fnx->last_mft_change_time =3D fn->last_mft_change_time;
+		fnx->last_access_time =3D fn->last_access_time;
+		ntfs_index_entry_mark_dirty(ictx);
+		ntfs_icx_ib_sync_write(ictx);
+		NInoSetDirty(ctx->ntfs_ino);
+		ntfs_index_ctx_put(ictx);
+		mutex_unlock(&index_ni->mrec_lock);
+		iput(index_vi);
+	}
+	/* Check for real error occurred. */
+	if (err !=3D -ENOENT) {
+		ntfs_error(sb, "Attribute lookup failed, err : %d, inode %lld", err,
+				(long long)ni->mft_no);
+	} else
+		err =3D 0;
+
+	ntfs_attr_put_search_ctx(ctx);
+	return err;
+}
+
+/**
+ * __ntfs_write_inode - write out a dirty inode
+ * @vi:		inode to write out
+ * @sync:	if true, write out synchronously
+ *
+ * Write out a dirty inode to disk including any extent inodes if present.
+ *
+ * If @sync is true, commit the inode to disk and wait for io completion. =
 This
+ * is done using write_mft_record().
+ *
+ * If @sync is false, just schedule the write to happen but do not wait fo=
r i/o
+ * completion.
+ *
+ * Return 0 on success and -errno on error.
+ */
+int __ntfs_write_inode(struct inode *vi, int sync)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct mft_record *m;
+	int err =3D 0;
+	bool need_iput =3D false;
+
+	ntfs_debug("Entering for %sinode 0x%lx.", NInoAttr(ni) ? "attr " : "",
+			vi->i_ino);
+
+	if (NVolShutdown(ni->vol))
+		return -EIO;
+
+	/*
+	 * Dirty attribute inodes are written via their real inodes so just
+	 * clean them here.  Access time updates are taken care off when the
+	 * real inode is written.
+	 */
+	if (NInoAttr(ni) || ni->nr_extents =3D=3D -1) {
+		NInoClearDirty(ni);
+		ntfs_debug("Done.");
+		return 0;
+	}
+
+	/* igrab prevents vi from being evicted while mrec_lock is hold. */
+	if (igrab(vi) !=3D NULL)
+		need_iput =3D true;
+
+	mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+	/* Map, pin, and lock the mft record belonging to the inode. */
+	m =3D map_mft_record(ni);
+	if (IS_ERR(m)) {
+		mutex_unlock(&ni->mrec_lock);
+		err =3D PTR_ERR(m);
+		goto err_out;
+	}
+
+	if (NInoNonResident(ni) && NInoRunlistDirty(ni)) {
+		down_write(&ni->runlist.lock);
+		err =3D ntfs_attr_update_mapping_pairs(ni, 0);
+		if (!err)
+			NInoClearRunlistDirty(ni);
+		up_write(&ni->runlist.lock);
+	}
+
+	err =3D ntfs_inode_sync_standard_information(vi, m);
+	if (err)
+		goto unm_err_out;
+
+	/*
+	 * when being umounted and inodes are evicted, write_inode()
+	 * is called with all inodes being marked with I_FREEING.
+	 * then ntfs_inode_sync_filename() waits infinitly because
+	 * of ntfs_iget. This situation happens only where sync_filesysem()
+	 * from umount fails because of a disk unplug and etc.
+	 * the absent of SB_ACTIVE means umounting.
+	 */
+	if ((vi->i_sb->s_flags & SB_ACTIVE) && NInoTestClearFileNameDirty(ni))
+		ntfs_inode_sync_filename(ni);
+
+	/* Now the access times are updated, write the base mft record. */
+	if (NInoDirty(ni)) {
+		err =3D write_mft_record(ni, m, sync);
+		if (err)
+			ntfs_error(vi->i_sb, "write_mft_record failed, err : %d\n", err);
+	}
+	unmap_mft_record(ni);
+
+	/* Write all attached extent mft records. */
+	mutex_lock(&ni->extent_lock);
+	if (ni->nr_extents > 0) {
+		struct ntfs_inode **extent_nis =3D ni->ext.extent_ntfs_inos;
+		int i;
+
+		ntfs_debug("Writing %i extent inodes.", ni->nr_extents);
+		for (i =3D 0; i < ni->nr_extents; i++) {
+			struct ntfs_inode *tni =3D extent_nis[i];
+
+			if (NInoDirty(tni)) {
+				struct mft_record *tm;
+				int ret;
+
+				mutex_lock(&tni->mrec_lock);
+				tm =3D map_mft_record(tni);
+				if (IS_ERR(tm)) {
+					mutex_unlock(&tni->mrec_lock);
+					if (!err || err =3D=3D -ENOMEM)
+						err =3D PTR_ERR(tm);
+					continue;
+				}
+
+				ret =3D write_mft_record(tni, tm, sync);
+				unmap_mft_record(tni);
+				mutex_unlock(&tni->mrec_lock);
+
+				if (unlikely(ret)) {
+					if (!err || err =3D=3D -ENOMEM)
+						err =3D ret;
+				}
+			}
+		}
+	}
+	mutex_unlock(&ni->extent_lock);
+	mutex_unlock(&ni->mrec_lock);
+
+	if (unlikely(err))
+		goto err_out;
+	if (need_iput)
+		iput(vi);
+	ntfs_debug("Done.");
+	return 0;
+unm_err_out:
+	unmap_mft_record(ni);
+	mutex_unlock(&ni->mrec_lock);
+err_out:
+	if (err =3D=3D -ENOMEM)
+		mark_inode_dirty(vi);
+	else {
+		ntfs_error(vi->i_sb, "Failed (error %i):  Run chkdsk.", -err);
+		NVolSetErrors(ni->vol);
+	}
+	if (need_iput)
+		iput(vi);
+	return err;
+}
+
+/**
+ * ntfs_extent_inode_open - load an extent inode and attach it to its base
+ * @base_ni:	base ntfs inode
+ * @mref:	mft reference of the extent inode to load (in little endian)
+ *
+ * First check if the extent inode @mref is already attached to the base n=
tfs
+ * inode @base_ni, and if so, return a pointer to the attached extent inod=
e.
+ *
+ * If the extent inode is not already attached to the base inode, allocate=
 an
+ * ntfs_inode structure and initialize it for the given inode @mref. @mref
+ * specifies the inode number / mft record to read, including the sequence
+ * number, which can be 0 if no sequence number checking is to be performe=
d.
+ *
+ * Then, allocate a buffer for the mft record, read the mft record from the
+ * volume @base_ni->vol, and attach it to the ntfs_inode structure (->mrec=
).
+ * The mft record is mst deprotected and sanity checked for validity and we
+ * abort if deprotection or checks fail.
+ *
+ * Finally attach the ntfs inode to its base inode @base_ni and return a
+ * pointer to the ntfs_inode structure on success or NULL on error, with e=
rrno
+ * set to the error code.
+ *
+ * Note, extent inodes are never closed directly. They are automatically
+ * disposed off by the closing of the base inode.
+ */
+static struct ntfs_inode *ntfs_extent_inode_open(struct ntfs_inode *base_n=
i,
+		const __le64 mref)
+{
+	u64 mft_no =3D MREF_LE(mref);
+	struct ntfs_inode *ni =3D NULL;
+	struct ntfs_inode **extent_nis;
+	int i;
+	struct mft_record *ni_mrec;
+	struct super_block *sb;
+
+	if (!base_ni)
+		return NULL;
+
+	sb =3D base_ni->vol->sb;
+	ntfs_debug("Opening extent inode %lld (base mft record %lld).\n",
+			(unsigned long long)mft_no,
+			(unsigned long long)base_ni->mft_no);
+
+	/* Is the extent inode already open and attached to the base inode? */
+	if (base_ni->nr_extents > 0) {
+		extent_nis =3D base_ni->ext.extent_ntfs_inos;
+		for (i =3D 0; i < base_ni->nr_extents; i++) {
+			u16 seq_no;
+
+			ni =3D extent_nis[i];
+			if (mft_no !=3D ni->mft_no)
+				continue;
+			ni_mrec =3D map_mft_record(ni);
+			if (IS_ERR(ni_mrec)) {
+				ntfs_error(sb, "failed to map mft record for %lu",
+						ni->mft_no);
+				goto out;
+			}
+			/* Verify the sequence number if given. */
+			seq_no =3D MSEQNO_LE(mref);
+			if (seq_no &&
+			    seq_no !=3D le16_to_cpu(ni_mrec->sequence_number)) {
+				ntfs_error(sb, "Found stale extent mft reference mft=3D%lld",
+						(long long)ni->mft_no);
+				unmap_mft_record(ni);
+				goto out;
+			}
+			unmap_mft_record(ni);
+			goto out;
+		}
+	}
+	/* Wasn't there, we need to load the extent inode. */
+	ni =3D ntfs_new_extent_inode(base_ni->vol->sb, mft_no);
+	if (!ni)
+		goto out;
+
+	ni->seq_no =3D (u16)MSEQNO_LE(mref);
+	ni->nr_extents =3D -1;
+	ni->ext.base_ntfs_ino =3D base_ni;
+	/* Attach extent inode to base inode, reallocating memory if needed. */
+	if (!(base_ni->nr_extents & 3)) {
+		i =3D (base_ni->nr_extents + 4) * sizeof(struct ntfs_inode *);
+
+		extent_nis =3D ntfs_malloc_nofs(i);
+		if (!extent_nis)
+			goto err_out;
+		if (base_ni->nr_extents) {
+			memcpy(extent_nis, base_ni->ext.extent_ntfs_inos,
+					i - 4 * sizeof(struct ntfs_inode *));
+			ntfs_free(base_ni->ext.extent_ntfs_inos);
+		}
+		base_ni->ext.extent_ntfs_inos =3D extent_nis;
+	}
+	base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] =3D ni;
+
+out:
+	ntfs_debug("\n");
+	return ni;
+err_out:
+	ntfs_destroy_ext_inode(ni);
+	ni =3D NULL;
+	goto out;
+}
+
+/**
+ * ntfs_inode_attach_all_extents - attach all extents for target inode
+ * @ni:		opened ntfs inode for which perform attach
+ *
+ * Return 0 on success and error.
+ */
+int ntfs_inode_attach_all_extents(struct ntfs_inode *ni)
+{
+	struct attr_list_entry *ale;
+	u64 prev_attached =3D 0;
+
+	if (!ni) {
+		ntfs_debug("Invalid arguments.\n");
+		return -EINVAL;
+	}
+
+	if (NInoAttr(ni))
+		ni =3D ni->ext.base_ntfs_ino;
+
+	ntfs_debug("Entering for inode 0x%llx.\n", (long long) ni->mft_no);
+
+	/* Inode haven't got attribute list, thus nothing to attach. */
+	if (!NInoAttrList(ni))
+		return 0;
+
+	if (!ni->attr_list) {
+		ntfs_debug("Corrupt in-memory struct.\n");
+		return -EINVAL;
+	}
+
+	/* Walk through attribute list and attach all extents. */
+	ale =3D (struct attr_list_entry *)ni->attr_list;
+	while ((u8 *)ale < ni->attr_list + ni->attr_list_size) {
+		if (ni->mft_no !=3D MREF_LE(ale->mft_reference) &&
+				prev_attached !=3D MREF_LE(ale->mft_reference)) {
+			if (!ntfs_extent_inode_open(ni, ale->mft_reference)) {
+				ntfs_debug("Couldn't attach extent inode.\n");
+				return -1;
+			}
+			prev_attached =3D MREF_LE(ale->mft_reference);
+		}
+		ale =3D (struct attr_list_entry *)((u8 *)ale + le16_to_cpu(ale->length));
+	}
+	return 0;
+}
+
+/**
+ * ntfs_inode_add_attrlist - add attribute list to inode and fill it
+ * @ni: opened ntfs inode to which add attribute list
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_add_attrlist(struct ntfs_inode *ni)
+{
+	int err;
+	struct ntfs_attr_search_ctx *ctx;
+	u8 *al =3D NULL, *aln;
+	int al_len =3D 0;
+	struct attr_list_entry *ale =3D NULL;
+	struct mft_record *ni_mrec;
+	u32 attr_al_len;
+
+	if (!ni)
+		return -EINVAL;
+
+	ntfs_debug("inode %llu\n", (unsigned long long) ni->mft_no);
+
+	if (NInoAttrList(ni) || ni->nr_extents) {
+		ntfs_error(ni->vol->sb, "Inode already has attribute list");
+		return -EEXIST;
+	}
+
+	ni_mrec =3D map_mft_record(ni);
+	if (IS_ERR(ni_mrec))
+		return -EIO;
+
+	/* Form attribute list. */
+	ctx =3D ntfs_attr_get_search_ctx(ni, ni_mrec);
+	if (!ctx) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+	/* Walk through all attributes. */
+	while (!(err =3D ntfs_attr_lookup(AT_UNUSED, NULL, 0, 0, 0, NULL, 0, ctx)=
)) {
+		int ale_size;
+
+		if (ctx->attr->type =3D=3D AT_ATTRIBUTE_LIST) {
+			err =3D -EIO;
+			ntfs_error(ni->vol->sb, "Attribute list already present");
+			goto put_err_out;
+		}
+
+		ale_size =3D (sizeof(struct attr_list_entry) + sizeof(__le16) *
+				ctx->attr->name_length + 7) & ~7;
+		al_len +=3D ale_size;
+
+		aln =3D ntfs_realloc_nofs(al, al_len, al_len-ale_size);
+		if (!aln) {
+			err =3D -ENOMEM;
+			ntfs_error(ni->vol->sb, "Failed to realloc %d bytes", al_len);
+			goto put_err_out;
+		}
+		ale =3D (struct attr_list_entry *)(aln + ((u8 *)ale - al));
+		al =3D aln;
+
+		memset(ale, 0, ale_size);
+
+		/* Add attribute to attribute list. */
+		ale->type =3D ctx->attr->type;
+		ale->length =3D cpu_to_le16((sizeof(struct attr_list_entry) +
+					sizeof(__le16) * ctx->attr->name_length + 7) & ~7);
+		ale->name_length =3D ctx->attr->name_length;
+		ale->name_offset =3D (u8 *)ale->name - (u8 *)ale;
+		if (ctx->attr->non_resident)
+			ale->lowest_vcn =3D
+				ctx->attr->data.non_resident.lowest_vcn;
+		else
+			ale->lowest_vcn =3D 0;
+		ale->mft_reference =3D MK_LE_MREF(ni->mft_no,
+				le16_to_cpu(ni_mrec->sequence_number));
+		ale->instance =3D ctx->attr->instance;
+		memcpy(ale->name, (u8 *)ctx->attr +
+				le16_to_cpu(ctx->attr->name_offset),
+				ctx->attr->name_length * sizeof(__le16));
+		ale =3D (struct attr_list_entry *)(al + al_len);
+	}
+
+	/* Check for real error occurred. */
+	if (err !=3D -ENOENT) {
+		ntfs_error(ni->vol->sb, "%s: Attribute lookup failed, inode %lld",
+				__func__, (long long)ni->mft_no);
+		goto put_err_out;
+	}
+
+	/* Set in-memory attribute list. */
+	ni->attr_list =3D al;
+	ni->attr_list_size =3D al_len;
+	NInoSetAttrList(ni);
+
+	attr_al_len =3D offsetof(struct attr_record, data.resident.reserved) + 1 +
+		((al_len + 7) & ~7);
+	/* Free space if there is not enough it for $ATTRIBUTE_LIST. */
+	if (le32_to_cpu(ni_mrec->bytes_allocated) -
+			le32_to_cpu(ni_mrec->bytes_in_use) < attr_al_len) {
+		if (ntfs_inode_free_space(ni, (int)attr_al_len)) {
+			/* Failed to free space. */
+			err =3D -ENOSPC;
+			ntfs_error(ni->vol->sb, "Failed to free space for attrlist");
+			goto rollback;
+		}
+	}
+
+	/* Add $ATTRIBUTE_LIST to mft record. */
+	err =3D ntfs_resident_attr_record_add(ni, AT_ATTRIBUTE_LIST, AT_UNNAMED, =
0,
+					    NULL, al_len, 0);
+	if (err < 0) {
+		ntfs_error(ni->vol->sb, "Couldn't add $ATTRIBUTE_LIST to MFT");
+		goto rollback;
+	}
+
+	err =3D ntfs_attrlist_update(ni);
+	if (err < 0)
+		goto remove_attrlist_record;
+
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(ni);
+	return 0;
+
+remove_attrlist_record:
+	/* Prevent ntfs_attr_recorm_rm from freeing attribute list. */
+	ni->attr_list =3D NULL;
+	NInoClearAttrList(ni);
+	/* Remove $ATTRIBUTE_LIST record. */
+	ntfs_attr_reinit_search_ctx(ctx);
+	if (!ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0,
+				CASE_SENSITIVE, 0, NULL, 0, ctx)) {
+		if (ntfs_attr_record_rm(ctx))
+			ntfs_error(ni->vol->sb, "Rollback failed to remove attrlist");
+	} else {
+		ntfs_error(ni->vol->sb, "Rollback failed to find attrlist");
+	}
+
+	/* Setup back in-memory runlist. */
+	ni->attr_list =3D al;
+	ni->attr_list_size =3D al_len;
+	NInoSetAttrList(ni);
+rollback:
+	/*
+	 * Scan attribute list for attributes that placed not in the base MFT
+	 * record and move them to it.
+	 */
+	ntfs_attr_reinit_search_ctx(ctx);
+	ale =3D (struct attr_list_entry *)al;
+	while ((u8 *)ale < al + al_len) {
+		if (MREF_LE(ale->mft_reference) !=3D ni->mft_no) {
+			if (!ntfs_attr_lookup(ale->type, ale->name,
+						ale->name_length,
+						CASE_SENSITIVE,
+						le64_to_cpu(ale->lowest_vcn),
+						NULL, 0, ctx)) {
+				if (ntfs_attr_record_move_to(ctx, ni))
+					ntfs_error(ni->vol->sb,
+							"Rollback failed to move attribute");
+			} else {
+				ntfs_error(ni->vol->sb, "Rollback failed to find attr");
+			}
+			ntfs_attr_reinit_search_ctx(ctx);
+		}
+		ale =3D (struct attr_list_entry *)((u8 *)ale + le16_to_cpu(ale->length));
+	}
+
+	/* Remove in-memory attribute list. */
+	ni->attr_list =3D NULL;
+	ni->attr_list_size =3D 0;
+	NInoClearAttrList(ni);
+	NInoClearAttrListDirty(ni);
+put_err_out:
+	ntfs_attr_put_search_ctx(ctx);
+err_out:
+	ntfs_free(al);
+	unmap_mft_record(ni);
+	return err;
+}
+
+/**
+ * ntfs_inode_close - close an ntfs inode and free all associated memory
+ * @ni:		ntfs inode to close
+ *
+ * Make sure the ntfs inode @ni is clean.
+ *
+ * If the ntfs inode @ni is a base inode, close all associated extent inod=
es,
+ * then deallocate all memory attached to it, and finally free the ntfs in=
ode
+ * structure itself.
+ *
+ * If it is an extent inode, we disconnect it from its base inode before we
+ * destroy it.
+ *
+ * It is OK to pass NULL to this function, it is just noop in this case.
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_close(struct ntfs_inode *ni)
+{
+	int err =3D -1;
+	struct ntfs_inode **tmp_nis;
+	struct ntfs_inode *base_ni;
+	s32 i;
+
+	if (!ni)
+		return 0;
+
+	ntfs_debug("Entering for inode %lld\n", (long long)ni->mft_no);
+
+	/* Is this a base inode with mapped extent inodes? */
+	/*
+	 * If the inode is an extent inode, disconnect it from the
+	 * base inode before destroying it.
+	 */
+	base_ni =3D ni->ext.base_ntfs_ino;
+	for (i =3D 0; i < base_ni->nr_extents; ++i) {
+		tmp_nis =3D base_ni->ext.extent_ntfs_inos;
+		if (tmp_nis[i] !=3D ni)
+			continue;
+		/* Found it. Disconnect. */
+		memmove(tmp_nis + i, tmp_nis + i + 1,
+				(base_ni->nr_extents - i - 1) *
+				sizeof(struct ntfs_inode *));
+		/* Buffer should be for multiple of four extents. */
+		if ((--base_ni->nr_extents) & 3) {
+			i =3D -1;
+			break;
+		}
+		/*
+		 * ElectricFence is unhappy with realloc(x,0) as free(x)
+		 * thus we explicitly separate these two cases.
+		 */
+		if (base_ni->nr_extents) {
+			/* Resize the memory buffer. */
+			tmp_nis =3D ntfs_realloc_nofs(tmp_nis, base_ni->nr_extents *
+					sizeof(struct ntfs_inode *), base_ni->nr_extents *
+					sizeof(struct ntfs_inode *));
+			/* Ignore errors, they don't really matter. */
+			if (tmp_nis)
+				base_ni->ext.extent_ntfs_inos =3D tmp_nis;
+		} else if (tmp_nis) {
+			ntfs_free(tmp_nis);
+			base_ni->ext.extent_ntfs_inos =3D NULL;
+		}
+		/* Allow for error checking. */
+		i =3D -1;
+		break;
+	}
+
+	if (NInoDirty(ni))
+		ntfs_error(ni->vol->sb, "Releasing dirty inode %lld!\n",
+				(long long)ni->mft_no);
+	if (NInoAttrList(ni) && ni->attr_list)
+		ntfs_free(ni->attr_list);
+	ntfs_destroy_ext_inode(ni);
+	err =3D 0;
+	ntfs_debug("\n");
+	return err;
+}
+
+void ntfs_destroy_ext_inode(struct ntfs_inode *ni)
+{
+	ntfs_debug("Entering.");
+	if (ni =3D=3D NULL)
+		return;
+
+	ntfs_attr_close(ni);
+
+	if (NInoDirty(ni))
+		ntfs_error(ni->vol->sb, "Releasing dirty ext inode %lld!\n",
+				(long long)ni->mft_no);
+	if (NInoAttrList(ni) && ni->attr_list)
+		ntfs_free(ni->attr_list);
+	kfree(ni->mrec);
+	kmem_cache_free(ntfs_inode_cache, ni);
+}
+
+static struct ntfs_inode *ntfs_inode_base(struct ntfs_inode *ni)
+{
+	if (ni->nr_extents =3D=3D -1)
+		return ni->ext.base_ntfs_ino;
+	return ni;
+}
+
+static int ntfs_attr_position(__le32 type, struct ntfs_attr_search_ctx *ct=
x)
+{
+	int err;
+
+	err =3D ntfs_attr_lookup(type, NULL, 0, CASE_SENSITIVE, 0, NULL,
+				0, ctx);
+	if (err) {
+		__le32 atype;
+
+		if (err !=3D -ENOENT)
+			return err;
+
+		atype =3D ctx->attr->type;
+		if (atype =3D=3D AT_END)
+			return -ENOSPC;
+
+		/*
+		 * if ntfs_external_attr_lookup return -ENOENT, ctx->al_entry
+		 * could point to an attribute in an extent mft record, but
+		 * ctx->attr and ctx->ntfs_ino always points to an attibute in
+		 * a base mft record.
+		 */
+		if (ctx->al_entry &&
+		    MREF_LE(ctx->al_entry->mft_reference) !=3D ctx->ntfs_ino->mft_no) {
+			ntfs_attr_reinit_search_ctx(ctx);
+			err =3D ntfs_attr_lookup(atype, NULL, 0, CASE_SENSITIVE, 0, NULL,
+					       0, ctx);
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
+
+/**
+ * ntfs_inode_free_space - free space in the MFT record of inode
+ * @ni:		ntfs inode in which MFT record free space
+ * @size:	amount of space needed to free
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_free_space(struct ntfs_inode *ni, int size)
+{
+	struct ntfs_attr_search_ctx *ctx;
+	int freed, err;
+	struct mft_record *ni_mrec;
+	struct super_block *sb;
+
+	if (!ni || size < 0)
+		return -EINVAL;
+	ntfs_debug("Entering for inode %lld, size %d\n",
+			(unsigned long long)ni->mft_no, size);
+
+	sb =3D ni->vol->sb;
+	ni_mrec =3D map_mft_record(ni);
+	if (IS_ERR(ni_mrec))
+		return -EIO;
+
+	freed =3D (le32_to_cpu(ni_mrec->bytes_allocated) -
+			le32_to_cpu(ni_mrec->bytes_in_use));
+
+	unmap_mft_record(ni);
+
+	if (size <=3D freed)
+		return 0;
+
+	ctx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!ctx) {
+		ntfs_error(sb, "%s, Failed to get search context", __func__);
+		return -ENOMEM;
+	}
+
+	/*
+	 * Chkdsk complain if $STANDARD_INFORMATION is not in the base MFT
+	 * record.
+	 *
+	 * Also we can't move $ATTRIBUTE_LIST from base MFT_RECORD, so position
+	 * search context on first attribute after $STANDARD_INFORMATION and
+	 * $ATTRIBUTE_LIST.
+	 *
+	 * Why we reposition instead of simply skip this attributes during
+	 * enumeration? Because in case we have got only in-memory attribute
+	 * list ntfs_attr_lookup will fail when it will try to find
+	 * $ATTRIBUTE_LIST.
+	 */
+	err =3D ntfs_attr_position(AT_FILE_NAME, ctx);
+	if (err)
+		goto put_err_out;
+
+	while (1) {
+		int record_size;
+
+		/*
+		 * Check whether attribute is from different MFT record. If so,
+		 * find next, because we don't need such.
+		 */
+		while (ctx->ntfs_ino->mft_no !=3D ni->mft_no) {
+retry:
+			err =3D ntfs_attr_lookup(AT_UNUSED, NULL, 0, CASE_SENSITIVE,
+						0, NULL, 0, ctx);
+			if (err) {
+				if (err !=3D -ENOENT)
+					ntfs_error(sb, "Attr lookup failed #2");
+				else if (ctx->attr->type =3D=3D AT_END)
+					err =3D -ENOSPC;
+				else
+					err =3D 0;
+
+				if (err)
+					goto put_err_out;
+			}
+		}
+
+		if (ntfs_inode_base(ctx->ntfs_ino)->mft_no =3D=3D FILE_MFT &&
+				ctx->attr->type =3D=3D AT_DATA)
+			goto retry;
+
+		if (ctx->attr->type =3D=3D AT_INDEX_ROOT)
+			goto retry;
+
+		record_size =3D le32_to_cpu(ctx->attr->length);
+
+		/* Move away attribute. */
+		err =3D ntfs_attr_record_move_away(ctx, 0);
+		if (err) {
+			ntfs_error(sb, "Failed to move out attribute #2");
+			break;
+		}
+		freed +=3D record_size;
+
+		/* Check whether we done. */
+		if (size <=3D freed) {
+			ntfs_attr_put_search_ctx(ctx);
+			return 0;
+		}
+
+		/*
+		 * Reposition to first attribute after $STANDARD_INFORMATION and
+		 * $ATTRIBUTE_LIST (see comments upwards).
+		 */
+		ntfs_attr_reinit_search_ctx(ctx);
+		err =3D ntfs_attr_position(AT_FILE_NAME, ctx);
+		if (err)
+			break;
+	}
+put_err_out:
+	ntfs_attr_put_search_ctx(ctx);
+	if (err =3D=3D -ENOSPC)
+		ntfs_debug("No attributes left that can be moved out.\n");
+	return err;
+}
+
+s64 ntfs_inode_attr_pread(struct inode *vi, s64 pos, s64 count, u8 *buf)
+{
+	struct address_space *mapping =3D vi->i_mapping;
+	struct folio *folio;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	s64 isize;
+	u32 attr_len, total =3D 0, offset;
+	pgoff_t index;
+	int err =3D 0;
+
+	WARN_ON(!NInoAttr(ni));
+	if (!count)
+		return 0;
+
+	mutex_lock(&ni->mrec_lock);
+	isize =3D i_size_read(vi);
+	if (pos > isize) {
+		mutex_unlock(&ni->mrec_lock);
+		return -EINVAL;
+	}
+	if (pos + count > isize)
+		count =3D isize - pos;
+
+	if (!NInoNonResident(ni)) {
+		struct ntfs_attr_search_ctx *ctx;
+		u8 *attr;
+
+		ctx =3D ntfs_attr_get_search_ctx(ni->ext.base_ntfs_ino, NULL);
+		if (!ctx) {
+			ntfs_error(vi->i_sb, "Failed to get attr search ctx");
+			err =3D -ENOMEM;
+			mutex_unlock(&ni->mrec_lock);
+			goto out;
+		}
+
+		err =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIV=
E,
+				       0, NULL, 0, ctx);
+		if (err) {
+			ntfs_error(vi->i_sb, "Failed to look up attr %#x", ni->type);
+			ntfs_attr_put_search_ctx(ctx);
+			mutex_unlock(&ni->mrec_lock);
+			goto out;
+		}
+
+		attr =3D (u8 *)ctx->attr + le16_to_cpu(ctx->attr->data.resident.value_of=
fset);
+		memcpy(buf, (u8 *)attr + pos, count);
+		ntfs_attr_put_search_ctx(ctx);
+		mutex_unlock(&ni->mrec_lock);
+		return count;
+	}
+	mutex_unlock(&ni->mrec_lock);
+
+	index =3D pos >> PAGE_SHIFT;
+	do {
+		/* Update @index and get the next folio. */
+		folio =3D ntfs_read_mapping_folio(mapping, index);
+		if (IS_ERR(folio))
+			break;
+
+		offset =3D offset_in_folio(folio, pos);
+		attr_len =3D min_t(size_t, (size_t)count, folio_size(folio) - offset);
+
+		folio_lock(folio);
+		memcpy_from_folio(buf, folio, offset, attr_len);
+		folio_unlock(folio);
+		folio_put(folio);
+
+		total +=3D attr_len;
+		buf +=3D attr_len;
+		pos +=3D attr_len;
+		count -=3D attr_len;
+		index++;
+	} while (count);
+out:
+	return err ? (s64)err : total;
+}
+
+static inline int ntfs_enlarge_attribute(struct inode *vi, s64 pos, s64 co=
unt,
+					 struct ntfs_attr_search_ctx *ctx)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct super_block *sb =3D vi->i_sb;
+	int ret;
+
+	if (pos + count <=3D ni->initialized_size)
+		return 0;
+
+	if (NInoEncrypted(ni) && NInoNonResident(ni))
+		return -EACCES;
+
+	if (NInoCompressed(ni))
+		return -EOPNOTSUPP;
+
+	if (pos + count > ni->data_size) {
+		if (ntfs_attr_truncate(ni, pos + count)) {
+			ntfs_debug("Failed to truncate attribute");
+			return -1;
+		}
+
+		ntfs_attr_reinit_search_ctx(ctx);
+		ret =3D ntfs_attr_lookup(ni->type,
+				       ni->name, ni->name_len, CASE_SENSITIVE,
+				       0, NULL, 0, ctx);
+		if (ret) {
+			ntfs_error(sb, "Failed to look up attr %#x", ni->type);
+			return ret;
+		}
+	}
+
+	if (!NInoNonResident(ni)) {
+		if (likely(i_size_read(vi) < ni->data_size))
+			i_size_write(vi, ni->data_size);
+		return 0;
+	}
+
+	if (pos + count > ni->initialized_size) {
+		ctx->attr->data.non_resident.initialized_size =3D cpu_to_le64(pos + coun=
t);
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		ni->initialized_size =3D pos + count;
+		if (i_size_read(vi) < ni->initialized_size)
+			i_size_write(vi, ni->initialized_size);
+	}
+	return 0;
+}
+
+static s64 __ntfs_inode_resident_attr_pwrite(struct inode *vi,
+					     s64 pos, s64 count, u8 *buf,
+					     struct ntfs_attr_search_ctx *ctx)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct folio *folio;
+	struct address_space *mapping =3D vi->i_mapping;
+	u8 *addr;
+	int err =3D 0;
+
+	WARN_ON(NInoNonResident(ni));
+	if (pos + count > PAGE_SIZE) {
+		ntfs_error(vi->i_sb, "Out of write into resident attr %#x", ni->type);
+		return -EINVAL;
+	}
+
+	/* Copy to mft record page */
+	addr =3D (u8 *)ctx->attr + le16_to_cpu(ctx->attr->data.resident.value_off=
set);
+	memcpy(addr + pos, buf, count);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+
+	/* Keep the first page clean and uptodate */
+	folio =3D __filemap_get_folio(mapping, 0, FGP_WRITEBEGIN | FGP_NOFS,
+				   mapping_gfp_mask(mapping));
+	if (IS_ERR(folio)) {
+		err =3D PTR_ERR(folio);
+		ntfs_error(vi->i_sb, "Failed to read a page 0 for attr %#x: %d",
+			   ni->type, err);
+		goto out;
+	}
+	if (!folio_test_uptodate(folio)) {
+		u32 len =3D le32_to_cpu(ctx->attr->data.resident.value_length);
+
+		memcpy_to_folio(folio, 0, addr, len);
+		folio_zero_segment(folio, offset_in_folio(folio, len),
+				   folio_size(folio) - len);
+	} else {
+		memcpy_to_folio(folio, offset_in_folio(folio, pos), buf, count);
+	}
+	folio_mark_uptodate(folio);
+	folio_unlock(folio);
+	folio_put(folio);
+out:
+	return err ? err : count;
+}
+
+static s64 __ntfs_inode_non_resident_attr_pwrite(struct inode *vi,
+						 s64 pos, s64 count, u8 *buf,
+						 struct ntfs_attr_search_ctx *ctx,
+						 bool sync)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct address_space *mapping =3D vi->i_mapping;
+	struct folio *folio;
+	pgoff_t index;
+	unsigned long offset, length;
+	size_t attr_len;
+	s64 ret =3D 0, written =3D 0;
+
+	WARN_ON(!NInoNonResident(ni));
+
+	index =3D pos >> PAGE_SHIFT;
+	while (count) {
+		folio =3D ntfs_read_mapping_folio(mapping, index);
+		if (IS_ERR(folio)) {
+			ret =3D PTR_ERR(folio);
+			ntfs_error(vi->i_sb, "Failed to read a page %lu for attr %#x: %ld",
+				   index, ni->type, PTR_ERR(folio));
+			break;
+		}
+
+		folio_lock(folio);
+		offset =3D offset_in_folio(folio, pos);
+		attr_len =3D min_t(size_t, (size_t)count, folio_size(folio) - offset);
+
+		memcpy_to_folio(folio, offset, buf, attr_len);
+
+		if (sync) {
+			struct ntfs_volume *vol =3D ni->vol;
+			s64 lcn, lcn_count;
+			unsigned int lcn_folio_off =3D 0;
+			struct bio *bio;
+			u64 rl_length =3D 0;
+			s64 vcn;
+			struct runlist_element *rl;
+
+			lcn_count =3D max_t(s64, 1, attr_len >> vol->cluster_size_bits);
+			vcn =3D (s64)folio->index << PAGE_SHIFT >> vol->cluster_size_bits;
+
+			do {
+				down_write(&ni->runlist.lock);
+				rl =3D ntfs_attr_vcn_to_rl(ni, vcn, &lcn);
+				if (IS_ERR(rl)) {
+					ret =3D PTR_ERR(rl);
+					up_write(&ni->runlist.lock);
+					goto err_unlock_folio;
+				}
+
+				rl_length =3D rl->length - (vcn - rl->vcn);
+				if (rl_length < lcn_count) {
+					lcn_count -=3D rl_length;
+				} else {
+					rl_length =3D lcn_count;
+					lcn_count =3D 0;
+				}
+				up_write(&ni->runlist.lock);
+
+				if (vol->cluster_size_bits > PAGE_SHIFT) {
+					lcn_folio_off =3D folio->index << PAGE_SHIFT;
+					lcn_folio_off &=3D vol->cluster_size_mask;
+				}
+
+				bio =3D ntfs_setup_bio(vol, REQ_OP_WRITE, lcn,
+						lcn_folio_off);
+				if (!bio) {
+					ret =3D -ENOMEM;
+					goto err_unlock_folio;
+				}
+
+				length =3D min_t(unsigned long,
+					       rl_length << vol->cluster_size_bits,
+					       folio_size(folio));
+				if (!bio_add_folio(bio, folio, length, offset)) {
+					ret =3D -EIO;
+					bio_put(bio);
+					goto err_unlock_folio;
+				}
+
+				submit_bio_wait(bio);
+				bio_put(bio);
+				vcn +=3D rl_length;
+				offset +=3D length;
+			} while (lcn_count !=3D 0);
+
+			folio_mark_uptodate(folio);
+		} else
+			folio_mark_dirty(folio);
+err_unlock_folio:
+		folio_unlock(folio);
+		folio_put(folio);
+
+		if (ret)
+			break;
+
+		written +=3D attr_len;
+		buf +=3D attr_len;
+		pos +=3D attr_len;
+		count -=3D attr_len;
+		index++;
+
+		cond_resched();
+	}
+
+	return ret ? ret : written;
+}
+
+s64 ntfs_inode_attr_pwrite(struct inode *vi, s64 pos, s64 count, u8 *buf, =
bool sync)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct ntfs_attr_search_ctx *ctx;
+	s64 ret;
+
+	WARN_ON(!NInoAttr(ni));
+
+	ctx =3D ntfs_attr_get_search_ctx(ni->ext.base_ntfs_ino, NULL);
+	if (!ctx) {
+		ntfs_error(vi->i_sb, "Failed to get attr search ctx");
+		return -ENOMEM;
+	}
+
+	ret =3D ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIVE,
+			       0, NULL, 0, ctx);
+	if (ret) {
+		ntfs_attr_put_search_ctx(ctx);
+		ntfs_error(vi->i_sb, "Failed to look up attr %#x", ni->type);
+		return ret;
+	}
+
+	mutex_lock(&ni->mrec_lock);
+	ret =3D ntfs_enlarge_attribute(vi, pos, count, ctx);
+	mutex_unlock(&ni->mrec_lock);
+	if (ret)
+		goto out;
+
+	if (NInoNonResident(ni))
+		ret =3D __ntfs_inode_non_resident_attr_pwrite(vi, pos, count, buf, ctx, =
sync);
+	else
+		ret =3D __ntfs_inode_resident_attr_pwrite(vi, pos, count, buf, ctx);
+out:
+	ntfs_attr_put_search_ctx(ctx);
+	return ret;
+}
diff --git a/fs/ntfsplus/mft.c b/fs/ntfsplus/mft.c
new file mode 100644
index 000000000000..c390e7fb98a0
--- /dev/null
+++ b/fs/ntfsplus/mft.c
@@ -0,0 +1,2698 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS kernel mft record operations. Part of the Linux-NTFS project.
+ * Part of this file is based on code from the NTFS-3G project.
+ *
+ * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/bio.h>
+
+#include "aops.h"
+#include "bitmap.h"
+#include "lcnalloc.h"
+#include "misc.h"
+#include "mft.h"
+#include "ntfs.h"
+
+/*
+ * ntfs_mft_record_check - Check the consistency of an MFT record
+ *
+ * Make sure its general fields are safe, then examine all its
+ * attributes and apply generic checks to them.
+ *
+ * Returns 0 if the checks are successful. If not, return -EIO.
+ */
+int ntfs_mft_record_check(const struct ntfs_volume *vol, struct mft_record=
 *m,
+		unsigned long mft_no)
+{
+	struct attr_record *a;
+	struct super_block *sb =3D vol->sb;
+
+	if (!ntfs_is_file_record(m->magic)) {
+		ntfs_error(sb, "Record %llu has no FILE magic (0x%x)\n",
+				(unsigned long long)mft_no, le32_to_cpu(*(__le32 *)m));
+		goto err_out;
+	}
+
+	if ((m->usa_ofs & 0x1) ||
+	    (vol->mft_record_size >> NTFS_BLOCK_SIZE_BITS) + 1 !=3D le16_to_cpu(m=
->usa_count) ||
+	    le16_to_cpu(m->usa_ofs) + le16_to_cpu(m->usa_count) * 2 > vol->mft_re=
cord_size) {
+		ntfs_error(sb, "Record %llu has corrupt fix-up values fields\n",
+				(unsigned long long)mft_no);
+		goto err_out;
+	}
+
+	if (le32_to_cpu(m->bytes_allocated) !=3D vol->mft_record_size) {
+		ntfs_error(sb, "Record %llu has corrupt allocation size (%u <> %u)\n",
+				(unsigned long long)mft_no,
+				vol->mft_record_size,
+				le32_to_cpu(m->bytes_allocated));
+		goto err_out;
+	}
+
+	if (le32_to_cpu(m->bytes_in_use) > vol->mft_record_size) {
+		ntfs_error(sb, "Record %llu has corrupt in-use size (%u > %u)\n",
+				(unsigned long long)mft_no,
+				le32_to_cpu(m->bytes_in_use),
+				vol->mft_record_size);
+		goto err_out;
+	}
+
+	if (le16_to_cpu(m->attrs_offset) & 7) {
+		ntfs_error(sb, "Attributes badly aligned in record %llu\n",
+				(unsigned long long)mft_no);
+		goto err_out;
+	}
+
+	a =3D (struct attr_record *)((char *)m + le16_to_cpu(m->attrs_offset));
+	if ((char *)a < (char *)m || (char *)a > (char *)m + vol->mft_record_size=
) {
+		ntfs_error(sb, "Record %llu is corrupt\n",
+				(unsigned long long)mft_no);
+		goto err_out;
+	}
+
+	return 0;
+
+err_out:
+	return -EIO;
+}
+
+/**
+ * map_mft_record_page - map the page in which a specific mft record resid=
es
+ * @ni:		ntfs inode whose mft record page to map
+ *
+ * This maps the page in which the mft record of the ntfs inode @ni is sit=
uated
+ * and returns a pointer to the mft record within the mapped page.
+ *
+ * Return value needs to be checked with IS_ERR() and if that is true PTR_=
ERR()
+ * contains the negative error code returned.
+ */
+static inline struct mft_record *map_mft_record_folio(struct ntfs_inode *n=
i)
+{
+	loff_t i_size;
+	struct ntfs_volume *vol =3D ni->vol;
+	struct inode *mft_vi =3D vol->mft_ino;
+	struct folio *folio;
+	unsigned long index, end_index;
+	unsigned int ofs;
+
+	WARN_ON(ni->folio);
+	/*
+	 * The index into the page cache and the offset within the page cache
+	 * page of the wanted mft record.
+	 */
+	index =3D (u64)ni->mft_no << vol->mft_record_size_bits >>
+			PAGE_SHIFT;
+	ofs =3D (ni->mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
+
+	i_size =3D i_size_read(mft_vi);
+	/* The maximum valid index into the page cache for $MFT's data. */
+	end_index =3D i_size >> PAGE_SHIFT;
+
+	/* If the wanted index is out of bounds the mft record doesn't exist. */
+	if (unlikely(index >=3D end_index)) {
+		if (index > end_index || (i_size & ~PAGE_MASK) < ofs +
+				vol->mft_record_size) {
+			folio =3D ERR_PTR(-ENOENT);
+			ntfs_error(vol->sb,
+				"Attempt to read mft record 0x%lx, which is beyond the end of the mft.=
 This is probably a bug in the ntfs driver.",
+				ni->mft_no);
+			goto err_out;
+		}
+	}
+
+	/* Read, map, and pin the folio. */
+	folio =3D ntfs_read_mapping_folio(mft_vi->i_mapping, index);
+	if (!IS_ERR(folio)) {
+		u8 *addr;
+
+		ni->mrec =3D kmalloc(vol->mft_record_size, GFP_NOFS);
+		if (!ni->mrec) {
+			ntfs_unmap_folio(folio, NULL);
+			folio =3D ERR_PTR(-ENOMEM);
+			goto err_out;
+		}
+
+		addr =3D kmap_local_folio(folio, 0);
+		memcpy(ni->mrec, addr + ofs, vol->mft_record_size);
+		post_read_mst_fixup((struct ntfs_record *)ni->mrec, vol->mft_record_size=
);
+
+		/* Catch multi sector transfer fixup errors. */
+		if (!ntfs_mft_record_check(vol, (struct mft_record *)ni->mrec, ni->mft_n=
o)) {
+			kunmap_local(addr);
+			ni->folio =3D folio;
+			ni->folio_ofs =3D ofs;
+			return ni->mrec;
+		}
+		ntfs_unmap_folio(folio, addr);
+		kfree(ni->mrec);
+		ni->mrec =3D NULL;
+		folio =3D ERR_PTR(-EIO);
+		NVolSetErrors(vol);
+	}
+err_out:
+	ni->folio =3D NULL;
+	ni->folio_ofs =3D 0;
+	return (void *)folio;
+}
+
+/**
+ * map_mft_record - map, pin and lock an mft record
+ * @ni:		ntfs inode whose MFT record to map
+ *
+ * First, take the mrec_lock mutex.  We might now be sleeping, while waiti=
ng
+ * for the mutex if it was already locked by someone else.
+ *
+ * The page of the record is mapped using map_mft_record_folio() before be=
ing
+ * returned to the caller.
+ *
+ * This in turn uses ntfs_read_mapping_folio() to get the page containing =
the wanted mft
+ * record (it in turn calls read_cache_page() which reads it in from disk =
if
+ * necessary, increments the use count on the page so that it cannot disap=
pear
+ * under us and returns a reference to the page cache page).
+ *
+ * If read_cache_page() invokes ntfs_readpage() to load the page from disk=
, it
+ * sets PG_locked and clears PG_uptodate on the page. Once I/O has complet=
ed
+ * and the post-read mst fixups on each mft record in the page have been
+ * performed, the page gets PG_uptodate set and PG_locked cleared (this is=
 done
+ * in our asynchronous I/O completion handler end_buffer_read_mft_async()).
+ * ntfs_read_mapping_folio() waits for PG_locked to become clear and check=
s if
+ * PG_uptodate is set and returns an error code if not. This provides
+ * sufficient protection against races when reading/using the page.
+ *
+ * However there is the write mapping to think about. Doing the above desc=
ribed
+ * checking here will be fine, because when initiating the write we will s=
et
+ * PG_locked and clear PG_uptodate making sure nobody is touching the page
+ * contents. Doing the locking this way means that the commit to disk code=
 in
+ * the page cache code paths is automatically sufficiently locked with us =
as
+ * we will not touch a page that has been locked or is not uptodate. The o=
nly
+ * locking problem then is them locking the page while we are accessing it.
+ *
+ * So that code will end up having to own the mrec_lock of all mft
+ * records/inodes present in the page before I/O can proceed. In that case=
 we
+ * wouldn't need to bother with PG_locked and PG_uptodate as nobody will be
+ * accessing anything without owning the mrec_lock mutex.  But we do need =
to
+ * use them because of the read_cache_page() invocation and the code becom=
es so
+ * much simpler this way that it is well worth it.
+ *
+ * The mft record is now ours and we return a pointer to it. You need to c=
heck
+ * the returned pointer with IS_ERR() and if that is true, PTR_ERR() will =
return
+ * the error code.
+ *
+ * NOTE: Caller is responsible for setting the mft record dirty before cal=
ling
+ * unmap_mft_record(). This is obviously only necessary if the caller real=
ly
+ * modified the mft record...
+ * Q: Do we want to recycle one of the VFS inode state bits instead?
+ * A: No, the inode ones mean we want to change the mft record, not we wan=
t to
+ * write it out.
+ */
+struct mft_record *map_mft_record(struct ntfs_inode *ni)
+{
+	struct mft_record *m;
+
+	if (!ni)
+		return ERR_PTR(-EINVAL);
+
+	ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
+
+	/* Make sure the ntfs inode doesn't go away. */
+	atomic_inc(&ni->count);
+
+	if (ni->folio)
+		return (struct mft_record *)ni->mrec;
+
+	m =3D map_mft_record_folio(ni);
+	if (!IS_ERR(m))
+		return m;
+
+	atomic_dec(&ni->count);
+	ntfs_error(ni->vol->sb, "Failed with error code %lu.", -PTR_ERR(m));
+	return m;
+}
+
+/**
+ * unmap_mft_record - release a mapped mft record
+ * @ni:		ntfs inode whose MFT record to unmap
+ *
+ * We release the page mapping and the mrec_lock mutex which unmaps the mft
+ * record and releases it for others to get hold of. We also release the n=
tfs
+ * inode by decrementing the ntfs inode reference count.
+ *
+ * NOTE: If caller has modified the mft record, it is imperative to set th=
e mft
+ * record dirty BEFORE calling unmap_mft_record().
+ */
+void unmap_mft_record(struct ntfs_inode *ni)
+{
+	struct folio *folio;
+
+	if (!ni)
+		return;
+
+	ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
+
+	folio =3D ni->folio;
+	if (atomic_dec_return(&ni->count) > 1)
+		return;
+	WARN_ON(!folio);
+}
+
+/**
+ * map_extent_mft_record - load an extent inode and attach it to its base
+ * @base_ni:	base ntfs inode
+ * @mref:	mft reference of the extent inode to load
+ * @ntfs_ino:	on successful return, pointer to the struct ntfs_inode struc=
ture
+ *
+ * Load the extent mft record @mref and attach it to its base inode @base_=
ni.
+ * Return the mapped extent mft record if IS_ERR(result) is false.  Otherw=
ise
+ * PTR_ERR(result) gives the negative error code.
+ *
+ * On successful return, @ntfs_ino contains a pointer to the ntfs_inode
+ * structure of the mapped extent inode.
+ */
+struct mft_record *map_extent_mft_record(struct ntfs_inode *base_ni, u64 m=
ref,
+		struct ntfs_inode **ntfs_ino)
+{
+	struct mft_record *m;
+	struct ntfs_inode *ni =3D NULL;
+	struct ntfs_inode **extent_nis =3D NULL;
+	int i;
+	unsigned long mft_no =3D MREF(mref);
+	u16 seq_no =3D MSEQNO(mref);
+	bool destroy_ni =3D false;
+
+	ntfs_debug("Mapping extent mft record 0x%lx (base mft record 0x%lx).",
+			mft_no, base_ni->mft_no);
+	/* Make sure the base ntfs inode doesn't go away. */
+	atomic_inc(&base_ni->count);
+	/*
+	 * Check if this extent inode has already been added to the base inode,
+	 * in which case just return it. If not found, add it to the base
+	 * inode before returning it.
+	 */
+	mutex_lock(&base_ni->extent_lock);
+	if (base_ni->nr_extents > 0) {
+		extent_nis =3D base_ni->ext.extent_ntfs_inos;
+		for (i =3D 0; i < base_ni->nr_extents; i++) {
+			if (mft_no !=3D extent_nis[i]->mft_no)
+				continue;
+			ni =3D extent_nis[i];
+			/* Make sure the ntfs inode doesn't go away. */
+			atomic_inc(&ni->count);
+			break;
+		}
+	}
+	if (likely(ni !=3D NULL)) {
+		mutex_unlock(&base_ni->extent_lock);
+		atomic_dec(&base_ni->count);
+		/* We found the record; just have to map and return it. */
+		m =3D map_mft_record(ni);
+		/* map_mft_record() has incremented this on success. */
+		atomic_dec(&ni->count);
+		if (!IS_ERR(m)) {
+			/* Verify the sequence number. */
+			if (likely(le16_to_cpu(m->sequence_number) =3D=3D seq_no)) {
+				ntfs_debug("Done 1.");
+				*ntfs_ino =3D ni;
+				return m;
+			}
+			unmap_mft_record(ni);
+			ntfs_error(base_ni->vol->sb,
+					"Found stale extent mft reference! Corrupt filesystem. Run chkdsk.");
+			return ERR_PTR(-EIO);
+		}
+map_err_out:
+		ntfs_error(base_ni->vol->sb,
+				"Failed to map extent mft record, error code %ld.",
+				-PTR_ERR(m));
+		return m;
+	}
+	/* Record wasn't there. Get a new ntfs inode and initialize it. */
+	ni =3D ntfs_new_extent_inode(base_ni->vol->sb, mft_no);
+	if (unlikely(!ni)) {
+		mutex_unlock(&base_ni->extent_lock);
+		atomic_dec(&base_ni->count);
+		return ERR_PTR(-ENOMEM);
+	}
+	ni->vol =3D base_ni->vol;
+	ni->seq_no =3D seq_no;
+	ni->nr_extents =3D -1;
+	ni->ext.base_ntfs_ino =3D base_ni;
+	/* Now map the record. */
+	m =3D map_mft_record(ni);
+	if (IS_ERR(m)) {
+		mutex_unlock(&base_ni->extent_lock);
+		atomic_dec(&base_ni->count);
+		ntfs_clear_extent_inode(ni);
+		goto map_err_out;
+	}
+	/* Verify the sequence number if it is present. */
+	if (seq_no && (le16_to_cpu(m->sequence_number) !=3D seq_no)) {
+		ntfs_error(base_ni->vol->sb,
+				"Found stale extent mft reference! Corrupt filesystem. Run chkdsk.");
+		destroy_ni =3D true;
+		m =3D ERR_PTR(-EIO);
+		goto unm_err_out;
+	}
+	/* Attach extent inode to base inode, reallocating memory if needed. */
+	if (!(base_ni->nr_extents & 3)) {
+		struct ntfs_inode **tmp;
+		int new_size =3D (base_ni->nr_extents + 4) * sizeof(struct ntfs_inode *);
+
+		tmp =3D ntfs_malloc_nofs(new_size);
+		if (unlikely(!tmp)) {
+			ntfs_error(base_ni->vol->sb, "Failed to allocate internal buffer.");
+			destroy_ni =3D true;
+			m =3D ERR_PTR(-ENOMEM);
+			goto unm_err_out;
+		}
+		if (base_ni->nr_extents) {
+			WARN_ON(!base_ni->ext.extent_ntfs_inos);
+			memcpy(tmp, base_ni->ext.extent_ntfs_inos, new_size -
+					4 * sizeof(struct ntfs_inode *));
+			ntfs_free(base_ni->ext.extent_ntfs_inos);
+		}
+		base_ni->ext.extent_ntfs_inos =3D tmp;
+	}
+	base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] =3D ni;
+	mutex_unlock(&base_ni->extent_lock);
+	atomic_dec(&base_ni->count);
+	ntfs_debug("Done 2.");
+	*ntfs_ino =3D ni;
+	return m;
+unm_err_out:
+	unmap_mft_record(ni);
+	mutex_unlock(&base_ni->extent_lock);
+	atomic_dec(&base_ni->count);
+	/*
+	 * If the extent inode was not attached to the base inode we need to
+	 * release it or we will leak memory.
+	 */
+	if (destroy_ni)
+		ntfs_clear_extent_inode(ni);
+	return m;
+}
+
+/**
+ * __mark_mft_record_dirty - set the mft record and the page containing it=
 dirty
+ * @ni:		ntfs inode describing the mapped mft record
+ *
+ * Internal function.  Users should call mark_mft_record_dirty() instead.
+ *
+ * Set the mapped (extent) mft record of the (base or extent) ntfs inode @=
ni,
+ * as well as the page containing the mft record, dirty.  Also, mark the b=
ase
+ * vfs inode dirty.  This ensures that any changes to the mft record are
+ * written out to disk.
+ *
+ * NOTE:  We only set I_DIRTY_DATASYNC (and not I_DIRTY_PAGES)
+ * on the base vfs inode, because even though file data may have been modi=
fied,
+ * it is dirty in the inode meta data rather than the data page cache of t=
he
+ * inode, and thus there are no data pages that need writing out.  Therefo=
re, a
+ * full mark_inode_dirty() is overkill.  A mark_inode_dirty_sync(), on the
+ * other hand, is not sufficient, because ->write_inode needs to be called=
 even
+ * in case of fdatasync. This needs to happen or the file data would not
+ * necessarily hit the device synchronously, even though the vfs inode has=
 the
+ * O_SYNC flag set.  Also, I_DIRTY_DATASYNC simply "feels" better than just
+ * I_DIRTY_SYNC, since the file data has not actually hit the block device=
 yet,
+ * which is not what I_DIRTY_SYNC on its own would suggest.
+ */
+void __mark_mft_record_dirty(struct ntfs_inode *ni)
+{
+	struct ntfs_inode *base_ni;
+
+	ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
+	WARN_ON(NInoAttr(ni));
+	/* Determine the base vfs inode and mark it dirty, too. */
+	if (likely(ni->nr_extents >=3D 0))
+		base_ni =3D ni;
+	else
+		base_ni =3D ni->ext.base_ntfs_ino;
+	__mark_inode_dirty(VFS_I(base_ni), I_DIRTY_DATASYNC);
+}
+
+/**
+ * ntfs_sync_mft_mirror - synchronize an mft record to the mft mirror
+ * @vol:	ntfs volume on which the mft record to synchronize resides
+ * @mft_no:	mft record number of mft record to synchronize
+ * @m:		mapped, mst protected (extent) mft record to synchronize
+ *
+ * Write the mapped, mst protected (extent) mft record @m with mft record
+ * number @mft_no to the mft mirror ($MFTMirr) of the ntfs volume @vol.
+ *
+ * On success return 0.  On error return -errno and set the volume errors =
flag
+ * in the ntfs volume @vol.
+ *
+ * NOTE:  We always perform synchronous i/o and ignore the @sync parameter.
+ */
+int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const unsigned long mft_=
no,
+		struct mft_record *m)
+{
+	u8 *kmirr =3D NULL;
+	struct folio *folio;
+	unsigned int folio_ofs, lcn_folio_off =3D 0;
+	int err =3D 0;
+	struct bio *bio;
+
+	ntfs_debug("Entering for inode 0x%lx.", mft_no);
+
+	if (unlikely(!vol->mftmirr_ino)) {
+		/* This could happen during umount... */
+		err =3D -EIO;
+		goto err_out;
+	}
+	/* Get the page containing the mirror copy of the mft record @m. */
+	folio =3D ntfs_read_mapping_folio(vol->mftmirr_ino->i_mapping, mft_no >>
+			(PAGE_SHIFT - vol->mft_record_size_bits));
+	if (IS_ERR(folio)) {
+		ntfs_error(vol->sb, "Failed to map mft mirror page.");
+		err =3D PTR_ERR(folio);
+		goto err_out;
+	}
+
+	folio_lock(folio);
+	folio_clear_uptodate(folio);
+	/* Offset of the mft mirror record inside the page. */
+	folio_ofs =3D (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
+	/* The address in the page of the mirror copy of the mft record @m. */
+	kmirr =3D kmap_local_folio(folio, 0) + folio_ofs;
+	/* Copy the mst protected mft record to the mirror. */
+	memcpy(kmirr, m, vol->mft_record_size);
+
+	if (vol->cluster_size_bits > PAGE_SHIFT) {
+		lcn_folio_off =3D folio->index << PAGE_SHIFT;
+		lcn_folio_off &=3D vol->cluster_size_mask;
+	}
+
+	bio =3D ntfs_setup_bio(vol, REQ_OP_WRITE, vol->mftmirr_lcn,
+			     lcn_folio_off + folio_ofs);
+	if (!bio) {
+		err =3D -ENOMEM;
+		goto unlock_folio;
+	}
+
+	if (!bio_add_folio(bio, folio, vol->mft_record_size, folio_ofs)) {
+		err =3D -EIO;
+		bio_put(bio);
+		goto unlock_folio;
+	}
+
+	submit_bio_wait(bio);
+	bio_put(bio);
+	/* Current state: all buffers are clean, unlocked, and uptodate. */
+	flush_dcache_folio(folio);
+	folio_mark_uptodate(folio);
+
+unlock_folio:
+	folio_unlock(folio);
+	ntfs_unmap_folio(folio, kmirr);
+	if (likely(!err)) {
+		ntfs_debug("Done.");
+	} else {
+		ntfs_error(vol->sb, "I/O error while writing mft mirror record 0x%lx!", =
mft_no);
+err_out:
+		ntfs_error(vol->sb,
+			"Failed to synchronize $MFTMirr (error code %i).  Volume will be left m=
arked dirty on umount.  Run chkdsk on the partition after umounting to corr=
ect this.",
+			err);
+		NVolSetErrors(vol);
+	}
+	return err;
+}
+
+/**
+ * write_mft_record_nolock - write out a mapped (extent) mft record
+ * @ni:		ntfs inode describing the mapped (extent) mft record
+ * @m:		mapped (extent) mft record to write
+ * @sync:	if true, wait for i/o completion
+ *
+ * Write the mapped (extent) mft record @m described by the (regular or ex=
tent)
+ * ntfs inode @ni to backing store.  If the mft record @m has a counterpar=
t in
+ * the mft mirror, that is also updated.
+ *
+ * We only write the mft record if the ntfs inode @ni is dirty and the fir=
st
+ * buffer belonging to its mft record is dirty, too.  We ignore the dirty =
state
+ * of subsequent buffers because we could have raced with
+ * fs/ntfs/aops.c::mark_ntfs_record_dirty().
+ *
+ * On success, clean the mft record and return 0.  On error, leave the mft
+ * record dirty and return -errno.
+ *
+ * NOTE:  We always perform synchronous i/o and ignore the @sync parameter.
+ * However, if the mft record has a counterpart in the mft mirror and @syn=
c is
+ * true, we write the mft record, wait for i/o completion, and only then w=
rite
+ * the mft mirror copy.  This ensures that if the system crashes either th=
e mft
+ * or the mft mirror will contain a self-consistent mft record @m.  If @sy=
nc is
+ * false on the other hand, we start i/o on both and then wait for complet=
ion
+ * on them.  This provides a speedup but no longer guarantees that you wil=
l end
+ * up with a self-consistent mft record in the case of a crash but if you =
asked
+ * for asynchronous writing you probably do not care about that anyway.
+ */
+int write_mft_record_nolock(struct ntfs_inode *ni, struct mft_record *m, i=
nt sync)
+{
+	struct ntfs_volume *vol =3D ni->vol;
+	struct folio *folio =3D ni->folio;
+	int err =3D 0, i =3D 0;
+	u8 *kaddr;
+	struct mft_record *fixup_m;
+	struct bio *bio;
+	unsigned int offset =3D 0, folio_size;
+
+	ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
+
+	WARN_ON(NInoAttr(ni));
+	WARN_ON(!folio_test_locked(folio));
+
+	/*
+	 * If the struct ntfs_inode is clean no need to do anything.  If it is di=
rty,
+	 * mark it as clean now so that it can be redirtied later on if needed.
+	 * There is no danger of races since the caller is holding the locks
+	 * for the mft record @m and the page it is in.
+	 */
+	if (!NInoTestClearDirty(ni))
+		goto done;
+
+	if (ni->mft_lcn[0] =3D=3D LCN_RL_NOT_MAPPED) {
+		s64 vcn;
+		struct runlist_element *rl;
+
+		vcn =3D (s64)ni->mft_no << vol->mft_record_size_bits >> vol->cluster_siz=
e_bits;
+
+		down_read(&NTFS_I(vol->mft_ino)->runlist.lock);
+		rl =3D NTFS_I(vol->mft_ino)->runlist.rl;
+
+		/* Seek to element containing target vcn. */
+		while (rl->length && rl[1].vcn <=3D vcn)
+			rl++;
+		ni->mft_lcn[0] =3D ntfs_rl_vcn_to_lcn(rl, vcn);
+		ni->mft_lcn_count++;
+
+		if (vol->cluster_size < vol->mft_record_size &&
+		    (rl->length - (vcn - rl->vcn)) <=3D 1) {
+			rl++;
+			ni->mft_lcn[1] =3D ntfs_rl_vcn_to_lcn(rl, vcn + 1);
+			ni->mft_lcn_count++;
+		}
+		up_read(&NTFS_I(vol->mft_ino)->runlist.lock);
+	}
+
+	kaddr =3D kmap_local_folio(folio, 0);
+	fixup_m =3D (struct mft_record *)(kaddr + ni->folio_ofs);
+	memcpy(fixup_m, m, vol->mft_record_size);
+
+	/* Apply the mst protection fixups. */
+	err =3D pre_write_mst_fixup((struct ntfs_record *)fixup_m, vol->mft_recor=
d_size);
+	if (err) {
+		ntfs_error(vol->sb, "Failed to apply mst fixups!");
+		goto err_out;
+	}
+
+	folio_size =3D vol->mft_record_size / ni->mft_lcn_count;
+	while (i < ni->mft_lcn_count) {
+		unsigned int clu_off;
+
+		clu_off =3D (unsigned int)((s64)ni->mft_no * vol->mft_record_size + offs=
et) &
+			vol->cluster_size_mask;
+
+		flush_dcache_folio(folio);
+
+		bio =3D ntfs_setup_bio(vol, REQ_OP_WRITE, ni->mft_lcn[i], clu_off);
+		if (!bio) {
+			err =3D -ENOMEM;
+			goto err_out;
+		}
+
+		if (!bio_add_folio(bio, folio, folio_size,
+				   ni->folio_ofs + offset)) {
+			err =3D -EIO;
+			goto put_bio_out;
+		}
+
+		/* Synchronize the mft mirror now if not @sync. */
+		if (!sync && ni->mft_no < vol->mftmirr_size)
+			ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
+
+		submit_bio_wait(bio);
+		bio_put(bio);
+		offset +=3D vol->cluster_size;
+		i++;
+	}
+
+	/* If @sync, now synchronize the mft mirror. */
+	if (sync && ni->mft_no < vol->mftmirr_size)
+		ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
+	kunmap_local(kaddr);
+	if (unlikely(err)) {
+		/* I/O error during writing.  This is really bad! */
+		ntfs_error(vol->sb,
+			"I/O error while writing mft record 0x%lx!  Marking base inode as bad. =
 You should unmount the volume and run chkdsk.",
+			ni->mft_no);
+		goto err_out;
+	}
+done:
+	ntfs_debug("Done.");
+	return 0;
+put_bio_out:
+	bio_put(bio);
+err_out:
+	/*
+	 * Current state: all buffers are clean, unlocked, and uptodate.
+	 * The caller should mark the base inode as bad so that no more i/o
+	 * happens.  ->clear_inode() will still be invoked so all extent inodes
+	 * and other allocated memory will be freed.
+	 */
+	if (err =3D=3D -ENOMEM) {
+		ntfs_error(vol->sb,
+			"Not enough memory to write mft record. Redirtying so the write is retr=
ied later.");
+		mark_mft_record_dirty(ni);
+		err =3D 0;
+	} else
+		NVolSetErrors(vol);
+	return err;
+}
+
+static int ntfs_test_inode_wb(struct inode *vi, unsigned long ino, void *d=
ata)
+{
+	struct ntfs_attr *na =3D (struct ntfs_attr *)data;
+
+	if (!ntfs_test_inode(vi, na))
+		return 0;
+
+	/*
+	 * Without this, ntfs_write_mst_block() could call iput_final()
+	 * , and ntfs_evict_big_inode() could try to unlink this inode
+	 * and the contex could be blocked infinitly in map_mft_record().
+	 */
+	if (NInoBeingDeleted(NTFS_I(vi))) {
+		na->state =3D NI_BeingDeleted;
+		return -1;
+	}
+
+	/*
+	 * This condition can prevent ntfs_write_mst_block()
+	 * from applying/undo fixups while ntfs_create() being
+	 * called
+	 */
+	spin_lock(&vi->i_lock);
+	if (vi->i_state & I_CREATING) {
+		spin_unlock(&vi->i_lock);
+		na->state =3D NI_BeingCreated;
+		return -1;
+	}
+	spin_unlock(&vi->i_lock);
+
+	return igrab(vi) ? 1 : -1;
+}
+
+/**
+ * ntfs_may_write_mft_record - check if an mft record may be written out
+ * @vol:	[IN]  ntfs volume on which the mft record to check resides
+ * @mft_no:	[IN]  mft record number of the mft record to check
+ * @m:		[IN]  mapped mft record to check
+ * @locked_ni:	[OUT] caller has to unlock this ntfs inode if one is return=
ed
+ *
+ * Check if the mapped (base or extent) mft record @m with mft record numb=
er
+ * @mft_no belonging to the ntfs volume @vol may be written out.  If neces=
sary
+ * and possible the ntfs inode of the mft record is locked and the base vfs
+ * inode is pinned.  The locked ntfs inode is then returned in @locked_ni.=
  The
+ * caller is responsible for unlocking the ntfs inode and unpinning the ba=
se
+ * vfs inode.
+ *
+ * Return 'true' if the mft record may be written out and 'false' if not.
+ *
+ * The caller has locked the page and cleared the uptodate flag on it which
+ * means that we can safely write out any dirty mft records that do not ha=
ve
+ * their inodes in icache as determined by ilookup5() as anyone
+ * opening/creating such an inode would block when attempting to map the m=
ft
+ * record in read_cache_page() until we are finished with the write out.
+ *
+ * Here is a description of the tests we perform:
+ *
+ * If the inode is found in icache we know the mft record must be a base m=
ft
+ * record.  If it is dirty, we do not write it and return 'false' as the v=
fs
+ * inode write paths will result in the access times being updated which w=
ould
+ * cause the base mft record to be redirtied and written out again.  (We k=
now
+ * the access time update will modify the base mft record because Windows
+ * chkdsk complains if the standard information attribute is not in the ba=
se
+ * mft record.)
+ *
+ * If the inode is in icache and not dirty, we attempt to lock the mft rec=
ord
+ * and if we find the lock was already taken, it is not safe to write the =
mft
+ * record and we return 'false'.
+ *
+ * If we manage to obtain the lock we have exclusive access to the mft rec=
ord,
+ * which also allows us safe writeout of the mft record.  We then set
+ * @locked_ni to the locked ntfs inode and return 'true'.
+ *
+ * Note we cannot just lock the mft record and sleep while waiting for the=
 lock
+ * because this would deadlock due to lock reversal (normally the mft reco=
rd is
+ * locked before the page is locked but we already have the page locked he=
re
+ * when we try to lock the mft record).
+ *
+ * If the inode is not in icache we need to perform further checks.
+ *
+ * If the mft record is not a FILE record or it is a base mft record, we c=
an
+ * safely write it and return 'true'.
+ *
+ * We now know the mft record is an extent mft record.  We check if the in=
ode
+ * corresponding to its base mft record is in icache and obtain a referenc=
e to
+ * it if it is.  If it is not, we can safely write it and return 'true'.
+ *
+ * We now have the base inode for the extent mft record.  We check if it h=
as an
+ * ntfs inode for the extent mft record attached and if not it is safe to =
write
+ * the extent mft record and we return 'true'.
+ *
+ * The ntfs inode for the extent mft record is attached to the base inode =
so we
+ * attempt to lock the extent mft record and if we find the lock was alrea=
dy
+ * taken, it is not safe to write the extent mft record and we return 'fal=
se'.
+ *
+ * If we manage to obtain the lock we have exclusive access to the extent =
mft
+ * record, which also allows us safe writeout of the extent mft record.  We
+ * set the ntfs inode of the extent mft record clean and then set @locked_=
ni to
+ * the now locked ntfs inode and return 'true'.
+ *
+ * Note, the reason for actually writing dirty mft records here and not ju=
st
+ * relying on the vfs inode dirty code paths is that we can have mft recor=
ds
+ * modified without them ever having actual inodes in memory.  Also we can=
 have
+ * dirty mft records with clean ntfs inodes in memory.  None of the descri=
bed
+ * cases would result in the dirty mft records being written out if we only
+ * relied on the vfs inode dirty code paths.  And these cases can really o=
ccur
+ * during allocation of new mft records and in particular when the
+ * initialized_size of the $MFT/$DATA attribute is extended and the new sp=
ace
+ * is initialized using ntfs_mft_record_format().  The clean inode can then
+ * appear if the mft record is reused for a new inode before it got written
+ * out.
+ */
+bool ntfs_may_write_mft_record(struct ntfs_volume *vol, const unsigned lon=
g mft_no,
+		const struct mft_record *m, struct ntfs_inode **locked_ni)
+{
+	struct super_block *sb =3D vol->sb;
+	struct inode *mft_vi =3D vol->mft_ino;
+	struct inode *vi;
+	struct ntfs_inode *ni, *eni, **extent_nis;
+	int i;
+	struct ntfs_attr na =3D {0};
+
+	ntfs_debug("Entering for inode 0x%lx.", mft_no);
+	/*
+	 * Normally we do not return a locked inode so set @locked_ni to NULL.
+	 */
+	*locked_ni =3D NULL;
+	/*
+	 * Check if the inode corresponding to this mft record is in the VFS
+	 * inode cache and obtain a reference to it if it is.
+	 */
+	ntfs_debug("Looking for inode 0x%lx in icache.", mft_no);
+	na.mft_no =3D mft_no;
+	na.type =3D AT_UNUSED;
+	/*
+	 * Optimize inode 0, i.e. $MFT itself, since we have it in memory and
+	 * we get here for it rather often.
+	 */
+	if (!mft_no) {
+		/* Balance the below iput(). */
+		vi =3D igrab(mft_vi);
+		WARN_ON(vi !=3D mft_vi);
+	} else {
+		/*
+		 * Have to use find_inode_nowait() since ilookup5_nowait()
+		 * waits for inode with I_FREEING, which causes ntfs to deadlock
+		 * when inodes are unlinked concurrently
+		 */
+		vi =3D find_inode_nowait(sb, mft_no, ntfs_test_inode_wb, &na);
+		if (na.state =3D=3D NI_BeingDeleted || na.state =3D=3D NI_BeingCreated)
+			return false;
+	}
+	if (vi) {
+		ntfs_debug("Base inode 0x%lx is in icache.", mft_no);
+		/* The inode is in icache. */
+		ni =3D NTFS_I(vi);
+		/* Take a reference to the ntfs inode. */
+		atomic_inc(&ni->count);
+		/* If the inode is dirty, do not write this record. */
+		if (NInoDirty(ni)) {
+			ntfs_debug("Inode 0x%lx is dirty, do not write it.",
+					mft_no);
+			atomic_dec(&ni->count);
+			iput(vi);
+			return false;
+		}
+		ntfs_debug("Inode 0x%lx is not dirty.", mft_no);
+		/* The inode is not dirty, try to take the mft record lock. */
+		if (unlikely(!mutex_trylock(&ni->mrec_lock))) {
+			ntfs_debug("Mft record 0x%lx is already locked, do not write it.", mft_=
no);
+			atomic_dec(&ni->count);
+			iput(vi);
+			return false;
+		}
+		ntfs_debug("Managed to lock mft record 0x%lx, write it.",
+				mft_no);
+		/*
+		 * The write has to occur while we hold the mft record lock so
+		 * return the locked ntfs inode.
+		 */
+		*locked_ni =3D ni;
+		return true;
+	}
+	ntfs_debug("Inode 0x%lx is not in icache.", mft_no);
+	/* The inode is not in icache. */
+	/* Write the record if it is not a mft record (type "FILE"). */
+	if (!ntfs_is_mft_record(m->magic)) {
+		ntfs_debug("Mft record 0x%lx is not a FILE record, write it.",
+				mft_no);
+		return true;
+	}
+	/* Write the mft record if it is a base inode. */
+	if (!m->base_mft_record) {
+		ntfs_debug("Mft record 0x%lx is a base record, write it.",
+				mft_no);
+		return true;
+	}
+	/*
+	 * This is an extent mft record.  Check if the inode corresponding to
+	 * its base mft record is in icache and obtain a reference to it if it
+	 * is.
+	 */
+	na.mft_no =3D MREF_LE(m->base_mft_record);
+	na.state =3D 0;
+	ntfs_debug("Mft record 0x%lx is an extent record.  Looking for base inode=
 0x%lx in icache.",
+			mft_no, na.mft_no);
+	if (!na.mft_no) {
+		/* Balance the below iput(). */
+		vi =3D igrab(mft_vi);
+		WARN_ON(vi !=3D mft_vi);
+	} else {
+		vi =3D find_inode_nowait(sb, mft_no, ntfs_test_inode_wb, &na);
+		if (na.state =3D=3D NI_BeingDeleted || na.state =3D=3D NI_BeingCreated)
+			return false;
+	}
+
+	if (!vi)
+		return false;
+	ntfs_debug("Base inode 0x%lx is in icache.", na.mft_no);
+	/*
+	 * The base inode is in icache.  Check if it has the extent inode
+	 * corresponding to this extent mft record attached.
+	 */
+	ni =3D NTFS_I(vi);
+	mutex_lock(&ni->extent_lock);
+	if (ni->nr_extents <=3D 0) {
+		/*
+		 * The base inode has no attached extent inodes, write this
+		 * extent mft record.
+		 */
+		mutex_unlock(&ni->extent_lock);
+		iput(vi);
+		ntfs_debug("Base inode 0x%lx has no attached extent inodes, write the ex=
tent record.",
+				na.mft_no);
+		return true;
+	}
+	/* Iterate over the attached extent inodes. */
+	extent_nis =3D ni->ext.extent_ntfs_inos;
+	for (eni =3D NULL, i =3D 0; i < ni->nr_extents; ++i) {
+		if (mft_no =3D=3D extent_nis[i]->mft_no) {
+			/*
+			 * Found the extent inode corresponding to this extent
+			 * mft record.
+			 */
+			eni =3D extent_nis[i];
+			break;
+		}
+	}
+	/*
+	 * If the extent inode was not attached to the base inode, write this
+	 * extent mft record.
+	 */
+	if (!eni) {
+		mutex_unlock(&ni->extent_lock);
+		iput(vi);
+		ntfs_debug("Extent inode 0x%lx is not attached to its base inode 0x%lx, =
write the extent record.",
+				mft_no, na.mft_no);
+		return true;
+	}
+	ntfs_debug("Extent inode 0x%lx is attached to its base inode 0x%lx.",
+			mft_no, na.mft_no);
+	/* Take a reference to the extent ntfs inode. */
+	atomic_inc(&eni->count);
+	mutex_unlock(&ni->extent_lock);
+
+	/* if extent inode is dirty, write_inode will write it */
+	if (NInoDirty(eni)) {
+		atomic_dec(&eni->count);
+		iput(vi);
+		return false;
+	}
+
+	/*
+	 * Found the extent inode coresponding to this extent mft record.
+	 * Try to take the mft record lock.
+	 */
+	if (unlikely(!mutex_trylock(&eni->mrec_lock))) {
+		atomic_dec(&eni->count);
+		iput(vi);
+		ntfs_debug("Extent mft record 0x%lx is already locked, do not write it.",
+				mft_no);
+		return false;
+	}
+	ntfs_debug("Managed to lock extent mft record 0x%lx, write it.",
+			mft_no);
+	/*
+	 * The write has to occur while we hold the mft record lock so return
+	 * the locked extent ntfs inode.
+	 */
+	*locked_ni =3D eni;
+	return true;
+}
+
+static const char *es =3D "  Leaving inconsistent metadata.  Unmount and r=
un chkdsk.";
+
+#define RESERVED_MFT_RECORDS	64
+
+/**
+ * ntfs_mft_bitmap_find_and_alloc_free_rec_nolock - see name
+ * @vol:	volume on which to search for a free mft record
+ * @base_ni:	open base inode if allocating an extent mft record or NULL
+ *
+ * Search for a free mft record in the mft bitmap attribute on the ntfs vo=
lume
+ * @vol.
+ *
+ * If @base_ni is NULL start the search at the default allocator position.
+ *
+ * If @base_ni is not NULL start the search at the mft record after the ba=
se
+ * mft record @base_ni.
+ *
+ * Return the free mft record on success and -errno on error.  An error co=
de of
+ * -ENOSPC means that there are no free mft records in the currently
+ * initialized mft bitmap.
+ *
+ * Locking: Caller must hold vol->mftbmp_lock for writing.
+ */
+static int ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(struct ntfs_volu=
me *vol,
+		struct ntfs_inode *base_ni)
+{
+	s64 pass_end, ll, data_pos, pass_start, ofs, bit;
+	unsigned long flags;
+	struct address_space *mftbmp_mapping;
+	u8 *buf =3D NULL, *byte;
+	struct folio *folio;
+	unsigned int folio_ofs, size;
+	u8 pass, b;
+
+	ntfs_debug("Searching for free mft record in the currently initialized mf=
t bitmap.");
+	mftbmp_mapping =3D vol->mftbmp_ino->i_mapping;
+	/*
+	 * Set the end of the pass making sure we do not overflow the mft
+	 * bitmap.
+	 */
+	read_lock_irqsave(&NTFS_I(vol->mft_ino)->size_lock, flags);
+	pass_end =3D NTFS_I(vol->mft_ino)->allocated_size >>
+			vol->mft_record_size_bits;
+	read_unlock_irqrestore(&NTFS_I(vol->mft_ino)->size_lock, flags);
+	read_lock_irqsave(&NTFS_I(vol->mftbmp_ino)->size_lock, flags);
+	ll =3D NTFS_I(vol->mftbmp_ino)->initialized_size << 3;
+	read_unlock_irqrestore(&NTFS_I(vol->mftbmp_ino)->size_lock, flags);
+	if (pass_end > ll)
+		pass_end =3D ll;
+	pass =3D 1;
+	if (!base_ni)
+		data_pos =3D vol->mft_data_pos;
+	else
+		data_pos =3D base_ni->mft_no + 1;
+	if (data_pos < RESERVED_MFT_RECORDS)
+		data_pos =3D RESERVED_MFT_RECORDS;
+	if (data_pos >=3D pass_end) {
+		data_pos =3D RESERVED_MFT_RECORDS;
+		pass =3D 2;
+		/* This happens on a freshly formatted volume. */
+		if (data_pos >=3D pass_end)
+			return -ENOSPC;
+	}
+
+	if (base_ni && base_ni->mft_no =3D=3D FILE_MFT) {
+		data_pos =3D 0;
+		pass =3D 2;
+	}
+
+	pass_start =3D data_pos;
+	ntfs_debug("Starting bitmap search: pass %u, pass_start 0x%llx, pass_end =
0x%llx, data_pos 0x%llx.",
+			pass, pass_start, pass_end, data_pos);
+	/* Loop until a free mft record is found. */
+	for (; pass <=3D 2;) {
+		/* Cap size to pass_end. */
+		ofs =3D data_pos >> 3;
+		folio_ofs =3D ofs & ~PAGE_MASK;
+		size =3D PAGE_SIZE - folio_ofs;
+		ll =3D ((pass_end + 7) >> 3) - ofs;
+		if (size > ll)
+			size =3D ll;
+		size <<=3D 3;
+		/*
+		 * If we are still within the active pass, search the next page
+		 * for a zero bit.
+		 */
+		if (size) {
+			folio =3D ntfs_read_mapping_folio(mftbmp_mapping,
+					ofs >> PAGE_SHIFT);
+			if (IS_ERR(folio)) {
+				ntfs_error(vol->sb, "Failed to read mft bitmap, aborting.");
+				return PTR_ERR(folio);
+			}
+			folio_lock(folio);
+			buf =3D (u8 *)kmap_local_folio(folio, 0) + folio_ofs;
+			bit =3D data_pos & 7;
+			data_pos &=3D ~7ull;
+			ntfs_debug("Before inner for loop: size 0x%x, data_pos 0x%llx, bit 0x%l=
lx",
+					size, data_pos, bit);
+			for (; bit < size && data_pos + bit < pass_end;
+					bit &=3D ~7ull, bit +=3D 8) {
+				/*
+				 * If we're extending $MFT and running out of the first
+				 * mft record (base record) then give up searching since
+				 * no guarantee that the found record will be accessible.
+				 */
+				if (base_ni && base_ni->mft_no =3D=3D FILE_MFT && bit > 400) {
+					folio_unlock(folio);
+					ntfs_unmap_folio(folio, buf);
+					return -ENOSPC;
+				}
+
+				byte =3D buf + (bit >> 3);
+				if (*byte =3D=3D 0xff)
+					continue;
+				b =3D ffz((unsigned long)*byte);
+				if (b < 8 && b >=3D (bit & 7)) {
+					ll =3D data_pos + (bit & ~7ull) + b;
+					if (unlikely(ll > (1ll << 32))) {
+						folio_unlock(folio);
+						ntfs_unmap_folio(folio, buf);
+						return -ENOSPC;
+					}
+					*byte |=3D 1 << b;
+					flush_dcache_folio(folio);
+					folio_mark_dirty(folio);
+					folio_unlock(folio);
+					ntfs_unmap_folio(folio, buf);
+					ntfs_debug("Done.  (Found and allocated mft record 0x%llx.)",
+							ll);
+					return ll;
+				}
+			}
+			ntfs_debug("After inner for loop: size 0x%x, data_pos 0x%llx, bit 0x%ll=
x",
+					size, data_pos, bit);
+			data_pos +=3D size;
+			folio_unlock(folio);
+			ntfs_unmap_folio(folio, buf);
+			/*
+			 * If the end of the pass has not been reached yet,
+			 * continue searching the mft bitmap for a zero bit.
+			 */
+			if (data_pos < pass_end)
+				continue;
+		}
+		/* Do the next pass. */
+		if (++pass =3D=3D 2) {
+			/*
+			 * Starting the second pass, in which we scan the first
+			 * part of the zone which we omitted earlier.
+			 */
+			pass_end =3D pass_start;
+			data_pos =3D pass_start =3D RESERVED_MFT_RECORDS;
+			ntfs_debug("pass %i, pass_start 0x%llx, pass_end 0x%llx.",
+					pass, pass_start, pass_end);
+			if (data_pos >=3D pass_end)
+				break;
+		}
+	}
+	/* No free mft records in currently initialized mft bitmap. */
+	ntfs_debug("Done.  (No free mft records left in currently initialized mft=
 bitmap.)");
+	return -ENOSPC;
+}
+
+static int ntfs_mft_attr_extend(struct ntfs_inode *ni)
+{
+	int ret =3D 0;
+	struct ntfs_inode *base_ni;
+
+	if (NInoAttr(ni))
+		base_ni =3D ni->ext.base_ntfs_ino;
+	else
+		base_ni =3D ni;
+
+	if (!NInoAttrList(base_ni)) {
+		ret =3D ntfs_inode_add_attrlist(base_ni);
+		if (ret) {
+			pr_err("Can not add attrlist\n");
+			goto out;
+		} else {
+			ret =3D -EAGAIN;
+			goto out;
+		}
+	}
+
+	ret =3D ntfs_attr_update_mapping_pairs(ni, 0);
+	if (ret)
+		pr_err("MP update failed\n");
+
+out:
+	return ret;
+}
+
+/**
+ * ntfs_mft_bitmap_extend_allocation_nolock - extend mft bitmap by a clust=
er
+ * @vol:	volume on which to extend the mft bitmap attribute
+ *
+ * Extend the mft bitmap attribute on the ntfs volume @vol by one cluster.
+ *
+ * Note: Only changes allocated_size, i.e. does not touch initialized_size=
 or
+ * data_size.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: - Caller must hold vol->mftbmp_lock for writing.
+ *	    - This function takes NTFS_I(vol->mftbmp_ino)->runlist.lock for
+ *	      writing and releases it before returning.
+ *	    - This function takes vol->lcnbmp_lock for writing and releases it
+ *	      before returning.
+ */
+static int ntfs_mft_bitmap_extend_allocation_nolock(struct ntfs_volume *vo=
l)
+{
+	s64 lcn;
+	s64 ll;
+	unsigned long flags;
+	struct folio *folio;
+	struct ntfs_inode *mft_ni, *mftbmp_ni;
+	struct runlist_element *rl, *rl2 =3D NULL;
+	struct ntfs_attr_search_ctx *ctx =3D NULL;
+	struct mft_record *mrec;
+	struct attr_record *a =3D NULL;
+	int ret, mp_size;
+	u32 old_alen =3D 0;
+	u8 *b, tb;
+	struct {
+		u8 added_cluster:1;
+		u8 added_run:1;
+		u8 mp_rebuilt:1;
+		u8 mp_extended:1;
+	} status =3D { 0, 0, 0, 0 };
+	size_t new_rl_count;
+
+	ntfs_debug("Extending mft bitmap allocation.");
+	mft_ni =3D NTFS_I(vol->mft_ino);
+	mftbmp_ni =3D NTFS_I(vol->mftbmp_ino);
+	/*
+	 * Determine the last lcn of the mft bitmap.  The allocated size of the
+	 * mft bitmap cannot be zero so we are ok to do this.
+	 */
+	down_write(&mftbmp_ni->runlist.lock);
+	read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	ll =3D mftbmp_ni->allocated_size;
+	read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	rl =3D ntfs_attr_find_vcn_nolock(mftbmp_ni,
+			(ll - 1) >> vol->cluster_size_bits, NULL);
+	if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) {
+		up_write(&mftbmp_ni->runlist.lock);
+		ntfs_error(vol->sb,
+			"Failed to determine last allocated cluster of mft bitmap attribute.");
+		if (!IS_ERR(rl))
+			ret =3D -EIO;
+		else
+			ret =3D PTR_ERR(rl);
+		return ret;
+	}
+	lcn =3D rl->lcn + rl->length;
+	ntfs_debug("Last lcn of mft bitmap attribute is 0x%llx.",
+			(long long)lcn);
+	/*
+	 * Attempt to get the cluster following the last allocated cluster by
+	 * hand as it may be in the MFT zone so the allocator would not give it
+	 * to us.
+	 */
+	ll =3D lcn >> 3;
+	folio =3D ntfs_read_mapping_folio(vol->lcnbmp_ino->i_mapping,
+			ll >> PAGE_SHIFT);
+	if (IS_ERR(folio)) {
+		up_write(&mftbmp_ni->runlist.lock);
+		ntfs_error(vol->sb, "Failed to read from lcn bitmap.");
+		return PTR_ERR(folio);
+	}
+
+	down_write(&vol->lcnbmp_lock);
+	folio_lock(folio);
+	b =3D (u8 *)kmap_local_folio(folio, 0) + (ll & ~PAGE_MASK);
+	tb =3D 1 << (lcn & 7ull);
+	if (*b !=3D 0xff && !(*b & tb)) {
+		/* Next cluster is free, allocate it. */
+		*b |=3D tb;
+		flush_dcache_folio(folio);
+		folio_mark_dirty(folio);
+		folio_unlock(folio);
+		ntfs_unmap_folio(folio, b);
+		up_write(&vol->lcnbmp_lock);
+		/* Update the mft bitmap runlist. */
+		rl->length++;
+		rl[1].vcn++;
+		status.added_cluster =3D 1;
+		ntfs_debug("Appending one cluster to mft bitmap.");
+	} else {
+		folio_unlock(folio);
+		ntfs_unmap_folio(folio, b);
+		up_write(&vol->lcnbmp_lock);
+		/* Allocate a cluster from the DATA_ZONE. */
+		rl2 =3D ntfs_cluster_alloc(vol, rl[1].vcn, 1, lcn, DATA_ZONE,
+				true, false, false);
+		if (IS_ERR(rl2)) {
+			up_write(&mftbmp_ni->runlist.lock);
+			ntfs_error(vol->sb,
+					"Failed to allocate a cluster for the mft bitmap.");
+			return PTR_ERR(rl2);
+		}
+		rl =3D ntfs_runlists_merge(&mftbmp_ni->runlist, rl2, 0, &new_rl_count);
+		if (IS_ERR(rl)) {
+			up_write(&mftbmp_ni->runlist.lock);
+			ntfs_error(vol->sb, "Failed to merge runlists for mft bitmap.");
+			if (ntfs_cluster_free_from_rl(vol, rl2)) {
+				ntfs_error(vol->sb, "Failed to deallocate allocated cluster.%s",
+						es);
+				NVolSetErrors(vol);
+			}
+			ntfs_free(rl2);
+			return PTR_ERR(rl);
+		}
+		mftbmp_ni->runlist.rl =3D rl;
+		mftbmp_ni->runlist.count =3D new_rl_count;
+		status.added_run =3D 1;
+		ntfs_debug("Adding one run to mft bitmap.");
+		/* Find the last run in the new runlist. */
+		for (; rl[1].length; rl++)
+			;
+	}
+	/*
+	 * Update the attribute record as well.  Note: @rl is the last
+	 * (non-terminator) runlist element of mft bitmap.
+	 */
+	mrec =3D map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		ret =3D PTR_ERR(mrec);
+		goto undo_alloc;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		ret =3D -ENOMEM;
+		goto undo_alloc;
+	}
+	ret =3D ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
+			0, ctx);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb,
+			"Failed to find last attribute extent of mft bitmap attribute.");
+		if (ret =3D=3D -ENOENT)
+			ret =3D -EIO;
+		goto undo_alloc;
+	}
+	a =3D ctx->attr;
+	ll =3D le64_to_cpu(a->data.non_resident.lowest_vcn);
+	/* Search back for the previous last allocated cluster of mft bitmap. */
+	for (rl2 =3D rl; rl2 > mftbmp_ni->runlist.rl; rl2--) {
+		if (ll >=3D rl2->vcn)
+			break;
+	}
+	WARN_ON(ll < rl2->vcn);
+	WARN_ON(ll >=3D rl2->vcn + rl2->length);
+	/* Get the size for the new mapping pairs array for this extent. */
+	mp_size =3D ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1, -1);
+	if (unlikely(mp_size <=3D 0)) {
+		ntfs_error(vol->sb,
+			"Get size for mapping pairs failed for mft bitmap attribute extent.");
+		ret =3D mp_size;
+		if (!ret)
+			ret =3D -EIO;
+		goto undo_alloc;
+	}
+	/* Expand the attribute record if necessary. */
+	old_alen =3D le32_to_cpu(a->length);
+	ret =3D ntfs_attr_record_resize(ctx->mrec, a, mp_size +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
+	if (unlikely(ret)) {
+		ret =3D ntfs_mft_attr_extend(mftbmp_ni);
+		if (!ret)
+			goto extended_ok;
+		status.mp_extended =3D 1;
+		goto undo_alloc;
+	}
+	status.mp_rebuilt =3D 1;
+	/* Generate the mapping pairs array directly into the attr record. */
+	ret =3D ntfs_mapping_pairs_build(vol, (u8 *)a +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
+			mp_size, rl2, ll, -1, NULL, NULL, NULL);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb,
+			"Failed to build mapping pairs array for mft bitmap attribute.");
+		goto undo_alloc;
+	}
+	/* Update the highest_vcn. */
+	a->data.non_resident.highest_vcn =3D cpu_to_le64(rl[1].vcn - 1);
+	/*
+	 * We now have extended the mft bitmap allocated_size by one cluster.
+	 * Reflect this in the struct ntfs_inode structure and the attribute reco=
rd.
+	 */
+	if (a->data.non_resident.lowest_vcn) {
+		/*
+		 * We are not in the first attribute extent, switch to it, but
+		 * first ensure the changes will make it to disk later.
+		 */
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		ntfs_attr_reinit_search_ctx(ctx);
+		ret =3D ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+				mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL,
+				0, ctx);
+		if (unlikely(ret)) {
+			ntfs_error(vol->sb,
+				"Failed to find first attribute extent of mft bitmap attribute.");
+			goto restore_undo_alloc;
+		}
+		a =3D ctx->attr;
+	}
+
+extended_ok:
+	write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	mftbmp_ni->allocated_size +=3D vol->cluster_size;
+	a->data.non_resident.allocated_size =3D
+			cpu_to_le64(mftbmp_ni->allocated_size);
+	write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	/* Ensure the changes make it to disk. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	up_write(&mftbmp_ni->runlist.lock);
+	ntfs_debug("Done.");
+	return 0;
+
+restore_undo_alloc:
+	ntfs_attr_reinit_search_ctx(ctx);
+	if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
+			0, ctx)) {
+		ntfs_error(vol->sb,
+			"Failed to find last attribute extent of mft bitmap attribute.%s", es);
+		write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+		mftbmp_ni->allocated_size +=3D vol->cluster_size;
+		write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(mft_ni);
+		up_write(&mftbmp_ni->runlist.lock);
+		/*
+		 * The only thing that is now wrong is ->allocated_size of the
+		 * base attribute extent which chkdsk should be able to fix.
+		 */
+		NVolSetErrors(vol);
+		return ret;
+	}
+	a =3D ctx->attr;
+	a->data.non_resident.highest_vcn =3D cpu_to_le64(rl[1].vcn - 2);
+undo_alloc:
+	if (status.added_cluster) {
+		/* Truncate the last run in the runlist by one cluster. */
+		rl->length--;
+		rl[1].vcn--;
+	} else if (status.added_run) {
+		lcn =3D rl->lcn;
+		/* Remove the last run from the runlist. */
+		rl->lcn =3D rl[1].lcn;
+		rl->length =3D 0;
+		mftbmp_ni->runlist.count--;
+	}
+	/* Deallocate the cluster. */
+	down_write(&vol->lcnbmp_lock);
+	if (ntfs_bitmap_clear_bit(vol->lcnbmp_ino, lcn)) {
+		ntfs_error(vol->sb, "Failed to free allocated cluster.%s", es);
+		NVolSetErrors(vol);
+	} else
+		ntfs_inc_free_clusters(vol, 1);
+	up_write(&vol->lcnbmp_lock);
+	if (status.mp_rebuilt) {
+		if (ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset),
+				old_alen - le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset),
+				rl2, ll, -1, NULL, NULL, NULL)) {
+			ntfs_error(vol->sb, "Failed to restore mapping pairs array.%s", es);
+			NVolSetErrors(vol);
+		}
+		if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
+			ntfs_error(vol->sb, "Failed to restore attribute record.%s", es);
+			NVolSetErrors(vol);
+		}
+		mark_mft_record_dirty(ctx->ntfs_ino);
+	} else if (status.mp_extended && ntfs_attr_update_mapping_pairs(mftbmp_ni=
, 0)) {
+		ntfs_error(vol->sb, "Failed to restore mapping pairs.%s", es);
+		NVolSetErrors(vol);
+	}
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	if (!IS_ERR(mrec))
+		unmap_mft_record(mft_ni);
+	up_write(&mftbmp_ni->runlist.lock);
+	return ret;
+}
+
+/**
+ * ntfs_mft_bitmap_extend_initialized_nolock - extend mftbmp initialized d=
ata
+ * @vol:	volume on which to extend the mft bitmap attribute
+ *
+ * Extend the initialized portion of the mft bitmap attribute on the ntfs
+ * volume @vol by 8 bytes.
+ *
+ * Note:  Only changes initialized_size and data_size, i.e. requires that
+ * allocated_size is big enough to fit the new initialized_size.
+ *
+ * Return 0 on success and -error on error.
+ *
+ * Locking: Caller must hold vol->mftbmp_lock for writing.
+ */
+static int ntfs_mft_bitmap_extend_initialized_nolock(struct ntfs_volume *v=
ol)
+{
+	s64 old_data_size, old_initialized_size;
+	unsigned long flags;
+	struct inode *mftbmp_vi;
+	struct ntfs_inode *mft_ni, *mftbmp_ni;
+	struct ntfs_attr_search_ctx *ctx;
+	struct mft_record *mrec;
+	struct attr_record *a;
+	int ret;
+
+	ntfs_debug("Extending mft bitmap initiailized (and data) size.");
+	mft_ni =3D NTFS_I(vol->mft_ino);
+	mftbmp_vi =3D vol->mftbmp_ino;
+	mftbmp_ni =3D NTFS_I(mftbmp_vi);
+	/* Get the attribute record. */
+	mrec =3D map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		return PTR_ERR(mrec);
+	}
+	ctx =3D ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		ret =3D -ENOMEM;
+		goto unm_err_out;
+	}
+	ret =3D ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb,
+			"Failed to find first attribute extent of mft bitmap attribute.");
+		if (ret =3D=3D -ENOENT)
+			ret =3D -EIO;
+		goto put_err_out;
+	}
+	a =3D ctx->attr;
+	write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	old_data_size =3D i_size_read(mftbmp_vi);
+	old_initialized_size =3D mftbmp_ni->initialized_size;
+	/*
+	 * We can simply update the initialized_size before filling the space
+	 * with zeroes because the caller is holding the mft bitmap lock for
+	 * writing which ensures that no one else is trying to access the data.
+	 */
+	mftbmp_ni->initialized_size +=3D 8;
+	a->data.non_resident.initialized_size =3D
+			cpu_to_le64(mftbmp_ni->initialized_size);
+	if (mftbmp_ni->initialized_size > old_data_size) {
+		i_size_write(mftbmp_vi, mftbmp_ni->initialized_size);
+		a->data.non_resident.data_size =3D
+				cpu_to_le64(mftbmp_ni->initialized_size);
+	}
+	write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	/* Ensure the changes make it to disk. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	/* Initialize the mft bitmap attribute value with zeroes. */
+	ret =3D ntfs_attr_set(mftbmp_ni, old_initialized_size, 8, 0);
+	if (likely(!ret)) {
+		ntfs_debug("Done.  (Wrote eight initialized bytes to mft bitmap.");
+		ntfs_inc_free_mft_records(vol, 8 * 8);
+		return 0;
+	}
+	ntfs_error(vol->sb, "Failed to write to mft bitmap.");
+	/* Try to recover from the error. */
+	mrec =3D map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.%s", es);
+		NVolSetErrors(vol);
+		return ret;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.%s", es);
+		NVolSetErrors(vol);
+		goto unm_err_out;
+	}
+	if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx)) {
+		ntfs_error(vol->sb,
+			"Failed to find first attribute extent of mft bitmap attribute.%s", es);
+		NVolSetErrors(vol);
+put_err_out:
+		ntfs_attr_put_search_ctx(ctx);
+unm_err_out:
+		unmap_mft_record(mft_ni);
+		goto err_out;
+	}
+	a =3D ctx->attr;
+	write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	mftbmp_ni->initialized_size =3D old_initialized_size;
+	a->data.non_resident.initialized_size =3D
+			cpu_to_le64(old_initialized_size);
+	if (i_size_read(mftbmp_vi) !=3D old_data_size) {
+		i_size_write(mftbmp_vi, old_data_size);
+		a->data.non_resident.data_size =3D cpu_to_le64(old_data_size);
+	}
+	write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+#ifdef DEBUG
+	read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	ntfs_debug("Restored status of mftbmp: allocated_size 0x%llx, data_size 0=
x%llx, initialized_size 0x%llx.",
+			mftbmp_ni->allocated_size, i_size_read(mftbmp_vi),
+			mftbmp_ni->initialized_size);
+	read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+#endif /* DEBUG */
+err_out:
+	return ret;
+}
+
+/**
+ * ntfs_mft_data_extend_allocation_nolock - extend mft data attribute
+ * @vol:	volume on which to extend the mft data attribute
+ *
+ * Extend the mft data attribute on the ntfs volume @vol by 16 mft records
+ * worth of clusters or if not enough space for this by one mft record wor=
th
+ * of clusters.
+ *
+ * Note:  Only changes allocated_size, i.e. does not touch initialized_siz=
e or
+ * data_size.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: - Caller must hold vol->mftbmp_lock for writing.
+ *	    - This function takes NTFS_I(vol->mft_ino)->runlist.lock for
+ *	      writing and releases it before returning.
+ *	    - This function calls functions which take vol->lcnbmp_lock for
+ *	      writing and release it before returning.
+ */
+static int ntfs_mft_data_extend_allocation_nolock(struct ntfs_volume *vol)
+{
+	s64 lcn;
+	s64 old_last_vcn;
+	s64 min_nr, nr, ll;
+	unsigned long flags;
+	struct ntfs_inode *mft_ni;
+	struct runlist_element *rl, *rl2;
+	struct ntfs_attr_search_ctx *ctx =3D NULL;
+	struct mft_record *mrec;
+	struct attr_record *a =3D NULL;
+	int ret, mp_size;
+	u32 old_alen =3D 0;
+	bool mp_rebuilt =3D false, mp_extended =3D false;
+	size_t new_rl_count;
+
+	ntfs_debug("Extending mft data allocation.");
+	mft_ni =3D NTFS_I(vol->mft_ino);
+	/*
+	 * Determine the preferred allocation location, i.e. the last lcn of
+	 * the mft data attribute.  The allocated size of the mft data
+	 * attribute cannot be zero so we are ok to do this.
+	 */
+	down_write(&mft_ni->runlist.lock);
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	ll =3D mft_ni->allocated_size;
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	rl =3D ntfs_attr_find_vcn_nolock(mft_ni,
+			(ll - 1) >> vol->cluster_size_bits, NULL);
+	if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) {
+		up_write(&mft_ni->runlist.lock);
+		ntfs_error(vol->sb,
+			"Failed to determine last allocated cluster of mft data attribute.");
+		if (!IS_ERR(rl))
+			ret =3D -EIO;
+		else
+			ret =3D PTR_ERR(rl);
+		return ret;
+	}
+	lcn =3D rl->lcn + rl->length;
+	ntfs_debug("Last lcn of mft data attribute is 0x%llx.", lcn);
+	/* Minimum allocation is one mft record worth of clusters. */
+	min_nr =3D vol->mft_record_size >> vol->cluster_size_bits;
+	if (!min_nr)
+		min_nr =3D 1;
+	/* Want to allocate 16 mft records worth of clusters. */
+	nr =3D vol->mft_record_size << 4 >> vol->cluster_size_bits;
+	if (!nr)
+		nr =3D min_nr;
+	/* Ensure we do not go above 2^32-1 mft records. */
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	ll =3D mft_ni->allocated_size;
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	if (unlikely((ll + (nr << vol->cluster_size_bits)) >>
+			vol->mft_record_size_bits >=3D (1ll << 32))) {
+		nr =3D min_nr;
+		if (unlikely((ll + (nr << vol->cluster_size_bits)) >>
+				vol->mft_record_size_bits >=3D (1ll << 32))) {
+			ntfs_warning(vol->sb,
+				"Cannot allocate mft record because the maximum number of inodes (2^32=
) has already been reached.");
+			up_write(&mft_ni->runlist.lock);
+			return -ENOSPC;
+		}
+	}
+	ntfs_debug("Trying mft data allocation with %s cluster count %lli.",
+			nr > min_nr ? "default" : "minimal", (long long)nr);
+	old_last_vcn =3D rl[1].vcn;
+	/*
+	 * We can release the mft_ni runlist lock, Because this function is
+	 * the only one that expends $MFT data attribute and is called with
+	 * mft_ni->mrec_lock.
+	 * This is required for the lock order, vol->lcnbmp_lock =3D>
+	 * mft_ni->runlist.lock.
+	 */
+	up_write(&mft_ni->runlist.lock);
+
+	do {
+		rl2 =3D ntfs_cluster_alloc(vol, old_last_vcn, nr, lcn, MFT_ZONE,
+				true, false, false);
+		if (!IS_ERR(rl2))
+			break;
+		if (PTR_ERR(rl2) !=3D -ENOSPC || nr =3D=3D min_nr) {
+			ntfs_error(vol->sb,
+				"Failed to allocate the minimal number of clusters (%lli) for the mft =
data attribute.",
+				nr);
+			return PTR_ERR(rl2);
+		}
+		/*
+		 * There is not enough space to do the allocation, but there
+		 * might be enough space to do a minimal allocation so try that
+		 * before failing.
+		 */
+		nr =3D min_nr;
+		ntfs_debug("Retrying mft data allocation with minimal cluster count %lli=
.", nr);
+	} while (1);
+
+	down_write(&mft_ni->runlist.lock);
+	rl =3D ntfs_runlists_merge(&mft_ni->runlist, rl2, 0, &new_rl_count);
+	if (IS_ERR(rl)) {
+		up_write(&mft_ni->runlist.lock);
+		ntfs_error(vol->sb, "Failed to merge runlists for mft data attribute.");
+		if (ntfs_cluster_free_from_rl(vol, rl2)) {
+			ntfs_error(vol->sb,
+				"Failed to deallocate clusters from the mft data attribute.%s", es);
+			NVolSetErrors(vol);
+		}
+		ntfs_free(rl2);
+		return PTR_ERR(rl);
+	}
+	mft_ni->runlist.rl =3D rl;
+	mft_ni->runlist.count =3D new_rl_count;
+	ntfs_debug("Allocated %lli clusters.", (long long)nr);
+	/* Find the last run in the new runlist. */
+	for (; rl[1].length; rl++)
+		;
+	up_write(&mft_ni->runlist.lock);
+
+	/* Update the attribute record as well. */
+	mrec =3D map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		ret =3D PTR_ERR(mrec);
+		down_write(&mft_ni->runlist.lock);
+		goto undo_alloc;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		ret =3D -ENOMEM;
+		goto undo_alloc;
+	}
+	ret =3D ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+			CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb, "Failed to find last attribute extent of mft data at=
tribute.");
+		if (ret =3D=3D -ENOENT)
+			ret =3D -EIO;
+		goto undo_alloc;
+	}
+	a =3D ctx->attr;
+	ll =3D le64_to_cpu(a->data.non_resident.lowest_vcn);
+
+	down_write(&mft_ni->runlist.lock);
+	/* Search back for the previous last allocated cluster of mft bitmap. */
+	for (rl2 =3D rl; rl2 > mft_ni->runlist.rl; rl2--) {
+		if (ll >=3D rl2->vcn)
+			break;
+	}
+	WARN_ON(ll < rl2->vcn);
+	WARN_ON(ll >=3D rl2->vcn + rl2->length);
+	/* Get the size for the new mapping pairs array for this extent. */
+	mp_size =3D ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1, -1);
+	if (unlikely(mp_size <=3D 0)) {
+		ntfs_error(vol->sb,
+			"Get size for mapping pairs failed for mft data attribute extent.");
+		ret =3D mp_size;
+		if (!ret)
+			ret =3D -EIO;
+		up_write(&mft_ni->runlist.lock);
+		goto undo_alloc;
+	}
+	up_write(&mft_ni->runlist.lock);
+
+	/* Expand the attribute record if necessary. */
+	old_alen =3D le32_to_cpu(a->length);
+	ret =3D ntfs_attr_record_resize(ctx->mrec, a, mp_size +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
+	if (unlikely(ret)) {
+		ret =3D ntfs_mft_attr_extend(mft_ni);
+		if (!ret)
+			goto extended_ok;
+		mp_extended =3D true;
+		goto undo_alloc;
+	}
+	mp_rebuilt =3D true;
+	/* Generate the mapping pairs array directly into the attr record. */
+	ret =3D ntfs_mapping_pairs_build(vol, (u8 *)a +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
+			mp_size, rl2, ll, -1, NULL, NULL, NULL);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb, "Failed to build mapping pairs array of mft data att=
ribute.");
+		goto undo_alloc;
+	}
+	/* Update the highest_vcn. */
+	a->data.non_resident.highest_vcn =3D cpu_to_le64(rl[1].vcn - 1);
+	/*
+	 * We now have extended the mft data allocated_size by nr clusters.
+	 * Reflect this in the struct ntfs_inode structure and the attribute reco=
rd.
+	 * @rl is the last (non-terminator) runlist element of mft data
+	 * attribute.
+	 */
+	if (a->data.non_resident.lowest_vcn) {
+		/*
+		 * We are not in the first attribute extent, switch to it, but
+		 * first ensure the changes will make it to disk later.
+		 */
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		ntfs_attr_reinit_search_ctx(ctx);
+		ret =3D ntfs_attr_lookup(mft_ni->type, mft_ni->name,
+				mft_ni->name_len, CASE_SENSITIVE, 0, NULL, 0,
+				ctx);
+		if (unlikely(ret)) {
+			ntfs_error(vol->sb,
+				"Failed to find first attribute extent of mft data attribute.");
+			goto restore_undo_alloc;
+		}
+		a =3D ctx->attr;
+	}
+
+extended_ok:
+	write_lock_irqsave(&mft_ni->size_lock, flags);
+	mft_ni->allocated_size +=3D nr << vol->cluster_size_bits;
+	a->data.non_resident.allocated_size =3D
+			cpu_to_le64(mft_ni->allocated_size);
+	write_unlock_irqrestore(&mft_ni->size_lock, flags);
+	/* Ensure the changes make it to disk. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	ntfs_debug("Done.");
+	return 0;
+restore_undo_alloc:
+	ntfs_attr_reinit_search_ctx(ctx);
+	if (ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+			CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx)) {
+		ntfs_error(vol->sb,
+			"Failed to find last attribute extent of mft data attribute.%s", es);
+		write_lock_irqsave(&mft_ni->size_lock, flags);
+		mft_ni->allocated_size +=3D nr << vol->cluster_size_bits;
+		write_unlock_irqrestore(&mft_ni->size_lock, flags);
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(mft_ni);
+		up_write(&mft_ni->runlist.lock);
+		/*
+		 * The only thing that is now wrong is ->allocated_size of the
+		 * base attribute extent which chkdsk should be able to fix.
+		 */
+		NVolSetErrors(vol);
+		return ret;
+	}
+	ctx->attr->data.non_resident.highest_vcn =3D
+			cpu_to_le64(old_last_vcn - 1);
+undo_alloc:
+	if (ntfs_cluster_free(mft_ni, old_last_vcn, -1, ctx) < 0) {
+		ntfs_error(vol->sb, "Failed to free clusters from mft data attribute.%s"=
, es);
+		NVolSetErrors(vol);
+	}
+
+	if (ntfs_rl_truncate_nolock(vol, &mft_ni->runlist, old_last_vcn)) {
+		ntfs_error(vol->sb, "Failed to truncate mft data attribute runlist.%s", =
es);
+		NVolSetErrors(vol);
+	}
+	if (mp_extended && ntfs_attr_update_mapping_pairs(mft_ni, 0)) {
+		ntfs_error(vol->sb, "Failed to restore mapping pairs.%s",
+			   es);
+		NVolSetErrors(vol);
+	}
+	if (ctx) {
+		a =3D ctx->attr;
+		if (mp_rebuilt && !IS_ERR(ctx->mrec)) {
+			if (ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset),
+				old_alen - le16_to_cpu(
+					a->data.non_resident.mapping_pairs_offset),
+				rl2, ll, -1, NULL, NULL, NULL)) {
+				ntfs_error(vol->sb, "Failed to restore mapping pairs array.%s", es);
+				NVolSetErrors(vol);
+			}
+			if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
+				ntfs_error(vol->sb, "Failed to restore attribute record.%s", es);
+				NVolSetErrors(vol);
+			}
+			mark_mft_record_dirty(ctx->ntfs_ino);
+		} else if (IS_ERR(ctx->mrec)) {
+			ntfs_error(vol->sb, "Failed to restore attribute search context.%s", es=
);
+			NVolSetErrors(vol);
+		}
+		ntfs_attr_put_search_ctx(ctx);
+	}
+	if (!IS_ERR(mrec))
+		unmap_mft_record(mft_ni);
+	return ret;
+}
+
+/**
+ * ntfs_mft_record_layout - layout an mft record into a memory buffer
+ * @vol:	volume to which the mft record will belong
+ * @mft_no:	mft reference specifying the mft record number
+ * @m:		destination buffer of size >=3D @vol->mft_record_size bytes
+ *
+ * Layout an empty, unused mft record with the mft record number @mft_no i=
nto
+ * the buffer @m.  The volume @vol is needed because the mft record struct=
ure
+ * was modified in NTFS 3.1 so we need to know which volume version this m=
ft
+ * record will be used on.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_mft_record_layout(const struct ntfs_volume *vol, const s64=
 mft_no,
+		struct mft_record *m)
+{
+	struct attr_record *a;
+
+	ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
+	if (mft_no >=3D (1ll << 32)) {
+		ntfs_error(vol->sb, "Mft record number 0x%llx exceeds maximum of 2^32.",
+				(long long)mft_no);
+		return -ERANGE;
+	}
+	/* Start by clearing the whole mft record to gives us a clean slate. */
+	memset(m, 0, vol->mft_record_size);
+	/* Aligned to 2-byte boundary. */
+	if (vol->major_ver < 3 || (vol->major_ver =3D=3D 3 && !vol->minor_ver))
+		m->usa_ofs =3D cpu_to_le16((sizeof(struct mft_record_old) + 1) & ~1);
+	else {
+		m->usa_ofs =3D cpu_to_le16((sizeof(struct mft_record) + 1) & ~1);
+		/*
+		 * Set the NTFS 3.1+ specific fields while we know that the
+		 * volume version is 3.1+.
+		 */
+		m->reserved =3D 0;
+		m->mft_record_number =3D cpu_to_le32((u32)mft_no);
+	}
+	m->magic =3D magic_FILE;
+	if (vol->mft_record_size >=3D NTFS_BLOCK_SIZE)
+		m->usa_count =3D cpu_to_le16(vol->mft_record_size /
+				NTFS_BLOCK_SIZE + 1);
+	else {
+		m->usa_count =3D cpu_to_le16(1);
+		ntfs_warning(vol->sb,
+			"Sector size is bigger than mft record size.  Setting usa_count to 1.  =
If chkdsk reports this as corruption");
+	}
+	/* Set the update sequence number to 1. */
+	*(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs)) =3D cpu_to_le16(1);
+	m->lsn =3D 0;
+	m->sequence_number =3D cpu_to_le16(1);
+	m->link_count =3D 0;
+	/*
+	 * Place the attributes straight after the update sequence array,
+	 * aligned to 8-byte boundary.
+	 */
+	m->attrs_offset =3D cpu_to_le16((le16_to_cpu(m->usa_ofs) +
+			(le16_to_cpu(m->usa_count) << 1) + 7) & ~7);
+	m->flags =3D 0;
+	/*
+	 * Using attrs_offset plus eight bytes (for the termination attribute).
+	 * attrs_offset is already aligned to 8-byte boundary, so no need to
+	 * align again.
+	 */
+	m->bytes_in_use =3D cpu_to_le32(le16_to_cpu(m->attrs_offset) + 8);
+	m->bytes_allocated =3D cpu_to_le32(vol->mft_record_size);
+	m->base_mft_record =3D 0;
+	m->next_attr_instance =3D 0;
+	/* Add the termination attribute. */
+	a =3D (struct attr_record *)((u8 *)m + le16_to_cpu(m->attrs_offset));
+	a->type =3D AT_END;
+	a->length =3D 0;
+	ntfs_debug("Done.");
+	return 0;
+}
+
+/**
+ * ntfs_mft_record_format - format an mft record on an ntfs volume
+ * @vol:	volume on which to format the mft record
+ * @mft_no:	mft record number to format
+ *
+ * Format the mft record @mft_no in $MFT/$DATA, i.e. lay out an empty, unu=
sed
+ * mft record into the appropriate place of the mft data attribute.  This =
is
+ * used when extending the mft data attribute.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_mft_record_format(const struct ntfs_volume *vol, const s64=
 mft_no)
+{
+	loff_t i_size;
+	struct inode *mft_vi =3D vol->mft_ino;
+	struct folio *folio;
+	struct mft_record *m;
+	pgoff_t index, end_index;
+	unsigned int ofs;
+	int err;
+
+	ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
+	/*
+	 * The index into the page cache and the offset within the page cache
+	 * page of the wanted mft record.
+	 */
+	index =3D mft_no << vol->mft_record_size_bits >> PAGE_SHIFT;
+	ofs =3D (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
+	/* The maximum valid index into the page cache for $MFT's data. */
+	i_size =3D i_size_read(mft_vi);
+	end_index =3D i_size >> PAGE_SHIFT;
+	if (unlikely(index >=3D end_index)) {
+		if (unlikely(index > end_index ||
+			     ofs + vol->mft_record_size > (i_size & ~PAGE_MASK))) {
+			ntfs_error(vol->sb, "Tried to format non-existing mft record 0x%llx.",
+					(long long)mft_no);
+			return -ENOENT;
+		}
+	}
+
+	/* Read, map, and pin the folio containing the mft record. */
+	folio =3D ntfs_read_mapping_folio(mft_vi->i_mapping, index);
+	if (IS_ERR(folio)) {
+		ntfs_error(vol->sb, "Failed to map page containing mft record to format =
0x%llx.",
+				(long long)mft_no);
+		return PTR_ERR(folio);
+	}
+	folio_lock(folio);
+	folio_clear_uptodate(folio);
+	m =3D (struct mft_record *)((u8 *)kmap_local_folio(folio, 0) + ofs);
+	err =3D ntfs_mft_record_layout(vol, mft_no, m);
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Failed to layout mft record 0x%llx.",
+				(long long)mft_no);
+		folio_mark_uptodate(folio);
+		folio_unlock(folio);
+		ntfs_unmap_folio(folio, m);
+		return err;
+	}
+	pre_write_mst_fixup((struct ntfs_record *)m, vol->mft_record_size);
+	flush_dcache_folio(folio);
+	folio_mark_uptodate(folio);
+	/*
+	 * Make sure the mft record is written out to disk.  We could use
+	 * ilookup5() to check if an inode is in icache and so on but this is
+	 * unnecessary as ntfs_writepage() will write the dirty record anyway.
+	 */
+	mark_ntfs_record_dirty(folio);
+	folio_unlock(folio);
+	ntfs_unmap_folio(folio, m);
+	ntfs_debug("Done.");
+	return 0;
+}
+
+/**
+ * ntfs_mft_record_alloc - allocate an mft record on an ntfs volume
+ * @vol:	[IN]  volume on which to allocate the mft record
+ * @mode:	[IN]  mode if want a file or directory, i.e. base inode or 0
+ * @base_ni:	[IN]  open base inode if allocating an extent mft record or N=
ULL
+ * @ni_mrec:	[OUT] on successful return this is the mapped mft record
+ *
+ * Allocate an mft record in $MFT/$DATA of an open ntfs volume @vol.
+ *
+ * If @base_ni is NULL make the mft record a base mft record, i.e. a file =
or
+ * direvctory inode, and allocate it at the default allocator position.  In
+ * this case @mode is the file mode as given to us by the caller.  We in
+ * particular use @mode to distinguish whether a file or a directory is be=
ing
+ * created (S_IFDIR(mode) and S_IFREG(mode), respectively).
+ *
+ * If @base_ni is not NULL make the allocated mft record an extent record,
+ * allocate it starting at the mft record after the base mft record and at=
tach
+ * the allocated and opened ntfs inode to the base inode @base_ni.  In this
+ * case @mode must be 0 as it is meaningless for extent inodes.
+ *
+ * You need to check the return value with IS_ERR().  If false, the functi=
on
+ * was successful and the return value is the now opened ntfs inode of the
+ * allocated mft record.  *@mrec is then set to the allocated, mapped, pin=
ned,
+ * and locked mft record.  If IS_ERR() is true, the function failed and the
+ * error code is obtained from PTR_ERR(return value).  *@mrec is undefined=
 in
+ * this case.
+ *
+ * Allocation strategy:
+ *
+ * To find a free mft record, we scan the mft bitmap for a zero bit.  To
+ * optimize this we start scanning at the place specified by @base_ni or if
+ * @base_ni is NULL we start where we last stopped and we perform wrap aro=
und
+ * when we reach the end.  Note, we do not try to allocate mft records bel=
ow
+ * number 64 because numbers 0 to 15 are the defined system files anyway a=
nd 16
+ * to 64 are special in that they are used for storing extension mft recor=
ds
+ * for the $DATA attribute of $MFT.  This is required to avoid the possibi=
lity
+ * of creating a runlist with a circular dependency which once written to =
disk
+ * can never be read in again.  Windows will only use records 16 to 24 for
+ * normal files if the volume is completely out of space.  We never use th=
em
+ * which means that when the volume is really out of space we cannot creat=
e any
+ * more files while Windows can still create up to 8 small files.  We can =
start
+ * doing this at some later time, it does not matter much for now.
+ *
+ * When scanning the mft bitmap, we only search up to the last allocated m=
ft
+ * record.  If there are no free records left in the range 64 to number of
+ * allocated mft records, then we extend the $MFT/$DATA attribute in order=
 to
+ * create free mft records.  We extend the allocated size of $MFT/$DATA by=
 16
+ * records at a time or one cluster, if cluster size is above 16kiB.  If t=
here
+ * is not sufficient space to do this, we try to extend by a single mft re=
cord
+ * or one cluster, if cluster size is above the mft record size.
+ *
+ * No matter how many mft records we allocate, we initialize only the first
+ * allocated mft record, incrementing mft data size and initialized size
+ * accordingly, open an struct ntfs_inode for it and return it to the call=
er, unless
+ * there are less than 64 mft records, in which case we allocate and initi=
alize
+ * mft records until we reach record 64 which we consider as the first fre=
e mft
+ * record for use by normal files.
+ *
+ * If during any stage we overflow the initialized data in the mft bitmap,=
 we
+ * extend the initialized size (and data size) by 8 bytes, allocating anot=
her
+ * cluster if required.  The bitmap data size has to be at least equal to =
the
+ * number of mft records in the mft, but it can be bigger, in which case t=
he
+ * superfluous bits are padded with zeroes.
+ *
+ * Thus, when we return successfully (IS_ERR() is false), we will have:
+ *	- initialized / extended the mft bitmap if necessary,
+ *	- initialized / extended the mft data if necessary,
+ *	- set the bit corresponding to the mft record being allocated in the
+ *	  mft bitmap,
+ *	- opened an struct ntfs_inode for the allocated mft record, and we will=
 have
+ *	- returned the struct ntfs_inode as well as the allocated mapped, pinne=
d, and
+ *	  locked mft record.
+ *
+ * On error, the volume will be left in a consistent state and no record w=
ill
+ * be allocated.  If rolling back a partial operation fails, we may leave =
some
+ * inconsistent metadata in which case we set NVolErrors() so the volume is
+ * left dirty when unmounted.
+ *
+ * Note, this function cannot make use of most of the normal functions, li=
ke
+ * for example for attribute resizing, etc, because when the run list over=
flows
+ * the base mft record and an attribute list is used, it is very important=
 that
+ * the extension mft records used to store the $DATA attribute of $MFT can=
 be
+ * reached without having to read the information contained inside them, as
+ * this would make it impossible to find them in the first place after the
+ * volume is unmounted.  $MFT/$BITMAP probably does not need to follow this
+ * rule because the bitmap is not essential for finding the mft records, b=
ut on
+ * the other hand, handling the bitmap in this special way would make life
+ * easier because otherwise there might be circular invocations of functio=
ns
+ * when reading the bitmap.
+ */
+int ntfs_mft_record_alloc(struct ntfs_volume *vol, const int mode,
+			  struct ntfs_inode **ni, struct ntfs_inode *base_ni,
+			  struct mft_record **ni_mrec)
+{
+	s64 ll, bit, old_data_initialized, old_data_size;
+	unsigned long flags;
+	struct folio *folio;
+	struct ntfs_inode *mft_ni, *mftbmp_ni;
+	struct ntfs_attr_search_ctx *ctx;
+	struct mft_record *m =3D NULL;
+	struct attr_record *a;
+	pgoff_t index;
+	unsigned int ofs;
+	int err;
+	__le16 seq_no, usn;
+	bool record_formatted =3D false;
+	unsigned int memalloc_flags;
+
+	if (base_ni && *ni)
+		return -EINVAL;
+
+	/* @mode and @base_ni are mutually exclusive. */
+	if (mode && base_ni)
+		return -EINVAL;
+
+	if (base_ni)
+		ntfs_debug("Entering (allocating an extent mft record for base mft recor=
d 0x%llx).",
+				(long long)base_ni->mft_no);
+	else
+		ntfs_debug("Entering (allocating a base mft record).");
+
+	memalloc_flags =3D memalloc_nofs_save();
+
+	mft_ni =3D NTFS_I(vol->mft_ino);
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		mutex_lock(&mft_ni->mrec_lock);
+	mftbmp_ni =3D NTFS_I(vol->mftbmp_ino);
+search_free_rec:
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		down_write(&vol->mftbmp_lock);
+	bit =3D ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(vol, base_ni);
+	if (bit >=3D 0) {
+		ntfs_debug("Found and allocated free record (#1), bit 0x%llx.",
+				(long long)bit);
+		goto have_alloc_rec;
+	}
+	if (bit !=3D -ENOSPC) {
+		if (!base_ni || base_ni->mft_no !=3D FILE_MFT) {
+			up_write(&vol->mftbmp_lock);
+			mutex_unlock(&mft_ni->mrec_lock);
+		}
+		memalloc_nofs_restore(memalloc_flags);
+		return bit;
+	}
+
+	if (base_ni && base_ni->mft_no =3D=3D FILE_MFT) {
+		memalloc_nofs_restore(memalloc_flags);
+		return bit;
+	}
+
+	/*
+	 * No free mft records left.  If the mft bitmap already covers more
+	 * than the currently used mft records, the next records are all free,
+	 * so we can simply allocate the first unused mft record.
+	 * Note: We also have to make sure that the mft bitmap at least covers
+	 * the first 24 mft records as they are special and whilst they may not
+	 * be in use, we do not allocate from them.
+	 */
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	ll =3D mft_ni->initialized_size >> vol->mft_record_size_bits;
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	old_data_initialized =3D mftbmp_ni->initialized_size;
+	read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	if (old_data_initialized << 3 > ll &&
+	    old_data_initialized > RESERVED_MFT_RECORDS / 8) {
+		bit =3D ll;
+		if (bit < RESERVED_MFT_RECORDS)
+			bit =3D RESERVED_MFT_RECORDS;
+		if (unlikely(bit >=3D (1ll << 32)))
+			goto max_err_out;
+		ntfs_debug("Found free record (#2), bit 0x%llx.",
+				(long long)bit);
+		goto found_free_rec;
+	}
+	/*
+	 * The mft bitmap needs to be expanded until it covers the first unused
+	 * mft record that we can allocate.
+	 * Note: The smallest mft record we allocate is mft record 24.
+	 */
+	bit =3D old_data_initialized << 3;
+	if (unlikely(bit >=3D (1ll << 32)))
+		goto max_err_out;
+	read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	old_data_size =3D mftbmp_ni->allocated_size;
+	ntfs_debug("Status of mftbmp before extension: allocated_size 0x%llx, dat=
a_size 0x%llx, initialized_size 0x%llx.",
+			old_data_size, i_size_read(vol->mftbmp_ino),
+			old_data_initialized);
+	read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+	if (old_data_initialized + 8 > old_data_size) {
+		/* Need to extend bitmap by one more cluster. */
+		ntfs_debug("mftbmp: initialized_size + 8 > allocated_size.");
+		err =3D ntfs_mft_bitmap_extend_allocation_nolock(vol);
+		if (err =3D=3D -EAGAIN)
+			err =3D ntfs_mft_bitmap_extend_allocation_nolock(vol);
+
+		if (unlikely(err)) {
+			if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+				up_write(&vol->mftbmp_lock);
+			goto err_out;
+		}
+#ifdef DEBUG
+		read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+		ntfs_debug("Status of mftbmp after allocation extension: allocated_size =
0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+				mftbmp_ni->allocated_size,
+				i_size_read(vol->mftbmp_ino),
+				mftbmp_ni->initialized_size);
+		read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+#endif /* DEBUG */
+	}
+	/*
+	 * We now have sufficient allocated space, extend the initialized_size
+	 * as well as the data_size if necessary and fill the new space with
+	 * zeroes.
+	 */
+	err =3D ntfs_mft_bitmap_extend_initialized_nolock(vol);
+	if (unlikely(err)) {
+		if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+			up_write(&vol->mftbmp_lock);
+		goto err_out;
+	}
+#ifdef DEBUG
+	read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+	ntfs_debug("Status of mftbmp after initialized extension: allocated_size =
0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+			mftbmp_ni->allocated_size,
+			i_size_read(vol->mftbmp_ino),
+			mftbmp_ni->initialized_size);
+	read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+#endif /* DEBUG */
+	ntfs_debug("Found free record (#3), bit 0x%llx.", (long long)bit);
+found_free_rec:
+	/* @bit is the found free mft record, allocate it in the mft bitmap. */
+	ntfs_debug("At found_free_rec.");
+	err =3D ntfs_bitmap_set_bit(vol->mftbmp_ino, bit);
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Failed to allocate bit in mft bitmap.");
+		if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+			up_write(&vol->mftbmp_lock);
+		goto err_out;
+	}
+	ntfs_debug("Set bit 0x%llx in mft bitmap.", (long long)bit);
+have_alloc_rec:
+	/*
+	 * The mft bitmap is now uptodate.  Deal with mft data attribute now.
+	 * Note, we keep hold of the mft bitmap lock for writing until all
+	 * modifications to the mft data attribute are complete, too, as they
+	 * will impact decisions for mft bitmap and mft record allocation done
+	 * by a parallel allocation and if the lock is not maintained a
+	 * parallel allocation could allocate the same mft record as this one.
+	 */
+	ll =3D (bit + 1) << vol->mft_record_size_bits;
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	old_data_initialized =3D mft_ni->initialized_size;
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	if (ll <=3D old_data_initialized) {
+		ntfs_debug("Allocated mft record already initialized.");
+		goto mft_rec_already_initialized;
+	}
+	ntfs_debug("Initializing allocated mft record.");
+	/*
+	 * The mft record is outside the initialized data.  Extend the mft data
+	 * attribute until it covers the allocated record.  The loop is only
+	 * actually traversed more than once when a freshly formatted volume is
+	 * first written to so it optimizes away nicely in the common case.
+	 */
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT) {
+		read_lock_irqsave(&mft_ni->size_lock, flags);
+		ntfs_debug("Status of mft data before extension: allocated_size 0x%llx, =
data_size 0x%llx, initialized_size 0x%llx.",
+				mft_ni->allocated_size, i_size_read(vol->mft_ino),
+				mft_ni->initialized_size);
+		while (ll > mft_ni->allocated_size) {
+			read_unlock_irqrestore(&mft_ni->size_lock, flags);
+			err =3D ntfs_mft_data_extend_allocation_nolock(vol);
+			if (err =3D=3D -EAGAIN)
+				err =3D ntfs_mft_data_extend_allocation_nolock(vol);
+
+			if (unlikely(err)) {
+				ntfs_error(vol->sb, "Failed to extend mft data allocation.");
+				goto undo_mftbmp_alloc_nolock;
+			}
+			read_lock_irqsave(&mft_ni->size_lock, flags);
+			ntfs_debug("Status of mft data after allocation extension: allocated_si=
ze 0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+					mft_ni->allocated_size, i_size_read(vol->mft_ino),
+					mft_ni->initialized_size);
+		}
+		read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	} else if (ll > mft_ni->allocated_size) {
+		err =3D -ENOSPC;
+		goto undo_mftbmp_alloc_nolock;
+	}
+	/*
+	 * Extend mft data initialized size (and data size of course) to reach
+	 * the allocated mft record, formatting the mft records allong the way.
+	 * Note: We only modify the struct ntfs_inode structure as that is all th=
at is
+	 * needed by ntfs_mft_record_format().  We will update the attribute
+	 * record itself in one fell swoop later on.
+	 */
+	write_lock_irqsave(&mft_ni->size_lock, flags);
+	old_data_initialized =3D mft_ni->initialized_size;
+	old_data_size =3D vol->mft_ino->i_size;
+	while (ll > mft_ni->initialized_size) {
+		s64 new_initialized_size, mft_no;
+
+		new_initialized_size =3D mft_ni->initialized_size +
+				vol->mft_record_size;
+		mft_no =3D mft_ni->initialized_size >> vol->mft_record_size_bits;
+		if (new_initialized_size > i_size_read(vol->mft_ino))
+			i_size_write(vol->mft_ino, new_initialized_size);
+		write_unlock_irqrestore(&mft_ni->size_lock, flags);
+		ntfs_debug("Initializing mft record 0x%llx.",
+				(long long)mft_no);
+		err =3D ntfs_mft_record_format(vol, mft_no);
+		if (unlikely(err)) {
+			ntfs_error(vol->sb, "Failed to format mft record.");
+			goto undo_data_init;
+		}
+		write_lock_irqsave(&mft_ni->size_lock, flags);
+		mft_ni->initialized_size =3D new_initialized_size;
+	}
+	write_unlock_irqrestore(&mft_ni->size_lock, flags);
+	record_formatted =3D true;
+	/* Update the mft data attribute record to reflect the new sizes. */
+	m =3D map_mft_record(mft_ni);
+	if (IS_ERR(m)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		err =3D PTR_ERR(m);
+		goto undo_data_init;
+	}
+	ctx =3D ntfs_attr_get_search_ctx(mft_ni, m);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		err =3D -ENOMEM;
+		unmap_mft_record(mft_ni);
+		goto undo_data_init;
+	}
+	err =3D ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Failed to find first attribute extent of mft data a=
ttribute.");
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(mft_ni);
+		goto undo_data_init;
+	}
+	a =3D ctx->attr;
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	a->data.non_resident.initialized_size =3D
+			cpu_to_le64(mft_ni->initialized_size);
+	a->data.non_resident.data_size =3D
+			cpu_to_le64(i_size_read(vol->mft_ino));
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+	/* Ensure the changes make it to disk. */
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	read_lock_irqsave(&mft_ni->size_lock, flags);
+	ntfs_debug("Status of mft data after mft record initialization: allocated=
_size 0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+			mft_ni->allocated_size,	i_size_read(vol->mft_ino),
+			mft_ni->initialized_size);
+	WARN_ON(i_size_read(vol->mft_ino) > mft_ni->allocated_size);
+	WARN_ON(mft_ni->initialized_size > i_size_read(vol->mft_ino));
+	read_unlock_irqrestore(&mft_ni->size_lock, flags);
+mft_rec_already_initialized:
+	/*
+	 * We can finally drop the mft bitmap lock as the mft data attribute
+	 * has been fully updated.  The only disparity left is that the
+	 * allocated mft record still needs to be marked as in use to match the
+	 * set bit in the mft bitmap but this is actually not a problem since
+	 * this mft record is not referenced from anywhere yet and the fact
+	 * that it is allocated in the mft bitmap means that no-one will try to
+	 * allocate it either.
+	 */
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		up_write(&vol->mftbmp_lock);
+	/*
+	 * We now have allocated and initialized the mft record.  Calculate the
+	 * index of and the offset within the page cache page the record is in.
+	 */
+	index =3D bit << vol->mft_record_size_bits >> PAGE_SHIFT;
+	ofs =3D (bit << vol->mft_record_size_bits) & ~PAGE_MASK;
+	/* Read, map, and pin the folio containing the mft record. */
+	folio =3D ntfs_read_mapping_folio(vol->mft_ino->i_mapping, index);
+	if (IS_ERR(folio)) {
+		ntfs_error(vol->sb, "Failed to map page containing allocated mft record =
0x%llx.",
+				bit);
+		err =3D PTR_ERR(folio);
+		goto undo_mftbmp_alloc;
+	}
+	folio_lock(folio);
+	folio_clear_uptodate(folio);
+	m =3D (struct mft_record *)((u8 *)kmap_local_folio(folio, 0) + ofs);
+	/* If we just formatted the mft record no need to do it again. */
+	if (!record_formatted) {
+		/* Sanity check that the mft record is really not in use. */
+		if (ntfs_is_file_record(m->magic) &&
+				(m->flags & MFT_RECORD_IN_USE)) {
+			ntfs_warning(vol->sb,
+				"Mft record 0x%llx was marked free in mft bitmap but is marked used it=
self. Unmount and run chkdsk.",
+				bit);
+			folio_mark_uptodate(folio);
+			folio_unlock(folio);
+			ntfs_unmap_folio(folio, m);
+			NVolSetErrors(vol);
+			goto search_free_rec;
+		}
+		/*
+		 * We need to (re-)format the mft record, preserving the
+		 * sequence number if it is not zero as well as the update
+		 * sequence number if it is not zero or -1 (0xffff).  This
+		 * means we do not need to care whether or not something went
+		 * wrong with the previous mft record.
+		 */
+		seq_no =3D m->sequence_number;
+		usn =3D *(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs));
+		err =3D ntfs_mft_record_layout(vol, bit, m);
+		if (unlikely(err)) {
+			ntfs_error(vol->sb, "Failed to layout allocated mft record 0x%llx.",
+					bit);
+			folio_mark_uptodate(folio);
+			folio_unlock(folio);
+			ntfs_unmap_folio(folio, m);
+			goto undo_mftbmp_alloc;
+		}
+		if (seq_no)
+			m->sequence_number =3D seq_no;
+		if (usn && le16_to_cpu(usn) !=3D 0xffff)
+			*(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs)) =3D usn;
+		pre_write_mst_fixup((struct ntfs_record *)m, vol->mft_record_size);
+	}
+	/* Set the mft record itself in use. */
+	m->flags |=3D MFT_RECORD_IN_USE;
+	if (S_ISDIR(mode))
+		m->flags |=3D MFT_RECORD_IS_DIRECTORY;
+	flush_dcache_folio(folio);
+	folio_mark_uptodate(folio);
+	if (base_ni) {
+		struct mft_record *m_tmp;
+
+		/*
+		 * Setup the base mft record in the extent mft record.  This
+		 * completes initialization of the allocated extent mft record
+		 * and we can simply use it with map_extent_mft_record().
+		 */
+		m->base_mft_record =3D MK_LE_MREF(base_ni->mft_no,
+				base_ni->seq_no);
+		/*
+		 * Allocate an extent inode structure for the new mft record,
+		 * attach it to the base inode @base_ni and map, pin, and lock
+		 * its, i.e. the allocated, mft record.
+		 */
+		m_tmp =3D map_extent_mft_record(base_ni,
+					      MK_MREF(bit, le16_to_cpu(m->sequence_number)),
+					      ni);
+		if (IS_ERR(m_tmp)) {
+			ntfs_error(vol->sb, "Failed to map allocated extent mft record 0x%llx.",
+					bit);
+			err =3D PTR_ERR(m_tmp);
+			/* Set the mft record itself not in use. */
+			m->flags &=3D cpu_to_le16(
+					~le16_to_cpu(MFT_RECORD_IN_USE));
+			flush_dcache_folio(folio);
+			/* Make sure the mft record is written out to disk. */
+			mark_ntfs_record_dirty(folio);
+			folio_unlock(folio);
+			ntfs_unmap_folio(folio, m);
+			goto undo_mftbmp_alloc;
+		}
+
+		/*
+		 * Make sure the allocated mft record is written out to disk.
+		 * No need to set the inode dirty because the caller is going
+		 * to do that anyway after finishing with the new extent mft
+		 * record (e.g. at a minimum a new attribute will be added to
+		 * the mft record.
+		 */
+		mark_ntfs_record_dirty(folio);
+		folio_unlock(folio);
+		/*
+		 * Need to unmap the page since map_extent_mft_record() mapped
+		 * it as well so we have it mapped twice at the moment.
+		 */
+		ntfs_unmap_folio(folio, m);
+	} else {
+		/*
+		 * Manually map, pin, and lock the mft record as we already
+		 * have its page mapped and it is very easy to do.
+		 */
+		(*ni)->seq_no =3D le16_to_cpu(m->sequence_number);
+		/*
+		 * Make sure the allocated mft record is written out to disk.
+		 * NOTE: We do not set the ntfs inode dirty because this would
+		 * fail in ntfs_write_inode() because the inode does not have a
+		 * standard information attribute yet.  Also, there is no need
+		 * to set the inode dirty because the caller is going to do
+		 * that anyway after finishing with the new mft record (e.g. at
+		 * a minimum some new attributes will be added to the mft
+		 * record.
+		 */
+
+		(*ni)->mrec =3D kmalloc(vol->mft_record_size, GFP_NOFS);
+		if (!(*ni)->mrec) {
+			folio_unlock(folio);
+			ntfs_unmap_folio(folio, m);
+			goto undo_mftbmp_alloc;
+		}
+
+		memcpy((*ni)->mrec, m, vol->mft_record_size);
+		post_read_mst_fixup((struct ntfs_record *)(*ni)->mrec, vol->mft_record_s=
ize);
+		mark_ntfs_record_dirty(folio);
+		folio_unlock(folio);
+		(*ni)->folio =3D folio;
+		(*ni)->folio_ofs =3D ofs;
+		atomic_inc(&(*ni)->count);
+		/* Update the default mft allocation position. */
+		vol->mft_data_pos =3D bit + 1;
+	}
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		mutex_unlock(&NTFS_I(vol->mft_ino)->mrec_lock);
+	memalloc_nofs_restore(memalloc_flags);
+
+	/*
+	 * Return the opened, allocated inode of the allocated mft record as
+	 * well as the mapped, pinned, and locked mft record.
+	 */
+	ntfs_debug("Returning opened, allocated %sinode 0x%llx.",
+			base_ni ? "extent " : "", bit);
+	(*ni)->mft_no =3D bit;
+	if (ni_mrec)
+		*ni_mrec =3D (*ni)->mrec;
+	ntfs_dec_free_mft_records(vol, 1);
+	return 0;
+undo_data_init:
+	write_lock_irqsave(&mft_ni->size_lock, flags);
+	mft_ni->initialized_size =3D old_data_initialized;
+	i_size_write(vol->mft_ino, old_data_size);
+	write_unlock_irqrestore(&mft_ni->size_lock, flags);
+	goto undo_mftbmp_alloc_nolock;
+undo_mftbmp_alloc:
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		down_write(&vol->mftbmp_lock);
+undo_mftbmp_alloc_nolock:
+	if (ntfs_bitmap_clear_bit(vol->mftbmp_ino, bit)) {
+		ntfs_error(vol->sb, "Failed to clear bit in mft bitmap.%s", es);
+		NVolSetErrors(vol);
+	}
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		up_write(&vol->mftbmp_lock);
+err_out:
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT)
+		mutex_unlock(&mft_ni->mrec_lock);
+	memalloc_nofs_restore(memalloc_flags);
+	return err;
+max_err_out:
+	ntfs_warning(vol->sb,
+		"Cannot allocate mft record because the maximum number of inodes (2^32) =
has already been reached.");
+	if (!base_ni || base_ni->mft_no !=3D FILE_MFT) {
+		up_write(&vol->mftbmp_lock);
+		mutex_unlock(&NTFS_I(vol->mft_ino)->mrec_lock);
+	}
+	memalloc_nofs_restore(memalloc_flags);
+	return -ENOSPC;
+}
+
+/**
+ * ntfs_mft_record_free - free an mft record on an ntfs volume
+ * @vol:	volume on which to free the mft record
+ * @ni:		open ntfs inode of the mft record to free
+ *
+ * Free the mft record of the open inode @ni on the mounted ntfs volume @v=
ol.
+ * Note that this function calls ntfs_inode_close() internally and hence y=
ou
+ * cannot use the pointer @ni any more after this function returns success.
+ *
+ * On success return 0 and on error return -1 with errno set to the error =
code.
+ */
+int ntfs_mft_record_free(struct ntfs_volume *vol, struct ntfs_inode *ni)
+{
+	u64 mft_no;
+	int err;
+	u16 seq_no;
+	__le16 old_seq_no;
+	struct mft_record *ni_mrec;
+	unsigned int memalloc_flags;
+	struct ntfs_inode *base_ni;
+
+	ntfs_debug("Entering for inode 0x%llx.\n", (long long)ni->mft_no);
+
+	if (!vol || !ni)
+		return -EINVAL;
+
+	ni_mrec =3D map_mft_record(ni);
+	if (IS_ERR(ni_mrec))
+		return -EIO;
+
+	/* Cache the mft reference for later. */
+	mft_no =3D ni->mft_no;
+
+	/* Mark the mft record as not in use. */
+	ni_mrec->flags &=3D ~MFT_RECORD_IN_USE;
+
+	/* Increment the sequence number, skipping zero, if it is not zero. */
+	old_seq_no =3D ni_mrec->sequence_number;
+	seq_no =3D le16_to_cpu(old_seq_no);
+	if (seq_no =3D=3D 0xffff)
+		seq_no =3D 1;
+	else if (seq_no)
+		seq_no++;
+	ni_mrec->sequence_number =3D cpu_to_le16(seq_no);
+
+	/*
+	 * Set the ntfs inode dirty and write it out.  We do not need to worry
+	 * about the base inode here since whatever caused the extent mft
+	 * record to be freed is guaranteed to do it already.
+	 */
+	NInoSetDirty(ni);
+	err =3D write_mft_record(ni, ni_mrec, 0);
+	if (err)
+		goto sync_rollback;
+
+	if (likely(ni->nr_extents >=3D 0))
+		base_ni =3D ni;
+	else
+		base_ni =3D ni->ext.base_ntfs_ino;
+
+	/* Clear the bit in the $MFT/$BITMAP corresponding to this record. */
+	memalloc_flags =3D memalloc_nofs_save();
+	if (base_ni->mft_no !=3D FILE_MFT)
+		down_write(&vol->mftbmp_lock);
+	err =3D ntfs_bitmap_clear_bit(vol->mftbmp_ino, mft_no);
+	if (base_ni->mft_no !=3D FILE_MFT)
+		up_write(&vol->mftbmp_lock);
+	memalloc_nofs_restore(memalloc_flags);
+	if (err)
+		goto bitmap_rollback;
+
+	unmap_mft_record(ni);
+	ntfs_inc_free_mft_records(vol, 1);
+	return 0;
+
+	/* Rollback what we did... */
+bitmap_rollback:
+	memalloc_flags =3D memalloc_nofs_save();
+	if (base_ni->mft_no !=3D FILE_MFT)
+		down_write(&vol->mftbmp_lock);
+	if (ntfs_bitmap_set_bit(vol->mftbmp_ino, mft_no))
+		ntfs_error(vol->sb, "ntfs_bitmap_set_bit failed in bitmap_rollback\n");
+	if (base_ni->mft_no !=3D FILE_MFT)
+		up_write(&vol->mftbmp_lock);
+	memalloc_nofs_restore(memalloc_flags);
+sync_rollback:
+	ntfs_error(vol->sb,
+		"Eeek! Rollback failed in %s. Leaving inconsistent metadata!\n", __func_=
_);
+	ni_mrec->flags |=3D MFT_RECORD_IN_USE;
+	ni_mrec->sequence_number =3D old_seq_no;
+	NInoSetDirty(ni);
+	write_mft_record(ni, ni_mrec, 0);
+	unmap_mft_record(ni);
+	return err;
+}
diff --git a/fs/ntfsplus/mst.c b/fs/ntfsplus/mst.c
new file mode 100644
index 000000000000..e88f52831cb8
--- /dev/null
+++ b/fs/ntfsplus/mst.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS multi sector transfer protection handling code.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ */
+
+#include <linux/ratelimit.h>
+
+#include "ntfs.h"
+
+/**
+ * post_read_mst_fixup - deprotect multi sector transfer protected data
+ * @b:		pointer to the data to deprotect
+ * @size:	size in bytes of @b
+ *
+ * Perform the necessary post read multi sector transfer fixup and detect =
the
+ * presence of incomplete multi sector transfers. - In that case, overwrit=
e the
+ * magic of the ntfs record header being processed with "BAAD" (in memory =
only!)
+ * and abort processing.
+ *
+ * Return 0 on success and -EINVAL on error ("BAAD" magic will be present).
+ *
+ * NOTE: We consider the absence / invalidity of an update sequence array =
to
+ * mean that the structure is not protected at all and hence doesn't need =
to
+ * be fixed up. Thus, we return success and not failure in this case. This=
 is
+ * in contrast to pre_write_mst_fixup(), see below.
+ */
+int post_read_mst_fixup(struct ntfs_record *b, const u32 size)
+{
+	u16 usa_ofs, usa_count, usn;
+	u16 *usa_pos, *data_pos;
+
+	/* Setup the variables. */
+	usa_ofs =3D le16_to_cpu(b->usa_ofs);
+	/* Decrement usa_count to get number of fixups. */
+	usa_count =3D le16_to_cpu(b->usa_count) - 1;
+	/* Size and alignment checks. */
+	if (size & (NTFS_BLOCK_SIZE - 1) || usa_ofs & 1	||
+	    usa_ofs + (usa_count * 2) > size ||
+	    (size >> NTFS_BLOCK_SIZE_BITS) !=3D usa_count)
+		return 0;
+	/* Position of usn in update sequence array. */
+	usa_pos =3D (u16 *)b + usa_ofs/sizeof(u16);
+	/*
+	 * The update sequence number which has to be equal to each of the
+	 * u16 values before they are fixed up. Note no need to care for
+	 * endianness since we are comparing and moving data for on disk
+	 * structures which means the data is consistent. - If it is
+	 * consistenty the wrong endianness it doesn't make any difference.
+	 */
+	usn =3D *usa_pos;
+	/*
+	 * Position in protected data of first u16 that needs fixing up.
+	 */
+	data_pos =3D (u16 *)b + NTFS_BLOCK_SIZE / sizeof(u16) - 1;
+	/*
+	 * Check for incomplete multi sector transfer(s).
+	 */
+	while (usa_count--) {
+		if (*data_pos !=3D usn) {
+			struct mft_record *m =3D (struct mft_record *)b;
+
+			pr_err_ratelimited("ntfs: Incomplete multi sector transfer detected! (R=
ecord magic : 0x%x, mft number : 0x%x, base mft number : 0x%lx, mft in use =
: %d, data : 0x%x, usn 0x%x)\n",
+					le32_to_cpu(m->magic), le32_to_cpu(m->mft_record_number),
+					MREF_LE(m->base_mft_record), m->flags & MFT_RECORD_IN_USE,
+					*data_pos, usn);
+			/*
+			 * Incomplete multi sector transfer detected! )-:
+			 * Set the magic to "BAAD" and return failure.
+			 * Note that magic_BAAD is already converted to le32.
+			 */
+			b->magic =3D magic_BAAD;
+			return -EINVAL;
+		}
+		data_pos +=3D NTFS_BLOCK_SIZE / sizeof(u16);
+	}
+	/* Re-setup the variables. */
+	usa_count =3D le16_to_cpu(b->usa_count) - 1;
+	data_pos =3D (u16 *)b + NTFS_BLOCK_SIZE / sizeof(u16) - 1;
+	/* Fixup all sectors. */
+	while (usa_count--) {
+		/*
+		 * Increment position in usa and restore original data from
+		 * the usa into the data buffer.
+		 */
+		*data_pos =3D *(++usa_pos);
+		/* Increment position in data as well. */
+		data_pos +=3D NTFS_BLOCK_SIZE/sizeof(u16);
+	}
+	return 0;
+}
+
+/**
+ * pre_write_mst_fixup - apply multi sector transfer protection
+ * @b:		pointer to the data to protect
+ * @size:	size in bytes of @b
+ *
+ * Perform the necessary pre write multi sector transfer fixup on the data
+ * pointer to by @b of @size.
+ *
+ * Return 0 if fixup applied (success) or -EINVAL if no fixup was performed
+ * (assumed not needed). This is in contrast to post_read_mst_fixup() abov=
e.
+ *
+ * NOTE: We consider the absence / invalidity of an update sequence array =
to
+ * mean that the structure is not subject to protection and hence doesn't =
need
+ * to be fixed up. This means that you have to create a valid update seque=
nce
+ * array header in the ntfs record before calling this function, otherwise=
 it
+ * will fail (the header needs to contain the position of the update seque=
nce
+ * array together with the number of elements in the array). You also need=
 to
+ * initialise the update sequence number before calling this function
+ * otherwise a random word will be used (whatever was in the record at that
+ * position at that time).
+ */
+int pre_write_mst_fixup(struct ntfs_record *b, const u32 size)
+{
+	__le16 *usa_pos, *data_pos;
+	u16 usa_ofs, usa_count, usn;
+	__le16 le_usn;
+
+	/* Sanity check + only fixup if it makes sense. */
+	if (!b || ntfs_is_baad_record(b->magic) ||
+	    ntfs_is_hole_record(b->magic))
+		return -EINVAL;
+	/* Setup the variables. */
+	usa_ofs =3D le16_to_cpu(b->usa_ofs);
+	/* Decrement usa_count to get number of fixups. */
+	usa_count =3D le16_to_cpu(b->usa_count) - 1;
+	/* Size and alignment checks. */
+	if (size & (NTFS_BLOCK_SIZE - 1) || usa_ofs & 1	||
+	    usa_ofs + (usa_count * 2) > size ||
+	    (size >> NTFS_BLOCK_SIZE_BITS) !=3D usa_count)
+		return -EINVAL;
+	/* Position of usn in update sequence array. */
+	usa_pos =3D (__le16 *)((u8 *)b + usa_ofs);
+	/*
+	 * Cyclically increment the update sequence number
+	 * (skipping 0 and -1, i.e. 0xffff).
+	 */
+	usn =3D le16_to_cpup(usa_pos) + 1;
+	if (usn =3D=3D 0xffff || !usn)
+		usn =3D 1;
+	le_usn =3D cpu_to_le16(usn);
+	*usa_pos =3D le_usn;
+	/* Position in data of first u16 that needs fixing up. */
+	data_pos =3D (__le16 *)b + NTFS_BLOCK_SIZE/sizeof(__le16) - 1;
+	/* Fixup all sectors. */
+	while (usa_count--) {
+		/*
+		 * Increment the position in the usa and save the
+		 * original data from the data buffer into the usa.
+		 */
+		*(++usa_pos) =3D *data_pos;
+		/* Apply fixup to data. */
+		*data_pos =3D le_usn;
+		/* Increment position in data as well. */
+		data_pos +=3D NTFS_BLOCK_SIZE / sizeof(__le16);
+	}
+	return 0;
+}
+
+/**
+ * post_write_mst_fixup - fast deprotect multi sector transfer protected d=
ata
+ * @b:		pointer to the data to deprotect
+ *
+ * Perform the necessary post write multi sector transfer fixup, not check=
ing
+ * for any errors, because we assume we have just used pre_write_mst_fixup=
(),
+ * thus the data will be fine or we would never have gotten here.
+ */
+void post_write_mst_fixup(struct ntfs_record *b)
+{
+	__le16 *usa_pos, *data_pos;
+
+	u16 usa_ofs =3D le16_to_cpu(b->usa_ofs);
+	u16 usa_count =3D le16_to_cpu(b->usa_count) - 1;
+
+	/* Position of usn in update sequence array. */
+	usa_pos =3D (__le16 *)b + usa_ofs/sizeof(__le16);
+
+	/* Position in protected data of first u16 that needs fixing up. */
+	data_pos =3D (__le16 *)b + NTFS_BLOCK_SIZE/sizeof(__le16) - 1;
+
+	/* Fixup all sectors. */
+	while (usa_count--) {
+		/*
+		 * Increment position in usa and restore original data from
+		 * the usa into the data buffer.
+		 */
+		*data_pos =3D *(++usa_pos);
+
+		/* Increment position in data as well. */
+		data_pos +=3D NTFS_BLOCK_SIZE/sizeof(__le16);
+	}
+}
diff --git a/fs/ntfsplus/namei.c b/fs/ntfsplus/namei.c
new file mode 100644
index 000000000000..911f9139c3a2
--- /dev/null
+++ b/fs/ntfsplus/namei.c
@@ -0,0 +1,1677 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel directory inode operations.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2006 Anton Altaparmakov
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/exportfs.h>
+#include <linux/iversion.h>
+
+#include "ntfs.h"
+#include "misc.h"
+#include "index.h"
+#include "reparse.h"
+#include "ea.h"
+
+static const __le16 aux_name_le[3] =3D {
+	cpu_to_le16('A'), cpu_to_le16('U'), cpu_to_le16('X')
+};
+
+static const __le16 con_name_le[3] =3D {
+	cpu_to_le16('C'), cpu_to_le16('O'), cpu_to_le16('N')
+};
+
+static const __le16 com_name_le[3] =3D {
+	cpu_to_le16('C'), cpu_to_le16('O'), cpu_to_le16('M')
+};
+
+static const __le16 lpt_name_le[3] =3D {
+	cpu_to_le16('L'), cpu_to_le16('P'), cpu_to_le16('T')
+};
+
+static const __le16 nul_name_le[3] =3D {
+	cpu_to_le16('N'), cpu_to_le16('U'), cpu_to_le16('L')
+};
+
+static const __le16 prn_name_le[3] =3D {
+	cpu_to_le16('P'), cpu_to_le16('R'), cpu_to_le16('N')
+};
+
+static inline int ntfs_check_bad_char(const unsigned short *wc,
+		unsigned int wc_len)
+{
+	int i;
+
+	for (i =3D 0; i < wc_len; i++) {
+		if ((wc[i] < 0x0020) ||
+		    (wc[i] =3D=3D 0x0022) || (wc[i] =3D=3D 0x002A) || (wc[i] =3D=3D 0x00=
2F) ||
+		    (wc[i] =3D=3D 0x003A) || (wc[i] =3D=3D 0x003C) || (wc[i] =3D=3D 0x00=
3E) ||
+		    (wc[i] =3D=3D 0x003F) || (wc[i] =3D=3D 0x005C) || (wc[i] =3D=3D 0x00=
7C))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ntfs_check_bad_windows_name(struct ntfs_volume *vol,
+				       const unsigned short *wc,
+				       unsigned int wc_len)
+{
+	if (ntfs_check_bad_char(wc, wc_len))
+		return -EINVAL;
+
+	if (!NVolCheckWindowsNames(vol))
+		return 0;
+
+	/* Check for trailing space or dot. */
+	if (wc_len > 0 &&
+	    (wc[wc_len - 1] =3D=3D cpu_to_le16(' ') ||
+	    wc[wc_len - 1] =3D=3D cpu_to_le16('.')))
+		return -EINVAL;
+
+	if (wc_len =3D=3D 3 || (wc_len > 3 && wc[3] =3D=3D cpu_to_le16('.'))) {
+		__le16 *upcase =3D vol->upcase;
+		u32 size =3D vol->upcase_len;
+
+		if (ntfs_are_names_equal(wc, 3, aux_name_le, 3, IGNORE_CASE, upcase, siz=
e) ||
+		    ntfs_are_names_equal(wc, 3, con_name_le, 3, IGNORE_CASE, upcase, siz=
e) ||
+		    ntfs_are_names_equal(wc, 3, nul_name_le, 3, IGNORE_CASE, upcase, siz=
e) ||
+		    ntfs_are_names_equal(wc, 3, prn_name_le, 3, IGNORE_CASE, upcase, siz=
e))
+			return -EINVAL;
+	}
+
+	if (wc_len =3D=3D 4 || (wc_len > 4 && wc[4] =3D=3D cpu_to_le16('.'))) {
+		__le16 *upcase =3D vol->upcase;
+		u32 size =3D vol->upcase_len, port;
+
+		if (ntfs_are_names_equal(wc, 3, com_name_le, 3, IGNORE_CASE, upcase, siz=
e) ||
+		    ntfs_are_names_equal(wc, 3, lpt_name_le, 3, IGNORE_CASE, upcase, siz=
e)) {
+			port =3D le16_to_cpu(wc[3]);
+			if (port >=3D '1' && port <=3D '9')
+				return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+/**
+ * ntfs_lookup - find the inode represented by a dentry in a directory ino=
de
+ * @dir_ino:	directory inode in which to look for the inode
+ * @dent:	dentry representing the inode to look for
+ * @flags:	lookup flags
+ *
+ * In short, ntfs_lookup() looks for the inode represented by the dentry @=
dent
+ * in the directory inode @dir_ino and if found attaches the inode to the
+ * dentry @dent.
+ *
+ * In more detail, the dentry @dent specifies which inode to look for by
+ * supplying the name of the inode in @dent->d_name.name. ntfs_lookup()
+ * converts the name to Unicode and walks the contents of the directory in=
ode
+ * @dir_ino looking for the converted Unicode name. If the name is found i=
n the
+ * directory, the corresponding inode is loaded by calling ntfs_iget() on =
its
+ * inode number and the inode is associated with the dentry @dent via a ca=
ll to
+ * d_splice_alias().
+ *
+ * If the name is not found in the directory, a NULL inode is inserted int=
o the
+ * dentry @dent via a call to d_add(). The dentry is then termed a negative
+ * dentry.
+ *
+ * Only if an actual error occurs, do we return an error via ERR_PTR().
+ *
+ * In order to handle the case insensitivity issues of NTFS with regards t=
o the
+ * dcache and the dcache requiring only one dentry per directory, we deal =
with
+ * dentry aliases that only differ in case in ->ntfs_lookup() while mainta=
ining
+ * a case sensitive dcache. This means that we get the full benefit of dca=
che
+ * speed when the file/directory is looked up with the same case as return=
ed by
+ * ->ntfs_readdir() but that a lookup for any other case (or for the short=
 file
+ * name) will not find anything in dcache and will enter ->ntfs_lookup()
+ * instead, where we search the directory for a fully matching file name
+ * (including case) and if that is not found, we search for a file name th=
at
+ * matches with different case and if that has non-POSIX semantics we retu=
rn
+ * that. We actually do only one search (case sensitive) and keep tabs on
+ * whether we have found a case insensitive match in the process.
+ *
+ * To simplify matters for us, we do not treat the short vs long filenames=
 as
+ * two hard links but instead if the lookup matches a short filename, we
+ * return the dentry for the corresponding long filename instead.
+ *
+ * There are three cases we need to distinguish here:
+ *
+ * 1) @dent perfectly matches (i.e. including case) a directory entry with=
 a
+ *    file name in the WIN32 or POSIX namespaces. In this case
+ *    ntfs_lookup_inode_by_name() will return with name set to NULL and we
+ *    just d_splice_alias() @dent.
+ * 2) @dent matches (not including case) a directory entry with a file nam=
e in
+ *    the WIN32 namespace. In this case ntfs_lookup_inode_by_name() will r=
eturn
+ *    with name set to point to a kmalloc()ed ntfs_name structure containi=
ng
+ *    the properly cased little endian Unicode name. We convert the name t=
o the
+ *    current NLS code page, search if a dentry with this name already exi=
sts
+ *    and if so return that instead of @dent.  At this point things are
+ *    complicated by the possibility of 'disconnected' dentries due to NFS
+ *    which we deal with appropriately (see the code comments).  The VFS w=
ill
+ *    then destroy the old @dent and use the one we returned.  If a dentry=
 is
+ *    not found, we allocate a new one, d_splice_alias() it, and return it=
 as
+ *    above.
+ * 3) @dent matches either perfectly or not (i.e. we don't care about case=
) a
+ *    directory entry with a file name in the DOS namespace. In this case
+ *    ntfs_lookup_inode_by_name() will return with name set to point to a
+ *    kmalloc()ed ntfs_name structure containing the mft reference (cpu en=
dian)
+ *    of the inode. We use the mft reference to read the inode and to find=
 the
+ *    file name in the WIN32 namespace corresponding to the matched short =
file
+ *    name. We then convert the name to the current NLS code page, and pro=
ceed
+ *    searching for a dentry with this name, etc, as in case 2), above.
+ *
+ * Locking: Caller must hold i_mutex on the directory.
+ */
+static struct dentry *ntfs_lookup(struct inode *dir_ino, struct dentry *de=
nt,
+		unsigned int flags)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(dir_ino->i_sb);
+	struct inode *dent_inode;
+	__le16 *uname;
+	struct ntfs_name *name =3D NULL;
+	u64 mref;
+	unsigned long dent_ino;
+	int uname_len;
+
+	ntfs_debug("Looking up %pd in directory inode 0x%lx.",
+			dent, dir_ino->i_ino);
+	/* Convert the name of the dentry to Unicode. */
+	uname_len =3D ntfs_nlstoucs(vol, dent->d_name.name, dent->d_name.len,
+				  &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_debug("Failed to convert name to Unicode.");
+		return ERR_PTR(uname_len);
+	}
+	mutex_lock(&NTFS_I(dir_ino)->mrec_lock);
+	mref =3D ntfs_lookup_inode_by_name(NTFS_I(dir_ino), uname, uname_len,
+			&name);
+	mutex_unlock(&NTFS_I(dir_ino)->mrec_lock);
+	kmem_cache_free(ntfs_name_cache, uname);
+	if (!IS_ERR_MREF(mref)) {
+		dent_ino =3D MREF(mref);
+		ntfs_debug("Found inode 0x%lx. Calling ntfs_iget.", dent_ino);
+		dent_inode =3D ntfs_iget(vol->sb, dent_ino);
+		if (!IS_ERR(dent_inode)) {
+			/* Consistency check. */
+			if (MSEQNO(mref) =3D=3D NTFS_I(dent_inode)->seq_no ||
+			    dent_ino =3D=3D FILE_MFT) {
+				/* Perfect WIN32/POSIX match. -- Case 1. */
+				if (!name) {
+					ntfs_debug("Done.  (Case 1.)");
+					return d_splice_alias(dent_inode, dent);
+				}
+				/*
+				 * We are too indented.  Handle imperfect
+				 * matches and short file names further below.
+				 */
+				goto handle_name;
+			}
+			ntfs_error(vol->sb,
+				"Found stale reference to inode 0x%lx (reference sequence number =3D 0=
x%x, inode sequence number =3D 0x%x), returning -EIO. Run chkdsk.",
+				dent_ino, MSEQNO(mref),
+				NTFS_I(dent_inode)->seq_no);
+			iput(dent_inode);
+			dent_inode =3D ERR_PTR(-EIO);
+		} else
+			ntfs_error(vol->sb, "ntfs_iget(0x%lx) failed with error code %li.",
+					dent_ino, PTR_ERR(dent_inode));
+		kfree(name);
+		/* Return the error code. */
+		return ERR_CAST(dent_inode);
+	}
+	kfree(name);
+	/* It is guaranteed that @name is no longer allocated at this point. */
+	if (MREF_ERR(mref) =3D=3D -ENOENT) {
+		ntfs_debug("Entry was not found, adding negative dentry.");
+		/* The dcache will handle negative entries. */
+		d_add(dent, NULL);
+		ntfs_debug("Done.");
+		return NULL;
+	}
+	ntfs_error(vol->sb, "ntfs_lookup_ino_by_name() failed with error code %i.=
",
+			-MREF_ERR(mref));
+	return ERR_PTR(MREF_ERR(mref));
+handle_name:
+	{
+		struct mft_record *m;
+		struct ntfs_attr_search_ctx *ctx;
+		struct ntfs_inode *ni =3D NTFS_I(dent_inode);
+		int err;
+		struct qstr nls_name;
+
+		nls_name.name =3D NULL;
+		if (name->type !=3D FILE_NAME_DOS) {			/* Case 2. */
+			ntfs_debug("Case 2.");
+			nls_name.len =3D (unsigned int)ntfs_ucstonls(vol,
+					(__le16 *)&name->name, name->len,
+					(unsigned char **)&nls_name.name, 0);
+			kfree(name);
+		} else /* if (name->type =3D=3D FILE_NAME_DOS) */ {		/* Case 3. */
+			struct file_name_attr *fn;
+
+			ntfs_debug("Case 3.");
+			kfree(name);
+
+			/* Find the WIN32 name corresponding to the matched DOS name. */
+			ni =3D NTFS_I(dent_inode);
+			m =3D map_mft_record(ni);
+			if (IS_ERR(m)) {
+				err =3D PTR_ERR(m);
+				m =3D NULL;
+				ctx =3D NULL;
+				goto err_out;
+			}
+			ctx =3D ntfs_attr_get_search_ctx(ni, m);
+			if (unlikely(!ctx)) {
+				err =3D -ENOMEM;
+				goto err_out;
+			}
+			do {
+				struct attr_record *a;
+				u32 val_len;
+
+				err =3D ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0,
+						NULL, 0, ctx);
+				if (unlikely(err)) {
+					ntfs_error(vol->sb,
+						"Inode corrupt: No WIN32 namespace counterpart to DOS file name. Run=
 chkdsk.");
+					if (err =3D=3D -ENOENT)
+						err =3D -EIO;
+					goto err_out;
+				}
+				/* Consistency checks. */
+				a =3D ctx->attr;
+				if (a->non_resident || a->flags)
+					goto eio_err_out;
+				val_len =3D le32_to_cpu(a->data.resident.value_length);
+				if (le16_to_cpu(a->data.resident.value_offset) +
+						val_len > le32_to_cpu(a->length))
+					goto eio_err_out;
+				fn =3D (struct file_name_attr *)((u8 *)ctx->attr + le16_to_cpu(
+							ctx->attr->data.resident.value_offset));
+				if ((u32)(fn->file_name_length * sizeof(__le16) +
+							sizeof(struct file_name_attr)) > val_len)
+					goto eio_err_out;
+			} while (fn->file_name_type !=3D FILE_NAME_WIN32);
+
+			/* Convert the found WIN32 name to current NLS code page. */
+			nls_name.len =3D (unsigned int)ntfs_ucstonls(vol,
+					(__le16 *)&fn->file_name, fn->file_name_length,
+					(unsigned char **)&nls_name.name, 0);
+
+			ntfs_attr_put_search_ctx(ctx);
+			unmap_mft_record(ni);
+		}
+		m =3D NULL;
+		ctx =3D NULL;
+
+		/* Check if a conversion error occurred. */
+		if ((int)nls_name.len < 0) {
+			err =3D (int)nls_name.len;
+			goto err_out;
+		}
+		nls_name.hash =3D full_name_hash(dent, nls_name.name, nls_name.len);
+
+		dent =3D d_add_ci(dent, dent_inode, &nls_name);
+		kfree(nls_name.name);
+		return dent;
+
+eio_err_out:
+		ntfs_error(vol->sb, "Illegal file name attribute. Run chkdsk.");
+		err =3D -EIO;
+err_out:
+		if (ctx)
+			ntfs_attr_put_search_ctx(ctx);
+		if (m)
+			unmap_mft_record(ni);
+		iput(dent_inode);
+		ntfs_error(vol->sb, "Failed, returning error code %i.", err);
+		return ERR_PTR(err);
+	}
+}
+
+static int ntfs_sd_add_everyone(struct ntfs_inode *ni)
+{
+	struct security_descriptor_relative *sd;
+	struct ntfs_acl *acl;
+	struct ntfs_ace *ace;
+	struct ntfs_sid *sid;
+	int ret, sd_len;
+
+	/* Create SECURITY_DESCRIPTOR attribute (everyone has full access). */
+	/*
+	 * Calculate security descriptor length. We have 2 sub-authorities in
+	 * owner and group SIDs, So add 8 bytes to every SID.
+	 */
+	sd_len =3D sizeof(struct security_descriptor_relative) + 2 *
+		(sizeof(struct ntfs_sid) + 8) + sizeof(struct ntfs_acl) +
+		sizeof(struct ntfs_ace) + 4;
+	sd =3D ntfs_malloc_nofs(sd_len);
+	if (!sd)
+		return -1;
+
+	sd->revision =3D 1;
+	sd->control =3D SE_DACL_PRESENT | SE_SELF_RELATIVE;
+
+	sid =3D (struct ntfs_sid *)((u8 *)sd + sizeof(struct security_descriptor_=
relative));
+	sid->revision =3D 1;
+	sid->sub_authority_count =3D 2;
+	sid->sub_authority[0] =3D cpu_to_le32(SECURITY_BUILTIN_DOMAIN_RID);
+	sid->sub_authority[1] =3D cpu_to_le32(DOMAIN_ALIAS_RID_ADMINS);
+	sid->identifier_authority.value[5] =3D 5;
+	sd->owner =3D cpu_to_le32((u8 *)sid - (u8 *)sd);
+
+	sid =3D (struct ntfs_sid *)((u8 *)sid + sizeof(struct ntfs_sid) + 8);
+	sid->revision =3D 1;
+	sid->sub_authority_count =3D 2;
+	sid->sub_authority[0] =3D cpu_to_le32(SECURITY_BUILTIN_DOMAIN_RID);
+	sid->sub_authority[1] =3D cpu_to_le32(DOMAIN_ALIAS_RID_ADMINS);
+	sid->identifier_authority.value[5] =3D 5;
+	sd->group =3D cpu_to_le32((u8 *)sid - (u8 *)sd);
+
+	acl =3D (struct ntfs_acl *)((u8 *)sid + sizeof(struct ntfs_sid) + 8);
+	acl->revision =3D 2;
+	acl->size =3D cpu_to_le16(sizeof(struct ntfs_acl) + sizeof(struct ntfs_ac=
e) + 4);
+	acl->ace_count =3D cpu_to_le16(1);
+	sd->dacl =3D cpu_to_le32((u8 *)acl - (u8 *)sd);
+
+	ace =3D (struct ntfs_ace *)((u8 *)acl + sizeof(struct ntfs_acl));
+	ace->type =3D ACCESS_ALLOWED_ACE_TYPE;
+	ace->flags =3D OBJECT_INHERIT_ACE | CONTAINER_INHERIT_ACE;
+	ace->size =3D cpu_to_le16(sizeof(struct ntfs_ace) + 4);
+	ace->mask =3D cpu_to_le32(0x1f01ff);
+	ace->sid.revision =3D 1;
+	ace->sid.sub_authority_count =3D 1;
+	ace->sid.sub_authority[0] =3D 0;
+	ace->sid.identifier_authority.value[5] =3D 1;
+
+	ret =3D ntfs_attr_add(ni, AT_SECURITY_DESCRIPTOR, AT_UNNAMED, 0, (u8 *)sd,
+			sd_len);
+	if (ret)
+		ntfs_error(ni->vol->sb, "Failed to add SECURITY_DESCRIPTOR\n");
+
+	ntfs_free(sd);
+	return ret;
+}
+
+static struct ntfs_inode *__ntfs_create(struct mnt_idmap *idmap, struct in=
ode *dir,
+		__le16 *name, u8 name_len, mode_t mode, dev_t dev,
+		__le16 *target, int target_len)
+{
+	struct ntfs_inode *dir_ni =3D NTFS_I(dir);
+	struct ntfs_volume *vol =3D dir_ni->vol;
+	struct ntfs_inode *ni;
+	bool rollback_data =3D false, rollback_sd =3D false, rollback_reparse =3D=
 false;
+	struct file_name_attr *fn =3D NULL;
+	struct standard_information *si =3D NULL;
+	int err =3D 0, fn_len, si_len;
+	struct inode *vi;
+	struct mft_record *ni_mrec, *dni_mrec;
+	struct super_block *sb =3D dir_ni->vol->sb;
+	__le64 parent_mft_ref;
+	u64 child_mft_ref;
+	__le16 ea_size;
+
+	vi =3D new_inode(vol->sb);
+	if (!vi)
+		return ERR_PTR(-ENOMEM);
+
+	ntfs_init_big_inode(vi);
+	ni =3D NTFS_I(vi);
+	ni->vol =3D dir_ni->vol;
+	ni->name_len =3D 0;
+	ni->name =3D NULL;
+
+	/*
+	 * Set the appropriate mode, attribute type, and name.  For
+	 * directories, also setup the index values to the defaults.
+	 */
+	if (S_ISDIR(mode)) {
+		mode &=3D ~vol->dmask;
+
+		NInoSetMstProtected(ni);
+		ni->itype.index.block_size =3D 4096;
+		ni->itype.index.block_size_bits =3D ntfs_ffs(4096) - 1;
+		ni->itype.index.collation_rule =3D COLLATION_FILE_NAME;
+		if (vol->cluster_size <=3D ni->itype.index.block_size) {
+			ni->itype.index.vcn_size =3D vol->cluster_size;
+			ni->itype.index.vcn_size_bits =3D
+				vol->cluster_size_bits;
+		} else {
+			ni->itype.index.vcn_size =3D vol->sector_size;
+			ni->itype.index.vcn_size_bits =3D
+				vol->sector_size_bits;
+		}
+	} else {
+		mode &=3D ~vol->fmask;
+	}
+
+	if (IS_RDONLY(vi))
+		mode &=3D ~0222;
+
+	inode_init_owner(idmap, vi, dir, mode);
+
+	if (uid_valid(vol->uid))
+		vi->i_uid =3D vol->uid;
+
+	if (gid_valid(vol->gid))
+		vi->i_gid =3D vol->gid;
+
+	/*
+	 * Set the file size to 0, the ntfs inode sizes are set to 0 by
+	 * the call to ntfs_init_big_inode() below.
+	 */
+	vi->i_size =3D 0;
+	vi->i_blocks =3D 0;
+
+	inode_inc_iversion(vi);
+
+	simple_inode_init_ts(vi);
+	ni->i_crtime =3D inode_get_ctime(vi);
+
+	inode_set_mtime_to_ts(dir, ni->i_crtime);
+	inode_set_ctime_to_ts(dir, ni->i_crtime);
+	mark_inode_dirty(dir);
+
+	err =3D ntfs_mft_record_alloc(dir_ni->vol, mode, &ni, NULL,
+				    &ni_mrec);
+	if (err) {
+		iput(vi);
+		return ERR_PTR(err);
+	}
+
+	/*
+	 * Prevent iget and writeback from finding this inode.
+	 * Caller must call d_instantiate_new instead of d_instantiate.
+	 */
+	spin_lock(&vi->i_lock);
+	vi->i_state =3D I_NEW | I_CREATING;
+	spin_unlock(&vi->i_lock);
+
+	/* Add the inode to the inode hash for the superblock. */
+	vi->i_ino =3D ni->mft_no;
+	inode_set_iversion(vi, 1);
+	insert_inode_hash(vi);
+
+	mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+	mutex_lock_nested(&dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+	if (NInoBeingDeleted(dir_ni)) {
+		err =3D -ENOENT;
+		goto err_out;
+	}
+
+	dni_mrec =3D map_mft_record(dir_ni);
+	if (IS_ERR(dni_mrec)) {
+		ntfs_error(dir_ni->vol->sb, "failed to map mft record for file %ld.\n",
+			   dir_ni->mft_no);
+		err =3D -EIO;
+		goto err_out;
+	}
+	parent_mft_ref =3D MK_LE_MREF(dir_ni->mft_no,
+				    le16_to_cpu(dni_mrec->sequence_number));
+	unmap_mft_record(dir_ni);
+
+	/*
+	 * Create STANDARD_INFORMATION attribute. Write STANDARD_INFORMATION
+	 * version 1.2, windows will upgrade it to version 3 if needed.
+	 */
+	si_len =3D offsetof(struct standard_information, file_attributes) +
+		sizeof(__le32) + 12;
+	si =3D ntfs_malloc_nofs(si_len);
+	if (!si) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+	si->creation_time =3D si->last_data_change_time =3D utc2ntfs(ni->i_crtime=
);
+	si->last_mft_change_time =3D si->last_access_time =3D si->creation_time;
+
+	if (!S_ISREG(mode) && !S_ISDIR(mode))
+		si->file_attributes =3D FILE_ATTR_SYSTEM;
+
+	/* Add STANDARD_INFORMATION to inode. */
+	err =3D ntfs_attr_add(ni, AT_STANDARD_INFORMATION, AT_UNNAMED, 0, (u8 *)s=
i,
+			si_len);
+	if (err) {
+		ntfs_error(sb, "Failed to add STANDARD_INFORMATION attribute.\n");
+		goto err_out;
+	}
+
+	err =3D ntfs_sd_add_everyone(ni);
+	if (err)
+		goto err_out;
+	rollback_sd =3D true;
+
+	if (S_ISDIR(mode)) {
+		struct index_root *ir =3D NULL;
+		struct index_entry *ie;
+		int ir_len, index_len;
+
+		/* Create struct index_root attribute. */
+		index_len =3D sizeof(struct index_header) + sizeof(struct index_entry_he=
ader);
+		ir_len =3D offsetof(struct index_root, index) + index_len;
+		ir =3D ntfs_malloc_nofs(ir_len);
+		if (!ir) {
+			err =3D -ENOMEM;
+			goto err_out;
+		}
+		ir->type =3D AT_FILE_NAME;
+		ir->collation_rule =3D COLLATION_FILE_NAME;
+		ir->index_block_size =3D cpu_to_le32(ni->vol->index_record_size);
+		if (ni->vol->cluster_size <=3D ni->vol->index_record_size)
+			ir->clusters_per_index_block =3D
+				ni->vol->index_record_size >> ni->vol->cluster_size_bits;
+		else
+			ir->clusters_per_index_block =3D
+				ni->vol->index_record_size >> ni->vol->sector_size_bits;
+		ir->index.entries_offset =3D cpu_to_le32(sizeof(struct index_header));
+		ir->index.index_length =3D cpu_to_le32(index_len);
+		ir->index.allocated_size =3D cpu_to_le32(index_len);
+		ie =3D (struct index_entry *)((u8 *)ir + sizeof(struct index_root));
+		ie->length =3D cpu_to_le16(sizeof(struct index_entry_header));
+		ie->key_length =3D 0;
+		ie->flags =3D INDEX_ENTRY_END;
+
+		/* Add struct index_root attribute to inode. */
+		err =3D ntfs_attr_add(ni, AT_INDEX_ROOT, I30, 4, (u8 *)ir, ir_len);
+		if (err) {
+			ntfs_free(ir);
+			ntfs_error(vi->i_sb, "Failed to add struct index_root attribute.\n");
+			goto err_out;
+		}
+		ntfs_free(ir);
+		err =3D ntfs_attr_open(ni, AT_INDEX_ROOT, I30, 4);
+		if (err)
+			goto err_out;
+	} else {
+		/* Add DATA attribute to inode. */
+		err =3D ntfs_attr_add(ni, AT_DATA, AT_UNNAMED, 0, NULL, 0);
+		if (err) {
+			ntfs_error(dir_ni->vol->sb, "Failed to add DATA attribute.\n");
+			goto err_out;
+		}
+		rollback_data =3D true;
+
+		err =3D ntfs_attr_open(ni, AT_DATA, AT_UNNAMED, 0);
+		if (err)
+			goto err_out;
+
+		if (S_ISLNK(mode)) {
+			err =3D ntfs_reparse_set_wsl_symlink(ni, target, target_len);
+			if (!err)
+				rollback_reparse =3D true;
+		} else if (S_ISBLK(mode) || S_ISCHR(mode) || S_ISSOCK(mode) ||
+			   S_ISFIFO(mode)) {
+			si->file_attributes =3D FILE_ATTRIBUTE_RECALL_ON_OPEN;
+			ni->flags =3D FILE_ATTRIBUTE_RECALL_ON_OPEN;
+			err =3D ntfs_reparse_set_wsl_not_symlink(ni, mode);
+			if (!err)
+				rollback_reparse =3D true;
+		}
+		if (err)
+			goto err_out;
+	}
+
+	err =3D ntfs_ea_set_wsl_inode(vi, dev, &ea_size,
+			NTFS_EA_UID | NTFS_EA_GID | NTFS_EA_MODE);
+	if (err)
+		goto err_out;
+
+	/* Create FILE_NAME attribute. */
+	fn_len =3D sizeof(struct file_name_attr) + name_len * sizeof(__le16);
+	fn =3D ntfs_malloc_nofs(fn_len);
+	if (!fn) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+	fn->file_attributes |=3D ni->flags;
+	fn->parent_directory =3D parent_mft_ref;
+	fn->file_name_length =3D name_len;
+	fn->file_name_type =3D FILE_NAME_POSIX;
+	fn->type.ea.packed_ea_size =3D ea_size;
+	if (S_ISDIR(mode)) {
+		fn->file_attributes =3D FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT;
+		fn->allocated_size =3D fn->data_size =3D 0;
+	} else {
+		fn->data_size =3D cpu_to_le64(ni->data_size);
+		fn->allocated_size =3D cpu_to_le64(ni->allocated_size);
+	}
+	if (!S_ISREG(mode) && !S_ISDIR(mode))
+		fn->file_attributes |=3D FILE_ATTR_SYSTEM;
+	if (NVolHideDotFiles(vol) && (name_len > 0 && name[0] =3D=3D '.'))
+		fn->file_attributes |=3D FILE_ATTR_HIDDEN;
+	fn->creation_time =3D fn->last_data_change_time =3D utc2ntfs(ni->i_crtime=
);
+	fn->last_mft_change_time =3D fn->last_access_time =3D fn->creation_time;
+	memcpy(fn->file_name, name, name_len * sizeof(__le16));
+
+	/* Add FILE_NAME attribute to inode. */
+	err =3D ntfs_attr_add(ni, AT_FILE_NAME, AT_UNNAMED, 0, (u8 *)fn, fn_len);
+	if (err) {
+		ntfs_error(sb, "Failed to add FILE_NAME attribute.\n");
+		goto err_out;
+	}
+
+	child_mft_ref =3D MK_MREF(ni->mft_no,
+				le16_to_cpu(ni_mrec->sequence_number));
+	/* Set hard links count and directory flag. */
+	ni_mrec->link_count =3D cpu_to_le16(1);
+	mark_mft_record_dirty(ni);
+
+	/* Add FILE_NAME attribute to index. */
+	err =3D ntfs_index_add_filename(dir_ni, fn, child_mft_ref);
+	if (err) {
+		ntfs_debug("Failed to add entry to the index");
+		goto err_out;
+	}
+
+	unmap_mft_record(ni);
+	mutex_unlock(&dir_ni->mrec_lock);
+	mutex_unlock(&ni->mrec_lock);
+
+	ni->flags =3D fn->file_attributes;
+	/* Set the sequence number. */
+	vi->i_generation =3D ni->seq_no;
+	set_nlink(vi, 1);
+	ntfs_set_vfs_operations(vi, mode, dev);
+
+#ifdef CONFIG_NTFSPLUS_FS_POSIX_ACL
+	if (!S_ISLNK(mode) && (sb->s_flags & SB_POSIXACL)) {
+		err =3D ntfsp_init_acl(idmap, vi, dir);
+		if (err)
+			goto err_out;
+	} else
+#endif
+	{
+		vi->i_flags |=3D S_NOSEC;
+	}
+
+	/* Done! */
+	ntfs_free(fn);
+	ntfs_free(si);
+	ntfs_debug("Done.\n");
+	return ni;
+
+err_out:
+	if (rollback_sd)
+		ntfs_attr_remove(ni, AT_SECURITY_DESCRIPTOR, AT_UNNAMED, 0);
+
+	if (rollback_data)
+		ntfs_attr_remove(ni, AT_DATA, AT_UNNAMED, 0);
+
+	if (rollback_reparse)
+		ntfs_delete_reparse_index(ni);
+	/*
+	 * Free extent MFT records (should not exist any with current
+	 * ntfs_create implementation, but for any case if something will be
+	 * changed in the future).
+	 */
+	while (ni->nr_extents !=3D 0) {
+		int err2;
+
+		err2 =3D ntfs_mft_record_free(ni->vol, *(ni->ext.extent_ntfs_inos));
+		if (err2)
+			ntfs_error(sb,
+				"Failed to free extent MFT record. Leaving inconsistent metadata.\n");
+		ntfs_inode_close(*(ni->ext.extent_ntfs_inos));
+	}
+	if (ntfs_mft_record_free(ni->vol, ni))
+		ntfs_error(sb,
+			"Failed to free MFT record. Leaving inconsistent metadata. Run chkdsk.\=
n");
+	unmap_mft_record(ni);
+	ntfs_free(fn);
+	ntfs_free(si);
+
+	mutex_unlock(&dir_ni->mrec_lock);
+	mutex_unlock(&ni->mrec_lock);
+
+	remove_inode_hash(vi);
+	discard_new_inode(vi);
+	return ERR_PTR(err);
+}
+
+static int ntfs_create(struct mnt_idmap *idmap, struct inode *dir,
+		struct dentry *dentry, umode_t mode, bool excl)
+{
+	struct ntfs_volume *vol =3D NTFS_SB(dir->i_sb);
+	struct ntfs_inode *ni;
+	__le16 *uname;
+	int uname_len, err;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+				  &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(vol->sb, "Failed to convert name to unicode.");
+		return uname_len;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname, uname_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname);
+		return err;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	ni =3D __ntfs_create(idmap, dir, uname, uname_len, S_IFREG | mode, 0, NUL=
L, 0);
+	kmem_cache_free(ntfs_name_cache, uname);
+	if (IS_ERR(ni))
+		return PTR_ERR(ni);
+
+	d_instantiate_new(dentry, VFS_I(ni));
+
+	return 0;
+}
+
+static int ntfs_check_unlinkable_dir(struct ntfs_attr_search_ctx *ctx, str=
uct file_name_attr *fn)
+{
+	int link_count;
+	int ret;
+	struct ntfs_inode *ni =3D ctx->base_ntfs_ino ? ctx->base_ntfs_ino : ctx->=
ntfs_ino;
+	struct mft_record *ni_mrec =3D ctx->base_mrec ? ctx->base_mrec : ctx->mre=
c;
+
+	ret =3D ntfs_check_empty_dir(ni, ni_mrec);
+	if (!ret || ret !=3D -ENOTEMPTY)
+		return ret;
+
+	link_count =3D le16_to_cpu(ni_mrec->link_count);
+	/*
+	 * Directory is non-empty, so we can unlink only if there is more than
+	 * one "real" hard link, i.e. links aren't different DOS and WIN32 names
+	 */
+	if ((link_count =3D=3D 1) ||
+	    (link_count =3D=3D 2 && fn->file_name_type =3D=3D FILE_NAME_DOS)) {
+		ret =3D -ENOTEMPTY;
+		ntfs_debug("Non-empty directory without hard links\n");
+		goto no_hardlink;
+	}
+
+	ret =3D 0;
+no_hardlink:
+	return ret;
+}
+
+static int ntfs_test_inode_attr(struct inode *vi, void *data)
+{
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	unsigned long mft_no =3D (unsigned long)data;
+
+	if (ni->mft_no !=3D mft_no)
+		return 0;
+	if (NInoAttr(ni) || ni->nr_extents =3D=3D -1)
+		return 1;
+	else
+		return 0;
+}
+
+/**
+ * ntfs_delete - delete file or directory from ntfs volume
+ * @ni:         ntfs inode for object to delte
+ * @dir_ni:     ntfs inode for directory in which delete object
+ * @name:       unicode name of the object to delete
+ * @name_len:   length of the name in unicode characters
+ * @need_lock:  whether mrec lock is needed or not
+ *
+ * @ni is always closed after the call to this function (even if it failed=
),
+ * user does not need to call ntfs_inode_close himself.
+ */
+static int ntfs_delete(struct ntfs_inode *ni, struct ntfs_inode *dir_ni,
+		__le16 *name, u8 name_len, bool need_lock)
+{
+	struct ntfs_attr_search_ctx *actx =3D NULL;
+	struct file_name_attr *fn =3D NULL;
+	bool looking_for_dos_name =3D false, looking_for_win32_name =3D false;
+	bool case_sensitive_match =3D true;
+	int err =3D 0;
+	struct mft_record *ni_mrec;
+	struct super_block *sb;
+	bool link_count_zero =3D false;
+
+	ntfs_debug("Entering.\n");
+
+	if (need_lock =3D=3D true) {
+		mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+		mutex_lock_nested(&dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+	}
+
+	sb =3D dir_ni->vol->sb;
+
+	if (ni->nr_extents =3D=3D -1)
+		ni =3D ni->ext.base_ntfs_ino;
+	if (dir_ni->nr_extents =3D=3D -1)
+		dir_ni =3D dir_ni->ext.base_ntfs_ino;
+	/*
+	 * Search for FILE_NAME attribute with such name. If it's in POSIX or
+	 * WIN32_AND_DOS namespace, then simply remove it from index and inode.
+	 * If filename in DOS or in WIN32 namespace, then remove DOS name first,
+	 * only then remove WIN32 name.
+	 */
+	actx =3D ntfs_attr_get_search_ctx(ni, NULL);
+	if (!actx) {
+		ntfs_error(sb, "%s, Failed to get search context", __func__);
+		mutex_unlock(&dir_ni->mrec_lock);
+		mutex_unlock(&ni->mrec_lock);
+		return -ENOMEM;
+	}
+search:
+	while ((err =3D ntfs_attr_lookup(AT_FILE_NAME, AT_UNNAMED, 0, CASE_SENSIT=
IVE,
+				0, NULL, 0, actx)) =3D=3D 0) {
+#ifdef DEBUG
+		unsigned char *s;
+#endif
+		bool case_sensitive =3D IGNORE_CASE;
+
+		fn =3D (struct file_name_attr *)((u8 *)actx->attr +
+				le16_to_cpu(actx->attr->data.resident.value_offset));
+#ifdef DEBUG
+		s =3D ntfs_attr_name_get(ni->vol, fn->file_name, fn->file_name_length);
+		ntfs_debug("name: '%s'  type: %d  dos: %d  win32: %d case: %d\n",
+				s, fn->file_name_type,
+				looking_for_dos_name, looking_for_win32_name,
+				case_sensitive_match);
+		ntfs_attr_name_free(&s);
+#endif
+		if (looking_for_dos_name) {
+			if (fn->file_name_type =3D=3D FILE_NAME_DOS)
+				break;
+			continue;
+		}
+		if (looking_for_win32_name) {
+			if  (fn->file_name_type =3D=3D FILE_NAME_WIN32)
+				break;
+			continue;
+		}
+
+		/* Ignore hard links from other directories */
+		if (dir_ni->mft_no !=3D MREF_LE(fn->parent_directory)) {
+			ntfs_debug("MFT record numbers don't match (%lu !=3D %lu)\n",
+					dir_ni->mft_no,
+					MREF_LE(fn->parent_directory));
+			continue;
+		}
+
+		if (fn->file_name_type =3D=3D FILE_NAME_POSIX || case_sensitive_match)
+			case_sensitive =3D CASE_SENSITIVE;
+
+		if (ntfs_names_are_equal(fn->file_name, fn->file_name_length,
+					name, name_len, case_sensitive,
+					ni->vol->upcase, ni->vol->upcase_len)) {
+			if (fn->file_name_type =3D=3D FILE_NAME_WIN32) {
+				looking_for_dos_name =3D true;
+				ntfs_attr_reinit_search_ctx(actx);
+				continue;
+			}
+			if (fn->file_name_type =3D=3D FILE_NAME_DOS)
+				looking_for_dos_name =3D true;
+			break;
+		}
+	}
+	if (err) {
+		/*
+		 * If case sensitive search failed, then try once again
+		 * ignoring case.
+		 */
+		if (err =3D=3D -ENOENT && case_sensitive_match) {
+			case_sensitive_match =3D false;
+			ntfs_attr_reinit_search_ctx(actx);
+			goto search;
+		}
+		goto err_out;
+	}
+
+	err =3D ntfs_check_unlinkable_dir(actx, fn);
+	if (err)
+		goto err_out;
+
+	err =3D ntfs_index_remove(dir_ni, fn, le32_to_cpu(actx->attr->data.reside=
nt.value_length));
+	if (err)
+		goto err_out;
+
+	err =3D ntfs_attr_record_rm(actx);
+	if (err)
+		goto err_out;
+
+	ni_mrec =3D actx->base_mrec ? actx->base_mrec : actx->mrec;
+	ni_mrec->link_count =3D cpu_to_le16(le16_to_cpu(ni_mrec->link_count) - 1);
+	drop_nlink(VFS_I(ni));
+
+	mark_mft_record_dirty(ni);
+	if (looking_for_dos_name) {
+		looking_for_dos_name =3D false;
+		looking_for_win32_name =3D true;
+		ntfs_attr_reinit_search_ctx(actx);
+		goto search;
+	}
+
+	/*
+	 * If hard link count is not equal to zero then we are done. In other
+	 * case there are no reference to this inode left, so we should free all
+	 * non-resident attributes and mark all MFT record as not in use.
+	 */
+	if (ni_mrec->link_count =3D=3D 0) {
+		NInoSetBeingDeleted(ni);
+		ntfs_delete_reparse_index(ni);
+		link_count_zero =3D true;
+	}
+
+	ntfs_attr_put_search_ctx(actx);
+	if (need_lock =3D=3D true) {
+		mutex_unlock(&dir_ni->mrec_lock);
+		mutex_unlock(&ni->mrec_lock);
+	}
+
+	/*
+	 * If hard link count is not equal to zero then we are done. In other
+	 * case there are no reference to this inode left, so we should free all
+	 * non-resident attributes and mark all MFT record as not in use.
+	 */
+	if (link_count_zero =3D=3D true) {
+		struct inode *attr_vi;
+
+		while ((attr_vi =3D ilookup5(sb, ni->mft_no, ntfs_test_inode_attr,
+					   (void *)ni->mft_no)) !=3D NULL) {
+			clear_nlink(attr_vi);
+			iput(attr_vi);
+		}
+	}
+	ntfs_debug("Done.\n");
+	return 0;
+err_out:
+	ntfs_attr_put_search_ctx(actx);
+	mutex_unlock(&dir_ni->mrec_lock);
+	mutex_unlock(&ni->mrec_lock);
+	return err;
+}
+
+static int ntfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+	struct inode *vi =3D dentry->d_inode;
+	struct super_block *sb =3D dir->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	int err =3D 0;
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	__le16 *uname =3D NULL;
+	int uname_len;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+				  &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to Unicode.");
+		return -ENOMEM;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname, uname_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname);
+		return err;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	err =3D ntfs_delete(ni, NTFS_I(dir), uname, uname_len, true);
+	if (err)
+		goto out;
+
+	inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
+	mark_inode_dirty(dir);
+	inode_set_ctime_to_ts(vi, inode_get_ctime(dir));
+	if (vi->i_nlink)
+		mark_inode_dirty(vi);
+out:
+	kmem_cache_free(ntfs_name_cache, uname);
+	return err;
+}
+
+static struct dentry *ntfs_mkdir(struct mnt_idmap *idmap, struct inode *di=
r,
+		struct dentry *dentry, umode_t mode)
+{
+	struct super_block *sb =3D dir->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	int err =3D 0;
+	struct ntfs_inode *ni;
+	__le16 *uname;
+	int uname_len;
+
+	if (NVolShutdown(vol))
+		return ERR_PTR(-EIO);
+
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+				  &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to unicode.");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname, uname_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname);
+		return ERR_PTR(err);
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	ni =3D __ntfs_create(idmap, dir, uname, uname_len, S_IFDIR | mode, 0, NUL=
L, 0);
+	kmem_cache_free(ntfs_name_cache, uname);
+	if (IS_ERR(ni)) {
+		err =3D PTR_ERR(ni);
+		return ERR_PTR(err);
+	}
+
+	d_instantiate_new(dentry, VFS_I(ni));
+	return ERR_PTR(err);
+}
+
+static int ntfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+	struct inode *vi =3D dentry->d_inode;
+	struct super_block *sb =3D dir->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	int err =3D 0;
+	struct ntfs_inode *ni;
+	__le16 *uname =3D NULL;
+	int uname_len;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	ni =3D NTFS_I(vi);
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+				  &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to unicode.");
+		return -ENOMEM;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname, uname_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname);
+		return err;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	err =3D ntfs_delete(ni, NTFS_I(dir), uname, uname_len, true);
+	if (err)
+		goto out;
+
+	inode_set_mtime_to_ts(vi, inode_set_atime_to_ts(vi, current_time(vi)));
+out:
+	kmem_cache_free(ntfs_name_cache, uname);
+	return err;
+}
+
+/**
+ * __ntfs_link - create hard link for file or directory
+ * @ni:		ntfs inode for object to create hard link
+ * @dir_ni:	ntfs inode for directory in which new link should be placed
+ * @name:	unicode name of the new link
+ * @name_len:	length of the name in unicode characters
+ *
+ * NOTE: At present we allow creating hardlinks to directories, we use them
+ * in a temporary state during rename. But it's defenitely bad idea to have
+ * hard links to directories as a result of operation.
+ */
+static int __ntfs_link(struct ntfs_inode *ni, struct ntfs_inode *dir_ni,
+		__le16 *name, u8 name_len)
+{
+	struct super_block *sb;
+	struct inode *vi =3D VFS_I(ni);
+	struct file_name_attr *fn =3D NULL;
+	int fn_len, err =3D 0;
+	struct mft_record *dir_mrec =3D NULL, *ni_mrec =3D NULL;
+
+	ntfs_debug("Entering.\n");
+
+	sb =3D dir_ni->vol->sb;
+	if (NInoBeingDeleted(dir_ni) || NInoBeingDeleted(ni))
+		return -ENOENT;
+
+	ni_mrec =3D map_mft_record(ni);
+	if (IS_ERR(ni_mrec)) {
+		err =3D -EIO;
+		goto err_out;
+	}
+
+	if (le16_to_cpu(ni_mrec->link_count) =3D=3D 0) {
+		err =3D -ENOENT;
+		goto err_out;
+	}
+
+	/* Create FILE_NAME attribute. */
+	fn_len =3D sizeof(struct file_name_attr) + name_len * sizeof(__le16);
+	fn =3D ntfs_malloc_nofs(fn_len);
+	if (!fn) {
+		err =3D -ENOMEM;
+		goto err_out;
+	}
+
+	dir_mrec =3D map_mft_record(dir_ni);
+	if (IS_ERR(dir_mrec)) {
+		err =3D -EIO;
+		goto err_out;
+	}
+
+	fn->parent_directory =3D MK_LE_MREF(dir_ni->mft_no,
+			le16_to_cpu(dir_mrec->sequence_number));
+	unmap_mft_record(dir_ni);
+	fn->file_name_length =3D name_len;
+	fn->file_name_type =3D FILE_NAME_POSIX;
+	fn->file_attributes =3D ni->flags;
+	if (ni_mrec->flags & MFT_RECORD_IS_DIRECTORY) {
+		fn->file_attributes |=3D FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT;
+		fn->allocated_size =3D fn->data_size =3D 0;
+	} else {
+		if (NInoSparse(ni) || NInoCompressed(ni))
+			fn->allocated_size =3D
+				cpu_to_le64(ni->itype.compressed.size);
+		else
+			fn->allocated_size =3D cpu_to_le64(ni->allocated_size);
+		fn->data_size =3D cpu_to_le64(ni->data_size);
+	}
+	if (NVolHideDotFiles(dir_ni->vol) && (name_len > 0 && name[0] =3D=3D '.'))
+		fn->file_attributes |=3D FILE_ATTR_HIDDEN;
+
+	fn->creation_time =3D utc2ntfs(ni->i_crtime);
+	fn->last_data_change_time =3D utc2ntfs(inode_get_mtime(vi));
+	fn->last_mft_change_time =3D utc2ntfs(inode_get_ctime(vi));
+	fn->last_access_time =3D utc2ntfs(inode_get_atime(vi));
+	memcpy(fn->file_name, name, name_len * sizeof(__le16));
+
+	/* Add FILE_NAME attribute to index. */
+	err =3D ntfs_index_add_filename(dir_ni, fn, MK_MREF(ni->mft_no,
+					le16_to_cpu(ni_mrec->sequence_number)));
+	if (err) {
+		ntfs_error(sb, "Failed to add filename to the index");
+		goto err_out;
+	}
+	/* Add FILE_NAME attribute to inode. */
+	err =3D ntfs_attr_add(ni, AT_FILE_NAME, AT_UNNAMED, 0, (u8 *)fn, fn_len);
+	if (err) {
+		ntfs_error(sb, "Failed to add FILE_NAME attribute.\n");
+		/* Try to remove just added attribute from index. */
+		if (ntfs_index_remove(dir_ni, fn, fn_len))
+			goto rollback_failed;
+		goto err_out;
+	}
+	/* Increment hard links count. */
+	ni_mrec->link_count =3D cpu_to_le16(le16_to_cpu(ni_mrec->link_count) + 1);
+	inc_nlink(VFS_I(ni));
+
+	/* Done! */
+	mark_mft_record_dirty(ni);
+	ntfs_free(fn);
+	unmap_mft_record(ni);
+
+	ntfs_debug("Done.\n");
+
+	return 0;
+rollback_failed:
+	ntfs_error(sb, "Rollback failed. Leaving inconsistent metadata.\n");
+err_out:
+	ntfs_free(fn);
+	if (!IS_ERR_OR_NULL(ni_mrec))
+		unmap_mft_record(ni);
+	return err;
+}
+
+static int ntfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
+		struct dentry *old_dentry, struct inode *new_dir,
+		struct dentry *new_dentry, unsigned int flags)
+{
+	struct inode *old_inode, *new_inode =3D NULL;
+	int err =3D 0;
+	int is_dir;
+	struct super_block *sb =3D old_dir->i_sb;
+	__le16 *uname_new =3D NULL;
+	__le16 *uname_old =3D NULL;
+	int new_name_len;
+	int old_name_len;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	struct ntfs_inode *old_ni, *new_ni =3D NULL;
+	struct ntfs_inode *old_dir_ni =3D NTFS_I(old_dir), *new_dir_ni =3D NTFS_I=
(new_dir);
+
+	if (NVolShutdown(old_dir_ni->vol))
+		return -EIO;
+
+	if (flags & (RENAME_EXCHANGE | RENAME_WHITEOUT))
+		return -EINVAL;
+
+	new_name_len =3D ntfs_nlstoucs(NTFS_I(new_dir)->vol, new_dentry->d_name.n=
ame,
+				     new_dentry->d_name.len, &uname_new,
+				     NTFS_MAX_NAME_LEN);
+	if (new_name_len < 0) {
+		if (new_name_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to unicode.");
+		return -ENOMEM;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname_new, new_name_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname_new);
+		return err;
+	}
+
+	old_name_len =3D ntfs_nlstoucs(NTFS_I(old_dir)->vol, old_dentry->d_name.n=
ame,
+				     old_dentry->d_name.len, &uname_old,
+				     NTFS_MAX_NAME_LEN);
+	if (old_name_len < 0) {
+		kmem_cache_free(ntfs_name_cache, uname_new);
+		if (old_name_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to unicode.");
+		return -ENOMEM;
+	}
+
+	old_inode =3D old_dentry->d_inode;
+	new_inode =3D new_dentry->d_inode;
+	old_ni =3D NTFS_I(old_inode);
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	mutex_lock_nested(&old_ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+	mutex_lock_nested(&old_dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+
+	if (NInoBeingDeleted(old_ni) || NInoBeingDeleted(old_dir_ni)) {
+		err =3D -ENOENT;
+		goto unlock_old;
+	}
+
+	is_dir =3D S_ISDIR(old_inode->i_mode);
+
+	if (new_inode) {
+		new_ni =3D NTFS_I(new_inode);
+		mutex_lock_nested(&new_ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL_2);
+		if (old_dir !=3D new_dir) {
+			mutex_lock_nested(&new_dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT_2);
+			if (NInoBeingDeleted(new_dir_ni)) {
+				err =3D -ENOENT;
+				goto err_out;
+			}
+		}
+
+		if (NInoBeingDeleted(new_ni)) {
+			err =3D -ENOENT;
+			goto err_out;
+		}
+
+		if (is_dir) {
+			struct mft_record *ni_mrec;
+
+			ni_mrec =3D map_mft_record(NTFS_I(new_inode));
+			if (IS_ERR(ni_mrec)) {
+				err =3D -EIO;
+				goto err_out;
+			}
+			err =3D ntfs_check_empty_dir(NTFS_I(new_inode), ni_mrec);
+			unmap_mft_record(NTFS_I(new_inode));
+			if (err)
+				goto err_out;
+		}
+
+		err =3D ntfs_delete(new_ni, new_dir_ni, uname_new, new_name_len, false);
+		if (err)
+			goto err_out;
+	} else {
+		if (old_dir !=3D new_dir) {
+			mutex_lock_nested(&new_dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT_2);
+			if (NInoBeingDeleted(new_dir_ni)) {
+				err =3D -ENOENT;
+				goto err_out;
+			}
+		}
+	}
+
+	err =3D __ntfs_link(old_ni, new_dir_ni, uname_new, new_name_len);
+	if (err)
+		goto err_out;
+
+	err =3D ntfs_delete(old_ni, old_dir_ni, uname_old, old_name_len, false);
+	if (err) {
+		int err2;
+
+		ntfs_error(sb, "Failed to delete old ntfs inode(%ld) in old dir, err : %=
d\n",
+				old_ni->mft_no, err);
+		err2 =3D ntfs_delete(old_ni, new_dir_ni, uname_new, new_name_len, false);
+		if (err2)
+			ntfs_error(sb, "Failed to delete old ntfs inode in new dir, err : %d\n",
+					err2);
+		goto err_out;
+	}
+
+	simple_rename_timestamp(old_dir, old_dentry, new_dir, new_dentry);
+	mark_inode_dirty(old_inode);
+	mark_inode_dirty(old_dir);
+	if (old_dir !=3D new_dir)
+		mark_inode_dirty(new_dir);
+	if (new_inode)
+		mark_inode_dirty(old_inode);
+
+	inode_inc_iversion(new_dir);
+
+err_out:
+	if (old_dir !=3D new_dir)
+		mutex_unlock(&new_dir_ni->mrec_lock);
+	if (new_inode)
+		mutex_unlock(&new_ni->mrec_lock);
+
+unlock_old:
+	mutex_unlock(&old_dir_ni->mrec_lock);
+	mutex_unlock(&old_ni->mrec_lock);
+	if (uname_new)
+		kmem_cache_free(ntfs_name_cache, uname_new);
+	if (uname_old)
+		kmem_cache_free(ntfs_name_cache, uname_old);
+
+	return err;
+}
+
+static int ntfs_symlink(struct mnt_idmap *idmap, struct inode *dir,
+		struct dentry *dentry, const char *symname)
+{
+	struct super_block *sb =3D dir->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	struct inode *vi;
+	int err =3D 0;
+	struct ntfs_inode *ni;
+	__le16 *usrc;
+	__le16 *utarget;
+	int usrc_len;
+	int utarget_len;
+	int symlen =3D strlen(symname);
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	usrc_len =3D ntfs_nlstoucs(vol, dentry->d_name.name,
+				 dentry->d_name.len, &usrc, NTFS_MAX_NAME_LEN);
+	if (usrc_len < 0) {
+		if (usrc_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to Unicode.");
+		err =3D  -ENOMEM;
+		goto out;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, usrc, usrc_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, usrc);
+		goto out;
+	}
+
+	utarget_len =3D ntfs_nlstoucs(vol, symname, symlen, &utarget,
+				    PATH_MAX);
+	if (utarget_len < 0) {
+		if (utarget_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert target name to Unicode.");
+		err =3D  -ENOMEM;
+		kmem_cache_free(ntfs_name_cache, usrc);
+		goto out;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	ni =3D __ntfs_create(idmap, dir, usrc, usrc_len, S_IFLNK | 0777, 0,
+			utarget, utarget_len);
+	kmem_cache_free(ntfs_name_cache, usrc);
+	kvfree(utarget);
+	if (IS_ERR(ni)) {
+		err =3D PTR_ERR(ni);
+		goto out;
+	}
+
+	vi =3D VFS_I(ni);
+	vi->i_size =3D symlen;
+	d_instantiate_new(dentry, vi);
+out:
+	return err;
+}
+
+static int ntfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
+		struct dentry *dentry, umode_t mode, dev_t rdev)
+{
+	struct super_block *sb =3D dir->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	int err =3D 0;
+	struct ntfs_inode *ni;
+	__le16 *uname =3D NULL;
+	int uname_len;
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name,
+			dentry->d_name.len, &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to Unicode.");
+		return -ENOMEM;
+	}
+
+	err =3D ntfs_check_bad_windows_name(vol, uname, uname_len);
+	if (err) {
+		kmem_cache_free(ntfs_name_cache, uname);
+		return err;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	switch (mode & S_IFMT) {
+	case S_IFCHR:
+	case S_IFBLK:
+		ni =3D __ntfs_create(idmap, dir, uname, uname_len,
+				mode, rdev, NULL, 0);
+		break;
+	default:
+		ni =3D __ntfs_create(idmap, dir, uname, uname_len,
+				mode, 0, NULL, 0);
+	}
+
+	kmem_cache_free(ntfs_name_cache, uname);
+	if (IS_ERR(ni)) {
+		err =3D PTR_ERR(ni);
+		goto out;
+	}
+
+	d_instantiate_new(dentry, VFS_I(ni));
+out:
+	return err;
+}
+
+static int ntfs_link(struct dentry *old_dentry, struct inode *dir,
+		struct dentry *dentry)
+{
+	struct inode *vi =3D old_dentry->d_inode;
+	struct super_block *sb =3D vi->i_sb;
+	struct ntfs_volume *vol =3D NTFS_SB(sb);
+	__le16 *uname =3D NULL;
+	int uname_len;
+	int err;
+	struct ntfs_inode *ni =3D NTFS_I(vi), *dir_ni =3D NTFS_I(dir);
+
+	if (NVolShutdown(vol))
+		return -EIO;
+
+	uname_len =3D ntfs_nlstoucs(vol, dentry->d_name.name,
+			dentry->d_name.len, &uname, NTFS_MAX_NAME_LEN);
+	if (uname_len < 0) {
+		if (uname_len !=3D -ENAMETOOLONG)
+			ntfs_error(sb, "Failed to convert name to unicode.");
+		err =3D -ENOMEM;
+		goto out;
+	}
+
+	if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+		ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+	ihold(vi);
+	mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+	mutex_lock_nested(&dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+	err =3D __ntfs_link(NTFS_I(vi), NTFS_I(dir), uname, uname_len);
+	if (err) {
+		mutex_unlock(&dir_ni->mrec_lock);
+		mutex_unlock(&ni->mrec_lock);
+		iput(vi);
+		pr_err("failed to create link, err =3D %d\n", err);
+		goto out;
+	}
+
+	inode_inc_iversion(dir);
+	simple_inode_init_ts(dir);
+
+	inode_inc_iversion(vi);
+	simple_inode_init_ts(vi);
+
+	/* timestamp is already written, so mark_inode_dirty() is unneeded. */
+	d_instantiate(dentry, vi);
+	mutex_unlock(&dir_ni->mrec_lock);
+	mutex_unlock(&ni->mrec_lock);
+
+out:
+	ntfs_free(uname);
+	return err;
+}
+
+/**
+ * Inode operations for directories.
+ */
+const struct inode_operations ntfs_dir_inode_ops =3D {
+	.lookup		=3D ntfs_lookup,	/* VFS: Lookup directory. */
+	.create		=3D ntfs_create,
+	.unlink		=3D ntfs_unlink,
+	.mkdir		=3D ntfs_mkdir,
+	.rmdir		=3D ntfs_rmdir,
+	.rename		=3D ntfs_rename,
+	.get_acl	=3D ntfsp_get_acl,
+	.set_acl	=3D ntfsp_set_acl,
+	.listxattr	=3D ntfsp_listxattr,
+	.setattr	=3D ntfsp_setattr,
+	.getattr	=3D ntfsp_getattr,
+	.symlink	=3D ntfs_symlink,
+	.mknod		=3D ntfs_mknod,
+	.link		=3D ntfs_link,
+};
+
+/**
+ * ntfs_get_parent - find the dentry of the parent of a given directory de=
ntry
+ * @child_dent:		dentry of the directory whose parent directory to find
+ *
+ * Find the dentry for the parent directory of the directory specified by =
the
+ * dentry @child_dent.  This function is called from
+ * fs/exportfs/expfs.c::find_exported_dentry() which in turn is called fro=
m the
+ * default ->decode_fh() which is export_decode_fh() in the same file.
+ *
+ * Note: ntfs_get_parent() is called with @d_inode(child_dent)->i_mutex do=
wn.
+ *
+ * Return the dentry of the parent directory on success or the error code =
on
+ * error (IS_ERR() is true).
+ */
+static struct dentry *ntfs_get_parent(struct dentry *child_dent)
+{
+	struct inode *vi =3D d_inode(child_dent);
+	struct ntfs_inode *ni =3D NTFS_I(vi);
+	struct mft_record *mrec;
+	struct ntfs_attr_search_ctx *ctx;
+	struct attr_record *attr;
+	struct file_name_attr *fn;
+	unsigned long parent_ino;
+	int err;
+
+	ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
+	/* Get the mft record of the inode belonging to the child dentry. */
+	mrec =3D map_mft_record(ni);
+	if (IS_ERR(mrec))
+		return ERR_CAST(mrec);
+	/* Find the first file name attribute in the mft record. */
+	ctx =3D ntfs_attr_get_search_ctx(ni, mrec);
+	if (unlikely(!ctx)) {
+		unmap_mft_record(ni);
+		return ERR_PTR(-ENOMEM);
+	}
+try_next:
+	err =3D ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, CASE_SENSITIVE, 0, NULL,
+			0, ctx);
+	if (unlikely(err)) {
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(ni);
+		if (err =3D=3D -ENOENT)
+			ntfs_error(vi->i_sb,
+				   "Inode 0x%lx does not have a file name attribute.  Run chkdsk.",
+				   vi->i_ino);
+		return ERR_PTR(err);
+	}
+	attr =3D ctx->attr;
+	if (unlikely(attr->non_resident))
+		goto try_next;
+	fn =3D (struct file_name_attr *)((u8 *)attr +
+			le16_to_cpu(attr->data.resident.value_offset));
+	if (unlikely((u8 *)fn + le32_to_cpu(attr->data.resident.value_length) >
+	    (u8 *)attr + le32_to_cpu(attr->length)))
+		goto try_next;
+	/* Get the inode number of the parent directory. */
+	parent_ino =3D MREF_LE(fn->parent_directory);
+	/* Release the search context and the mft record of the child. */
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(ni);
+
+	return d_obtain_alias(ntfs_iget(vi->i_sb, parent_ino));
+}
+
+static struct inode *ntfs_nfs_get_inode(struct super_block *sb,
+		u64 ino, u32 generation)
+{
+	struct inode *inode;
+
+	inode =3D ntfs_iget(sb, ino);
+	if (!IS_ERR(inode)) {
+		if (inode->i_generation !=3D generation) {
+			iput(inode);
+			inode =3D ERR_PTR(-ESTALE);
+		}
+	}
+
+	return inode;
+}
+
+static struct dentry *ntfs_fh_to_dentry(struct super_block *sb, struct fid=
 *fid,
+		int fh_len, int fh_type)
+{
+	return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
+				    ntfs_nfs_get_inode);
+}
+
+static struct dentry *ntfs_fh_to_parent(struct super_block *sb, struct fid=
 *fid,
+		int fh_len, int fh_type)
+{
+	return generic_fh_to_parent(sb, fid, fh_len, fh_type,
+				    ntfs_nfs_get_inode);
+}
+
+/**
+ * Export operations allowing NFS exporting of mounted NTFS partitions.
+ */
+const struct export_operations ntfs_export_ops =3D {
+	.encode_fh =3D generic_encode_ino32_fh,
+	.get_parent	=3D ntfs_get_parent,	/* Find the parent of a given directory.=
 */
+	.fh_to_dentry	=3D ntfs_fh_to_dentry,
+	.fh_to_parent	=3D ntfs_fh_to_parent,
+};
--=20
2.25.1