From nobody Tue Apr  7 06:06:01 2026
Received: from mail-yw1-f181.google.com (mail-yw1-f181.google.com
 [209.85.128.181])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9934C3128C9
	for <linux-kernel@vger.kernel.org>; Mon, 16 Mar 2026 02:19:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.181
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773627589; cv=none;
 b=EX/2FTm9bMUzp8N0g758hbWcCtQdlB3E/DA01YeXe9/QfCGvc9i1ZboOvZTl6IJv4kbs6DfshM9Q2nnkq9pAOKju0XcfGpf+IZBcc6Hpn41dYMt/tEURpEqEII/pVeC+RDC5iyyPr7ka0Y59nmbhjKZa1gDyUrliuC1L6QTfFSo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773627589; c=relaxed/simple;
	bh=zROYj2HJVB91cbrSBnrVMW1x7uY5rH6eBccXOW1wGog=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=FNUIn22ZfvbQiQid8K6m+9Q6DC11JvAE9ksw0KyJdDcoXvgHgIWsrU4fUL7EIvwJ8HFe7/yDwe2yyOtxqn8iDPd8DM896nqrlLqv1ON5k8X9vXUugIIz/4lrD6br2Vf7IoLzfKzRJ+o/vCWrduifZoHo4jhlZwRPDfkDG7kJ1CU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=dubeyko.com;
 spf=pass smtp.mailfrom=dubeyko.com;
 dkim=pass (2048-bit key) header.d=dubeyko-com.20230601.gappssmtp.com
 header.i=@dubeyko-com.20230601.gappssmtp.com header.b=ZfjnbxMF;
 arc=none smtp.client-ip=209.85.128.181
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=dubeyko.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=dubeyko.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=dubeyko-com.20230601.gappssmtp.com
 header.i=@dubeyko-com.20230601.gappssmtp.com header.b="ZfjnbxMF"
Received: by mail-yw1-f181.google.com with SMTP id
 00721157ae682-797ab169454so40088407b3.3
        for <linux-kernel@vger.kernel.org>;
 Sun, 15 Mar 2026 19:19:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=dubeyko-com.20230601.gappssmtp.com; s=20230601; t=1773627583;
 x=1774232383; darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=HNuB2J9dQLNXlEO3rZEqnzaUqzYfKUINF1/TXNYAcAE=;
        b=ZfjnbxMFjJJT60DhdOL2JCzQhXr9I2HGNl0APDu8goQBTACwzgLLQV/XUXT+qtx9Ik
         XSHjQ4by1WQWKpzYz2PquUjAUAMRc34uvShKo3p6gi6U/TZe6nxMmNWzmOkkNTJ1vSu2
         btXlGVlbqL97pqH+qOnC4APufH623GvQdEOxKOvvdLGbPc5YuEBHRaZsoDaaYLLltDaI
         fm8QxlqhqRpFXayvxcT8TjRmJHr+UEBE2Az+IX74uBOe9AbWJ/kiVZl4vsZ4oiELoOBP
         KP/Aflr3Vcv73wQcssc6BH2PtshBKGsR3yYZhVnhlCAxBkF4PFllkiaQA4shHydmpcls
         fvqA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1773627583; x=1774232383;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=HNuB2J9dQLNXlEO3rZEqnzaUqzYfKUINF1/TXNYAcAE=;
        b=quiBcqtwZFqO0UkaRUpe3bLTE6XuJgK1MXZgvXj0R4IWbmb7weH5DKIzzrFXr+0J/i
         JklLMQ5ChGQuqti4d4aNEg7nkqx2nUknoZfv6nEkLsMyfxaTGm0au8174i8oksAXqeGC
         7OKrowOAeBuOQeiVatn9e8VqR3qG/QWNATHgasgEzz0r5QWcAQrkFAxN49aBpy+nhV4J
         NbQ52x90/E6mZql+gjm/8Jb7QmLWqv94MbXC1ToTdD7xgx9rZ5AU8j0OK/Hh1BK9qR5k
         GP0dcJImMcRa3rCrA5SZIp+w+BmJi9IcNzbgDOl1E7NglQfBVY8tMn0Bv5h6wn2UC5Qr
         gwXw==
X-Gm-Message-State: AOJu0YxNBFv13gWyWh/dNUTSHTGezNfO5nQ5xrDFsvtTz3e82IUTDqIL
	R1nSAvDsKI9HFT1/owsy1HI0WAgyS1kZ5lbtyEcc6uWlOXWj+/LoRXmknxcwvD/VBroKA8n3dfe
	68CVZ
X-Gm-Gg: ATEYQzySta808CbPGJQT+phdKTq62WeC+raJmyJqsdjmMyoSDAKWgMsJE9qdIyxeeGX
	aZKexsMNC+VQyTEe9HPYFlwRTzdtsp6WuiBbUwftdpJpSQ/eAnl6NvvYlwll6O3ieGf+N/MB5Yd
	KBzXvvNW2e+eLhjT6VbcgNWHOW8HYT45diC795nNJmBsTbUn095pW50GwzPQXsBDsxuEr/GA+Yw
	uOf+RIAZrqYasIGFQE/lNhxHW/pewGaN3ruh5GyKCjuD9tzzMLZS8DBy/14iqCt5KtJV5OOhZXC
	RoVill5tAHEsBmIUm9R4oe881b9cV1p/PskZYYh3H3/2XQuzfXLWL/5A6IrMmX1UTuyi/adtAzY
	mqWvDSUY9yUNCFewYnETFcANSAPeW+BF08kDLov7EUHxy3LbsTFJzKdj03Cuax6eZP4Kl5uKibT
	Z714xpicL4/CpNUafvDxvm9x0s2q9aYZ+XTStU1k5Q/EOJebfa4rTQkMEqcKkf4GIn63G2RUz00
	/PliV2+tnMF/nT65S4+DJFM
X-Received: by 2002:a05:690c:ed6:b0:79a:47b4:397b with SMTP id
 00721157ae682-79a47b440a3mr28617237b3.57.1773627583370;
        Sun, 15 Mar 2026 19:19:43 -0700 (PDT)
Received: from pop-os.attlocal.net ([2600:1700:6476:1430:6939:3e01:5e8f:6093])
        by smtp.gmail.com with ESMTPSA id
 00721157ae682-79917deb69asm79721617b3.10.2026.03.15.19.19.41
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sun, 15 Mar 2026 19:19:42 -0700 (PDT)
From: Viacheslav Dubeyko <slava@dubeyko.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org,
	Viacheslav Dubeyko <slava@dubeyko.com>
Subject: [PATCH v2 38/79] ssdfs: introduce PEB mapping table
Date: Sun, 15 Mar 2026 19:17:42 -0700
Message-ID: <20260316021800.1694650-16-slava@dubeyko.com>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20260316021800.1694650-1-slava@dubeyko.com>
References: <20260316021800.1694650-1-slava@dubeyko.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6=
.18.0

SSDFS file system is based on the concept of logical segment
that is the aggregation of Logical Erase Blocks (LEB). Moreover,
initially, LEB hasn=E2=80=99t association with a particular "Physical"
Erase Block (PEB). It means that segment could have the association
not for all LEBs or, even, to have no association at all with any
PEB (for example, in the case of clean segment). Generally speaking,
SSDFS file system needs a special metadata structure (PEB mapping
table) that is capable of associating any LEB with any PEB. The PEB
mapping table is the crucial metadata structure that has several goals:
(1) mapping LEB to PEB, (2) implementation the logical extent concept,
(3) implementation the concept of PEB migration, (4) implementation of
the delayed erase operation by specialized thread.

PEB mapping table describes the state of all PEBs on a particular
SSDFS file system=E2=80=99s volume. These descriptors are split on several
fragments that are distributed amongst PEBs of specialized segments.
Every fragment of PEB mapping table represents the log=E2=80=99s payload in
a specialized segment. Generally speaking, the payload=E2=80=99s content is
split on: (1) LEB table, and (2) PEB table. The LEB table starts from
the header and it contains the array of records are ordered by LEB IDs.
It means that LEB ID plays the role of index in the array of records.
As a result, the responsibility of LEB table is to define an index inside
of PEB table. Moreover, every LEB table=E2=80=99s record defines two indexe=
s.
The first index (physical index) associates the LEB ID with some PEB ID.
Additionally, the second index (relation index) is able to define a PEB ID
that plays the role of destination PEB during the migration process from
the exhausted PEB into a new one. It is possible to see that PEB table
starts from the header and it contains the array of PEB=E2=80=99s state rec=
ords is
ordered by PEB ID. The most important fields of the PEB=E2=80=99s state rec=
ord
are: (1) erase cycles, (2) PEB type, (3) PEB state.

PEB type describes possible types of data that PEB could contain:
(1) user data, (2) leaf b-tree node, (3) hybrid b-tree node,
(4) index b-tree node, (5) snapshot, (6) superblock, (7) segment bitmap,
(8) PEB mapping table. PEB state describes possible states of PEB during
the lifecycle: (1) clean state means that PEB contains only free NAND flash
pages are ready for write operations, (2) using state means that PEB could
contain valid, invalid, and free pages, (3) used state means that PEB
contains only valid pages, (4) pre-dirty state means that PEB contains
as valid as invalid pages only, (5) dirty state means that PEB contains
only invalid pages, (6) migrating state means that PEB is under migration,
(7) pre-erase state means that PEB is added into the queue of PEBs are
waiting the erase operation, (8) recovering state means that PEB will be
untouched during some amount of time with the goal to recover the ability
to fulfill the erase operation, (9) bad state means that PEB is unable
to be used for storing the data. Generally speaking, the responsibility of
PEB state is to track the passing of PEBs through various phases of their
lifetime with the goal to manage the PEBs=E2=80=99 pool of the file system=
=E2=80=99s
volume efficiently.

"Physical" Erase Block (PEB) mapping table is represented by
a sequence of fragments are distributed among several
segments. Every map or unmap operation marks a fragment as
dirty. Flush operation requires to check the dirty state of
all fragments and to flush dirty fragments on the volume by
means of creation of log(s) into PEB(s) is dedicated to store
mapping table's content. Flush operation is executed in several
steps: (1) prepare migration, (2) flush dirty fragments,
(3) commit logs.

Prepare migration operation is requested before mapping table
flush with the goal to check the necessity to finish/start
migration. Because, start/finish migration requires the modification of
mapping table. However, mapping table's flush operation needs to be
finished without any modifications of mapping table itself.
Flush dirty fragments step implies the searching of dirty fragments
and preparation of update requests for PEB(s) flush thread.
Finally, commit log should be requested because metadata flush
operation must be finished by storing new metadata state
persistently.

Logical extent represents fundamental concept of SSDFS file
system. Any piece of data or metadata on file system volume
is identified by: (1) segment ID, (2) logical block ID, and
(3) length. As a result, any logical block is always located
at the same segment because segment is logical portion of
file system volume is always located at the same position.
However, logical block's content should be located into some
erase block. "Physical" Erase Block (PEB) mapping table
implements mapping of Logical Erase Block (LEB) into PEB
because any segment is a container for one or several LEBs.
Moreover, mapping table supports migration scheme implementation.
The migration scheme guarantee that logical block will be
always located at the same segment even for the case of update
requests.

PEB mapping table implements two fundamental methods:
(1) convert LEB to PEB; (2) map LEB to PEB. Conversion operation is
required if we need to identify which particular PEB contains
data for a LEB of particular segment. Mapping operation is required
if a clean segment has been allocated because LEB(s) of clean
segment need to be associated with PEB(s) that can store logs with
user data or metadata.

Migration scheme is the fundamental technique of GC overhead
management in the SSDFS file system. The key responsibility of
the migration scheme is to guarantee the presence of data at
the same segment for any update operations. Generally speaking,
the migration scheme=E2=80=99s model is implemented on the basis of
association an exhausted PEB with a clean one. The goal of such
association of two PEBs is to implement the gradual migration of
data by means of the update operations in the initial (exhausted)
PEB. As a result, the old, exhausted PEB becomes invalidated after
complete data migration and it will be possible to apply
the erase operation to convert it in the clean state. Moreover,
the destination PEB in the association changes the initial PEB
for some index in the segment and, finally, it becomes the only
PEB for this position. Such technique implements the concept of
logical extent with the goal to decrease the write amplification
issue and to manage the GC overhead. Because the logical extent
concept excludes the necessity to update metadata tracking
the position of user data on the file system=E2=80=99s volume.
Generally speaking, the migration scheme is capable to decrease
the GC activity significantly by means of the excluding the necessity
to update metadata and by means of self-migration of data
between of PEBs is triggered by regular update operations.

Mapping table supports two principal operations:
(1) add migration PEB, (2) exclude migration PEB. Operation of
adding migration PEB is required for the case of starting
migration. Exclude migration PEB operation is executed during
finishing migration. Adding migration PEB operation implies
the association an exhausted PEB with a clean one. Excluding
migration PEB operation implies removing completely invalidated
PEB from the association and request to TRIM/erase this PEB.

"Physical" Erase Block (PEB) mapping table has dedicated
thread. This thread has goal to track the presence of dirty
PEB(s) in mapping table and to execute TRIM/erase operation
for dirty PEBs in the background. However, if the number of
dirty PEBs is big enough, then erase operation(s) can be
executed at the context of the thread that marks PEB as dirty.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/peb_mapping_table.h | 784 +++++++++++++++++++++++++++++++++++
 1 file changed, 784 insertions(+)
 create mode 100644 fs/ssdfs/peb_mapping_table.h

diff --git a/fs/ssdfs/peb_mapping_table.h b/fs/ssdfs/peb_mapping_table.h
new file mode 100644
index 000000000000..aa2332ec8341
--- /dev/null
+++ b/fs/ssdfs/peb_mapping_table.h
@@ -0,0 +1,784 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/peb_mapping_table.h - PEB mapping table declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_PEB_MAPPING_TABLE_H
+#define _SSDFS_PEB_MAPPING_TABLE_H
+
+#include "request_queue.h"
+
+#define SSDFS_MAPTBL_FIRST_PROTECTED_INDEX	0
+#define SSDFS_MAPTBL_PROTECTION_STEP		50
+#define SSDFS_MAPTBL_PROTECTION_RANGE		3
+
+#define SSDFS_PRE_ERASE_PEB_THRESHOLD_PCT	(3)
+#define SSDFS_UNUSED_LEB_THRESHOLD_PCT		(1)
+
+/*
+ * struct ssdfs_maptbl_flush_pair - segment/request pair
+ * @si: pointer on segment object
+ * @req: request object
+ */
+struct ssdfs_maptbl_flush_pair {
+	struct ssdfs_segment_info *si;
+	struct ssdfs_segment_request req;
+};
+
+/*
+ * struct ssdfs_maptbl_fragment_desc - fragment descriptor
+ * @lock: fragment lock
+ * @state: fragment state
+ * @fragment_id: fragment's ID in the whole sequence
+ * @fragment_folios: count of memory folios in fragment
+ * @start_leb: start LEB of fragment
+ * @lebs_count: count of LEB descriptors in the whole fragment
+ * @lebs_per_page: count of LEB descriptors in memory folio
+ * @lebtbl_pages: count of memory folios are used for LEBs description
+ * @pebs_per_page: count of PEB descriptors in memory folio
+ * @stripe_pages: count of memory folios in one stripe
+ * @mapped_lebs: mapped LEBs count in the fragment
+ * @migrating_lebs: migrating LEBs count in the fragment
+ * @reserved_pebs: count of reserved PEBs in fragment
+ * @pre_erase_pebs: count of PEBs in pre-erase state per fragment
+ * @recovering_pebs: count of recovering PEBs per fragment
+ * @array: fragment's memory folios
+ * @init_end: wait of init ending
+ * @flush_req1: main flush requests array
+ * @flush_req2: backup flush requests array
+ * @flush_req_count: number of flush requests in the array
+ * @flush_seq_size: flush requests' array capacity
+ */
+struct ssdfs_maptbl_fragment_desc {
+	struct rw_semaphore lock;
+	atomic_t state;
+
+	u32 fragment_id;
+	u32 fragment_folios;
+
+	u64 start_leb;
+	u32 lebs_count;
+
+	u16 lebs_per_page;
+	u16 lebtbl_pages;
+
+	u16 pebs_per_page;
+	u16 stripe_pages;
+
+	u32 mapped_lebs;
+	u32 migrating_lebs;
+	u32 reserved_pebs;
+	u32 pre_erase_pebs;
+	u32 recovering_pebs;
+
+	struct ssdfs_folio_array array;
+	struct completion init_end;
+
+	struct ssdfs_maptbl_flush_pair *flush_pair1;
+	struct ssdfs_maptbl_flush_pair *flush_pair2;
+	u32 flush_req_count;
+	u32 flush_seq_size;
+
+	/* /sys/fs/<ssdfs>/<device>/maptbl/fragments/fragment<N> */
+	struct kobject frag_kobj;
+	struct completion frag_kobj_unregister;
+};
+
+/* Fragment's state */
+enum {
+	SSDFS_MAPTBL_FRAG_CREATED	=3D 0,
+	SSDFS_MAPTBL_FRAG_INIT_FAILED	=3D 1,
+	SSDFS_MAPTBL_FRAG_INITIALIZED	=3D 2,
+	SSDFS_MAPTBL_FRAG_DIRTY		=3D 3,
+	SSDFS_MAPTBL_FRAG_TOWRITE	=3D 4,
+	SSDFS_MAPTBL_FRAG_STATE_MAX	=3D 5,
+};
+
+/*
+ * struct ssdfs_maptbl_area - mapping table area
+ * @portion_id: sequential ID of mapping table fragment
+ * @folios: array of memory folio pointers
+ * @folios_capacity: capacity of array
+ * @folios_count: count of folios in array
+ */
+struct ssdfs_maptbl_area {
+	u16 portion_id;
+	struct folio **folios;
+	size_t folios_capacity;
+	size_t folios_count;
+};
+
+/*
+ * struct ssdfs_peb_mapping_table - mapping table object
+ * @tbl_lock: mapping table lock
+ * @fragments_count: count of fragments
+ * @fragments_per_seg: count of fragments in segment
+ * @fragments_per_peb: count of fragments in PEB
+ * @fragment_bytes: count of bytes in one fragment
+ * @fragment_folios: count of memory folios in one fragment
+ * @flags: mapping table flags
+ * @lebs_count: count of LEBs are described by mapping table
+ * @pebs_count: count of PEBs are described by mapping table
+ * @lebs_per_fragment: count of LEB descriptors in fragment
+ * @pebs_per_fragment: count of PEB descriptors in fragment
+ * @pebs_per_stripe: count of PEB descriptors in stripe
+ * @stripes_per_fragment: count of stripes in fragment
+ * @extents: metadata extents that describe mapping table location
+ * @segs: array of pointers on segment objects
+ * @segs_count: count of segment objects are used for mapping table
+ * @state: mapping table's state
+ * @erase_op_state: state of erase operation
+ * @min_pre_erase_pebs: minimum number of PEBs in pre-erase state
+ * @total_pre_erase_pebs: total number of PEBs in pre-erase state
+ * @max_erase_ops: upper bound of erase operations for one iteration
+ * @erase_ops_end_wq: wait queue of threads are waiting end of erase opera=
tion
+ * @bmap_lock: dirty bitmap's lock
+ * @dirty_bmap: bitmap of dirty fragments
+ * @desc_array: array of fragment descriptors
+ * @wait_queue: wait queue of mapping table's thread
+ * @flush_end: wait of flush ending
+ * @thread: descriptor of mapping table's thread
+ * @fsi: pointer on shared file system object
+ */
+struct ssdfs_peb_mapping_table {
+	struct rw_semaphore tbl_lock;
+	u32 fragments_count;
+	u16 fragments_per_seg;
+	u16 fragments_per_peb;
+	u32 fragment_bytes;
+	u32 fragment_folios;
+	atomic_t flags;
+	u64 lebs_count;
+	u64 pebs_count;
+	u16 lebs_per_fragment;
+	u16 pebs_per_fragment;
+	u16 pebs_per_stripe;
+	u16 stripes_per_fragment;
+	struct ssdfs_meta_area_extent extents[MAPTBL_LIMIT1][MAPTBL_LIMIT2];
+	struct ssdfs_segment_info **segs[SSDFS_MAPTBL_SEG_COPY_MAX];
+	u16 segs_count;
+
+	atomic_t state;
+
+	atomic_t erase_op_state;
+	atomic_t min_pre_erase_pebs;
+	atomic_t total_pre_erase_pebs;
+	atomic_t max_erase_ops;
+	wait_queue_head_t erase_ops_end_wq;
+
+	atomic64_t last_peb_recover_cno;
+
+	struct mutex bmap_lock;
+	unsigned long *dirty_bmap;
+	struct ssdfs_maptbl_fragment_desc *desc_array;
+
+	wait_queue_head_t wait_queue;
+	struct completion flush_end;
+	struct ssdfs_thread_info thread;
+	struct ssdfs_fs_info *fsi;
+};
+
+/* PEB mapping table's state */
+enum {
+	SSDFS_MAPTBL_CREATED			=3D 0,
+	SSDFS_MAPTBL_GOING_TO_BE_DESTROY	=3D 1,
+	SSDFS_MAPTBL_STATE_MAX			=3D 2,
+};
+
+/*
+ * struct ssdfs_maptbl_peb_descriptor - PEB descriptor
+ * @peb_id: PEB identification number
+ * @shared_peb_index: index of external shared destination PEB
+ * @erase_cycles: P/E cycles
+ * @type: PEB type
+ * @state: PEB state
+ * @flags: PEB flags
+ * @consistency: PEB state consistency type
+ */
+struct ssdfs_maptbl_peb_descriptor {
+	u64 peb_id;
+	u8 shared_peb_index;
+	u32 erase_cycles;
+	u8 type;
+	u8 state;
+	u8 flags;
+	u8 consistency;
+};
+
+/*
+ * struct ssdfs_maptbl_peb_relation - PEBs association
+ * @pebs: array of PEB descriptors
+ */
+struct ssdfs_maptbl_peb_relation {
+	struct ssdfs_maptbl_peb_descriptor pebs[SSDFS_MAPTBL_RELATION_MAX];
+};
+
+/*
+ * Erase operation state
+ */
+enum {
+	SSDFS_MAPTBL_NO_ERASE,
+	SSDFS_MAPTBL_ERASE_IN_PROGRESS
+};
+
+/* Stage of recovering try */
+enum {
+	SSDFS_CHECK_RECOVERABILITY,
+	SSDFS_MAKE_RECOVERING,
+	SSDFS_RECOVER_STAGE_MAX
+};
+
+/* Possible states of erase operation */
+enum {
+	SSDFS_ERASE_RESULT_UNKNOWN,
+	SSDFS_ERASE_DONE,
+	SSDFS_ERASE_SB_PEB_DONE,
+	SSDFS_IGNORE_ERASE,
+	SSDFS_ERASE_FAILURE,
+	SSDFS_BAD_BLOCK_DETECTED,
+	SSDFS_ERASE_RESULT_MAX
+};
+
+/*
+ * struct ssdfs_erase_result - PEB's erase operation result
+ * @fragment_index: index of mapping table's fragment
+ * @peb_index: PEB's index in fragment
+ * @peb_id: PEB ID number
+ * @state: state of erase operation
+ */
+struct ssdfs_erase_result {
+	u32 fragment_index;
+	u16 peb_index;
+	u64 peb_id;
+	int state;
+};
+
+/*
+ * struct ssdfs_erase_result_array - array of erase operation results
+ * @ptr: pointer on memory buffer
+ * @capacity: maximal number of erase operation results in array
+ * @size: count of erase operation results in array
+ */
+struct ssdfs_erase_result_array {
+	struct ssdfs_erase_result *ptr;
+	u32 capacity;
+	u32 size;
+};
+
+#define SSDFS_ERASE_RESULTS_PER_FRAGMENT	(10)
+
+/*
+ * Inline functions
+ */
+
+/*
+ * SSDFS_ERASE_RESULT_INIT() - init erase result
+ * @fragment_index: index of mapping table's fragment
+ * @peb_index: PEB's index in fragment
+ * @peb_id: PEB ID number
+ * @state: state of erase operation
+ * @result: erase operation result [out]
+ *
+ * This method initializes the erase operation result.
+ */
+static inline
+void SSDFS_ERASE_RESULT_INIT(u32 fragment_index, u16 peb_index,
+			     u64 peb_id, int state,
+			     struct ssdfs_erase_result *result)
+{
+	result->fragment_index =3D fragment_index;
+	result->peb_index =3D peb_index;
+	result->peb_id =3D peb_id;
+	result->state =3D state;
+}
+
+/*
+ * DEFINE_PEB_INDEX_IN_FRAGMENT() - define PEB index in the whole fragment
+ * @fdesc: fragment descriptor
+ * @folio_index: folio index in the fragment
+ * @item_index: item index in the memory folio
+ */
+static inline
+u16 DEFINE_PEB_INDEX_IN_FRAGMENT(struct ssdfs_maptbl_fragment_desc *fdesc,
+				 pgoff_t folio_index,
+				 u16 item_index)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fdesc);
+	BUG_ON(folio_index < fdesc->lebtbl_pages);
+
+	SSDFS_DBG("fdesc %p, folio_index %lu, item_index %u\n",
+		  fdesc, folio_index, item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio_index -=3D fdesc->lebtbl_pages;
+	folio_index *=3D fdesc->pebs_per_page;
+	folio_index +=3D item_index;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(folio_index >=3D U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u16)folio_index;
+}
+
+/*
+ * GET_PEB_ID() - define PEB ID for the index
+ * @kaddr: pointer on memory folio's content
+ * @item_index: item index inside of the folio
+ *
+ * This method tries to convert @item_index into
+ * PEB ID value.
+ *
+ * RETURN:
+ * [success] - PEB ID
+ * [failure] - U64_MAX
+ */
+static inline
+u64 GET_PEB_ID(void *kaddr, u16 item_index)
+{
+	struct ssdfs_peb_table_fragment_header *hdr;
+	u64 start_peb;
+	u16 pebs_count;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!kaddr);
+
+	SSDFS_DBG("kaddr %p, item_index %u\n",
+		  kaddr, item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	hdr =3D (struct ssdfs_peb_table_fragment_header *)kaddr;
+
+	if (le16_to_cpu(hdr->magic) !=3D SSDFS_PEB_TABLE_MAGIC) {
+		SSDFS_ERR("corrupted folio\n");
+		return U64_MAX;
+	}
+
+	start_peb =3D le64_to_cpu(hdr->start_peb);
+	pebs_count =3D le16_to_cpu(hdr->pebs_count);
+
+	if (item_index >=3D pebs_count) {
+		SSDFS_ERR("item_index %u >=3D pebs_count %u\n",
+			  item_index, pebs_count);
+		return U64_MAX;
+	}
+
+	return start_peb + item_index;
+}
+
+/*
+ * PEBTBL_FOLIO_INDEX() - define PEB table folio index
+ * @fdesc: fragment descriptor
+ * @peb_index: index of PEB in the fragment
+ */
+static inline
+pgoff_t PEBTBL_FOLIO_INDEX(struct ssdfs_maptbl_fragment_desc *fdesc,
+			   u16 peb_index)
+{
+	pgoff_t folio_index;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fdesc);
+
+	SSDFS_DBG("fdesc %p, peb_index %u\n",
+		  fdesc, peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio_index =3D fdesc->lebtbl_pages;
+	folio_index +=3D peb_index / fdesc->pebs_per_page;
+	return folio_index;
+}
+
+/*
+ * GET_PEB_DESCRIPTOR() - retrieve PEB descriptor
+ * @kaddr: pointer on memory folio's content
+ * @item_index: item index inside of the folio
+ *
+ * This method tries to return the pointer on
+ * PEB descriptor for @item_index.
+ *
+ * RETURN:
+ * [success] - pointer on PEB descriptor
+ * [failure] - error code:
+ *
+ * %-ERANGE     - internal error.
+ */
+static inline
+struct ssdfs_peb_descriptor *GET_PEB_DESCRIPTOR(void *kaddr, u16 item_inde=
x)
+{
+	struct ssdfs_peb_table_fragment_header *hdr;
+	u16 pebs_count;
+	u32 peb_desc_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!kaddr);
+
+	SSDFS_DBG("kaddr %p, item_index %u\n",
+		  kaddr, item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	hdr =3D (struct ssdfs_peb_table_fragment_header *)kaddr;
+
+	if (le16_to_cpu(hdr->magic) !=3D SSDFS_PEB_TABLE_MAGIC) {
+		SSDFS_ERR("corrupted folio\n");
+		return ERR_PTR(-ERANGE);
+	}
+
+	pebs_count =3D le16_to_cpu(hdr->pebs_count);
+
+	if (item_index >=3D pebs_count) {
+		SSDFS_ERR("item_index %u >=3D pebs_count %u\n",
+			  item_index, pebs_count);
+		return ERR_PTR(-ERANGE);
+	}
+
+	peb_desc_off =3D SSDFS_PEBTBL_FRAGMENT_HDR_SIZE;
+	peb_desc_off +=3D item_index * sizeof(struct ssdfs_peb_descriptor);
+
+	if (peb_desc_off >=3D PAGE_SIZE) {
+		SSDFS_ERR("invalid offset %u\n", peb_desc_off);
+		return ERR_PTR(-ERANGE);
+	}
+
+	return (struct ssdfs_peb_descriptor *)((u8 *)kaddr + peb_desc_off);
+}
+
+/*
+ * SEG2PEB_TYPE() - convert segment into PEB type
+ */
+static inline
+int SEG2PEB_TYPE(int seg_type)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_type %d\n", seg_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (seg_type) {
+	case SSDFS_USER_DATA_SEG_TYPE:
+		return SSDFS_MAPTBL_DATA_PEB_TYPE;
+
+	case SSDFS_LEAF_NODE_SEG_TYPE:
+		return SSDFS_MAPTBL_LNODE_PEB_TYPE;
+
+	case SSDFS_HYBRID_NODE_SEG_TYPE:
+		return SSDFS_MAPTBL_HNODE_PEB_TYPE;
+
+	case SSDFS_INDEX_NODE_SEG_TYPE:
+		return SSDFS_MAPTBL_IDXNODE_PEB_TYPE;
+
+	case SSDFS_INITIAL_SNAPSHOT_SEG_TYPE:
+		return SSDFS_MAPTBL_INIT_SNAP_PEB_TYPE;
+
+	case SSDFS_SB_SEG_TYPE:
+		return SSDFS_MAPTBL_SBSEG_PEB_TYPE;
+
+	case SSDFS_SEGBMAP_SEG_TYPE:
+		return SSDFS_MAPTBL_SEGBMAP_PEB_TYPE;
+
+	case SSDFS_MAPTBL_SEG_TYPE:
+		return SSDFS_MAPTBL_MAPTBL_PEB_TYPE;
+	}
+
+	return SSDFS_MAPTBL_PEB_TYPE_MAX;
+}
+
+/*
+ * PEB2SEG_TYPE() - convert PEB into segment type
+ */
+static inline
+int PEB2SEG_TYPE(int peb_type)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("peb_type %d\n", peb_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (peb_type) {
+	case SSDFS_MAPTBL_DATA_PEB_TYPE:
+		return SSDFS_USER_DATA_SEG_TYPE;
+
+	case SSDFS_MAPTBL_LNODE_PEB_TYPE:
+		return SSDFS_LEAF_NODE_SEG_TYPE;
+
+	case SSDFS_MAPTBL_HNODE_PEB_TYPE:
+		return SSDFS_HYBRID_NODE_SEG_TYPE;
+
+	case SSDFS_MAPTBL_IDXNODE_PEB_TYPE:
+		return SSDFS_INDEX_NODE_SEG_TYPE;
+
+	case SSDFS_MAPTBL_INIT_SNAP_PEB_TYPE:
+		return SSDFS_INITIAL_SNAPSHOT_SEG_TYPE;
+
+	case SSDFS_MAPTBL_SBSEG_PEB_TYPE:
+		return SSDFS_SB_SEG_TYPE;
+
+	case SSDFS_MAPTBL_SEGBMAP_PEB_TYPE:
+		return SSDFS_SEGBMAP_SEG_TYPE;
+
+	case SSDFS_MAPTBL_MAPTBL_PEB_TYPE:
+		return SSDFS_MAPTBL_SEG_TYPE;
+	}
+
+	return SSDFS_UNKNOWN_SEG_TYPE;
+}
+
+static inline
+bool is_ssdfs_maptbl_under_flush(struct ssdfs_fs_info *fsi)
+{
+	return atomic_read(&fsi->maptbl->flags) & SSDFS_MAPTBL_UNDER_FLUSH;
+}
+
+static inline
+bool is_ssdfs_maptbl_start_migration(struct ssdfs_fs_info *fsi)
+{
+	return atomic_read(&fsi->maptbl->flags) & SSDFS_MAPTBL_START_MIGRATION;
+}
+
+/*
+ * is_peb_protected() - check that PEB is protected
+ * @found_item: PEB index in the fragment
+ */
+static inline
+bool is_peb_protected(unsigned long found_item)
+{
+	unsigned long remainder;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("found_item %lu\n", found_item);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	remainder =3D found_item % SSDFS_MAPTBL_PROTECTION_STEP;
+	return remainder =3D=3D 0;
+}
+
+static inline
+bool is_ssdfs_maptbl_going_to_be_destroyed(struct ssdfs_peb_mapping_table =
*tbl)
+{
+	return atomic_read(&tbl->state) =3D=3D SSDFS_MAPTBL_GOING_TO_BE_DESTROY;
+}
+
+static inline
+void set_maptbl_going_to_be_destroyed(struct ssdfs_fs_info *fsi)
+{
+	atomic_set(&fsi->maptbl->state, SSDFS_MAPTBL_GOING_TO_BE_DESTROY);
+}
+
+static inline
+void ssdfs_account_updated_user_data_pages(struct ssdfs_fs_info *fsi,
+					   u32 count)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	u64 updated =3D 0;
+
+	BUG_ON(!fsi);
+
+	SSDFS_DBG("fsi %p, count %u\n",
+		  fsi, count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	spin_lock(&fsi->volume_state_lock);
+	fsi->updated_user_data_pages +=3D count;
+#ifdef CONFIG_SSDFS_DEBUG
+	updated =3D fsi->updated_user_data_pages;
+#endif /* CONFIG_SSDFS_DEBUG */
+	spin_unlock(&fsi->volume_state_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("updated %llu\n", updated);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+int ssdfs_maptbl_change_peb_state(struct ssdfs_fs_info *fsi,
+				  u64 leb_id, u8 peb_type,
+				  int peb_state,
+				  struct completion **end);
+
+/*
+ * ssdfs_maptbl_wait_and_change_peb_state() - wait and change PEB state
+ * @fsi: file system info object
+ * @leb_id: LEB ID number
+ * @peb_type: type of the PEB
+ * @peb_state: new state of the PEB
+ */
+static inline
+int ssdfs_maptbl_wait_and_change_peb_state(struct ssdfs_fs_info *fsi,
+					   u64 leb_id, u8 peb_type,
+					   int peb_state)
+{
+	struct completion *end;
+	int number_of_tries =3D 0;
+	int err =3D 0;
+
+	err =3D ssdfs_maptbl_change_peb_state(fsi, leb_id,
+					    peb_type, peb_state,
+					    &end);
+	if (err =3D=3D -EAGAIN) {
+wait_completion_end:
+		err =3D SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("waiting failed: "
+				  "err %d\n", err);
+			return err;
+		}
+
+		err =3D ssdfs_maptbl_change_peb_state(fsi,
+						    leb_id,
+						    peb_type,
+						    peb_state,
+						    &end);
+		if (err =3D=3D -EAGAIN && is_ssdfs_maptbl_under_flush(fsi)) {
+			if (number_of_tries < SSDFS_MAX_NUMBER_OF_TRIES) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("mapping table is flushing: "
+					  "leb_id %llu, peb_type %#x, "
+					  "new_state %#x, number_of_tries %d\n",
+					  leb_id, peb_type, peb_state,
+					  number_of_tries);
+#endif /* CONFIG_SSDFS_DEBUG */
+				number_of_tries++;
+				goto wait_completion_end;
+			}
+		}
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to change the PEB state: "
+			  "leb_id %llu, peb_type %#x, "
+			  "new_state %#x, err %d\n",
+			  leb_id, peb_type,
+			  peb_state, err);
+	}
+
+	return err;
+}
+
+/*
+ * PEB mapping table's API
+ */
+int ssdfs_maptbl_create(struct ssdfs_fs_info *fsi);
+void ssdfs_maptbl_destroy(struct ssdfs_fs_info *fsi);
+int ssdfs_maptbl_fragment_init(struct ssdfs_peb_container *pebc,
+				struct ssdfs_maptbl_area *area);
+int ssdfs_maptbl_flush(struct ssdfs_peb_mapping_table *tbl);
+int ssdfs_maptbl_resize(struct ssdfs_peb_mapping_table *tbl,
+			u64 new_pebs_count);
+
+int ssdfs_maptbl_convert_leb2peb(struct ssdfs_fs_info *fsi,
+				 u64 leb_id, u8 peb_type,
+				 struct ssdfs_maptbl_peb_relation *pebr,
+				 struct completion **end);
+int ssdfs_maptbl_map_leb2peb(struct ssdfs_fs_info *fsi,
+			     u64 leb_id, u8 peb_type,
+			     struct ssdfs_maptbl_peb_relation *pebr,
+			     struct completion **end);
+int ssdfs_maptbl_recommend_search_range(struct ssdfs_fs_info *fsi,
+					u64 *start_leb,
+					u64 *end_leb,
+					struct completion **end);
+int ssdfs_maptbl_change_peb_state(struct ssdfs_fs_info *fsi,
+				  u64 leb_id, u8 peb_type,
+				  int peb_state,
+				  struct completion **end);
+int ssdfs_maptbl_prepare_pre_erase_state(struct ssdfs_fs_info *fsi,
+					 u64 leb_id, u8 peb_type,
+					 struct completion **end);
+int ssdfs_maptbl_set_pre_erased_snapshot_peb(struct ssdfs_fs_info *fsi,
+					     u64 peb_id,
+					     struct completion **end);
+int ssdfs_maptbl_add_migration_peb(struct ssdfs_fs_info *fsi,
+				   u64 leb_id, u8 peb_type,
+				   struct ssdfs_maptbl_peb_relation *pebr,
+				   struct completion **end);
+int ssdfs_maptbl_exclude_migration_peb(struct ssdfs_fs_info *fsi,
+					u64 leb_id, u8 peb_type,
+					u64 peb_create_time,
+					u64 last_log_time,
+					struct completion **end);
+int ssdfs_maptbl_set_indirect_relation(struct ssdfs_peb_mapping_table *tbl,
+					u64 leb_id, u8 peb_type,
+					u64 dst_leb_id, u16 dst_peb_index,
+					struct completion **end);
+int ssdfs_maptbl_break_indirect_relation(struct ssdfs_peb_mapping_table *t=
bl,
+					 u64 leb_id, u8 peb_type,
+					 u64 dst_leb_id, int dst_peb_refs,
+					 struct completion **end);
+int ssdfs_maptbl_set_zns_indirect_relation(struct ssdfs_peb_mapping_table =
*tbl,
+					   u64 leb_id, u8 peb_type,
+					   struct completion **end);
+int ssdfs_maptbl_break_zns_indirect_relation(struct ssdfs_peb_mapping_tabl=
e *tbl,
+					     u64 leb_id, u8 peb_type,
+					     struct completion **end);
+
+int ssdfs_reserve_free_pages(struct ssdfs_fs_info *fsi,
+			     u32 count, int type);
+
+/*
+ * It makes sense to have special thread for the whole mapping table.
+ * The goal of the thread will be clearing of dirty PEBs,
+ * tracking P/E cycles, excluding bad PEBs and recovering PEBs
+ * in the background. Knowledge about PEBs will be hidden by
+ * mapping table. All other subsystems will operate by LEBs.
+ */
+
+/*
+ * PEB mapping table's internal API
+ */
+int ssdfs_maptbl_start_thread(struct ssdfs_peb_mapping_table *tbl);
+int ssdfs_maptbl_stop_thread(struct ssdfs_peb_mapping_table *tbl);
+
+int ssdfs_maptbl_define_fragment_info(struct ssdfs_fs_info *fsi,
+				      u64 leb_id,
+				      u16 *pebs_per_fragment,
+				      u16 *pebs_per_stripe,
+				      u16 *stripes_per_fragment);
+struct ssdfs_maptbl_fragment_desc *
+ssdfs_maptbl_get_fragment_descriptor(struct ssdfs_peb_mapping_table *tbl,
+				     u64 leb_id);
+void ssdfs_maptbl_set_fragment_dirty(struct ssdfs_peb_mapping_table *tbl,
+				     struct ssdfs_maptbl_fragment_desc *fdesc,
+				     u64 leb_id, u8 peb_type);
+int ssdfs_maptbl_solve_inconsistency(struct ssdfs_peb_mapping_table *tbl,
+				     struct ssdfs_maptbl_fragment_desc *fdesc,
+				     u64 leb_id,
+				     struct ssdfs_maptbl_peb_relation *pebr);
+int ssdfs_maptbl_solve_pre_deleted_state(struct ssdfs_peb_mapping_table *t=
bl,
+				     struct ssdfs_maptbl_fragment_desc *fdesc,
+				     u64 leb_id,
+				     struct ssdfs_maptbl_peb_relation *pebr);
+void ssdfs_maptbl_move_fragment_folios(struct ssdfs_segment_request *req,
+					struct ssdfs_maptbl_area *area,
+					u16 folios_count);
+int ssdfs_maptbl_erase_peb(struct ssdfs_fs_info *fsi,
+			   struct ssdfs_erase_result *result);
+int ssdfs_maptbl_correct_dirty_peb(struct ssdfs_peb_mapping_table *tbl,
+				   struct ssdfs_maptbl_fragment_desc *fdesc,
+				   struct ssdfs_erase_result *result);
+int __ssdfs_maptbl_correct_peb_state(struct ssdfs_peb_mapping_table *tbl,
+				     struct ssdfs_maptbl_fragment_desc *fdesc,
+				     struct ssdfs_peb_table_fragment_header *hdr,
+				     struct ssdfs_erase_result *res);
+int ssdfs_maptbl_erase_reserved_peb_now(struct ssdfs_fs_info *fsi,
+					u64 leb_id, u8 peb_type,
+					struct completion **end);
+int ssdfs_maptbl_erase_dirty_pebs_now(struct ssdfs_peb_mapping_table *tbl);
+
+void ssdfs_debug_maptbl_object(struct ssdfs_peb_mapping_table *tbl);
+
+#endif /* _SSDFS_PEB_MAPPING_TABLE_H */
--=20
2.34.1