From nobody Sun Feb 8 05:58:57 2026 Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E49A33E1 for ; Sun, 25 Feb 2024 02:38:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828719; cv=none; b=d4T1l0lDd8dEzLyClbECZI87fCdNMPxxXsd2RvXUx0xFGl1JMjQhkyGaKLF3H6idM4uHmIwBNOo9zXr4ii36K9Fn92sUlFY15KJJW5NsBRR7IfdxSO+9u9a5Md/yQbfOPpPBfwSSVUoXPL8x7A6V2CB4ob+2SgIpOCR0DBfdK4g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828719; c=relaxed/simple; bh=HeNS1iPBZ5lAlddImoTA33wWf0HYkhdae2awGoF0K+0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J5Xyh9dn1SOVVyzSbBDcxWrUKEnqJqdwhSIVbaUw0FfJuM6obY7/zaXJUbQxRyzqybEzl2r2dcCFmGftJ2YbjzcjZSnn+djsZzHU4xjRFWvI+1b82qWLhAn5TTOdyGllmaD0IjATDA3WdnU/Be/p9OdjaUXv4Daa+9jt7kC8F70= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=v1MNIKkb; arc=none smtp.client-ip=95.215.58.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="v1MNIKkb" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828715; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zl/wHtbVZwctaDzKPv15EO94P0+3HzLA0UW5JZF/pxI=; b=v1MNIKkbFwTFtfvLjog2D14bfQficAunRudKPocggSrwalxyTzlI/kLXdSKQHcufvWC4Zg 8h9qboDqXIG39mInHi+z9p3D9LTfrz4CJMxApn4rxGCNjxeYNmHoIZMqQfrSkBJsbnxZb5 2dBovt6i+oJCeXgaQYd8Fnghgq/3VsA= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 01/21] bcachefs: KEY_TYPE_accounting Date: Sat, 24 Feb 2024 21:38:03 -0500 Message-ID: <20240225023826.2413565-2-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" New key type for the disk space accounting rewrite. - Holds a variable sized array of u64s (may be more than one for accounting e.g. compressed and uncompressed size, or buckets and sectors for a given data type) - Updates are deltas, not new versions of the key: this means updates to accounting can happen via the btree write buffer, which we'll be teaching to accumulate deltas. Signed-off-by: Kent Overstreet --- fs/bcachefs/Makefile | 3 +- fs/bcachefs/bcachefs.h | 1 + fs/bcachefs/bcachefs_format.h | 80 +++------------ fs/bcachefs/bkey_methods.c | 1 + fs/bcachefs/disk_accounting.c | 70 ++++++++++++++ fs/bcachefs/disk_accounting.h | 52 ++++++++++ fs/bcachefs/disk_accounting_format.h | 139 +++++++++++++++++++++++++++ fs/bcachefs/replicas_format.h | 21 ++++ fs/bcachefs/sb-downgrade.c | 12 ++- fs/bcachefs/sb-errors_types.h | 3 +- 10 files changed, 311 insertions(+), 71 deletions(-) create mode 100644 fs/bcachefs/disk_accounting.c create mode 100644 fs/bcachefs/disk_accounting.h create mode 100644 fs/bcachefs/disk_accounting_format.h create mode 100644 fs/bcachefs/replicas_format.h diff --git a/fs/bcachefs/Makefile b/fs/bcachefs/Makefile index f42f6d256945..94b2edb4155f 100644 --- a/fs/bcachefs/Makefile +++ b/fs/bcachefs/Makefile @@ -27,10 +27,11 @@ bcachefs-y :=3D \ checksum.o \ clock.o \ compress.o \ + data_update.o \ debug.o \ dirent.o \ + disk_accounting.o \ disk_groups.o \ - data_update.o \ ec.o \ errcode.o \ error.o \ diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h index 0bee9dab6068..62812fc1cad0 100644 --- a/fs/bcachefs/bcachefs.h +++ b/fs/bcachefs/bcachefs.h @@ -509,6 +509,7 @@ enum gc_phase { GC_PHASE_BTREE_logged_ops, GC_PHASE_BTREE_rebalance_work, GC_PHASE_BTREE_subvolume_children, + GC_PHASE_BTREE_accounting, =20 GC_PHASE_PENDING_DELETE, }; diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h index bff8750ac0d7..313ca7dc370d 100644 --- a/fs/bcachefs/bcachefs_format.h +++ b/fs/bcachefs/bcachefs_format.h @@ -416,7 +416,8 @@ static inline void bkey_init(struct bkey *k) x(bucket_gens, 30) \ x(snapshot_tree, 31) \ x(logged_op_truncate, 32) \ - x(logged_op_finsert, 33) + x(logged_op_finsert, 33) \ + x(accounting, 34) =20 enum bch_bkey_type { #define x(name, nr) KEY_TYPE_##name =3D nr, @@ -501,17 +502,19 @@ struct bch_sb_field { x(downgrade, 14) =20 #include "alloc_background_format.h" +#include "dirent_format.h" +#include "disk_accounting_format.h" #include "extents_format.h" -#include "reflink_format.h" #include "ec_format.h" #include "inode_format.h" -#include "dirent_format.h" -#include "xattr_format.h" -#include "quota_format.h" #include "logged_ops_format.h" +#include "quota_format.h" +#include "reflink_format.h" +#include "replicas_format.h" +#include "sb-counters_format.h" #include "snapshot_format.h" #include "subvolume_format.h" -#include "sb-counters_format.h" +#include "xattr_format.h" =20 enum bch_sb_field_type { #define x(f, nr) BCH_SB_FIELD_##f =3D nr, @@ -680,69 +683,11 @@ LE64_BITMASK(BCH_KDF_SCRYPT_P, struct bch_sb_field_cr= ypt, kdf_flags, 32, 48); =20 /* BCH_SB_FIELD_replicas: */ =20 -#define BCH_DATA_TYPES() \ - x(free, 0) \ - x(sb, 1) \ - x(journal, 2) \ - x(btree, 3) \ - x(user, 4) \ - x(cached, 5) \ - x(parity, 6) \ - x(stripe, 7) \ - x(need_gc_gens, 8) \ - x(need_discard, 9) - -enum bch_data_type { -#define x(t, n) BCH_DATA_##t, - BCH_DATA_TYPES() -#undef x - BCH_DATA_NR -}; - -static inline bool data_type_is_empty(enum bch_data_type type) -{ - switch (type) { - case BCH_DATA_free: - case BCH_DATA_need_gc_gens: - case BCH_DATA_need_discard: - return true; - default: - return false; - } -} - -static inline bool data_type_is_hidden(enum bch_data_type type) -{ - switch (type) { - case BCH_DATA_sb: - case BCH_DATA_journal: - return true; - default: - return false; - } -} - -struct bch_replicas_entry_v0 { - __u8 data_type; - __u8 nr_devs; - __u8 devs[]; -} __packed; - struct bch_sb_field_replicas_v0 { struct bch_sb_field field; struct bch_replicas_entry_v0 entries[]; } __packed __aligned(8); =20 -struct bch_replicas_entry_v1 { - __u8 data_type; - __u8 nr_devs; - __u8 nr_required; - __u8 devs[]; -} __packed; - -#define replicas_entry_bytes(_i) \ - (offsetof(typeof(*(_i)), devs) + (_i)->nr_devs) - struct bch_sb_field_replicas { struct bch_sb_field field; struct bch_replicas_entry_v1 entries[]; @@ -875,7 +820,8 @@ struct bch_sb_field_downgrade { x(rebalance_work, BCH_VERSION(1, 3)) \ x(member_seq, BCH_VERSION(1, 4)) \ x(subvolume_fs_parent, BCH_VERSION(1, 5)) \ - x(btree_subvolume_children, BCH_VERSION(1, 6)) + x(btree_subvolume_children, BCH_VERSION(1, 6)) \ + x(disk_accounting_v2, BCH_VERSION(1, 7)) =20 enum bcachefs_metadata_version { bcachefs_metadata_version_min =3D 9, @@ -1525,7 +1471,9 @@ enum btree_id_flags { x(rebalance_work, 18, BTREE_ID_SNAPSHOT_FIELD, \ BIT_ULL(KEY_TYPE_set)|BIT_ULL(KEY_TYPE_cookie)) \ x(subvolume_children, 19, 0, \ - BIT_ULL(KEY_TYPE_set)) + BIT_ULL(KEY_TYPE_set)) \ + x(accounting, 20, BTREE_ID_SNAPSHOT_FIELD, \ + BIT_ULL(KEY_TYPE_accounting)) \ =20 enum btree_id { #define x(name, nr, ...) BTREE_ID_##name =3D nr, diff --git a/fs/bcachefs/bkey_methods.c b/fs/bcachefs/bkey_methods.c index 5e52684764eb..da25bdd1e8a6 100644 --- a/fs/bcachefs/bkey_methods.c +++ b/fs/bcachefs/bkey_methods.c @@ -7,6 +7,7 @@ #include "btree_types.h" #include "alloc_background.h" #include "dirent.h" +#include "disk_accounting.h" #include "ec.h" #include "error.h" #include "extents.h" diff --git a/fs/bcachefs/disk_accounting.c b/fs/bcachefs/disk_accounting.c new file mode 100644 index 000000000000..209f59e87b34 --- /dev/null +++ b/fs/bcachefs/disk_accounting.c @@ -0,0 +1,70 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "bcachefs.h" +#include "btree_update.h" +#include "buckets.h" +#include "disk_accounting.h" +#include "replicas.h" + +static const char * const disk_accounting_type_strs[] =3D { +#define x(t, n, ...) [n] =3D #t, + BCH_DISK_ACCOUNTING_TYPES() +#undef x + NULL +}; + +int bch2_accounting_invalid(struct bch_fs *c, struct bkey_s_c k, + enum bkey_invalid_flags flags, + struct printbuf *err) +{ + return 0; +} + +void bch2_accounting_key_to_text(struct printbuf *out, struct disk_account= ing_key *k) +{ + if (k->type >=3D BCH_DISK_ACCOUNTING_TYPE_NR) { + prt_printf(out, "unknown type %u", k->type); + return; + } + + prt_str(out, disk_accounting_type_strs[k->type]); + prt_str(out, " "); + + switch (k->type) { + case BCH_DISK_ACCOUNTING_nr_inodes: + break; + case BCH_DISK_ACCOUNTING_persistent_reserved: + prt_printf(out, "replicas=3D%u", k->persistent_reserved.nr_replicas); + break; + case BCH_DISK_ACCOUNTING_replicas: + bch2_replicas_entry_to_text(out, &k->replicas); + break; + case BCH_DISK_ACCOUNTING_dev_data_type: + prt_printf(out, "dev=3D%u data_type=3D", k->dev_data_type.dev); + bch2_prt_data_type(out, k->dev_data_type.data_type); + break; + case BCH_DISK_ACCOUNTING_dev_stripe_buckets: + prt_printf(out, "dev=3D%u", k->dev_stripe_buckets.dev); + break; + } +} + +void bch2_accounting_to_text(struct printbuf *out, struct bch_fs *c, struc= t bkey_s_c k) +{ + struct bkey_s_c_accounting acc =3D bkey_s_c_to_accounting(k); + struct disk_accounting_key acc_k; + bpos_to_disk_accounting_key(&acc_k, k.k->p); + + bch2_accounting_key_to_text(out, &acc_k); + + for (unsigned i =3D 0; i < bch2_accounting_counters(k.k); i++) + prt_printf(out, " %lli", acc.v->d[i]); +} + +void bch2_accounting_swab(struct bkey_s k) +{ + for (u64 *p =3D (u64 *) k.v; + p < (u64 *) bkey_val_end(k); + p++) + *p =3D swab64(*p); +} diff --git a/fs/bcachefs/disk_accounting.h b/fs/bcachefs/disk_accounting.h new file mode 100644 index 000000000000..e15299665859 --- /dev/null +++ b/fs/bcachefs/disk_accounting.h @@ -0,0 +1,52 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _BCACHEFS_DISK_ACCOUNTING_H +#define _BCACHEFS_DISK_ACCOUNTING_H + +static inline unsigned bch2_accounting_counters(const struct bkey *k) +{ + return bkey_val_u64s(k) - offsetof(struct bch_accounting, d) / sizeof(u64= ); +} + +static inline void bch2_accounting_accumulate(struct bkey_i_accounting *ds= t, + struct bkey_s_c_accounting src) +{ + EBUG_ON(dst->k.u64s !=3D src.k->u64s); + + for (unsigned i =3D 0; i < bch2_accounting_counters(&dst->k); i++) + dst->v.d[i] +=3D src.v->d[i]; + if (bversion_cmp(dst->k.version, src.k->version) < 0) + dst->k.version =3D src.k->version; +} + +static inline void bpos_to_disk_accounting_key(struct disk_accounting_key = *acc, struct bpos p) +{ + acc->_pad =3D p; +#if __BYTE_ORDER__ =3D=3D __ORDER_BIG_ENDIAN__ + bch2_bpos_swab(&acc->_pad); +#endif +} + +static inline struct bpos disk_accounting_key_to_bpos(struct disk_accounti= ng_key *k) +{ + struct bpos ret =3D k->_pad; + +#if __BYTE_ORDER__ =3D=3D __ORDER_BIG_ENDIAN__ + bch2_bpos_swab(&ret); +#endif + return ret; +} + +int bch2_accounting_invalid(struct bch_fs *, struct bkey_s_c, + enum bkey_invalid_flags, struct printbuf *); +void bch2_accounting_key_to_text(struct printbuf *, struct disk_accounting= _key *); +void bch2_accounting_to_text(struct printbuf *, struct bch_fs *, struct bk= ey_s_c); +void bch2_accounting_swab(struct bkey_s); + +#define bch2_bkey_ops_accounting ((struct bkey_ops) { \ + .key_invalid =3D bch2_accounting_invalid, \ + .val_to_text =3D bch2_accounting_to_text, \ + .swab =3D bch2_accounting_swab, \ + .min_val_size =3D 8, \ +}) + +#endif /* _BCACHEFS_DISK_ACCOUNTING_H */ diff --git a/fs/bcachefs/disk_accounting_format.h b/fs/bcachefs/disk_accoun= ting_format.h new file mode 100644 index 000000000000..e06a42f0d578 --- /dev/null +++ b/fs/bcachefs/disk_accounting_format.h @@ -0,0 +1,139 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _BCACHEFS_DISK_ACCOUNTING_FORMAT_H +#define _BCACHEFS_DISK_ACCOUNTING_FORMAT_H + +#include "replicas_format.h" + +/* + * Disk accounting - KEY_TYPE_accounting - on disk format: + * + * Here, the key has considerably more structure than a typical key (bpos)= ; an + * accounting key is 'struct disk_accounting_key', which is a union of bpo= s. + * + * This is a type-tagged union of all our various subtypes; a disk account= ing + * key can be device counters, replicas counters, et cetera - it's extensi= ble. + * + * The value is a list of u64s or s64s; the number of counters is specific= to a + * given accounting type. + * + * Unlike with other key types, updates are _deltas_, and the deltas are n= ot + * resolved until the update to the underlying btree, done by btree write = buffer + * flush or journal replay. + * + * Journal replay in particular requires special handling. The journal tra= cks a + * range of entries which may possibly have not yet been applied to the bt= ree + * yet - it does not know definitively whether individual entries are dirt= y and + * still need to be applied. + * + * To handle this, we use the version field of struct bkey, and give every + * accounting update a unique version number - a total ordering in time; t= he + * version number is derived from the key's position in the journal. Then + * journal replay can compare the version number of the key from the journ= al + * with the version number of the key in the btree to determine if a key n= eeds + * to be replayed. + * + * For this to work, we must maintain this strict time ordering of updates= as + * they are flushed to the btree, both via write buffer flush and via jour= nal + * replay. This has complications for the write buffer code while journal = replay + * is still in progress; the write buffer cannot flush any accounting keys= to + * the btree until journal replay has finished replaying its accounting ke= ys, or + * the (newer) version number of the keys from the write buffer will cause + * updates from journal replay to be lost. + */ + +struct bch_accounting { + struct bch_val v; + __u64 d[]; +}; + +#define BCH_ACCOUNTING_MAX_COUNTERS 3 + +#define BCH_DATA_TYPES() \ + x(free, 0) \ + x(sb, 1) \ + x(journal, 2) \ + x(btree, 3) \ + x(user, 4) \ + x(cached, 5) \ + x(parity, 6) \ + x(stripe, 7) \ + x(need_gc_gens, 8) \ + x(need_discard, 9) + +enum bch_data_type { +#define x(t, n) BCH_DATA_##t, + BCH_DATA_TYPES() +#undef x + BCH_DATA_NR +}; + +static inline bool data_type_is_empty(enum bch_data_type type) +{ + switch (type) { + case BCH_DATA_free: + case BCH_DATA_need_gc_gens: + case BCH_DATA_need_discard: + return true; + default: + return false; + } +} + +static inline bool data_type_is_hidden(enum bch_data_type type) +{ + switch (type) { + case BCH_DATA_sb: + case BCH_DATA_journal: + return true; + default: + return false; + } +} + +#define BCH_DISK_ACCOUNTING_TYPES() \ + x(nr_inodes, 0) \ + x(persistent_reserved, 1) \ + x(replicas, 2) \ + x(dev_data_type, 3) \ + x(dev_stripe_buckets, 4) + +enum disk_accounting_type { +#define x(f, nr) BCH_DISK_ACCOUNTING_##f =3D nr, + BCH_DISK_ACCOUNTING_TYPES() +#undef x + BCH_DISK_ACCOUNTING_TYPE_NR, +}; + +struct bch_nr_inodes { +}; + +struct bch_persistent_reserved { + __u8 nr_replicas; +}; + +struct bch_dev_data_type { + __u8 dev; + __u8 data_type; +}; + +struct bch_dev_stripe_buckets { + __u8 dev; +}; + +struct disk_accounting_key { + union { + struct { + __u8 type; + union { + struct bch_nr_inodes nr_inodes; + struct bch_persistent_reserved persistent_reserved; + struct bch_replicas_entry_v1 replicas; + struct bch_dev_data_type dev_data_type; + struct bch_dev_stripe_buckets dev_stripe_buckets; + }; + }; + struct bpos _pad; + }; +}; + +#endif /* _BCACHEFS_DISK_ACCOUNTING_FORMAT_H */ diff --git a/fs/bcachefs/replicas_format.h b/fs/bcachefs/replicas_format.h new file mode 100644 index 000000000000..ed94f8c636b3 --- /dev/null +++ b/fs/bcachefs/replicas_format.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _BCACHEFS_REPLICAS_FORMAT_H +#define _BCACHEFS_REPLICAS_FORMAT_H + +struct bch_replicas_entry_v0 { + __u8 data_type; + __u8 nr_devs; + __u8 devs[]; +} __packed; + +struct bch_replicas_entry_v1 { + __u8 data_type; + __u8 nr_devs; + __u8 nr_required; + __u8 devs[]; +} __packed; + +#define replicas_entry_bytes(_i) \ + (offsetof(typeof(*(_i)), devs) + (_i)->nr_devs) + +#endif /* _BCACHEFS_REPLICAS_FORMAT_H */ diff --git a/fs/bcachefs/sb-downgrade.c b/fs/bcachefs/sb-downgrade.c index 3337419faeff..33db8d7ca8c4 100644 --- a/fs/bcachefs/sb-downgrade.c +++ b/fs/bcachefs/sb-downgrade.c @@ -52,9 +52,15 @@ BCH_FSCK_ERR_subvol_fs_path_parent_wrong) \ x(btree_subvolume_children, \ BIT_ULL(BCH_RECOVERY_PASS_check_subvols), \ - BCH_FSCK_ERR_subvol_children_not_set) + BCH_FSCK_ERR_subvol_children_not_set) \ + x(disk_accounting_v2, \ + BIT_ULL(BCH_RECOVERY_PASS_check_allocations), \ + BCH_FSCK_ERR_accounting_mismatch) =20 -#define DOWNGRADE_TABLE() +#define DOWNGRADE_TABLE() \ + x(disk_accounting_v2, \ + BIT_ULL(BCH_RECOVERY_PASS_check_alloc_info), \ + BCH_FSCK_ERR_dev_usage_buckets_wrong) =20 struct upgrade_downgrade_entry { u64 recovery_passes; @@ -108,7 +114,7 @@ void bch2_sb_set_upgrade(struct bch_fs *c, } } =20 -#define x(ver, passes, ...) static const u16 downgrade_ver_##errors[] =3D = { __VA_ARGS__ }; +#define x(ver, passes, ...) static const u16 downgrade_##ver##_errors[] = =3D { __VA_ARGS__ }; DOWNGRADE_TABLE() #undef x =20 diff --git a/fs/bcachefs/sb-errors_types.h b/fs/bcachefs/sb-errors_types.h index 0df4b0e7071a..383e13711001 100644 --- a/fs/bcachefs/sb-errors_types.h +++ b/fs/bcachefs/sb-errors_types.h @@ -264,7 +264,8 @@ x(subvol_children_not_set, 256) \ x(subvol_children_bad, 257) \ x(subvol_loop, 258) \ - x(subvol_unreachable, 259) + x(subvol_unreachable, 259) \ + x(accounting_mismatch, 260) =20 enum bch_sb_error_id { #define x(t, n) BCH_FSCK_ERR_##t =3D n, --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC73F3D60 for ; Sun, 25 Feb 2024 02:38:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828720; cv=none; b=phAsi4EBUO/ENV9TH7IqGJzzqkl5AnRxaOqoIof9yhb0i/CNvmikR23vOvewDFhFWbUUg6CcLe9nVEm+cEqfRwzZPZdJPGkiY9STehq5SgXEYr61POuIuLY7B6/CODTwif530ZMtEYrnB0mK/iSGmnbu8+f3dShFh+h8D4g9giY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828720; c=relaxed/simple; bh=TY8NtdnU3pWFnKYN+4BU1L207QW6+D+/TUTduTofl4s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GAVUmU1vKRYDd5DoOhNQJoWe3r6Yx9T92B5blMfhEq9w8HLVFmPnmiQ4dSL7q0R448T8eRadURq87KtsuOOMIphv6iJJ0xkmnwePwKRY9eDZhwPy3a3tVrGOaAbKK2FPZqw/Dwk79w23zYXDwgOOUKYXic7fJtfMhN8f6isI268= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=vpUILZjD; arc=none smtp.client-ip=95.215.58.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="vpUILZjD" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828716; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oAccuhu6Iwsb4yqWKJItHSJ6C2cFFxtNuJ2qW8cn31Q=; b=vpUILZjD67dT5GvnEE+MYqYBvAl2MpVeSAy1+47qjpPmcvPYX9jCSoJkXGK3TostWq241J lZzXyTNOx7Ty/LUv45lJwV4zGe84EdqX0is+ApkUjprA3SK3gWcqXRJZ3JlG/dMUL/qXcv dRKZaO1GOJq/v6T8Sg5/85O++xxJWTM= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 02/21] bcachefs: Accumulate accounting keys in journal replay Date: Sat, 24 Feb 2024 21:38:04 -0500 Message-ID: <20240225023826.2413565-3-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Until accounting keys hit the btree, they are deltas, not new versions of the existing key; this means we have to teach journal replay to accumulate them. Additionally, the journal doesn't track precisely which entries have been flushed to the btree; it only tracks a range of entries that may possibly still need to be flushed. That means we need to compare accounting keys against the version in the btree and only flush updates that are newer. There's another wrinkle with the write buffer: if the write buffer starts flushing accounting keys before journal replay has finished flushing accounting keys, journal replay will see the version number from the new updates and updates from the journal will be lost. To avoid this, journal replay has to flush accounting keys first, and we'll be adding a flag so that write buffer flush knows to hold accounting keys until then. Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_journal_iter.c | 23 +++------- fs/bcachefs/btree_journal_iter.h | 15 +++++++ fs/bcachefs/btree_trans_commit.c | 9 +++- fs/bcachefs/btree_update.h | 14 +++++- fs/bcachefs/recovery.c | 76 +++++++++++++++++++++++++++++++- 5 files changed, 117 insertions(+), 20 deletions(-) diff --git a/fs/bcachefs/btree_journal_iter.c b/fs/bcachefs/btree_journal_i= ter.c index 207dd32e2ecc..164a316d8995 100644 --- a/fs/bcachefs/btree_journal_iter.c +++ b/fs/bcachefs/btree_journal_iter.c @@ -16,21 +16,6 @@ * operations for the regular btree iter code to use: */ =20 -static int __journal_key_cmp(enum btree_id l_btree_id, - unsigned l_level, - struct bpos l_pos, - const struct journal_key *r) -{ - return (cmp_int(l_btree_id, r->btree_id) ?: - cmp_int(l_level, r->level) ?: - bpos_cmp(l_pos, r->k->k.p)); -} - -static int journal_key_cmp(const struct journal_key *l, const struct journ= al_key *r) -{ - return __journal_key_cmp(l->btree_id, l->level, l->k->k.p, r); -} - static inline size_t idx_to_pos(struct journal_keys *keys, size_t idx) { size_t gap_size =3D keys->size - keys->nr; @@ -492,7 +477,13 @@ static void __journal_keys_sort(struct journal_keys *k= eys) struct journal_key *dst =3D keys->data; =20 darray_for_each(*keys, src) { - if (src + 1 < &darray_top(*keys) && + /* + * We don't accumulate accounting keys here because we have to + * compare each individual accounting key against the version in + * the btree during replay: + */ + if (src->k->k.type !=3D KEY_TYPE_accounting && + src + 1 < &darray_top(*keys) && !journal_key_cmp(src, src + 1)) continue; =20 diff --git a/fs/bcachefs/btree_journal_iter.h b/fs/bcachefs/btree_journal_i= ter.h index c9d19da3ea04..8f3d9a3f1969 100644 --- a/fs/bcachefs/btree_journal_iter.h +++ b/fs/bcachefs/btree_journal_iter.h @@ -26,6 +26,21 @@ struct btree_and_journal_iter { bool prefetch; }; =20 +static inline int __journal_key_cmp(enum btree_id l_btree_id, + unsigned l_level, + struct bpos l_pos, + const struct journal_key *r) +{ + return (cmp_int(l_btree_id, r->btree_id) ?: + cmp_int(l_level, r->level) ?: + bpos_cmp(l_pos, r->k->k.p)); +} + +static inline int journal_key_cmp(const struct journal_key *l, const struc= t journal_key *r) +{ + return __journal_key_cmp(l->btree_id, l->level, l->k->k.p, r); +} + struct bkey_i *bch2_journal_keys_peek_upto(struct bch_fs *, enum btree_id, unsigned, struct bpos, struct bpos, size_t *); struct bkey_i *bch2_journal_keys_peek_slot(struct bch_fs *, enum btree_id, diff --git a/fs/bcachefs/btree_trans_commit.c b/fs/bcachefs/btree_trans_com= mit.c index 30d69a6d133e..60f6255367b9 100644 --- a/fs/bcachefs/btree_trans_commit.c +++ b/fs/bcachefs/btree_trans_commit.c @@ -760,8 +760,15 @@ bch2_trans_commit_write_locked(struct btree_trans *tra= ns, unsigned flags, =20 static noinline void bch2_drop_overwrites_from_journal(struct btree_trans = *trans) { + /* + * Accounting keys aren't deduped in the journal: we have to compare + * each individual update against what's in the btree to see if it has + * been applied yet, and accounting updates also don't overwrite, + * they're deltas that accumulate. + */ trans_for_each_update(trans, i) - bch2_journal_key_overwritten(trans->c, i->btree_id, i->level, i->k->k.p); + if (i->k->k.type !=3D KEY_TYPE_accounting) + bch2_journal_key_overwritten(trans->c, i->btree_id, i->level, i->k->k.p= ); } =20 static noinline int bch2_trans_commit_bkey_invalid(struct btree_trans *tra= ns, diff --git a/fs/bcachefs/btree_update.h b/fs/bcachefs/btree_update.h index cc7c53e83f89..21f887fe857c 100644 --- a/fs/bcachefs/btree_update.h +++ b/fs/bcachefs/btree_update.h @@ -128,7 +128,19 @@ static inline int __must_check bch2_trans_update_buffe= red(struct btree_trans *tr enum btree_id btree, struct bkey_i *k) { - if (unlikely(trans->journal_replay_not_finished)) + /* + * Most updates skip the btree write buffer until journal replay is + * finished because synchronization with journal replay relies on having + * a btree node locked - if we're overwriting a key in the journal that + * journal replay hasn't yet replayed, we have to mark it as + * overwritten. + * + * But accounting updates don't overwrite, they're deltas, and they have + * to be flushed to the btree strictly in order for journal replay to be + * able to tell which updates need to be applied: + */ + if (k->k.type !=3D KEY_TYPE_accounting && + unlikely(trans->journal_replay_not_finished)) return bch2_btree_insert_clone_trans(trans, btree, k); =20 struct jset_entry *e =3D bch2_trans_jset_entry_alloc(trans, jset_u64s(k->= k.u64s)); diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index 96e7a1ec7091..6829d80bd181 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -11,6 +11,7 @@ #include "btree_io.h" #include "buckets.h" #include "dirent.h" +#include "disk_accounting.h" #include "ec.h" #include "errcode.h" #include "error.h" @@ -87,6 +88,56 @@ static void replay_now_at(struct journal *j, u64 seq) bch2_journal_pin_put(j, j->replay_journal_seq++); } =20 +static int bch2_journal_replay_accounting_key(struct btree_trans *trans, + struct journal_key *k) +{ + struct journal_keys *keys =3D &trans->c->journal_keys; + + struct btree_iter iter; + bch2_trans_node_iter_init(trans, &iter, k->btree_id, k->k->k.p, + BTREE_MAX_DEPTH, k->level, + BTREE_ITER_INTENT); + int ret =3D bch2_btree_iter_traverse(&iter); + if (ret) + goto out; + + struct bkey u; + struct bkey_s_c old =3D bch2_btree_path_peek_slot(btree_iter_path(trans, = &iter), &u); + + if (bversion_cmp(old.k->version, k->k->k.version) >=3D 0) { + ret =3D 0; + goto out; + } + + if (k + 1 < &darray_top(*keys) && + !journal_key_cmp(k, k + 1)) { + BUG_ON(bversion_cmp(k[0].k->k.version, k[1].k->k.version) > 0); + + bch2_accounting_accumulate(bkey_i_to_accounting(k[1].k), + bkey_i_to_s_c_accounting(k[0].k)); + ret =3D 0; + goto out; + } + + struct bkey_i *new =3D k->k; + if (old.k->type =3D=3D KEY_TYPE_accounting) { + new =3D bch2_bkey_make_mut_noupdate(trans, bkey_i_to_s_c(k->k)); + ret =3D PTR_ERR_OR_ZERO(new); + if (ret) + goto out; + + bch2_accounting_accumulate(bkey_i_to_accounting(new), + bkey_s_c_to_accounting(old)); + } + + trans->journal_res.seq =3D k->journal_seq; + + ret =3D bch2_trans_update(trans, &iter, new, BTREE_TRIGGER_NORUN); +out: + bch2_trans_iter_exit(trans, &iter); + return ret; +} + static int bch2_journal_replay_key(struct btree_trans *trans, struct journal_key *k) { @@ -159,12 +210,33 @@ static int bch2_journal_replay(struct bch_fs *c) =20 BUG_ON(!atomic_read(&keys->ref)); =20 + /* + * Replay accounting keys first: we can't allow the write buffer to + * flush accounting keys until we're done + */ + darray_for_each(*keys, k) { + if (!(k->k->k.type =3D=3D KEY_TYPE_accounting && !k->allocated)) + continue; + + cond_resched(); + + ret =3D commit_do(trans, NULL, NULL, + BCH_TRANS_COMMIT_no_enospc| + BCH_TRANS_COMMIT_no_journal_res, + bch2_journal_replay_accounting_key(trans, k)); + if (bch2_fs_fatal_err_on(ret, c, "error replaying accounting; %s", bch2_= err_str(ret))) + goto err; + } + /* * First, attempt to replay keys in sorted order. This is more * efficient - better locality of btree access - but some might fail if * that would cause a journal deadlock. */ darray_for_each(*keys, k) { + if (k->k->k.type =3D=3D KEY_TYPE_accounting && !k->allocated) + continue; + cond_resched(); =20 /* Skip fastpath if we're low on space in the journal */ @@ -174,7 +246,7 @@ static int bch2_journal_replay(struct bch_fs *c) BCH_TRANS_COMMIT_journal_reclaim| (!k->allocated ? BCH_TRANS_COMMIT_no_journal_res : 0), bch2_journal_replay_key(trans, k)); - BUG_ON(!ret && !k->overwritten); + BUG_ON(!ret && !k->overwritten && k->k->k.type !=3D KEY_TYPE_accounting); if (ret) { ret =3D darray_push(&keys_sorted, k); if (ret) @@ -208,7 +280,7 @@ static int bch2_journal_replay(struct bch_fs *c) if (ret) goto err; =20 - BUG_ON(!k->overwritten); + BUG_ON(k->btree_id !=3D BTREE_ID_accounting && !k->overwritten); } =20 /* --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6CB85256 for ; Sun, 25 Feb 2024 02:38:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828721; cv=none; b=pY6lN4kmtjEfNdOHhVz0xdJ7/zKU+/wDekvGglnUk7Sl6zNxc7RamX6sCcuPUbgz+BZdBEuiYBcwEpLlIrXuB728mt/gSk9u7StTzuxF6nwkqzIeOYyi7ANw1EPeWoQUwx40i4JEvwgiBM/griWD38FGCDDk/nWGmYL0fV/npls= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828721; c=relaxed/simple; bh=K1vgmzriFmWvEgJQok9FDXdRbIZtQ7Piq5QxBv6WPqs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=muo8BiGmWrMq1tw6Ya5+37mxIARItFjqGw6HBrWCgZTHc+ONhjR6iNH8W2/xzHbaRB8GOY4u3iN0bOnM6wsoI6l1UT+5Nr51ptLDuCvXq7p31UXaZtKJjx20jaSGG1mh2uu4dqJJU+YATSDTDpO8mS3rvdUHI6z4hHH/wY/ZFbc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=LVoDiof7; arc=none smtp.client-ip=95.215.58.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="LVoDiof7" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828717; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tb3shMFnBEZZ4S6E3i1HVvq3aYHajiaSmFjcTi21EZM=; b=LVoDiof7IIb3DSJ7OEkVgngw1g0y5zhCbQNc0/k7+sYocgA3CXFJfK1hTc3qxYNUeH5f1i b+x+CSFz7sezKiLEr0q9+DVnPDfNdUrscbrPF4o/VdX0nxV3op4z0niYlDermoB6Hq0IgJ hLQNNzDYbMheESPmlyu9rVc6RaMSB3M= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 03/21] bcachefs: btree write buffer knows how to accumulate bch_accounting keys Date: Sat, 24 Feb 2024 21:38:05 -0500 Message-ID: <20240225023826.2413565-4-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Teach the btree write buffer how to accumulate accounting keys - instead of having the newer key overwrite the older key as we do with other updates, we need to add them together. Also, add a flag so that write buffer flush knows when journal replay is finished flushing accounting, and teach it to hold accounting keys until that flag is set. Signed-off-by: Kent Overstreet --- fs/bcachefs/bcachefs.h | 1 + fs/bcachefs/btree_write_buffer.c | 66 +++++++++++++++++++++++++++----- fs/bcachefs/recovery.c | 3 ++ 3 files changed, 61 insertions(+), 9 deletions(-) diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h index 62812fc1cad0..9a24989c9a6a 100644 --- a/fs/bcachefs/bcachefs.h +++ b/fs/bcachefs/bcachefs.h @@ -616,6 +616,7 @@ struct bch_dev { =20 #define BCH_FS_FLAGS() \ x(started) \ + x(accounting_replay_done) \ x(may_go_rw) \ x(rw) \ x(was_rw) \ diff --git a/fs/bcachefs/btree_write_buffer.c b/fs/bcachefs/btree_write_buf= fer.c index b77e7b382b66..002a0762fc85 100644 --- a/fs/bcachefs/btree_write_buffer.c +++ b/fs/bcachefs/btree_write_buffer.c @@ -5,6 +5,7 @@ #include "btree_update.h" #include "btree_update_interior.h" #include "btree_write_buffer.h" +#include "disk_accounting.h" #include "error.h" #include "journal.h" #include "journal_io.h" @@ -123,7 +124,9 @@ static noinline int wb_flush_one_slowpath(struct btree_= trans *trans, =20 static inline int wb_flush_one(struct btree_trans *trans, struct btree_ite= r *iter, struct btree_write_buffered_key *wb, - bool *write_locked, size_t *fast) + bool *write_locked, + bool *accounting_accumulated, + size_t *fast) { struct btree_path *path; int ret; @@ -136,6 +139,16 @@ static inline int wb_flush_one(struct btree_trans *tra= ns, struct btree_iter *ite if (ret) return ret; =20 + if (!*accounting_accumulated && wb->k.k.type =3D=3D KEY_TYPE_accounting) { + struct bkey u; + struct bkey_s_c k =3D bch2_btree_path_peek_slot_exact(btree_iter_path(tr= ans, iter), &u); + + if (k.k->type =3D=3D KEY_TYPE_accounting) + bch2_accounting_accumulate(bkey_i_to_accounting(&wb->k), + bkey_s_c_to_accounting(k)); + } + *accounting_accumulated =3D true; + /* * We can't clone a path that has write locks: unshare it now, before * set_pos and traverse(): @@ -248,8 +261,9 @@ static int bch2_btree_write_buffer_flush_locked(struct = btree_trans *trans) struct journal *j =3D &c->journal; struct btree_write_buffer *wb =3D &c->btree_write_buffer; struct btree_iter iter =3D { NULL }; - size_t skipped =3D 0, fast =3D 0, slowpath =3D 0; + size_t overwritten =3D 0, fast =3D 0, slowpath =3D 0, could_not_insert = =3D 0; bool write_locked =3D false; + bool accounting_replay_done =3D test_bit(BCH_FS_accounting_replay_done, &= c->flags); int ret =3D 0; =20 bch2_trans_unlock(trans); @@ -284,17 +298,29 @@ static int bch2_btree_write_buffer_flush_locked(struc= t btree_trans *trans) =20 darray_for_each(wb->sorted, i) { struct btree_write_buffered_key *k =3D &wb->flushing.keys.data[i->idx]; + bool accounting_accumulated =3D false; =20 for (struct wb_key_ref *n =3D i + 1; n < min(i + 4, &darray_top(wb->sort= ed)); n++) prefetch(&wb->flushing.keys.data[n->idx]); =20 BUG_ON(!k->journal_seq); =20 + if (!accounting_replay_done && + k->k.k.type =3D=3D KEY_TYPE_accounting) { + slowpath++; + continue; + } + if (i + 1 < &darray_top(wb->sorted) && wb_key_eq(i, i + 1)) { struct btree_write_buffered_key *n =3D &wb->flushing.keys.data[i[1].idx= ]; =20 - skipped++; + if (k->k.k.type =3D=3D KEY_TYPE_accounting && + n->k.k.type =3D=3D KEY_TYPE_accounting) + bch2_accounting_accumulate(bkey_i_to_accounting(&n->k), + bkey_i_to_s_c_accounting(&k->k)); + + overwritten++; n->journal_seq =3D min_t(u64, n->journal_seq, k->journal_seq); k->journal_seq =3D 0; continue; @@ -325,7 +351,8 @@ static int bch2_btree_write_buffer_flush_locked(struct = btree_trans *trans) break; } =20 - ret =3D wb_flush_one(trans, &iter, k, &write_locked, &fast); + ret =3D wb_flush_one(trans, &iter, k, &write_locked, + &accounting_accumulated, &fast); if (!write_locked) bch2_trans_begin(trans); } while (bch2_err_matches(ret, BCH_ERR_transaction_restart)); @@ -361,8 +388,15 @@ static int bch2_btree_write_buffer_flush_locked(struct= btree_trans *trans) if (!i->journal_seq) continue; =20 - bch2_journal_pin_update(j, i->journal_seq, &wb->flushing.pin, - bch2_btree_write_buffer_journal_flush); + if (!accounting_replay_done && + i->k.k.type =3D=3D KEY_TYPE_accounting) { + could_not_insert++; + continue; + } + + if (!could_not_insert) + bch2_journal_pin_update(j, i->journal_seq, &wb->flushing.pin, + bch2_btree_write_buffer_journal_flush); =20 bch2_trans_begin(trans); =20 @@ -375,13 +409,27 @@ static int bch2_btree_write_buffer_flush_locked(struc= t btree_trans *trans) btree_write_buffered_insert(trans, i)); if (ret) goto err; + + i->journal_seq =3D 0; + } + + if (could_not_insert) { + struct btree_write_buffered_key *dst =3D wb->flushing.keys.data; + + darray_for_each(wb->flushing.keys, i) + if (i->journal_seq) + *dst++ =3D *i; + wb->flushing.keys.nr =3D dst - wb->flushing.keys.data; } } err: + if (ret || !could_not_insert) { + bch2_journal_pin_drop(j, &wb->flushing.pin); + wb->flushing.keys.nr =3D 0; + } + bch2_fs_fatal_err_on(ret, c, "%s: insert error %s", __func__, bch2_err_st= r(ret)); - trace_write_buffer_flush(trans, wb->flushing.keys.nr, skipped, fast, 0); - bch2_journal_pin_drop(j, &wb->flushing.pin); - wb->flushing.keys.nr =3D 0; + trace_write_buffer_flush(trans, wb->flushing.keys.nr, overwritten, fast, = 0); return ret; } =20 diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index 6829d80bd181..b8289af66c8e 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -228,6 +228,8 @@ static int bch2_journal_replay(struct bch_fs *c) goto err; } =20 + set_bit(BCH_FS_accounting_replay_done, &c->flags); + /* * First, attempt to replay keys in sorted order. This is more * efficient - better locality of btree access - but some might fail if @@ -1204,6 +1206,7 @@ int bch2_fs_initialize(struct bch_fs *c) * set up the journal.pin FIFO and journal.cur pointer: */ bch2_fs_journal_start(&c->journal, 1); + set_bit(BCH_FS_accounting_replay_done, &c->flags); bch2_journal_set_replay_done(&c->journal); =20 ret =3D bch2_fs_read_write_early(c); --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF82C6110 for ; Sun, 25 Feb 2024 02:38:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828724; cv=none; b=f33UeqjZtwse7s3J1oG0Ff5VEHRNH+O/OhdgbAcPcbGMMyBb4p7AimDjSqY1MB3Yrpr0grdCdWNx6GjZ24sGOCX14BZjdO2zYlc69MtXTuG4lKLMszHm6LhL4vVqdT0JzOfZqStd/M916bxcPnI4S5IoZ0pbNW7pVQn2DRnjZ8I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828724; c=relaxed/simple; bh=Uiispgyd16fXObybQtFcG61HaR8E3O/9zhg2YHGpe6M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=W9eT8vDFfiNkS0Y5J3k9eOMqOadO9gxpBRVH7QqvNnlEwBDSq5XZdL5yLS4GVYOMQyzYWGsKgnqOv27BmvzO04cFWCcJIj7QytKXxj4GzEkTbyLq1FOK+AWe9tzD0TlIPCbNHxajp4xzfj8M/bCK3e6GYi4zlETaXEYQLy8BISQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=t7fnIL2w; arc=none smtp.client-ip=95.215.58.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="t7fnIL2w" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828718; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4Tn76/W0CbcMhsnTJQ9k3lY0axZy5xliCVw9DtTqxy8=; b=t7fnIL2wcLeCy/VOq8jcCQf5gADKop8vRerdJn6/OLbp0bN6Bu+GF8RVrqLEey2W/ad3Sw B6Sdo2r/2qdSHPWsDgDJq1VRRjbEv2yWwk1ri6Wxh4Q2tiSGvnT662Lemw2MAX/52gPLK1 M1UgpiPx3yVEpu6obP8RunSyd/tK5kA= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 04/21] bcachefs: Disk space accounting rewrite Date: Sat, 24 Feb 2024 21:38:06 -0500 Message-ID: <20240225023826.2413565-5-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet --- fs/bcachefs/alloc_background.c | 68 +++++- fs/bcachefs/bcachefs.h | 6 +- fs/bcachefs/bcachefs_format.h | 1 - fs/bcachefs/bcachefs_ioctl.h | 7 +- fs/bcachefs/btree_gc.c | 3 +- fs/bcachefs/btree_iter.c | 9 - fs/bcachefs/btree_trans_commit.c | 62 ++++-- fs/bcachefs/btree_types.h | 1 - fs/bcachefs/btree_update.h | 8 - fs/bcachefs/buckets.c | 289 +++++--------------------- fs/bcachefs/buckets.h | 33 +-- fs/bcachefs/disk_accounting.c | 308 ++++++++++++++++++++++++++++ fs/bcachefs/disk_accounting.h | 126 ++++++++++++ fs/bcachefs/disk_accounting_types.h | 20 ++ fs/bcachefs/ec.c | 24 ++- fs/bcachefs/inode.c | 9 +- fs/bcachefs/recovery.c | 12 +- fs/bcachefs/recovery_types.h | 1 + fs/bcachefs/replicas.c | 42 ++-- fs/bcachefs/replicas.h | 11 +- fs/bcachefs/replicas_types.h | 16 -- fs/bcachefs/sb-errors_types.h | 3 +- fs/bcachefs/super.c | 49 +++-- 23 files changed, 704 insertions(+), 404 deletions(-) create mode 100644 fs/bcachefs/disk_accounting_types.h diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c index ccd6cbfd470e..d8ad5bb28a7f 100644 --- a/fs/bcachefs/alloc_background.c +++ b/fs/bcachefs/alloc_background.c @@ -14,6 +14,7 @@ #include "buckets_waiting_for_journal.h" #include "clock.h" #include "debug.h" +#include "disk_accounting.h" #include "ec.h" #include "error.h" #include "lru.h" @@ -813,8 +814,60 @@ int bch2_trigger_alloc(struct btree_trans *trans, =20 if ((flags & BTREE_TRIGGER_BUCKET_INVALIDATE) && old_a->cached_sectors) { - ret =3D bch2_update_cached_sectors_list(trans, new.k->p.inode, - -((s64) old_a->cached_sectors)); + ret =3D bch2_mod_dev_cached_sectors(trans, new.k->p.inode, + -((s64) old_a->cached_sectors)); + if (ret) + return ret; + } + + + if (old_a->data_type !=3D new_a->data_type || + old_a->dirty_sectors !=3D new_a->dirty_sectors) { + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_dev_data_type, + .dev_data_type.dev =3D new.k->p.inode, + .dev_data_type.data_type =3D new_a->data_type, + }; + s64 d[3]; + + if (old_a->data_type =3D=3D new_a->data_type) { + d[0] =3D 0; + d[1] =3D (s64) new_a->dirty_sectors - (s64) old_a->dirty_sectors; + d[2] =3D bucket_sectors_fragmented(ca, *new_a) - + bucket_sectors_fragmented(ca, *old_a); + + ret =3D bch2_disk_accounting_mod(trans, &acc, d, 3); + if (ret) + return ret; + } else { + d[0] =3D 1; + d[1] =3D new_a->dirty_sectors; + d[2] =3D bucket_sectors_fragmented(ca, *new_a); + + ret =3D bch2_disk_accounting_mod(trans, &acc, d, 3); + if (ret) + return ret; + + acc.dev_data_type.data_type =3D old_a->data_type; + d[0] =3D -1; + d[1] =3D -(s64) old_a->dirty_sectors; + d[2] =3D -bucket_sectors_fragmented(ca, *old_a); + + ret =3D bch2_disk_accounting_mod(trans, &acc, d, 3); + if (ret) + return ret; + } + } + + if (!!old_a->stripe !=3D !!new_a->stripe) { + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_dev_stripe_buckets, + .dev_stripe_buckets.dev =3D new.k->p.inode, + }; + u64 d[1]; + + d[0] =3D (s64) !!new_a->stripe - (s64) !!old_a->stripe; + ret =3D bch2_disk_accounting_mod(trans, &acc, d, 1); if (ret) return ret; } @@ -857,12 +910,11 @@ int bch2_trigger_alloc(struct btree_trans *trans, } } =20 - percpu_down_read(&c->mark_lock); - if (new_a->gen !=3D old_a->gen) + if (new_a->gen !=3D old_a->gen) { + percpu_down_read(&c->mark_lock); *bucket_gen(ca, new.k->p.offset) =3D new_a->gen; - - bch2_dev_usage_update(c, ca, old_a, new_a, journal_seq, false); - percpu_up_read(&c->mark_lock); + percpu_up_read(&c->mark_lock); + } =20 #define eval_state(_a, expr) ({ const struct bch_alloc_v4 *a =3D _a; expr= ; }) #define statechange(expr) !eval_state(old_a, expr) && eval_state(new_a, e= xpr) @@ -906,6 +958,8 @@ int bch2_trigger_alloc(struct btree_trans *trans, =20 bucket_unlock(g); percpu_up_read(&c->mark_lock); + + bch2_dev_usage_update(c, ca, old_a, new_a); } =20 return 0; diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h index 9a24989c9a6a..18c00051a8f6 100644 --- a/fs/bcachefs/bcachefs.h +++ b/fs/bcachefs/bcachefs.h @@ -207,6 +207,7 @@ #include =20 #include "bcachefs_format.h" +#include "disk_accounting_types.h" #include "errcode.h" #include "fifo.h" #include "nocow_locking_types.h" @@ -695,8 +696,6 @@ struct btree_trans_buf { struct btree_trans *trans; }; =20 -#define REPLICAS_DELTA_LIST_MAX (1U << 16) - #define BCACHEFS_ROOT_SUBVOL_INUM \ ((subvol_inum) { BCACHEFS_ROOT_SUBVOL, BCACHEFS_ROOT_INO }) =20 @@ -763,10 +762,11 @@ struct bch_fs { =20 struct bch_dev __rcu *devs[BCH_SB_MEMBERS_MAX]; =20 + struct bch_accounting_mem accounting; + struct bch_replicas_cpu replicas; struct bch_replicas_cpu replicas_gc; struct mutex replicas_gc_lock; - mempool_t replicas_delta_pool; =20 struct journal_entry_res btree_root_journal_res; struct journal_entry_res replicas_journal_res; diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h index 313ca7dc370d..6edd3fd63bfa 100644 --- a/fs/bcachefs/bcachefs_format.h +++ b/fs/bcachefs/bcachefs_format.h @@ -1271,7 +1271,6 @@ static inline bool jset_entry_is_key(struct jset_entr= y *e) switch (e->type) { case BCH_JSET_ENTRY_btree_keys: case BCH_JSET_ENTRY_btree_root: - case BCH_JSET_ENTRY_overwrite: case BCH_JSET_ENTRY_write_buffer_keys: return true; } diff --git a/fs/bcachefs/bcachefs_ioctl.h b/fs/bcachefs/bcachefs_ioctl.h index 4b8fba754b1c..0b82a4dd099f 100644 --- a/fs/bcachefs/bcachefs_ioctl.h +++ b/fs/bcachefs/bcachefs_ioctl.h @@ -251,10 +251,15 @@ struct bch_replicas_usage { struct bch_replicas_entry_v1 r; } __packed; =20 +static inline unsigned replicas_usage_bytes(struct bch_replicas_usage *u) +{ + return offsetof(struct bch_replicas_usage, r) + replicas_entry_bytes(&u->= r); +} + static inline struct bch_replicas_usage * replicas_usage_next(struct bch_replicas_usage *u) { - return (void *) u + replicas_entry_bytes(&u->r) + 8; + return (void *) u + replicas_usage_bytes(u); } =20 /* diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c index 6c52f116098f..2dfa7ca95fc0 100644 --- a/fs/bcachefs/btree_gc.c +++ b/fs/bcachefs/btree_gc.c @@ -827,7 +827,8 @@ static int bch2_gc_mark_key(struct btree_trans *trans, = enum btree_id btree_id, if (ret) goto err; =20 - if (fsck_err_on(k->k->version.lo > atomic64_read(&c->key_version), c, + if (fsck_err_on(btree_id !=3D BTREE_ID_accounting && + k->k->version.lo > atomic64_read(&c->key_version), c, bkey_version_in_future, "key version number higher than recorded: %llu > %llu", k->k->version.lo, diff --git a/fs/bcachefs/btree_iter.c b/fs/bcachefs/btree_iter.c index 2357af3e6757..ef7cb7174c8b 100644 --- a/fs/bcachefs/btree_iter.c +++ b/fs/bcachefs/btree_iter.c @@ -3072,15 +3072,6 @@ void bch2_trans_put(struct btree_trans *trans) srcu_read_unlock(&c->btree_trans_barrier, trans->srcu_idx); } =20 - if (trans->fs_usage_deltas) { - if (trans->fs_usage_deltas->size + sizeof(trans->fs_usage_deltas) =3D=3D - REPLICAS_DELTA_LIST_MAX) - mempool_free(trans->fs_usage_deltas, - &c->replicas_delta_pool); - else - kfree(trans->fs_usage_deltas); - } - if (unlikely(trans->journal_replay_not_finished)) bch2_journal_keys_put(c); =20 diff --git a/fs/bcachefs/btree_trans_commit.c b/fs/bcachefs/btree_trans_com= mit.c index 60f6255367b9..b005e20039bb 100644 --- a/fs/bcachefs/btree_trans_commit.c +++ b/fs/bcachefs/btree_trans_commit.c @@ -9,6 +9,7 @@ #include "btree_update_interior.h" #include "btree_write_buffer.h" #include "buckets.h" +#include "disk_accounting.h" #include "errcode.h" #include "error.h" #include "journal.h" @@ -598,6 +599,14 @@ static noinline int bch2_trans_commit_run_gc_triggers(= struct btree_trans *trans) return 0; } =20 +static struct bversion journal_pos_to_bversion(struct journal_res *res, un= signed offset) +{ + return (struct bversion) { + .hi =3D res->seq >> 32, + .lo =3D (res->seq << 32) | (res->offset + offset), + }; +} + static inline int bch2_trans_commit_write_locked(struct btree_trans *trans, unsigned flags, struct btree_insert_entry **stopped_at, @@ -606,7 +615,7 @@ bch2_trans_commit_write_locked(struct btree_trans *tran= s, unsigned flags, struct bch_fs *c =3D trans->c; struct btree_trans_commit_hook *h; unsigned u64s =3D 0; - int ret; + int ret =3D 0; =20 if (race_fault()) { trace_and_count(c, trans_restart_fault_inject, trans, trace_ip); @@ -668,21 +677,35 @@ bch2_trans_commit_write_locked(struct btree_trans *tr= ans, unsigned flags, i->k->k.version =3D MAX_VERSION; } =20 - if (trans->fs_usage_deltas && - bch2_trans_fs_usage_apply(trans, trans->fs_usage_deltas)) - return -BCH_ERR_btree_insert_need_mark_replicas; - - /* XXX: we only want to run this if deltas are nonzero */ - bch2_trans_account_disk_usage_change(trans); - h =3D trans->hooks; while (h) { ret =3D h->fn(trans, h); if (ret) - goto revert_fs_usage; + return ret; h =3D h->next; } =20 + percpu_down_read(&c->mark_lock); + struct jset_entry *entry =3D trans->journal_entries; + + for (entry =3D trans->journal_entries; + entry !=3D (void *) ((u64 *) trans->journal_entries + trans->journal= _entries_u64s); + entry =3D vstruct_next(entry)) + if (jset_entry_is_key(entry) && entry->start->k.type =3D=3D KEY_TYPE_acc= ounting) { + struct bkey_i_accounting *a =3D bkey_i_to_accounting(entry->start); + + a->k.version =3D journal_pos_to_bversion(&trans->journal_res, + (u64 *) entry - (u64 *) trans->journal_entries); + BUG_ON(bversion_zero(a->k.version)); + ret =3D bch2_accounting_mem_add(trans, accounting_i_to_s_c(a)); + if (ret) + goto revert_fs_usage; + } + percpu_up_read(&c->mark_lock); + + /* XXX: we only want to run this if deltas are nonzero */ + bch2_trans_account_disk_usage_change(trans); + trans_for_each_update(trans, i) if (BTREE_NODE_TYPE_HAS_ATOMIC_TRIGGERS & (1U << i->bkey_type)) { ret =3D run_one_mem_trigger(trans, i, BTREE_TRIGGER_ATOMIC|i->flags); @@ -751,10 +774,20 @@ bch2_trans_commit_write_locked(struct btree_trans *tr= ans, unsigned flags, =20 return 0; fatal_err: - bch2_fatal_error(c); + bch2_fs_fatal_error(c, "fatal error in transaction commit: %s", bch2_err_= str(ret)); + percpu_down_read(&c->mark_lock); revert_fs_usage: - if (trans->fs_usage_deltas) - bch2_trans_fs_usage_revert(trans, trans->fs_usage_deltas); + for (struct jset_entry *entry2 =3D trans->journal_entries; + entry2 !=3D entry; + entry2 =3D vstruct_next(entry2)) + if (jset_entry_is_key(entry2) && entry2->start->k.type =3D=3D KEY_TYPE_a= ccounting) { + struct bkey_s_accounting a =3D bkey_i_to_s_accounting(entry2->start); + + bch2_accounting_neg(a); + bch2_accounting_mem_add(trans, a.c); + bch2_accounting_neg(a); + } + percpu_up_read(&c->mark_lock); return ret; } =20 @@ -904,7 +937,7 @@ int bch2_trans_commit_error(struct btree_trans *trans, = unsigned flags, break; case -BCH_ERR_btree_insert_need_mark_replicas: ret =3D drop_locks_do(trans, - bch2_replicas_delta_list_mark(c, trans->fs_usage_deltas)); + bch2_accounting_update_sb(trans)); break; case -BCH_ERR_journal_res_get_blocked: /* @@ -996,8 +1029,6 @@ int __bch2_trans_commit(struct btree_trans *trans, uns= igned flags) !trans->journal_entries_u64s) goto out_reset; =20 - memset(&trans->fs_usage_delta, 0, sizeof(trans->fs_usage_delta)); - ret =3D bch2_trans_commit_run_triggers(trans); if (ret) goto out_reset; @@ -1093,6 +1124,7 @@ int __bch2_trans_commit(struct btree_trans *trans, un= signed flags) bch2_trans_verify_not_in_restart(trans); if (likely(!(flags & BCH_TRANS_COMMIT_no_journal_res))) memset(&trans->journal_res, 0, sizeof(trans->journal_res)); + memset(&trans->fs_usage_delta, 0, sizeof(trans->fs_usage_delta)); =20 ret =3D do_bch2_trans_commit(trans, flags, &errored_at, _RET_IP_); =20 diff --git a/fs/bcachefs/btree_types.h b/fs/bcachefs/btree_types.h index b2ebf143c3b7..2acca37eb831 100644 --- a/fs/bcachefs/btree_types.h +++ b/fs/bcachefs/btree_types.h @@ -441,7 +441,6 @@ struct btree_trans { =20 unsigned journal_u64s; unsigned extra_disk_res; /* XXX kill */ - struct replicas_delta_list *fs_usage_deltas; =20 /* Entries before this are zeroed out on every bch2_trans_get() call */ =20 diff --git a/fs/bcachefs/btree_update.h b/fs/bcachefs/btree_update.h index 21f887fe857c..6f8812f21444 100644 --- a/fs/bcachefs/btree_update.h +++ b/fs/bcachefs/btree_update.h @@ -213,14 +213,6 @@ static inline void bch2_trans_reset_updates(struct btr= ee_trans *trans) trans->journal_entries_u64s =3D 0; trans->hooks =3D NULL; trans->extra_disk_res =3D 0; - - if (trans->fs_usage_deltas) { - trans->fs_usage_deltas->used =3D 0; - memset((void *) trans->fs_usage_deltas + - offsetof(struct replicas_delta_list, memset_start), 0, - (void *) &trans->fs_usage_deltas->memset_end - - (void *) &trans->fs_usage_deltas->memset_start); - } } =20 static inline struct bkey_i *__bch2_bkey_make_mut_noupdate(struct btree_tr= ans *trans, struct bkey_s_c k, diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c index c2f46b267b3a..fb915c1b7844 100644 --- a/fs/bcachefs/buckets.c +++ b/fs/bcachefs/buckets.c @@ -13,6 +13,7 @@ #include "btree_update.h" #include "buckets.h" #include "buckets_waiting_for_journal.h" +#include "disk_accounting.h" #include "ec.h" #include "error.h" #include "inode.h" @@ -25,24 +26,16 @@ =20 #include =20 -static inline void fs_usage_data_type_to_base(struct bch_fs_usage_base *fs= _usage, - enum bch_data_type data_type, - s64 sectors) +static inline struct bch_fs_usage *fs_usage_ptr(struct bch_fs *c, + unsigned journal_seq, + bool gc) { - switch (data_type) { - case BCH_DATA_btree: - fs_usage->btree +=3D sectors; - break; - case BCH_DATA_user: - case BCH_DATA_parity: - fs_usage->data +=3D sectors; - break; - case BCH_DATA_cached: - fs_usage->cached +=3D sectors; - break; - default: - break; - } + percpu_rwsem_assert_held(&c->mark_lock); + BUG_ON(!gc && !journal_seq); + + return this_cpu_ptr(gc + ? c->usage_gc + : c->usage[journal_seq & JOURNAL_BUF_MASK]); } =20 void bch2_fs_usage_initialize(struct bch_fs *c) @@ -67,24 +60,13 @@ void bch2_fs_usage_initialize(struct bch_fs *c) struct bch_dev_usage dev =3D bch2_dev_usage_read(ca); =20 usage->b.hidden +=3D (dev.d[BCH_DATA_sb].buckets + - dev.d[BCH_DATA_journal].buckets) * + dev.d[BCH_DATA_journal].buckets) * ca->mi.bucket_size; } =20 percpu_up_write(&c->mark_lock); } =20 -static inline struct bch_dev_usage *dev_usage_ptr(struct bch_dev *ca, - unsigned journal_seq, - bool gc) -{ - BUG_ON(!gc && !journal_seq); - - return this_cpu_ptr(gc - ? ca->usage_gc - : ca->usage[journal_seq & JOURNAL_BUF_MASK]); -} - void bch2_dev_usage_read_fast(struct bch_dev *ca, struct bch_dev_usage *us= age) { struct bch_fs *c =3D ca->fs; @@ -267,11 +249,6 @@ bch2_fs_usage_read_short(struct bch_fs *c) return ret; } =20 -void bch2_dev_usage_init(struct bch_dev *ca) -{ - ca->usage_base->d[BCH_DATA_free].buckets =3D ca->mi.nbuckets - ca->mi.fir= st_bucket; -} - void bch2_dev_usage_to_text(struct printbuf *out, struct bch_dev_usage *us= age) { prt_tab(out); @@ -298,21 +275,20 @@ void bch2_dev_usage_to_text(struct printbuf *out, str= uct bch_dev_usage *usage) =20 void bch2_dev_usage_update(struct bch_fs *c, struct bch_dev *ca, const struct bch_alloc_v4 *old, - const struct bch_alloc_v4 *new, - u64 journal_seq, bool gc) + const struct bch_alloc_v4 *new) { struct bch_fs_usage *fs_usage; struct bch_dev_usage *u; =20 preempt_disable(); - fs_usage =3D fs_usage_ptr(c, journal_seq, gc); + fs_usage =3D this_cpu_ptr(c->usage_gc); =20 if (data_type_is_hidden(old->data_type)) fs_usage->b.hidden -=3D ca->mi.bucket_size; if (data_type_is_hidden(new->data_type)) fs_usage->b.hidden +=3D ca->mi.bucket_size; =20 - u =3D dev_usage_ptr(ca, journal_seq, gc); + u =3D this_cpu_ptr(ca->usage_gc); =20 u->d[old->data_type].buckets--; u->d[new->data_type].buckets++; @@ -346,27 +322,11 @@ void bch2_dev_usage_update_m(struct bch_fs *c, struct= bch_dev *ca, struct bch_alloc_v4 old_a =3D bucket_m_to_alloc(*old); struct bch_alloc_v4 new_a =3D bucket_m_to_alloc(*new); =20 - bch2_dev_usage_update(c, ca, &old_a, &new_a, 0, true); -} - -static inline int __update_replicas(struct bch_fs *c, - struct bch_fs_usage *fs_usage, - struct bch_replicas_entry_v1 *r, - s64 sectors) -{ - int idx =3D bch2_replicas_entry_idx(c, r); - - if (idx < 0) - return -1; - - fs_usage_data_type_to_base(&fs_usage->b, r->data_type, sectors); - fs_usage->replicas[idx] +=3D sectors; - return 0; + bch2_dev_usage_update(c, ca, &old_a, &new_a); } =20 int bch2_update_replicas(struct bch_fs *c, struct bkey_s_c k, - struct bch_replicas_entry_v1 *r, s64 sectors, - unsigned journal_seq, bool gc) + struct bch_replicas_entry_v1 *r, s64 sectors) { struct bch_fs_usage *fs_usage; int idx, ret =3D 0; @@ -393,7 +353,7 @@ int bch2_update_replicas(struct bch_fs *c, struct bkey_= s_c k, } =20 preempt_disable(); - fs_usage =3D fs_usage_ptr(c, journal_seq, gc); + fs_usage =3D this_cpu_ptr(c->usage_gc); fs_usage_data_type_to_base(&fs_usage->b, r->data_type, sectors); fs_usage->replicas[idx] +=3D sectors; preempt_enable(); @@ -406,94 +366,13 @@ int bch2_update_replicas(struct bch_fs *c, struct bke= y_s_c k, =20 static inline int update_cached_sectors(struct bch_fs *c, struct bkey_s_c k, - unsigned dev, s64 sectors, - unsigned journal_seq, bool gc) + unsigned dev, s64 sectors) { struct bch_replicas_padded r; =20 bch2_replicas_entry_cached(&r.e, dev); =20 - return bch2_update_replicas(c, k, &r.e, sectors, journal_seq, gc); -} - -static int __replicas_deltas_realloc(struct btree_trans *trans, unsigned m= ore, - gfp_t gfp) -{ - struct replicas_delta_list *d =3D trans->fs_usage_deltas; - unsigned new_size =3D d ? (d->size + more) * 2 : 128; - unsigned alloc_size =3D sizeof(*d) + new_size; - - WARN_ON_ONCE(alloc_size > REPLICAS_DELTA_LIST_MAX); - - if (!d || d->used + more > d->size) { - d =3D krealloc(d, alloc_size, gfp|__GFP_ZERO); - - if (unlikely(!d)) { - if (alloc_size > REPLICAS_DELTA_LIST_MAX) - return -ENOMEM; - - d =3D mempool_alloc(&trans->c->replicas_delta_pool, gfp); - if (!d) - return -ENOMEM; - - memset(d, 0, REPLICAS_DELTA_LIST_MAX); - - if (trans->fs_usage_deltas) - memcpy(d, trans->fs_usage_deltas, - trans->fs_usage_deltas->size + sizeof(*d)); - - new_size =3D REPLICAS_DELTA_LIST_MAX - sizeof(*d); - kfree(trans->fs_usage_deltas); - } - - d->size =3D new_size; - trans->fs_usage_deltas =3D d; - } - - return 0; -} - -int bch2_replicas_deltas_realloc(struct btree_trans *trans, unsigned more) -{ - return allocate_dropping_locks_errcode(trans, - __replicas_deltas_realloc(trans, more, _gfp)); -} - -int bch2_update_replicas_list(struct btree_trans *trans, - struct bch_replicas_entry_v1 *r, - s64 sectors) -{ - struct replicas_delta_list *d; - struct replicas_delta *n; - unsigned b; - int ret; - - if (!sectors) - return 0; - - b =3D replicas_entry_bytes(r) + 8; - ret =3D bch2_replicas_deltas_realloc(trans, b); - if (ret) - return ret; - - d =3D trans->fs_usage_deltas; - n =3D (void *) d->d + d->used; - n->delta =3D sectors; - unsafe_memcpy((void *) n + offsetof(struct replicas_delta, r), - r, replicas_entry_bytes(r), - "flexible array member embedded in strcuct with padding"); - bch2_replicas_entry_sort(&n->r); - d->used +=3D b; - return 0; -} - -int bch2_update_cached_sectors_list(struct btree_trans *trans, unsigned de= v, s64 sectors) -{ - struct bch_replicas_padded r; - - bch2_replicas_entry_cached(&r.e, dev); - - return bch2_update_replicas_list(trans, &r.e, sectors); + return bch2_update_replicas(c, k, &r.e, sectors); } =20 int bch2_mark_metadata_bucket(struct bch_fs *c, struct bch_dev *ca, @@ -653,47 +532,6 @@ int bch2_check_bucket_ref(struct btree_trans *trans, goto out; } =20 -void bch2_trans_fs_usage_revert(struct btree_trans *trans, - struct replicas_delta_list *deltas) -{ - struct bch_fs *c =3D trans->c; - struct bch_fs_usage *dst; - struct replicas_delta *d, *top =3D (void *) deltas->d + deltas->used; - s64 added =3D 0; - unsigned i; - - percpu_down_read(&c->mark_lock); - preempt_disable(); - dst =3D fs_usage_ptr(c, trans->journal_res.seq, false); - - /* revert changes: */ - for (d =3D deltas->d; d !=3D top; d =3D replicas_delta_next(d)) { - switch (d->r.data_type) { - case BCH_DATA_btree: - case BCH_DATA_user: - case BCH_DATA_parity: - added +=3D d->delta; - } - BUG_ON(__update_replicas(c, dst, &d->r, -d->delta)); - } - - dst->b.nr_inodes -=3D deltas->nr_inodes; - - for (i =3D 0; i < BCH_REPLICAS_MAX; i++) { - added -=3D deltas->persistent_reserved[i]; - dst->b.reserved -=3D deltas->persistent_reserved[i]; - dst->persistent_reserved[i] -=3D deltas->persistent_reserved[i]; - } - - if (added > 0) { - trans->disk_res->sectors +=3D added; - this_cpu_add(*c->online_reserved, added); - } - - preempt_enable(); - percpu_up_read(&c->mark_lock); -} - void bch2_trans_account_disk_usage_change(struct btree_trans *trans) { struct bch_fs *c =3D trans->c; @@ -747,43 +585,6 @@ void bch2_trans_account_disk_usage_change(struct btree= _trans *trans) should_not_have_added, disk_res_sectors); } =20 -int bch2_trans_fs_usage_apply(struct btree_trans *trans, - struct replicas_delta_list *deltas) -{ - struct bch_fs *c =3D trans->c; - struct replicas_delta *d, *d2; - struct replicas_delta *top =3D (void *) deltas->d + deltas->used; - struct bch_fs_usage *dst; - unsigned i; - - percpu_down_read(&c->mark_lock); - preempt_disable(); - dst =3D fs_usage_ptr(c, trans->journal_res.seq, false); - - for (d =3D deltas->d; d !=3D top; d =3D replicas_delta_next(d)) - if (__update_replicas(c, dst, &d->r, d->delta)) - goto need_mark; - - dst->b.nr_inodes +=3D deltas->nr_inodes; - - for (i =3D 0; i < BCH_REPLICAS_MAX; i++) { - dst->b.reserved +=3D deltas->persistent_reserved[i]; - dst->persistent_reserved[i] +=3D deltas->persistent_reserved[i]; - } - - preempt_enable(); - percpu_up_read(&c->mark_lock); - return 0; -need_mark: - /* revert changes: */ - for (d2 =3D deltas->d; d2 !=3D d; d2 =3D replicas_delta_next(d2)) - BUG_ON(__update_replicas(c, dst, &d2->r, -d2->delta)); - - preempt_enable(); - percpu_up_read(&c->mark_lock); - return -1; -} - /* KEY_TYPE_extent: */ =20 static int __mark_pointer(struct btree_trans *trans, @@ -911,10 +712,12 @@ static int bch2_trigger_stripe_ptr(struct btree_trans= *trans, stripe_blockcount_get(&s->v, p.ec.block) + sectors); =20 - struct bch_replicas_padded r; - bch2_bkey_to_replicas(&r.e, bkey_i_to_s_c(&s->k_i)); - r.e.data_type =3D data_type; - ret =3D bch2_update_replicas_list(trans, &r.e, sectors); + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_replicas, + }; + bch2_bkey_to_replicas(&acc.replicas, bkey_i_to_s_c(&s->k_i)); + acc.replicas.data_type =3D data_type; + ret =3D bch2_disk_accounting_mod(trans, &acc, §ors, 1); err: bch2_trans_iter_exit(trans, &iter); return ret; @@ -951,7 +754,7 @@ static int bch2_trigger_stripe_ptr(struct btree_trans *= trans, mutex_unlock(&c->ec_stripes_heap_lock); =20 r.e.data_type =3D data_type; - bch2_update_replicas(c, k, &r.e, sectors, trans->journal_res.seq, true); + bch2_update_replicas(c, k, &r.e, sectors); } =20 return 0; @@ -966,16 +769,18 @@ static int __trigger_extent(struct btree_trans *trans, struct bkey_ptrs_c ptrs =3D bch2_bkey_ptrs_c(k); const union bch_extent_entry *entry; struct extent_ptr_decoded p; - struct bch_replicas_padded r; enum bch_data_type data_type =3D bkey_is_btree_ptr(k.k) ? BCH_DATA_btree : BCH_DATA_user; s64 dirty_sectors =3D 0; int ret =3D 0; =20 - r.e.data_type =3D data_type; - r.e.nr_devs =3D 0; - r.e.nr_required =3D 1; + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_replicas, + .replicas.data_type =3D data_type, + .replicas.nr_devs =3D 0, + .replicas.nr_required =3D 1, + }; =20 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) { s64 disk_sectors; @@ -988,8 +793,8 @@ static int __trigger_extent(struct btree_trans *trans, if (p.ptr.cached) { if (!stale) { ret =3D !gc - ? bch2_update_cached_sectors_list(trans, p.ptr.dev, disk_sectors) - : update_cached_sectors(c, k, p.ptr.dev, disk_sectors, 0, true); + ? bch2_mod_dev_cached_sectors(trans, p.ptr.dev, disk_sectors) + : update_cached_sectors(c, k, p.ptr.dev, disk_sectors); bch2_fs_fatal_err_on(ret && gc, c, "%s(): no replicas entry while upda= ting cached sectors", __func__); if (ret) @@ -997,7 +802,7 @@ static int __trigger_extent(struct btree_trans *trans, } } else if (!p.has_ec) { dirty_sectors +=3D disk_sectors; - r.e.devs[r.e.nr_devs++] =3D p.ptr.dev; + acc.replicas.devs[acc.replicas.nr_devs++] =3D p.ptr.dev; } else { ret =3D bch2_trigger_stripe_ptr(trans, k, p, data_type, disk_sectors, f= lags); if (ret) @@ -1008,14 +813,14 @@ static int __trigger_extent(struct btree_trans *tran= s, * if so they're not required for mounting if we have an * erasure coded pointer in this extent: */ - r.e.nr_required =3D 0; + acc.replicas.nr_required =3D 0; } } =20 - if (r.e.nr_devs) { + if (acc.replicas.nr_devs) { ret =3D !gc - ? bch2_update_replicas_list(trans, &r.e, dirty_sectors) - : bch2_update_replicas(c, k, &r.e, dirty_sectors, 0, true); + ? bch2_disk_accounting_mod(trans, &acc, &dirty_sectors, 1) + : bch2_update_replicas(c, k, &acc.replicas, dirty_sectors); if (unlikely(ret && gc)) { struct printbuf buf =3D PRINTBUF; =20 @@ -1074,23 +879,23 @@ static int __trigger_reservation(struct btree_trans = *trans, { struct bch_fs *c =3D trans->c; unsigned replicas =3D bkey_s_c_to_reservation(k).v->nr_replicas; - s64 sectors =3D (s64) k.k->size * replicas; + s64 sectors =3D (s64) k.k->size; =20 if (flags & BTREE_TRIGGER_OVERWRITE) sectors =3D -sectors; =20 if (flags & BTREE_TRIGGER_TRANSACTIONAL) { - int ret =3D bch2_replicas_deltas_realloc(trans, 0); - if (ret) - return ret; - - struct replicas_delta_list *d =3D trans->fs_usage_deltas; - replicas =3D min(replicas, ARRAY_SIZE(d->persistent_reserved)); + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_persistent_reserved, + .persistent_reserved.nr_replicas =3D replicas, + }; =20 - d->persistent_reserved[replicas - 1] +=3D sectors; + return bch2_disk_accounting_mod(trans, &acc, §ors, 1); } =20 if (flags & BTREE_TRIGGER_GC) { + sectors *=3D replicas; + percpu_down_read(&c->mark_lock); preempt_disable(); =20 diff --git a/fs/bcachefs/buckets.h b/fs/bcachefs/buckets.h index 6387e039f789..f9a1d24c997b 100644 --- a/fs/bcachefs/buckets.h +++ b/fs/bcachefs/buckets.h @@ -202,7 +202,6 @@ static inline struct bch_dev_usage bch2_dev_usage_read(= struct bch_dev *ca) return ret; } =20 -void bch2_dev_usage_init(struct bch_dev *); void bch2_dev_usage_to_text(struct printbuf *, struct bch_dev_usage *); =20 static inline u64 bch2_dev_buckets_reserved(struct bch_dev *ca, enum bch_w= atermark watermark) @@ -261,6 +260,13 @@ static inline u64 dev_buckets_available(struct bch_dev= *ca, return __dev_buckets_available(ca, bch2_dev_usage_read(ca), watermark); } =20 +static inline s64 bucket_sectors_fragmented(struct bch_dev *ca, struct bch= _alloc_v4 a) +{ + return a.dirty_sectors + ? max(0, (int) ca->mi.bucket_size - (int) a.dirty_sectors) + : 0; +} + /* Filesystem usage: */ =20 static inline unsigned __fs_usage_u64s(unsigned nr_replicas) @@ -304,31 +310,11 @@ bch2_fs_usage_read_short(struct bch_fs *); =20 void bch2_dev_usage_update(struct bch_fs *, struct bch_dev *, const struct bch_alloc_v4 *, - const struct bch_alloc_v4 *, u64, bool); + const struct bch_alloc_v4 *); void bch2_dev_usage_update_m(struct bch_fs *, struct bch_dev *, struct bucket *, struct bucket *); - -/* key/bucket marking: */ - -static inline struct bch_fs_usage *fs_usage_ptr(struct bch_fs *c, - unsigned journal_seq, - bool gc) -{ - percpu_rwsem_assert_held(&c->mark_lock); - BUG_ON(!gc && !journal_seq); - - return this_cpu_ptr(gc - ? c->usage_gc - : c->usage[journal_seq & JOURNAL_BUF_MASK]); -} - int bch2_update_replicas(struct bch_fs *, struct bkey_s_c, - struct bch_replicas_entry_v1 *, s64, - unsigned, bool); -int bch2_update_replicas_list(struct btree_trans *, struct bch_replicas_entry_v1 *, s64); -int bch2_update_cached_sectors_list(struct btree_trans *, unsigned, s64); -int bch2_replicas_deltas_realloc(struct btree_trans *, unsigned); =20 void bch2_fs_usage_initialize(struct bch_fs *); =20 @@ -358,9 +344,6 @@ int bch2_trigger_reservation(struct btree_trans *, enum= btree_id, unsigned, =20 void bch2_trans_account_disk_usage_change(struct btree_trans *); =20 -void bch2_trans_fs_usage_revert(struct btree_trans *, struct replicas_delt= a_list *); -int bch2_trans_fs_usage_apply(struct btree_trans *, struct replicas_delta_= list *); - int bch2_trans_mark_metadata_bucket(struct btree_trans *, struct bch_dev *, size_t, enum bch_data_type, unsigned); int bch2_trans_mark_dev_sb(struct bch_fs *, struct bch_dev *); diff --git a/fs/bcachefs/disk_accounting.c b/fs/bcachefs/disk_accounting.c index 209f59e87b34..327c586ac661 100644 --- a/fs/bcachefs/disk_accounting.c +++ b/fs/bcachefs/disk_accounting.c @@ -1,9 +1,13 @@ // SPDX-License-Identifier: GPL-2.0 =20 #include "bcachefs.h" +#include "bcachefs_ioctl.h" #include "btree_update.h" +#include "btree_write_buffer.h" #include "buckets.h" #include "disk_accounting.h" +#include "error.h" +#include "journal_io.h" #include "replicas.h" =20 static const char * const disk_accounting_type_strs[] =3D { @@ -13,6 +17,44 @@ static const char * const disk_accounting_type_strs[] = =3D { NULL }; =20 +int bch2_disk_accounting_mod(struct btree_trans *trans, + struct disk_accounting_key *k, + s64 *d, unsigned nr) +{ + /* Normalize: */ + switch (k->type) { + case BCH_DISK_ACCOUNTING_replicas: + bubble_sort(k->replicas.devs, k->replicas.nr_devs, u8_cmp); + break; + } + + BUG_ON(nr > BCH_ACCOUNTING_MAX_COUNTERS); + + struct { + __BKEY_PADDED(k, BCH_ACCOUNTING_MAX_COUNTERS); + } k_i; + struct bkey_i_accounting *acc =3D bkey_accounting_init(&k_i.k); + + acc->k.p =3D disk_accounting_key_to_bpos(k); + set_bkey_val_u64s(&acc->k, sizeof(struct bch_accounting) / sizeof(u64) + = nr); + + memcpy_u64s_small(acc->v.d, d, nr); + + return bch2_trans_update_buffered(trans, BTREE_ID_accounting, &acc->k_i); +} + +int bch2_mod_dev_cached_sectors(struct btree_trans *trans, + unsigned dev, s64 sectors) +{ + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_replicas, + }; + + bch2_replicas_entry_cached(&acc.replicas, dev); + + return bch2_disk_accounting_mod(trans, &acc, §ors, 1); +} + int bch2_accounting_invalid(struct bch_fs *c, struct bkey_s_c k, enum bkey_invalid_flags flags, struct printbuf *err) @@ -68,3 +110,269 @@ void bch2_accounting_swab(struct bkey_s k) p++) *p =3D swab64(*p); } + +static inline bool accounting_to_replicas(struct bch_replicas_entry_v1 *r,= struct bpos p) +{ + struct disk_accounting_key acc_k; + bpos_to_disk_accounting_key(&acc_k, p); + + switch (acc_k.type) { + case BCH_DISK_ACCOUNTING_replicas: + memcpy(r, &acc_k.replicas, replicas_entry_bytes(&acc_k.replicas)); + return true; + default: + return false; + } +} + +static int bch2_accounting_update_sb_one(struct bch_fs *c, struct bpos p) +{ + struct bch_replicas_padded r; + return accounting_to_replicas(&r.e, p) + ? bch2_mark_replicas(c, &r.e) + : 0; +} + +int bch2_accounting_update_sb(struct btree_trans *trans) +{ + for (struct jset_entry *i =3D trans->journal_entries; + i !=3D (void *) ((u64 *) trans->journal_entries + trans->journal_ent= ries_u64s); + i =3D vstruct_next(i)) + if (jset_entry_is_key(i) && i->start->k.type =3D=3D KEY_TYPE_accounting)= { + int ret =3D bch2_accounting_update_sb_one(trans->c, i->start->k.p); + if (ret) + return ret; + } + + return 0; +} + +static int __bch2_accounting_mem_add_slowpath(struct bch_fs *c, struct bke= y_s_c_accounting a) +{ + struct bch_replicas_padded r; + + if (accounting_to_replicas(&r.e, a.k->p) && + !bch2_replicas_marked_locked(c, &r.e)) + return -BCH_ERR_btree_insert_need_mark_replicas; + + struct bch_accounting_mem *acc =3D &c->accounting; + unsigned new_nr_counters =3D acc->nr_counters + bch2_accounting_counters(= a.k); + + u64 __percpu *new_counters =3D __alloc_percpu_gfp(new_nr_counters * sizeo= f(u64), + sizeof(u64), GFP_KERNEL); + if (!new_counters) + return -BCH_ERR_ENOMEM_disk_accounting; + + preempt_disable(); + memcpy(this_cpu_ptr(new_counters), + bch2_acc_percpu_u64s(acc->v, acc->nr_counters), + acc->nr_counters * sizeof(u64)); + preempt_enable(); + + struct accounting_pos_offset n =3D { + .pos =3D a.k->p, + .version =3D a.k->version, + .offset =3D acc->nr_counters, + .nr_counters =3D bch2_accounting_counters(a.k), + }; + if (darray_push(&acc->k, n)) { + free_percpu(new_counters); + return -BCH_ERR_ENOMEM_disk_accounting; + } + + eytzinger0_sort(acc->k.data, acc->k.nr, sizeof(acc->k.data[0]), accountin= g_pos_cmp, NULL); + + free_percpu(acc->v); + acc->v =3D new_counters; + acc->nr_counters =3D new_nr_counters; + + for (unsigned i =3D 0; i < n.nr_counters; i++) + this_cpu_add(acc->v[n.offset + i], a.v->d[i]); + return 0; +} + +int bch2_accounting_mem_add_slowpath(struct bch_fs *c, struct bkey_s_c_acc= ounting a) +{ + percpu_up_read(&c->mark_lock); + percpu_down_write(&c->mark_lock); + int ret =3D __bch2_accounting_mem_add_slowpath(c, a); + percpu_up_write(&c->mark_lock); + percpu_down_read(&c->mark_lock); + return ret; +} + +int bch2_fs_replicas_usage_read(struct bch_fs *c, darray_char *usage) +{ + struct bch_accounting_mem *acc =3D &c->accounting; + int ret =3D 0; + + darray_init(usage); + + percpu_down_read(&c->mark_lock); + darray_for_each(acc->k, i) { + struct { + struct bch_replicas_usage r; + u8 pad[BCH_BKEY_PTRS_MAX]; + } u; + + if (!accounting_to_replicas(&u.r.r, i->pos)) + continue; + + bch2_accounting_mem_read(c, i->pos, &u.r.sectors, 1); + + ret =3D darray_make_room(usage, replicas_usage_bytes(&u.r)); + if (ret) + break; + + memcpy(&darray_top(*usage), &u.r, replicas_usage_bytes(&u.r)); + usage->nr +=3D replicas_usage_bytes(&u.r); + } + percpu_up_read(&c->mark_lock); + + if (ret) + darray_exit(usage); + return ret; +} + +static bool accounting_key_is_zero(struct bkey_s_c_accounting a) +{ + + for (unsigned i =3D 0; i < bch2_accounting_counters(a.k); i++) + if (a.v->d[i]) + return false; + return true; +} + +static int accounting_read_key(struct bch_fs *c, struct bkey_s_c k) +{ + struct printbuf buf =3D PRINTBUF; + + if (k.k->type !=3D KEY_TYPE_accounting) + return 0; + + percpu_down_read(&c->mark_lock); + int ret =3D __bch2_accounting_mem_add(c, bkey_s_c_to_accounting(k)); + percpu_up_read(&c->mark_lock); + + if (accounting_key_is_zero(bkey_s_c_to_accounting(k)) && + ret =3D=3D -BCH_ERR_btree_insert_need_mark_replicas) + ret =3D 0; + + struct disk_accounting_key acc; + bpos_to_disk_accounting_key(&acc, k.k->p); + + if (fsck_err_on(ret =3D=3D -BCH_ERR_btree_insert_need_mark_replicas, + c, accounting_replicas_not_marked, + "accounting not marked in superblock replicas\n %s", + (bch2_accounting_key_to_text(&buf, &acc), + buf.buf))) + ret =3D bch2_accounting_update_sb_one(c, k.k->p); +fsck_err: + printbuf_exit(&buf); + return ret; +} + +int bch2_accounting_read(struct bch_fs *c) +{ + struct bch_accounting_mem *acc =3D &c->accounting; + + int ret =3D bch2_trans_run(c, + for_each_btree_key(trans, iter, + BTREE_ID_accounting, POS_MIN, + BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, ({ + struct bkey u; + struct bkey_s_c k =3D bch2_btree_path_peek_slot_exact(btree_iter_path(t= rans, &iter), &u); + accounting_read_key(c, k); + }))); + if (ret) + goto err; + + struct genradix_iter iter; + struct journal_replay *i, **_i; + + genradix_for_each(&c->journal_entries, iter, _i) { + i =3D *_i; + + if (!i || i->ignore) + continue; + + for_each_jset_key(k, entry, &i->j) + if (k->k.type =3D=3D KEY_TYPE_accounting) { + struct bkey_s_c_accounting a =3D bkey_i_to_s_c_accounting(k); + unsigned idx =3D eytzinger0_find(acc->k.data, acc->k.nr, + sizeof(acc->k.data[0]), + accounting_pos_cmp, &a.k->p); + if (idx < acc->k.nr && + bversion_cmp(acc->k.data[idx].version, a.k->version) >=3D 0) + continue; + + ret =3D accounting_read_key(c, bkey_i_to_s_c(k)); + if (ret) + goto err; + } + } + + percpu_down_read(&c->mark_lock); + preempt_disable(); + struct bch_fs_usage_base *usage =3D &c->usage_base->b; + + for (unsigned i =3D 0; i < acc->k.nr; i++) { + struct disk_accounting_key k; + bpos_to_disk_accounting_key(&k, acc->k.data[i].pos); + + u64 v[BCH_ACCOUNTING_MAX_COUNTERS]; + bch2_accounting_mem_read_counters(c, i, v, ARRAY_SIZE(v)); + + switch (k.type) { + case BCH_DISK_ACCOUNTING_persistent_reserved: + usage->reserved +=3D v[0] * k.persistent_reserved.nr_replicas; + break; + case BCH_DISK_ACCOUNTING_replicas: + fs_usage_data_type_to_base(usage, k.replicas.data_type, v[0]); + break; + } + } + preempt_enable(); + percpu_up_read(&c->mark_lock); +err: + bch_err_fn(c, ret); + return ret; +} + +int bch2_dev_usage_remove(struct bch_fs *c, unsigned dev) +{ + return bch2_trans_run(c, + bch2_btree_write_buffer_flush_sync(trans) ?: + for_each_btree_key_commit(trans, iter, BTREE_ID_accounting, POS_MIN, + BTREE_ITER_ALL_SNAPSHOTS, k, NULL, NULL, 0, ({ + struct disk_accounting_key acc; + bpos_to_disk_accounting_key(&acc, k.k->p); + + acc.type =3D=3D BCH_DISK_ACCOUNTING_dev_data_type && + acc.dev_data_type.dev =3D=3D dev + ? bch2_btree_bit_mod_buffered(trans, BTREE_ID_accounting, k.k->p, 0) + : 0; + })) ?: + bch2_btree_write_buffer_flush_sync(trans)); +} + +int bch2_dev_usage_init(struct bch_dev *ca) +{ + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_dev_data_type, + .dev_data_type.dev =3D ca->dev_idx, + .dev_data_type.data_type =3D BCH_DATA_free, + }; + u64 v[3] =3D { ca->mi.nbuckets - ca->mi.first_bucket, 0, 0 }; + + return bch2_trans_do(ca->fs, NULL, NULL, 0, + bch2_disk_accounting_mod(trans, &acc, v, ARRAY_SIZE(v))); +} + +void bch2_fs_accounting_exit(struct bch_fs *c) +{ + struct bch_accounting_mem *acc =3D &c->accounting; + + darray_exit(&acc->k); + free_percpu(acc->v); +} diff --git a/fs/bcachefs/disk_accounting.h b/fs/bcachefs/disk_accounting.h index e15299665859..5fd053a819df 100644 --- a/fs/bcachefs/disk_accounting.h +++ b/fs/bcachefs/disk_accounting.h @@ -2,11 +2,32 @@ #ifndef _BCACHEFS_DISK_ACCOUNTING_H #define _BCACHEFS_DISK_ACCOUNTING_H =20 +#include + +static inline void bch2_u64s_neg(u64 *v, unsigned nr) +{ + for (unsigned i =3D 0; i < nr; i++) + v[i] =3D -v[i]; +} + static inline unsigned bch2_accounting_counters(const struct bkey *k) { return bkey_val_u64s(k) - offsetof(struct bch_accounting, d) / sizeof(u64= ); } =20 +static inline void bch2_accounting_neg(struct bkey_s_accounting a) +{ + bch2_u64s_neg(a.v->d, bch2_accounting_counters(a.k)); +} + +static inline bool bch2_accounting_key_is_zero(struct bkey_s_c_accounting = a) +{ + for (unsigned i =3D 0; i < bch2_accounting_counters(a.k); i++) + if (a.v->d[i]) + return false; + return true; +} + static inline void bch2_accounting_accumulate(struct bkey_i_accounting *ds= t, struct bkey_s_c_accounting src) { @@ -18,6 +39,26 @@ static inline void bch2_accounting_accumulate(struct bke= y_i_accounting *dst, dst->k.version =3D src.k->version; } =20 +static inline void fs_usage_data_type_to_base(struct bch_fs_usage_base *fs= _usage, + enum bch_data_type data_type, + s64 sectors) +{ + switch (data_type) { + case BCH_DATA_btree: + fs_usage->btree +=3D sectors; + break; + case BCH_DATA_user: + case BCH_DATA_parity: + fs_usage->data +=3D sectors; + break; + case BCH_DATA_cached: + fs_usage->cached +=3D sectors; + break; + default: + break; + } +} + static inline void bpos_to_disk_accounting_key(struct disk_accounting_key = *acc, struct bpos p) { acc->_pad =3D p; @@ -36,6 +77,12 @@ static inline struct bpos disk_accounting_key_to_bpos(st= ruct disk_accounting_key return ret; } =20 +int bch2_disk_accounting_mod(struct btree_trans *, + struct disk_accounting_key *, + s64 *, unsigned); +int bch2_mod_dev_cached_sectors(struct btree_trans *trans, + unsigned dev, s64 sectors); + int bch2_accounting_invalid(struct bch_fs *, struct bkey_s_c, enum bkey_invalid_flags, struct printbuf *); void bch2_accounting_key_to_text(struct printbuf *, struct disk_accounting= _key *); @@ -49,4 +96,83 @@ void bch2_accounting_swab(struct bkey_s); .min_val_size =3D 8, \ }) =20 +int bch2_accounting_update_sb(struct btree_trans *); + +static inline int accounting_pos_cmp(const void *_l, const void *_r) +{ + const struct bpos *l =3D _l, *r =3D _r; + + return bpos_cmp(*l, *r); +} + +int bch2_accounting_mem_add_slowpath(struct bch_fs *, struct bkey_s_c_acco= unting); + +static inline int __bch2_accounting_mem_add(struct bch_fs *c, struct bkey_= s_c_accounting a) +{ + struct bch_accounting_mem *acc =3D &c->accounting; + unsigned idx =3D eytzinger0_find(acc->k.data, acc->k.nr, sizeof(acc->k.da= ta[0]), + accounting_pos_cmp, &a.k->p); + if (unlikely(idx >=3D acc->k.nr)) + return bch2_accounting_mem_add_slowpath(c, a); + + unsigned offset =3D acc->k.data[idx].offset; + + EBUG_ON(bch2_accounting_counters(a.k) !=3D acc->k.data[idx].nr_counters); + + for (unsigned i =3D 0; i < bch2_accounting_counters(a.k); i++) + this_cpu_add(acc->v[offset + i], a.v->d[i]); + return 0; +} + +static inline int bch2_accounting_mem_add(struct btree_trans *trans, struc= t bkey_s_c_accounting a) +{ + struct disk_accounting_key acc_k; + bpos_to_disk_accounting_key(&acc_k, a.k->p); + + switch (acc_k.type) { + case BCH_DISK_ACCOUNTING_persistent_reserved: + trans->fs_usage_delta.reserved +=3D acc_k.persistent_reserved.nr_replica= s * a.v->d[0]; + break; + case BCH_DISK_ACCOUNTING_replicas: + fs_usage_data_type_to_base(&trans->fs_usage_delta, acc_k.replicas.data_t= ype, a.v->d[0]); + break; + } + return __bch2_accounting_mem_add(trans->c, a); +} + +static inline void bch2_accounting_mem_read_counters(struct bch_fs *c, + unsigned idx, + u64 *v, unsigned nr) +{ + memset(v, 0, sizeof(*v) * nr); + + struct bch_accounting_mem *acc =3D &c->accounting; + if (unlikely(idx >=3D acc->k.nr)) + return; + + unsigned offset =3D acc->k.data[idx].offset; + nr =3D min_t(unsigned, nr, acc->k.data[idx].nr_counters); + + for (unsigned i =3D 0; i < nr; i++) + v[i] =3D percpu_u64_get(acc->v + offset + i); +} + +static inline void bch2_accounting_mem_read(struct bch_fs *c, struct bpos = p, + u64 *v, unsigned nr) +{ + struct bch_accounting_mem *acc =3D &c->accounting; + unsigned idx =3D eytzinger0_find(acc->k.data, acc->k.nr, sizeof(acc->k.da= ta[0]), + accounting_pos_cmp, &p); + + bch2_accounting_mem_read_counters(c, idx, v, nr); +} + +int bch2_fs_replicas_usage_read(struct bch_fs *, darray_char *); + +int bch2_accounting_read(struct bch_fs *); + +int bch2_dev_usage_remove(struct bch_fs *, unsigned); +int bch2_dev_usage_init(struct bch_dev *); +void bch2_fs_accounting_exit(struct bch_fs *); + #endif /* _BCACHEFS_DISK_ACCOUNTING_H */ diff --git a/fs/bcachefs/disk_accounting_types.h b/fs/bcachefs/disk_account= ing_types.h new file mode 100644 index 000000000000..8da5ac182b33 --- /dev/null +++ b/fs/bcachefs/disk_accounting_types.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _BCACHEFS_DISK_ACCOUNTING_TYPES_H +#define _BCACHEFS_DISK_ACCOUNTING_TYPES_H + +#include + +struct accounting_pos_offset { + struct bpos pos; + struct bversion version; + u32 offset:24, + nr_counters:8; +}; + +struct bch_accounting_mem { + DARRAY(struct accounting_pos_offset) k; + u64 __percpu *v; + unsigned nr_counters; +}; + +#endif /* _BCACHEFS_DISK_ACCOUNTING_TYPES_H */ diff --git a/fs/bcachefs/ec.c b/fs/bcachefs/ec.c index b98e2c2b8bf0..38e5e882f4a4 100644 --- a/fs/bcachefs/ec.c +++ b/fs/bcachefs/ec.c @@ -13,6 +13,7 @@ #include "btree_write_buffer.h" #include "buckets.h" #include "checksum.h" +#include "disk_accounting.h" #include "disk_groups.h" #include "ec.h" #include "error.h" @@ -324,21 +325,25 @@ int bch2_trigger_stripe(struct btree_trans *trans, new_s->nr_redundant !=3D old_s->nr_redundant)); =20 if (new_s) { - s64 sectors =3D le16_to_cpu(new_s->sectors); + s64 sectors =3D (u64) le16_to_cpu(new_s->sectors) * new_s->nr_redundant; =20 - struct bch_replicas_padded r; - bch2_bkey_to_replicas(&r.e, new); - int ret =3D bch2_update_replicas_list(trans, &r.e, sectors * new_s->nr_= redundant); + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_replicas, + }; + bch2_bkey_to_replicas(&acc.replicas, new); + int ret =3D bch2_disk_accounting_mod(trans, &acc, §ors, 1); if (ret) return ret; } =20 if (old_s) { - s64 sectors =3D -((s64) le16_to_cpu(old_s->sectors)); + s64 sectors =3D -((s64) le16_to_cpu(old_s->sectors)) * old_s->nr_redund= ant; =20 - struct bch_replicas_padded r; - bch2_bkey_to_replicas(&r.e, old); - int ret =3D bch2_update_replicas_list(trans, &r.e, sectors * old_s->nr_= redundant); + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_replicas, + }; + bch2_bkey_to_replicas(&acc.replicas, old); + int ret =3D bch2_disk_accounting_mod(trans, &acc, §ors, 1); if (ret) return ret; } @@ -442,8 +447,7 @@ int bch2_trigger_stripe(struct btree_trans *trans, } =20 int ret =3D bch2_update_replicas(c, new, &m->r.e, - ((s64) m->sectors * m->nr_redundant), - 0, true); + ((s64) m->sectors * m->nr_redundant)); if (ret) { struct printbuf buf =3D PRINTBUF; =20 diff --git a/fs/bcachefs/inode.c b/fs/bcachefs/inode.c index a3139bb66f77..3dfa9f77c739 100644 --- a/fs/bcachefs/inode.c +++ b/fs/bcachefs/inode.c @@ -8,6 +8,7 @@ #include "buckets.h" #include "compress.h" #include "dirent.h" +#include "disk_accounting.h" #include "error.h" #include "extents.h" #include "extent_update.h" @@ -610,11 +611,13 @@ int bch2_trigger_inode(struct btree_trans *trans, =20 if (flags & BTREE_TRIGGER_TRANSACTIONAL) { if (nr) { - int ret =3D bch2_replicas_deltas_realloc(trans, 0); + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_nr_inodes + }; + + int ret =3D bch2_disk_accounting_mod(trans, &acc, &nr, 1); if (ret) return ret; - - trans->fs_usage_deltas->nr_inodes +=3D nr; } =20 bool old_deleted =3D bkey_is_deleted_inode(old); diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index b8289af66c8e..140393256f32 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -1194,9 +1194,6 @@ int bch2_fs_initialize(struct bch_fs *c) for (unsigned i =3D 0; i < BTREE_ID_NR; i++) bch2_btree_root_alloc(c, i); =20 - for_each_member_device(c, ca) - bch2_dev_usage_init(ca); - ret =3D bch2_fs_journal_alloc(c); if (ret) goto err; @@ -1213,6 +1210,15 @@ int bch2_fs_initialize(struct bch_fs *c) if (ret) goto err; =20 + for_each_member_device(c, ca) { + ret =3D bch2_dev_usage_init(ca); + bch_err_msg(c, ret, "initializing device usage"); + if (ret) { + percpu_ref_put(&ca->ref); + goto err; + } + } + /* * Write out the superblock and journal buckets, now that we can do * btree updates diff --git a/fs/bcachefs/recovery_types.h b/fs/bcachefs/recovery_types.h index 1361e34d4e64..18582e2128ed 100644 --- a/fs/bcachefs/recovery_types.h +++ b/fs/bcachefs/recovery_types.h @@ -13,6 +13,7 @@ * must never change: */ #define BCH_RECOVERY_PASSES() \ + x(accounting_read, 37, PASS_ALWAYS) \ x(alloc_read, 0, PASS_ALWAYS) \ x(stripes_read, 1, PASS_ALWAYS) \ x(initialize_subvolumes, 2, 0) \ diff --git a/fs/bcachefs/replicas.c b/fs/bcachefs/replicas.c index 678b9c20e251..dde581a49e28 100644 --- a/fs/bcachefs/replicas.c +++ b/fs/bcachefs/replicas.c @@ -254,23 +254,25 @@ static bool __replicas_has_entry(struct bch_replicas_= cpu *r, return __replicas_entry_idx(r, search) >=3D 0; } =20 -bool bch2_replicas_marked(struct bch_fs *c, +bool bch2_replicas_marked_locked(struct bch_fs *c, struct bch_replicas_entry_v1 *search) { - bool marked; - - if (!search->nr_devs) - return true; - verify_replicas_entry(search); =20 + return !search->nr_devs || + (__replicas_has_entry(&c->replicas, search) && + (likely((!c->replicas_gc.entries)) || + __replicas_has_entry(&c->replicas_gc, search))); +} + +bool bch2_replicas_marked(struct bch_fs *c, + struct bch_replicas_entry_v1 *search) +{ percpu_down_read(&c->mark_lock); - marked =3D __replicas_has_entry(&c->replicas, search) && - (likely((!c->replicas_gc.entries)) || - __replicas_has_entry(&c->replicas_gc, search)); + bool ret =3D bch2_replicas_marked_locked(c, search); percpu_up_read(&c->mark_lock); =20 - return marked; + return ret; } =20 static void __replicas_table_update(struct bch_fs_usage *dst, @@ -468,20 +470,6 @@ int bch2_mark_replicas(struct bch_fs *c, struct bch_re= plicas_entry_v1 *r) ? 0 : bch2_mark_replicas_slowpath(c, r); } =20 -/* replicas delta list: */ - -int bch2_replicas_delta_list_mark(struct bch_fs *c, - struct replicas_delta_list *r) -{ - struct replicas_delta *d =3D r->d; - struct replicas_delta *top =3D (void *) r->d + r->used; - int ret =3D 0; - - for (d =3D r->d; !ret && d !=3D top; d =3D replicas_delta_next(d)) - ret =3D bch2_mark_replicas(c, &d->r); - return ret; -} - /* * Old replicas_gc mechanism: only used for journal replicas entries now, = should * die at some point: @@ -1042,8 +1030,6 @@ void bch2_fs_replicas_exit(struct bch_fs *c) kfree(c->usage_base); kfree(c->replicas.entries); kfree(c->replicas_gc.entries); - - mempool_exit(&c->replicas_delta_pool); } =20 int bch2_fs_replicas_init(struct bch_fs *c) @@ -1052,7 +1038,5 @@ int bch2_fs_replicas_init(struct bch_fs *c) &c->replicas_journal_res, reserve_journal_replicas(c, &c->replicas)); =20 - return mempool_init_kmalloc_pool(&c->replicas_delta_pool, 1, - REPLICAS_DELTA_LIST_MAX) ?: - replicas_table_update(c, &c->replicas); + return replicas_table_update(c, &c->replicas); } diff --git a/fs/bcachefs/replicas.h b/fs/bcachefs/replicas.h index 983cce782ac2..f00c586f8cd9 100644 --- a/fs/bcachefs/replicas.h +++ b/fs/bcachefs/replicas.h @@ -26,18 +26,13 @@ int bch2_replicas_entry_idx(struct bch_fs *, void bch2_devlist_to_replicas(struct bch_replicas_entry_v1 *, enum bch_data_type, struct bch_devs_list); + +bool bch2_replicas_marked_locked(struct bch_fs *, + struct bch_replicas_entry_v1 *); bool bch2_replicas_marked(struct bch_fs *, struct bch_replicas_entry_v1 *); int bch2_mark_replicas(struct bch_fs *, struct bch_replicas_entry_v1 *); =20 -static inline struct replicas_delta * -replicas_delta_next(struct replicas_delta *d) -{ - return (void *) d + replicas_entry_bytes(&d->r) + 8; -} - -int bch2_replicas_delta_list_mark(struct bch_fs *, struct replicas_delta_l= ist *); - void bch2_bkey_to_replicas(struct bch_replicas_entry_v1 *, struct bkey_s_c= ); =20 static inline void bch2_replicas_entry_cached(struct bch_replicas_entry_v1= *e, diff --git a/fs/bcachefs/replicas_types.h b/fs/bcachefs/replicas_types.h index ac90d142c4e8..fed71c861fe7 100644 --- a/fs/bcachefs/replicas_types.h +++ b/fs/bcachefs/replicas_types.h @@ -8,20 +8,4 @@ struct bch_replicas_cpu { struct bch_replicas_entry_v1 *entries; }; =20 -struct replicas_delta { - s64 delta; - struct bch_replicas_entry_v1 r; -} __packed; - -struct replicas_delta_list { - unsigned size; - unsigned used; - - struct {} memset_start; - u64 nr_inodes; - u64 persistent_reserved[BCH_REPLICAS_MAX]; - struct {} memset_end; - struct replicas_delta d[]; -}; - #endif /* _BCACHEFS_REPLICAS_TYPES_H */ diff --git a/fs/bcachefs/sb-errors_types.h b/fs/bcachefs/sb-errors_types.h index 383e13711001..777a1adc38cf 100644 --- a/fs/bcachefs/sb-errors_types.h +++ b/fs/bcachefs/sb-errors_types.h @@ -265,7 +265,8 @@ x(subvol_children_bad, 257) \ x(subvol_loop, 258) \ x(subvol_unreachable, 259) \ - x(accounting_mismatch, 260) + x(accounting_mismatch, 260) \ + x(accounting_replicas_not_marked, 261) =20 enum bch_sb_error_id { #define x(t, n) BCH_FSCK_ERR_##t =3D n, diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c index a7f9de220d90..685d54d0ddbb 100644 --- a/fs/bcachefs/super.c +++ b/fs/bcachefs/super.c @@ -24,6 +24,7 @@ #include "clock.h" #include "compress.h" #include "debug.h" +#include "disk_accounting.h" #include "disk_groups.h" #include "ec.h" #include "errcode.h" @@ -535,6 +536,7 @@ static void __bch2_fs_free(struct bch_fs *c) time_stats_exit(&c->times[i]); =20 bch2_free_pending_node_rewrites(c); + bch2_fs_accounting_exit(c); bch2_fs_sb_errors_exit(c); bch2_fs_counters_exit(c); bch2_fs_snapshots_exit(c); @@ -1581,7 +1583,8 @@ static int bch2_dev_remove_alloc(struct bch_fs *c, st= ruct bch_dev *ca) bch2_btree_delete_range(c, BTREE_ID_alloc, start, end, BTREE_TRIGGER_NORUN, NULL) ?: bch2_btree_delete_range(c, BTREE_ID_bucket_gens, start, end, - BTREE_TRIGGER_NORUN, NULL); + BTREE_TRIGGER_NORUN, NULL) ?: + bch2_dev_usage_remove(c, ca->dev_idx); bch_err_msg(c, ret, "removing dev alloc info"); return ret; } @@ -1618,6 +1621,16 @@ int bch2_dev_remove(struct bch_fs *c, struct bch_dev= *ca, int flags) if (ret) goto err; =20 + /* + * We need to flush the entire journal to get rid of keys that reference + * the device being removed before removing the superblock entry + */ + bch2_journal_flush_all_pins(&c->journal); + + /* + * this is really just needed for the bch2_replicas_gc_(start|end) + * calls, and could be cleaned up: + */ ret =3D bch2_journal_flush_device_pins(&c->journal, ca->dev_idx); bch_err_msg(ca, ret, "bch2_journal_flush_device_pins()"); if (ret) @@ -1655,17 +1668,6 @@ int bch2_dev_remove(struct bch_fs *c, struct bch_dev= *ca, int flags) =20 bch2_dev_free(ca); =20 - /* - * At this point the device object has been removed in-core, but the - * on-disk journal might still refer to the device index via sb device - * usage entries. Recovery fails if it sees usage information for an - * invalid device. Flush journal pins to push the back of the journal - * past now invalid device index references before we update the - * superblock, but after the device object has been removed so any - * further journal writes elide usage info for the device. - */ - bch2_journal_flush_all_pins(&c->journal); - /* * Free this device's slot in the bch_member array - all pointers to * this device must be gone: @@ -1727,8 +1729,6 @@ int bch2_dev_add(struct bch_fs *c, const char *path) goto err; } =20 - bch2_dev_usage_init(ca); - ret =3D __bch2_dev_attach_bdev(ca, &sb); if (ret) goto err; @@ -1793,6 +1793,10 @@ int bch2_dev_add(struct bch_fs *c, const char *path) =20 bch2_dev_usage_journal_reserve(c); =20 + ret =3D bch2_dev_usage_init(ca); + if (ret) + goto err_late; + ret =3D bch2_trans_mark_dev_sb(c, ca); bch_err_msg(ca, ret, "marking new superblock"); if (ret) @@ -1956,15 +1960,18 @@ int bch2_dev_resize(struct bch_fs *c, struct bch_de= v *ca, u64 nbuckets) mutex_unlock(&c->sb_lock); =20 if (ca->mi.freespace_initialized) { - ret =3D bch2_dev_freespace_init(c, ca, old_nbuckets, nbuckets); + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_dev_data_type, + .dev_data_type.dev =3D ca->dev_idx, + .dev_data_type.data_type =3D BCH_DATA_free, + }; + u64 v[3] =3D { nbuckets - old_nbuckets, 0, 0 }; + + ret =3D bch2_dev_freespace_init(c, ca, old_nbuckets, nbuckets) ?: + bch2_trans_do(ca->fs, NULL, NULL, 0, + bch2_disk_accounting_mod(trans, &acc, v, ARRAY_SIZE(v))); if (ret) goto err; - - /* - * XXX: this is all wrong transactionally - we'll be able to do - * this correctly after the disk space accounting rewrite - */ - ca->usage_base->d[BCH_DATA_free].buckets +=3D nbuckets - old_nbuckets; } =20 bch2_recalc_capacity(c); --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80FAE8C07 for ; Sun, 25 Feb 2024 02:38:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828722; cv=none; b=j/KEA38oBQjiVCt+OIXqGbkVroaKfFTo3ahmTKgrq3BNRtuw+qBJY0+41rfDUfwf5IAP0ygL5JmYaN9fR6PHzEhyHResE6+HDVXF9Jajvz89HfTz1FL2naecJ+eby2FnmM4beeJL6yBX+8lvQ0DTt1gJ23uGJfSm1vdgZrOhsGQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828722; c=relaxed/simple; bh=WhdKHqtzSdKzXSywa8/Kdpjd1qyDu5Tei06epJX7qUk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SbvhgwpQyS/hhx0M/Nyss3j5HswQZoi64SxpRsXwvx2YB9LHMBy/Jb3s+381NkMJJW7kz3Pmk+tFjDYweQjl1U5yaC7jl65TTH3rUnHcC11bDPRgxKFk9hAoRaV++ce+YMD/jLx3V17WYm2OnzGBKJEJorqwntjGpdGmLxZAugc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=dGAOx/GL; arc=none smtp.client-ip=95.215.58.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="dGAOx/GL" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828718; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DrOHx5wMYDRZo/YqdsvPnCBIdET0WVtHDJeAbRuqJI0=; b=dGAOx/GLUxtEdliB0UdEMTiotkvOejJNPfJ61o3fMSJjoewuwiT3emDwd6XodW30MavCZb jB3slYjO1Uy9H72d5KMb9yXufezoiKDtmz1+I1cAzdDELRlh7FrPEa/l37qal7/ruDAtS/ 9hYMIPbSuygjbJL+DJPEvhfCmIdEQ1o= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 05/21] bcachefs: dev_usage updated by new accounting Date: Sat, 24 Feb 2024 21:38:07 -0500 Message-ID: <20240225023826.2413565-6-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Reading disk accounting now requires an eytzinger lookup (see: bch2_accounting_mem_read()), but the per-device counters are used frequently enough that we'd like to still be able to read them with just a percpu sum, as in the old code. This patch special cases the device counters; when we update in-memory accounting we also update the old style percpu counters if it's a deice counter update. Signed-off-by: Kent Overstreet --- fs/bcachefs/bcachefs.h | 3 +-- fs/bcachefs/btree_gc.c | 2 +- fs/bcachefs/buckets.c | 36 +++++------------------------------ fs/bcachefs/buckets_types.h | 2 +- fs/bcachefs/disk_accounting.c | 14 ++++++++++++++ fs/bcachefs/disk_accounting.h | 11 ++++++++++- fs/bcachefs/recovery.c | 14 -------------- fs/bcachefs/sb-clean.c | 17 ----------------- 8 files changed, 32 insertions(+), 67 deletions(-) diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h index 18c00051a8f6..91c40fde1925 100644 --- a/fs/bcachefs/bcachefs.h +++ b/fs/bcachefs/bcachefs.h @@ -576,8 +576,7 @@ struct bch_dev { unsigned long *buckets_nouse; struct rw_semaphore bucket_lock; =20 - struct bch_dev_usage *usage_base; - struct bch_dev_usage __percpu *usage[JOURNAL_BUF_NR]; + struct bch_dev_usage __percpu *usage; struct bch_dev_usage __percpu *usage_gc; =20 /* Allocator: */ diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c index 2dfa7ca95fc0..93826749356e 100644 --- a/fs/bcachefs/btree_gc.c +++ b/fs/bcachefs/btree_gc.c @@ -1233,7 +1233,7 @@ static int bch2_gc_done(struct bch_fs *c, bch2_fs_usage_acc_to_base(c, i); =20 __for_each_member_device(c, ca) { - struct bch_dev_usage *dst =3D ca->usage_base; + struct bch_dev_usage *dst =3D this_cpu_ptr(ca->usage); struct bch_dev_usage *src =3D (void *) bch2_acc_percpu_u64s((u64 __percpu *) ca->usage_gc, dev_usage_u64s()); diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c index fb915c1b7844..7540486ae266 100644 --- a/fs/bcachefs/buckets.c +++ b/fs/bcachefs/buckets.c @@ -69,15 +69,8 @@ void bch2_fs_usage_initialize(struct bch_fs *c) =20 void bch2_dev_usage_read_fast(struct bch_dev *ca, struct bch_dev_usage *us= age) { - struct bch_fs *c =3D ca->fs; - unsigned seq, i, u64s =3D dev_usage_u64s(); - - do { - seq =3D read_seqcount_begin(&c->usage_lock); - memcpy(usage, ca->usage_base, u64s * sizeof(u64)); - for (i =3D 0; i < ARRAY_SIZE(ca->usage); i++) - acc_u64s_percpu((u64 *) usage, (u64 __percpu *) ca->usage[i], u64s); - } while (read_seqcount_retry(&c->usage_lock, seq)); + memset(usage, 0, sizeof(*usage)); + acc_u64s_percpu((u64 *) usage, (u64 __percpu *) ca->usage, dev_usage_u64s= ()); } =20 u64 bch2_fs_usage_read_one(struct bch_fs *c, u64 *v) @@ -147,16 +140,6 @@ void bch2_fs_usage_acc_to_base(struct bch_fs *c, unsig= ned idx) (u64 __percpu *) c->usage[idx], u64s); percpu_memset(c->usage[idx], 0, u64s * sizeof(u64)); =20 - rcu_read_lock(); - for_each_member_device_rcu(c, ca, NULL) { - u64s =3D dev_usage_u64s(); - - acc_u64s_percpu((u64 *) ca->usage_base, - (u64 __percpu *) ca->usage[idx], u64s); - percpu_memset(ca->usage[idx], 0, u64s * sizeof(u64)); - } - rcu_read_unlock(); - write_seqcount_end(&c->usage_lock); preempt_enable(); } @@ -1214,23 +1197,14 @@ void bch2_dev_buckets_free(struct bch_dev *ca) { kvfree(ca->buckets_nouse); kvfree(rcu_dereference_protected(ca->bucket_gens, 1)); - - for (unsigned i =3D 0; i < ARRAY_SIZE(ca->usage); i++) - free_percpu(ca->usage[i]); - kfree(ca->usage_base); + free_percpu(ca->usage); } =20 int bch2_dev_buckets_alloc(struct bch_fs *c, struct bch_dev *ca) { - ca->usage_base =3D kzalloc(sizeof(struct bch_dev_usage), GFP_KERNEL); - if (!ca->usage_base) + ca->usage =3D alloc_percpu(struct bch_dev_usage); + if (!ca->usage) return -BCH_ERR_ENOMEM_usage_init; =20 - for (unsigned i =3D 0; i < ARRAY_SIZE(ca->usage); i++) { - ca->usage[i] =3D alloc_percpu(struct bch_dev_usage); - if (!ca->usage[i]) - return -BCH_ERR_ENOMEM_usage_init; - } - return bch2_dev_buckets_resize(c, ca, ca->mi.nbuckets); } diff --git a/fs/bcachefs/buckets_types.h b/fs/bcachefs/buckets_types.h index 6a31740222a7..baa7e0924390 100644 --- a/fs/bcachefs/buckets_types.h +++ b/fs/bcachefs/buckets_types.h @@ -33,7 +33,7 @@ struct bucket_gens { }; =20 struct bch_dev_usage { - struct { + struct bch_dev_usage_type { u64 buckets; u64 sectors; /* _compressed_ sectors: */ /* diff --git a/fs/bcachefs/disk_accounting.c b/fs/bcachefs/disk_accounting.c index 327c586ac661..e0114d8eb5a8 100644 --- a/fs/bcachefs/disk_accounting.c +++ b/fs/bcachefs/disk_accounting.c @@ -330,6 +330,20 @@ int bch2_accounting_read(struct bch_fs *c) case BCH_DISK_ACCOUNTING_replicas: fs_usage_data_type_to_base(usage, k.replicas.data_type, v[0]); break; + case BCH_DISK_ACCOUNTING_dev_data_type: + if (bch2_dev_exists2(c, k.dev_data_type.dev)) { + struct bch_dev *ca =3D bch_dev_bkey_exists(c, k.dev_data_type.dev); + struct bch_dev_usage_type __percpu *d =3D &ca->usage->d[k.dev_data_typ= e.data_type]; + + percpu_u64_set(&d->buckets, v[0]); + percpu_u64_set(&d->sectors, v[1]); + percpu_u64_set(&d->fragmented, v[2]); + + if (k.dev_data_type.data_type =3D=3D BCH_DATA_sb || + k.dev_data_type.data_type =3D=3D BCH_DATA_journal) + usage->hidden +=3D v[0] * ca->mi.bucket_size; + } + break; } } preempt_enable(); diff --git a/fs/bcachefs/disk_accounting.h b/fs/bcachefs/disk_accounting.h index 5fd053a819df..a8526bf43207 100644 --- a/fs/bcachefs/disk_accounting.h +++ b/fs/bcachefs/disk_accounting.h @@ -3,6 +3,7 @@ #define _BCACHEFS_DISK_ACCOUNTING_H =20 #include +#include "sb-members.h" =20 static inline void bch2_u64s_neg(u64 *v, unsigned nr) { @@ -126,6 +127,7 @@ static inline int __bch2_accounting_mem_add(struct bch_= fs *c, struct bkey_s_c_ac =20 static inline int bch2_accounting_mem_add(struct btree_trans *trans, struc= t bkey_s_c_accounting a) { + struct bch_fs *c =3D trans->c; struct disk_accounting_key acc_k; bpos_to_disk_accounting_key(&acc_k, a.k->p); =20 @@ -136,8 +138,15 @@ static inline int bch2_accounting_mem_add(struct btree= _trans *trans, struct bkey case BCH_DISK_ACCOUNTING_replicas: fs_usage_data_type_to_base(&trans->fs_usage_delta, acc_k.replicas.data_t= ype, a.v->d[0]); break; + case BCH_DISK_ACCOUNTING_dev_data_type: { + struct bch_dev *ca =3D bch_dev_bkey_exists(c, acc_k.dev_data_type.dev); + + this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].buckets, a.v->d= [0]); + this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].sectors, a.v->d= [1]); + this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].fragmented, a.v= ->d[2]); + } } - return __bch2_accounting_mem_add(trans->c, a); + return __bch2_accounting_mem_add(c, a); } =20 static inline void bch2_accounting_mem_read_counters(struct bch_fs *c, diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index 140393256f32..5a0ab3920382 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -368,20 +368,6 @@ static int journal_replay_entry_early(struct bch_fs *c, le64_to_cpu(u->v)); break; } - case BCH_JSET_ENTRY_dev_usage: { - struct jset_entry_dev_usage *u =3D - container_of(entry, struct jset_entry_dev_usage, entry); - struct bch_dev *ca =3D bch_dev_bkey_exists(c, le32_to_cpu(u->dev)); - unsigned i, nr_types =3D jset_entry_dev_usage_nr_types(u); - - for (i =3D 0; i < min_t(unsigned, nr_types, BCH_DATA_NR); i++) { - ca->usage_base->d[i].buckets =3D le64_to_cpu(u->d[i].buckets); - ca->usage_base->d[i].sectors =3D le64_to_cpu(u->d[i].sectors); - ca->usage_base->d[i].fragmented =3D le64_to_cpu(u->d[i].fragmented); - } - - break; - } case BCH_JSET_ENTRY_blacklist: { struct jset_entry_blacklist *bl_entry =3D container_of(entry, struct jset_entry_blacklist, entry); diff --git a/fs/bcachefs/sb-clean.c b/fs/bcachefs/sb-clean.c index 5980ba2563fe..a7f2cc774492 100644 --- a/fs/bcachefs/sb-clean.c +++ b/fs/bcachefs/sb-clean.c @@ -228,23 +228,6 @@ void bch2_journal_super_entries_add_common(struct bch_= fs *c, "embedded variable length struct"); } =20 - for_each_member_device(c, ca) { - unsigned b =3D sizeof(struct jset_entry_dev_usage) + - sizeof(struct jset_entry_dev_usage_type) * BCH_DATA_NR; - struct jset_entry_dev_usage *u =3D - container_of(jset_entry_init(end, b), - struct jset_entry_dev_usage, entry); - - u->entry.type =3D BCH_JSET_ENTRY_dev_usage; - u->dev =3D cpu_to_le32(ca->dev_idx); - - for (unsigned i =3D 0; i < BCH_DATA_NR; i++) { - u->d[i].buckets =3D cpu_to_le64(ca->usage_base->d[i].buckets); - u->d[i].sectors =3D cpu_to_le64(ca->usage_base->d[i].sectors); - u->d[i].fragmented =3D cpu_to_le64(ca->usage_base->d[i].fragmented); - } - } - percpu_up_read(&c->mark_lock); =20 for (unsigned i =3D 0; i < 2; i++) { --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-174.mta1.migadu.com (out-174.mta1.migadu.com [95.215.58.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79E6FA94D for ; Sun, 25 Feb 2024 02:38:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828723; cv=none; b=j1xCyMinpPAJAlF/D013XSifeQFrhKLIm07PK9/KcVucrEL0v0gvtPwYnyDe0V1zTDhFP91BsyUyKEhQI+hXmpWw9vrJUaw6z5AT/67t+JiIFabOU1WnDAliyaBYkoD7/8gCQNwyu/p3UPJ86bD4wzMnkLoPOQ38L+ilfREmilI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828723; c=relaxed/simple; bh=p7O3MH5Pv6ZYytgnV2uzMgTPCfEDxUNuHv6Bvd2aaxU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lfg3S5hYDoSZHJyK+kTNZTm6PR8Oa1tGglSfvWp/inlbYsjR7JxTsm922S0kcV8D1upAKcSVcjdejJ81ydHbbnSYczkolvOetsrBsAB6YinTI56W20LszvMzS6pz/NtHUbv8o+axsr0am2ag5TEKnwOdgX0WmRptjrilwUj0kMQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=d2sSfVLe; arc=none smtp.client-ip=95.215.58.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="d2sSfVLe" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828719; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LpqGNmxVz4nH/s+HZrDUNHReRjWltPhY2a8VCj1nHDg=; b=d2sSfVLeCn+Z852b+5f+xPu13H2zmwi9gFUfGFrtsmTCSf3LTon4K8IJSM6bwVmCNw2lG7 7Sa57u1Vi3FPOp77P/As+QRnYkx0/cljMXy1JBdts1G3Af1FVHyF97qPKePgv03wwEZxPy djVNmYb5U+qdpDcb2mbbph7W50k1YXc= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 06/21] bcachefs: Kill bch2_fs_usage_initialize() Date: Sat, 24 Feb 2024 21:38:08 -0500 Message-ID: <20240225023826.2413565-7-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Deleting code for the old disk accounting scheme. Signed-off-by: Kent Overstreet --- fs/bcachefs/buckets.c | 29 ----------------------------- fs/bcachefs/buckets.h | 2 -- fs/bcachefs/recovery.c | 2 -- 3 files changed, 33 deletions(-) diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c index 7540486ae266..054c4c8d9c1b 100644 --- a/fs/bcachefs/buckets.c +++ b/fs/bcachefs/buckets.c @@ -38,35 +38,6 @@ static inline struct bch_fs_usage *fs_usage_ptr(struct b= ch_fs *c, : c->usage[journal_seq & JOURNAL_BUF_MASK]); } =20 -void bch2_fs_usage_initialize(struct bch_fs *c) -{ - percpu_down_write(&c->mark_lock); - struct bch_fs_usage *usage =3D c->usage_base; - - for (unsigned i =3D 0; i < ARRAY_SIZE(c->usage); i++) - bch2_fs_usage_acc_to_base(c, i); - - for (unsigned i =3D 0; i < BCH_REPLICAS_MAX; i++) - usage->b.reserved +=3D usage->persistent_reserved[i]; - - for (unsigned i =3D 0; i < c->replicas.nr; i++) { - struct bch_replicas_entry_v1 *e =3D - cpu_replicas_entry(&c->replicas, i); - - fs_usage_data_type_to_base(&usage->b, e->data_type, usage->replicas[i]); - } - - for_each_member_device(c, ca) { - struct bch_dev_usage dev =3D bch2_dev_usage_read(ca); - - usage->b.hidden +=3D (dev.d[BCH_DATA_sb].buckets + - dev.d[BCH_DATA_journal].buckets) * - ca->mi.bucket_size; - } - - percpu_up_write(&c->mark_lock); -} - void bch2_dev_usage_read_fast(struct bch_dev *ca, struct bch_dev_usage *us= age) { memset(usage, 0, sizeof(*usage)); diff --git a/fs/bcachefs/buckets.h b/fs/bcachefs/buckets.h index f9a1d24c997b..4e14615c770e 100644 --- a/fs/bcachefs/buckets.h +++ b/fs/bcachefs/buckets.h @@ -316,8 +316,6 @@ void bch2_dev_usage_update_m(struct bch_fs *, struct bc= h_dev *, int bch2_update_replicas(struct bch_fs *, struct bkey_s_c, struct bch_replicas_entry_v1 *, s64); =20 -void bch2_fs_usage_initialize(struct bch_fs *); - int bch2_check_bucket_ref(struct btree_trans *, struct bkey_s_c, const struct bch_extent_ptr *, s64, enum bch_data_type, u8, u8, u32); diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index 5a0ab3920382..4936b18e5a58 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -426,8 +426,6 @@ static int journal_replay_early(struct bch_fs *c, } } =20 - bch2_fs_usage_initialize(c); - return 0; } =20 --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D472C2ED for ; Sun, 25 Feb 2024 02:38:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828724; cv=none; b=HjutlcGvtjMndjEZ69koehMyq90wBSddx/xGMsnFBAMQl9kuleJV2U47r2FRWVkZLHPqBesGF/cIQbd7WeYHY6OzGLv89M9ud72+NVrKWxGQ6j3rLDOYmgZUZQqatbdTTvMFoeOjTsRI4t7HLWVb3+lexKbcM9Gs5RjxsvI2vRE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828724; c=relaxed/simple; bh=6mSrCXsXi3Y+WGJ2l3uOvmTKcFFx9GJUElF7uHdTim4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EakVbCqyHE9YLeWw2lR6JB9IqR528TVCtnqrtWWk+w8shWm54ctr6OLQxAp0j9/OFWsxU2kB7862geToq9obEKqD7v+1yplf+w0ZOuV20e9zmzuxUf7a9/CmBxXeDZYln0Fu1ilPIGjHIG7maD8F3o8jPtybbs9TN+tQivzjHSg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=cuC2Dm84; arc=none smtp.client-ip=95.215.58.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="cuC2Dm84" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828720; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YboY8McwKZ8bbD5v2MLi4a2lKe9cbZ9uUy04W0G/mjo=; b=cuC2Dm84QZNf9uWwz0FZ7YV3pOrzRtNt9Hw0gOtM7PYFz1pDsv20+1XBSvVcWihPqreTth K90XUlAFGmpHfZZDqSr+M45J26yZ46f1vJJNdPY6gd2ts/hEd29xg2yTb9d32jq5lLDGYP +EzpRuwWfP3QR4318kESDtCr2PM6SJM= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 07/21] bcachefs: Convert bch2_ioctl_fs_usage() to new accounting Date: Sat, 24 Feb 2024 21:38:09 -0500 Message-ID: <20240225023826.2413565-8-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" This converts bch2_ioctl_fs_usage() to read from the new disk accounting, via bch2_fs_replicas_usage_read(). Signed-off-by: Kent Overstreet --- fs/bcachefs/chardev.c | 68 ++++++++++++------------------------------- 1 file changed, 19 insertions(+), 49 deletions(-) diff --git a/fs/bcachefs/chardev.c b/fs/bcachefs/chardev.c index 992939152f01..13ea852be153 100644 --- a/fs/bcachefs/chardev.c +++ b/fs/bcachefs/chardev.c @@ -5,6 +5,7 @@ #include "bcachefs_ioctl.h" #include "buckets.h" #include "chardev.h" +#include "disk_accounting.h" #include "journal.h" #include "move.h" #include "recovery.h" @@ -500,11 +501,11 @@ static long bch2_ioctl_data(struct bch_fs *c, static long bch2_ioctl_fs_usage(struct bch_fs *c, struct bch_ioctl_fs_usage __user *user_arg) { - struct bch_ioctl_fs_usage *arg =3D NULL; - struct bch_replicas_usage *dst_e, *dst_end; - struct bch_fs_usage_online *src; - u32 replica_entries_bytes; + struct bch_ioctl_fs_usage arg; + struct bch_fs_usage_online *src =3D NULL; + darray_char replicas =3D {}; unsigned i; + u32 replica_entries_bytes; int ret =3D 0; =20 if (!test_bit(BCH_FS_started, &c->flags)) @@ -513,9 +514,16 @@ static long bch2_ioctl_fs_usage(struct bch_fs *c, if (get_user(replica_entries_bytes, &user_arg->replica_entries_bytes)) return -EFAULT; =20 - arg =3D kzalloc(size_add(sizeof(*arg), replica_entries_bytes), GFP_KERNEL= ); - if (!arg) - return -ENOMEM; + ret =3D bch2_fs_replicas_usage_read(c, &replicas) ?: + (replica_entries_bytes < replicas.nr ? -ERANGE : 0) ?: + copy_to_user_errcode(&user_arg->replicas, replicas.data, replicas.nr); + if (ret) + goto err; + + arg.capacity =3D c->capacity; + arg.used =3D bch2_fs_sectors_used(c, src); + arg.online_reserved =3D src->online_reserved; + arg.replica_entries_bytes =3D replicas.nr; =20 src =3D bch2_fs_usage_read(c); if (!src) { @@ -523,52 +531,14 @@ static long bch2_ioctl_fs_usage(struct bch_fs *c, goto err; } =20 - arg->capacity =3D c->capacity; - arg->used =3D bch2_fs_sectors_used(c, src); - arg->online_reserved =3D src->online_reserved; - for (i =3D 0; i < BCH_REPLICAS_MAX; i++) - arg->persistent_reserved[i] =3D src->u.persistent_reserved[i]; - - dst_e =3D arg->replicas; - dst_end =3D (void *) arg->replicas + replica_entries_bytes; - - for (i =3D 0; i < c->replicas.nr; i++) { - struct bch_replicas_entry_v1 *src_e =3D - cpu_replicas_entry(&c->replicas, i); - - /* check that we have enough space for one replicas entry */ - if (dst_e + 1 > dst_end) { - ret =3D -ERANGE; - break; - } - - dst_e->sectors =3D src->u.replicas[i]; - dst_e->r =3D *src_e; - - /* recheck after setting nr_devs: */ - if (replicas_usage_next(dst_e) > dst_end) { - ret =3D -ERANGE; - break; - } - - memcpy(dst_e->r.devs, src_e->devs, src_e->nr_devs); - - dst_e =3D replicas_usage_next(dst_e); - } - - arg->replica_entries_bytes =3D (void *) dst_e - (void *) arg->replicas; - + arg.persistent_reserved[i] =3D src->u.persistent_reserved[i]; percpu_up_read(&c->mark_lock); - kfree(src); - - if (ret) - goto err; =20 - ret =3D copy_to_user_errcode(user_arg, arg, - sizeof(*arg) + arg->replica_entries_bytes); + ret =3D copy_to_user_errcode(user_arg, &arg, sizeof(arg)); err: - kfree(arg); + darray_exit(&replicas); + kfree(src); return ret; } =20 --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 742E9DDC4 for ; Sun, 25 Feb 2024 02:38:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828727; cv=none; b=ntpLDOOePepweODPFlWVv21t9aBxIFRKKKlS0De53hHJ5DU8vnjW/mfeJz8nCKSV/CT9FXB97IVuVVNHube2zDiteJxBkqEZRS4Ac5xb94SipWAODveIZNvwmaC9sEnBYrWRnEq4+ZSUicijmZ+x+3A2xRMmTwrj12GWtu7PThw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828727; c=relaxed/simple; bh=11AZzASOqgnR9hZpIfthFRQECtC3R7F7mXZu7eJ2AyY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mE5UrUnhjFZMx0doW7T8nLs9AkBmdIqYIvvr4AjlPoOByLoSG49PB8QyM7naLzSfp2GYfphUYGbKVblFuSZ9PgTqbcUCDZmdJBZoj9/jClQgHtOcocV/JtkhHWKtCWt0sbJRm4nvCOKM1Umu6jqxpZpry4vhML7Orm76n7rnqrE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=meWmYrgS; arc=none smtp.client-ip=95.215.58.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="meWmYrgS" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828721; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2Qqw98+K10fXhVd6VHbNJh2AoZpuXWUlc5V3sQ9k6Jk=; b=meWmYrgSRX7MdEz5MWTgRZSo+xAxpbBg6J3OUwA3jREQgddOHCPVIsX/otgbLgqfn95uUo jqYPtMcBepMrDoWz1yKqjc+kxvFRRz9npzVPvCsdjU8SXxnDWkCX9bO+tC+c3HFhnKcFAl UkJ3UDpXsIZsaHCp0REIjBFJIy2VL+c= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 08/21] bcachefs: kill bch2_fs_usage_read() Date: Sat, 24 Feb 2024 21:38:10 -0500 Message-ID: <20240225023826.2413565-9-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" With bch2_ioctl_fs_usage(), this is now dead code. Signed-off-by: Kent Overstreet --- fs/bcachefs/bcachefs.h | 4 ---- fs/bcachefs/buckets.c | 34 ---------------------------------- fs/bcachefs/buckets.h | 2 -- fs/bcachefs/chardev.c | 25 ++++++++++++------------- fs/bcachefs/replicas.c | 7 ------- fs/bcachefs/super.c | 2 -- 6 files changed, 12 insertions(+), 62 deletions(-) diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h index 91c40fde1925..5824cf57defd 100644 --- a/fs/bcachefs/bcachefs.h +++ b/fs/bcachefs/bcachefs.h @@ -912,10 +912,6 @@ struct bch_fs { struct bch_fs_usage __percpu *usage_gc; u64 __percpu *online_reserved; =20 - /* single element mempool: */ - struct mutex usage_scratch_lock; - struct bch_fs_usage_online *usage_scratch; - struct io_clock io_clock[2]; =20 /* JOURNAL SEQ BLACKLIST */ diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c index 054c4c8d9c1b..24b53f449313 100644 --- a/fs/bcachefs/buckets.c +++ b/fs/bcachefs/buckets.c @@ -64,40 +64,6 @@ u64 bch2_fs_usage_read_one(struct bch_fs *c, u64 *v) return ret; } =20 -struct bch_fs_usage_online *bch2_fs_usage_read(struct bch_fs *c) -{ - struct bch_fs_usage_online *ret; - unsigned nr_replicas =3D READ_ONCE(c->replicas.nr); - unsigned seq, i; -retry: - ret =3D kmalloc(__fs_usage_online_u64s(nr_replicas) * sizeof(u64), GFP_KE= RNEL); - if (unlikely(!ret)) - return NULL; - - percpu_down_read(&c->mark_lock); - - if (nr_replicas !=3D c->replicas.nr) { - nr_replicas =3D c->replicas.nr; - percpu_up_read(&c->mark_lock); - kfree(ret); - goto retry; - } - - ret->online_reserved =3D percpu_u64_get(c->online_reserved); - - do { - seq =3D read_seqcount_begin(&c->usage_lock); - unsafe_memcpy(&ret->u, c->usage_base, - __fs_usage_u64s(nr_replicas) * sizeof(u64), - "embedded variable length struct"); - for (i =3D 0; i < ARRAY_SIZE(c->usage); i++) - acc_u64s_percpu((u64 *) &ret->u, (u64 __percpu *) c->usage[i], - __fs_usage_u64s(nr_replicas)); - } while (read_seqcount_retry(&c->usage_lock, seq)); - - return ret; -} - void bch2_fs_usage_acc_to_base(struct bch_fs *c, unsigned idx) { unsigned u64s =3D fs_usage_u64s(c); diff --git a/fs/bcachefs/buckets.h b/fs/bcachefs/buckets.h index 4e14615c770e..356f725a4fad 100644 --- a/fs/bcachefs/buckets.h +++ b/fs/bcachefs/buckets.h @@ -296,8 +296,6 @@ static inline unsigned dev_usage_u64s(void) =20 u64 bch2_fs_usage_read_one(struct bch_fs *, u64 *); =20 -struct bch_fs_usage_online *bch2_fs_usage_read(struct bch_fs *); - void bch2_fs_usage_acc_to_base(struct bch_fs *, unsigned); =20 void bch2_fs_usage_to_text(struct printbuf *, diff --git a/fs/bcachefs/chardev.c b/fs/bcachefs/chardev.c index 13ea852be153..03a1339e8f3b 100644 --- a/fs/bcachefs/chardev.c +++ b/fs/bcachefs/chardev.c @@ -502,9 +502,7 @@ static long bch2_ioctl_fs_usage(struct bch_fs *c, struct bch_ioctl_fs_usage __user *user_arg) { struct bch_ioctl_fs_usage arg; - struct bch_fs_usage_online *src =3D NULL; darray_char replicas =3D {}; - unsigned i; u32 replica_entries_bytes; int ret =3D 0; =20 @@ -520,25 +518,26 @@ static long bch2_ioctl_fs_usage(struct bch_fs *c, if (ret) goto err; =20 + struct bch_fs_usage_short u =3D bch2_fs_usage_read_short(c); arg.capacity =3D c->capacity; - arg.used =3D bch2_fs_sectors_used(c, src); - arg.online_reserved =3D src->online_reserved; + arg.used =3D u.used; + arg.online_reserved =3D percpu_u64_get(c->online_reserved); arg.replica_entries_bytes =3D replicas.nr; =20 - src =3D bch2_fs_usage_read(c); - if (!src) { - ret =3D -ENOMEM; - goto err; - } + for (unsigned i =3D 0; i < BCH_REPLICAS_MAX; i++) { + struct disk_accounting_key k =3D { + .type =3D BCH_DISK_ACCOUNTING_persistent_reserved, + .persistent_reserved.nr_replicas =3D i, + }; =20 - for (i =3D 0; i < BCH_REPLICAS_MAX; i++) - arg.persistent_reserved[i] =3D src->u.persistent_reserved[i]; - percpu_up_read(&c->mark_lock); + bch2_accounting_mem_read(c, + disk_accounting_key_to_bpos(&k), + &arg.persistent_reserved[i], 1); + } =20 ret =3D copy_to_user_errcode(user_arg, &arg, sizeof(arg)); err: darray_exit(&replicas); - kfree(src); return ret; } =20 diff --git a/fs/bcachefs/replicas.c b/fs/bcachefs/replicas.c index dde581a49e28..d02eb03d2ebd 100644 --- a/fs/bcachefs/replicas.c +++ b/fs/bcachefs/replicas.c @@ -319,13 +319,10 @@ static int replicas_table_update(struct bch_fs *c, struct bch_replicas_cpu *new_r) { struct bch_fs_usage __percpu *new_usage[JOURNAL_BUF_NR]; - struct bch_fs_usage_online *new_scratch =3D NULL; struct bch_fs_usage __percpu *new_gc =3D NULL; struct bch_fs_usage *new_base =3D NULL; unsigned i, bytes =3D sizeof(struct bch_fs_usage) + sizeof(u64) * new_r->nr; - unsigned scratch_bytes =3D sizeof(struct bch_fs_usage_online) + - sizeof(u64) * new_r->nr; int ret =3D 0; =20 memset(new_usage, 0, sizeof(new_usage)); @@ -336,7 +333,6 @@ static int replicas_table_update(struct bch_fs *c, goto err; =20 if (!(new_base =3D kzalloc(bytes, GFP_KERNEL)) || - !(new_scratch =3D kmalloc(scratch_bytes, GFP_KERNEL)) || (c->usage_gc && !(new_gc =3D __alloc_percpu_gfp(bytes, sizeof(u64), GFP_KERNEL)))) goto err; @@ -355,12 +351,10 @@ static int replicas_table_update(struct bch_fs *c, for (i =3D 0; i < ARRAY_SIZE(new_usage); i++) swap(c->usage[i], new_usage[i]); swap(c->usage_base, new_base); - swap(c->usage_scratch, new_scratch); swap(c->usage_gc, new_gc); swap(c->replicas, *new_r); out: free_percpu(new_gc); - kfree(new_scratch); for (i =3D 0; i < ARRAY_SIZE(new_usage); i++) free_percpu(new_usage[i]); kfree(new_base); @@ -1024,7 +1018,6 @@ void bch2_fs_replicas_exit(struct bch_fs *c) { unsigned i; =20 - kfree(c->usage_scratch); for (i =3D 0; i < ARRAY_SIZE(c->usage); i++) free_percpu(c->usage[i]); kfree(c->usage_base); diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c index 685d54d0ddbb..a26472f4620a 100644 --- a/fs/bcachefs/super.c +++ b/fs/bcachefs/super.c @@ -783,8 +783,6 @@ static struct bch_fs *bch2_fs_alloc(struct bch_sb *sb, = struct bch_opts opts) =20 INIT_LIST_HEAD(&c->list); =20 - mutex_init(&c->usage_scratch_lock); - mutex_init(&c->bio_bounce_pages_lock); mutex_init(&c->snapshot_table_lock); init_rwsem(&c->snapshot_create_lock); --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50AA5EEC6 for ; Sun, 25 Feb 2024 02:38:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828726; cv=none; b=qmHV2z7ERKCqGCUnnRSioFbzab83i+m6QFOPcUHuwLttmkd8h67Q0wrSJNbUVM8ZcQadRK/Y+vf7bxI+Z4E+4C1x0PXv+gwInEUgrhnY58f1DdMyARH57vjOestvbrGemDlCWiNwysSuU+/wABBsT07CR/Bh1z14xHCo/ilHdaY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828726; c=relaxed/simple; bh=u2DvZcMqKB6og8Lq7jawbKRlhTvu+07YR4ER+eXC/KI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VCHbipdlgS7elrLoTJ7ObypNXCz+xGuwy/dRxDmMt/+SMA+DPy+9dfDdU4pB0jrIu+RxwFlkwa+Ysil/R7umr+t3qJQcN8mDW7lnU89tXl30Z/8d5+YkpRBrvH9pn1mBJ+CjPWRbAciDvgdt4SbT8zi7qfk7rv24S1zcjZrNIro= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=CHg59nt2; arc=none smtp.client-ip=95.215.58.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="CHg59nt2" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828722; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sochMrpvNfdQFlEjZm/ljf2msW7UTBhTHpI35Xx9Jzs=; b=CHg59nt2Qc17U0pd3Uar1bIzO02xmb907r29R+H1Ixq3v6WI/2XSZIJLjqKNjZ6nEm1rxD RhcM4nF1wtBLX7jXN0JoWApSoWtOFmwradLNdJ5wZqB+gnt8Bbs4UbGeDAU1nq6dOFNBYL gVrJR7dzjGlV4CQotVij2gybSgRaPH4= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 09/21] bcachefs: Kill writing old accounting to journal Date: Sat, 24 Feb 2024 21:38:11 -0500 Message-ID: <20240225023826.2413565-10-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" More ripping out of the old disk space accounting. Note that the new disk space accounting is incompatible with the old, and writing out old style disk space accounting with the new code is infeasible. This means upgrading and downgrading past this version requires regenerating accounting. Signed-off-by: Kent Overstreet --- fs/bcachefs/sb-clean.c | 45 ------------------------------------------ 1 file changed, 45 deletions(-) diff --git a/fs/bcachefs/sb-clean.c b/fs/bcachefs/sb-clean.c index a7f2cc774492..1af2785653f6 100644 --- a/fs/bcachefs/sb-clean.c +++ b/fs/bcachefs/sb-clean.c @@ -175,25 +175,6 @@ void bch2_journal_super_entries_add_common(struct bch_= fs *c, struct jset_entry **end, u64 journal_seq) { - percpu_down_read(&c->mark_lock); - - if (!journal_seq) { - for (unsigned i =3D 0; i < ARRAY_SIZE(c->usage); i++) - bch2_fs_usage_acc_to_base(c, i); - } else { - bch2_fs_usage_acc_to_base(c, journal_seq & JOURNAL_BUF_MASK); - } - - { - struct jset_entry_usage *u =3D - container_of(jset_entry_init(end, sizeof(*u)), - struct jset_entry_usage, entry); - - u->entry.type =3D BCH_JSET_ENTRY_usage; - u->entry.btree_id =3D BCH_FS_USAGE_inodes; - u->v =3D cpu_to_le64(c->usage_base->b.nr_inodes); - } - { struct jset_entry_usage *u =3D container_of(jset_entry_init(end, sizeof(*u)), @@ -204,32 +185,6 @@ void bch2_journal_super_entries_add_common(struct bch_= fs *c, u->v =3D cpu_to_le64(atomic64_read(&c->key_version)); } =20 - for (unsigned i =3D 0; i < BCH_REPLICAS_MAX; i++) { - struct jset_entry_usage *u =3D - container_of(jset_entry_init(end, sizeof(*u)), - struct jset_entry_usage, entry); - - u->entry.type =3D BCH_JSET_ENTRY_usage; - u->entry.btree_id =3D BCH_FS_USAGE_reserved; - u->entry.level =3D i; - u->v =3D cpu_to_le64(c->usage_base->persistent_reserved[i]); - } - - for (unsigned i =3D 0; i < c->replicas.nr; i++) { - struct bch_replicas_entry_v1 *e =3D - cpu_replicas_entry(&c->replicas, i); - struct jset_entry_data_usage *u =3D - container_of(jset_entry_init(end, sizeof(*u) + e->nr_devs), - struct jset_entry_data_usage, entry); - - u->entry.type =3D BCH_JSET_ENTRY_data_usage; - u->v =3D cpu_to_le64(c->usage_base->replicas[i]); - unsafe_memcpy(&u->r, e, replicas_entry_bytes(e), - "embedded variable length struct"); - } - - percpu_up_read(&c->mark_lock); - for (unsigned i =3D 0; i < 2; i++) { struct jset_entry_clock *clock =3D container_of(jset_entry_init(end, sizeof(*clock)), --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-170.mta1.migadu.com (out-170.mta1.migadu.com [95.215.58.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37DCFFC12 for ; Sun, 25 Feb 2024 02:38:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828727; cv=none; b=tsZI5o4MT4aHmR8lkeNNb+6yEmATpiefMnFKRgM5wuaTbkx4vjK0kTMAplGs3Qi4fNNYm7DNs5Nt7GKqPuDnoN5Ad2noZoVOR2cwitQQKQN+PFS3Mv7tBTjSPe5lCYnXmcwBqqRY9sb7JtYwMBlLyOxYSGBJlkpjYaTBqc5vRcs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828727; c=relaxed/simple; bh=4ofseTkI0NYTSy2SqN8qmDFsYCyfnkox/iwtJ5pAAVE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZutdYWYH7tqlUoJKEGttJCZnElpq16clbPPfgkiCQyXMUnEArDgvjHRgBzPNmtf+2yMvMX3e5JTS5AhH1yRIOBGraZGGFhIfsloeT6C9uUb18YQWYIRNhCqoeuy3iHDk95S998tj7a8zTIKRWJwsgRtDLgd0cdzsUPFFgPyjqfQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=A3/4XoC1; arc=none smtp.client-ip=95.215.58.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="A3/4XoC1" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828723; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=feltRpT2GfQrxVlgaHX2FneBv3ceqjBYGcQemtXMFOY=; b=A3/4XoC1dWRBf8ZzMlzDrBAEbZBt1AXPOutmDEPLieM+m75bBVp0nfoXVGtjp/bKQZRPpK 1CM8EAimSlVXFd7fO+2g2cg1EF2iHD7bbfpacZ0GjORDwdXORqRvt4iXXPLyrWv0Q+AZGo xSg8KXEbYVzTw1FmzAaSexVFA59CH5A= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 10/21] bcachefs: Delete journal-buf-sharded old style accounting Date: Sat, 24 Feb 2024 21:38:12 -0500 Message-ID: <20240225023826.2413565-11-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" More deletion of dead code. Signed-off-by: Kent Overstreet --- fs/bcachefs/bcachefs.h | 3 +- fs/bcachefs/btree_gc.c | 9 ++--- fs/bcachefs/buckets.c | 61 ++++----------------------------- fs/bcachefs/buckets.h | 4 --- fs/bcachefs/disk_accounting.c | 2 +- fs/bcachefs/recovery.c | 20 +---------- fs/bcachefs/replicas.c | 63 +++-------------------------------- fs/bcachefs/replicas.h | 4 --- fs/bcachefs/super.c | 2 ++ 9 files changed, 21 insertions(+), 147 deletions(-) diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h index 5824cf57defd..2e7c4d10c951 100644 --- a/fs/bcachefs/bcachefs.h +++ b/fs/bcachefs/bcachefs.h @@ -907,8 +907,7 @@ struct bch_fs { struct percpu_rw_semaphore mark_lock; =20 seqcount_t usage_lock; - struct bch_fs_usage *usage_base; - struct bch_fs_usage __percpu *usage[JOURNAL_BUF_NR]; + struct bch_fs_usage_base __percpu *usage; struct bch_fs_usage __percpu *usage_gc; u64 __percpu *online_reserved; =20 diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c index 93826749356e..15a8796197f3 100644 --- a/fs/bcachefs/btree_gc.c +++ b/fs/bcachefs/btree_gc.c @@ -1229,10 +1229,8 @@ static int bch2_gc_done(struct bch_fs *c, #define copy_fs_field(_err, _f, _msg, ...) \ copy_field(_err, _f, "fs has wrong " _msg, ##__VA_ARGS__) =20 - for (i =3D 0; i < ARRAY_SIZE(c->usage); i++) - bch2_fs_usage_acc_to_base(c, i); - __for_each_member_device(c, ca) { + /* XXX */ struct bch_dev_usage *dst =3D this_cpu_ptr(ca->usage); struct bch_dev_usage *src =3D (void *) bch2_acc_percpu_u64s((u64 __percpu *) ca->usage_gc, @@ -1249,8 +1247,10 @@ static int bch2_gc_done(struct bch_fs *c, } =20 { +#if 0 unsigned nr =3D fs_usage_u64s(c); - struct bch_fs_usage *dst =3D c->usage_base; + /* XX: */ + struct bch_fs_usage *dst =3D this_cpu_ptr(c->usage); struct bch_fs_usage *src =3D (void *) bch2_acc_percpu_u64s((u64 __percpu *) c->usage_gc, nr); =20 @@ -1290,6 +1290,7 @@ static int bch2_gc_done(struct bch_fs *c, copy_fs_field(fs_usage_replicas_wrong, replicas[i], "%s", buf.buf); } +#endif } =20 #undef copy_fs_field diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c index 24b53f449313..8476bd5cb3af 100644 --- a/fs/bcachefs/buckets.c +++ b/fs/bcachefs/buckets.c @@ -26,61 +26,12 @@ =20 #include =20 -static inline struct bch_fs_usage *fs_usage_ptr(struct bch_fs *c, - unsigned journal_seq, - bool gc) -{ - percpu_rwsem_assert_held(&c->mark_lock); - BUG_ON(!gc && !journal_seq); - - return this_cpu_ptr(gc - ? c->usage_gc - : c->usage[journal_seq & JOURNAL_BUF_MASK]); -} - void bch2_dev_usage_read_fast(struct bch_dev *ca, struct bch_dev_usage *us= age) { memset(usage, 0, sizeof(*usage)); acc_u64s_percpu((u64 *) usage, (u64 __percpu *) ca->usage, dev_usage_u64s= ()); } =20 -u64 bch2_fs_usage_read_one(struct bch_fs *c, u64 *v) -{ - ssize_t offset =3D v - (u64 *) c->usage_base; - unsigned i, seq; - u64 ret; - - BUG_ON(offset < 0 || offset >=3D fs_usage_u64s(c)); - percpu_rwsem_assert_held(&c->mark_lock); - - do { - seq =3D read_seqcount_begin(&c->usage_lock); - ret =3D *v; - - for (i =3D 0; i < ARRAY_SIZE(c->usage); i++) - ret +=3D percpu_u64_get((u64 __percpu *) c->usage[i] + offset); - } while (read_seqcount_retry(&c->usage_lock, seq)); - - return ret; -} - -void bch2_fs_usage_acc_to_base(struct bch_fs *c, unsigned idx) -{ - unsigned u64s =3D fs_usage_u64s(c); - - BUG_ON(idx >=3D ARRAY_SIZE(c->usage)); - - preempt_disable(); - write_seqcount_begin(&c->usage_lock); - - acc_u64s_percpu((u64 *) c->usage_base, - (u64 __percpu *) c->usage[idx], u64s); - percpu_memset(c->usage[idx], 0, u64s * sizeof(u64)); - - write_seqcount_end(&c->usage_lock); - preempt_enable(); -} - void bch2_fs_usage_to_text(struct printbuf *out, struct bch_fs *c, struct bch_fs_usage_online *fs_usage) @@ -142,17 +93,17 @@ __bch2_fs_usage_read_short(struct bch_fs *c) u64 data, reserved; =20 ret.capacity =3D c->capacity - - bch2_fs_usage_read_one(c, &c->usage_base->b.hidden); + percpu_u64_get(&c->usage->hidden); =20 - data =3D bch2_fs_usage_read_one(c, &c->usage_base->b.data) + - bch2_fs_usage_read_one(c, &c->usage_base->b.btree); - reserved =3D bch2_fs_usage_read_one(c, &c->usage_base->b.reserved) + + data =3D percpu_u64_get(&c->usage->data) + + percpu_u64_get(&c->usage->btree); + reserved =3D percpu_u64_get(&c->usage->reserved) + percpu_u64_get(c->online_reserved); =20 ret.used =3D min(ret.capacity, data + reserve_factor(reserved)); ret.free =3D ret.capacity - ret.used; =20 - ret.nr_inodes =3D bch2_fs_usage_read_one(c, &c->usage_base->b.nr_inodes); + ret.nr_inodes =3D percpu_u64_get(&c->usage->nr_inodes); =20 return ret; } @@ -461,7 +412,7 @@ void bch2_trans_account_disk_usage_change(struct btree_= trans *trans) =20 percpu_down_read(&c->mark_lock); preempt_disable(); - struct bch_fs_usage_base *dst =3D &fs_usage_ptr(c, trans->journal_res.seq= , false)->b; + struct bch_fs_usage_base *dst =3D this_cpu_ptr(c->usage); struct bch_fs_usage_base *src =3D &trans->fs_usage_delta; =20 s64 added =3D src->btree + src->data + src->reserved; diff --git a/fs/bcachefs/buckets.h b/fs/bcachefs/buckets.h index 356f725a4fad..dfdf1b3ee817 100644 --- a/fs/bcachefs/buckets.h +++ b/fs/bcachefs/buckets.h @@ -294,10 +294,6 @@ static inline unsigned dev_usage_u64s(void) return sizeof(struct bch_dev_usage) / sizeof(u64); } =20 -u64 bch2_fs_usage_read_one(struct bch_fs *, u64 *); - -void bch2_fs_usage_acc_to_base(struct bch_fs *, unsigned); - void bch2_fs_usage_to_text(struct printbuf *, struct bch_fs *, struct bch_fs_usage_online *); =20 diff --git a/fs/bcachefs/disk_accounting.c b/fs/bcachefs/disk_accounting.c index e0114d8eb5a8..f898323f72c7 100644 --- a/fs/bcachefs/disk_accounting.c +++ b/fs/bcachefs/disk_accounting.c @@ -314,7 +314,7 @@ int bch2_accounting_read(struct bch_fs *c) =20 percpu_down_read(&c->mark_lock); preempt_disable(); - struct bch_fs_usage_base *usage =3D &c->usage_base->b; + struct bch_fs_usage_base *usage =3D this_cpu_ptr(c->usage); =20 for (unsigned i =3D 0; i < acc->k.nr; i++) { struct disk_accounting_key k; diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index 4936b18e5a58..18fd71960d2e 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -344,28 +344,10 @@ static int journal_replay_entry_early(struct bch_fs *= c, container_of(entry, struct jset_entry_usage, entry); =20 switch (entry->btree_id) { - case BCH_FS_USAGE_reserved: - if (entry->level < BCH_REPLICAS_MAX) - c->usage_base->persistent_reserved[entry->level] =3D - le64_to_cpu(u->v); - break; - case BCH_FS_USAGE_inodes: - c->usage_base->b.nr_inodes =3D le64_to_cpu(u->v); - break; case BCH_FS_USAGE_key_version: - atomic64_set(&c->key_version, - le64_to_cpu(u->v)); + atomic64_set(&c->key_version, le64_to_cpu(u->v)); break; } - - break; - } - case BCH_JSET_ENTRY_data_usage: { - struct jset_entry_data_usage *u =3D - container_of(entry, struct jset_entry_data_usage, entry); - - ret =3D bch2_replicas_set_usage(c, &u->r, - le64_to_cpu(u->v)); break; } case BCH_JSET_ENTRY_blacklist: { diff --git a/fs/bcachefs/replicas.c b/fs/bcachefs/replicas.c index d02eb03d2ebd..6dca705eaf1f 100644 --- a/fs/bcachefs/replicas.c +++ b/fs/bcachefs/replicas.c @@ -318,46 +318,23 @@ static void __replicas_table_update_pcpu(struct bch_f= s_usage __percpu *dst_p, static int replicas_table_update(struct bch_fs *c, struct bch_replicas_cpu *new_r) { - struct bch_fs_usage __percpu *new_usage[JOURNAL_BUF_NR]; struct bch_fs_usage __percpu *new_gc =3D NULL; - struct bch_fs_usage *new_base =3D NULL; - unsigned i, bytes =3D sizeof(struct bch_fs_usage) + + unsigned bytes =3D sizeof(struct bch_fs_usage) + sizeof(u64) * new_r->nr; int ret =3D 0; =20 - memset(new_usage, 0, sizeof(new_usage)); - - for (i =3D 0; i < ARRAY_SIZE(new_usage); i++) - if (!(new_usage[i] =3D __alloc_percpu_gfp(bytes, - sizeof(u64), GFP_KERNEL))) - goto err; - - if (!(new_base =3D kzalloc(bytes, GFP_KERNEL)) || - (c->usage_gc && + if ((c->usage_gc && !(new_gc =3D __alloc_percpu_gfp(bytes, sizeof(u64), GFP_KERNEL)))) goto err; =20 - for (i =3D 0; i < ARRAY_SIZE(new_usage); i++) - if (c->usage[i]) - __replicas_table_update_pcpu(new_usage[i], new_r, - c->usage[i], &c->replicas); - if (c->usage_base) - __replicas_table_update(new_base, new_r, - c->usage_base, &c->replicas); if (c->usage_gc) __replicas_table_update_pcpu(new_gc, new_r, c->usage_gc, &c->replicas); =20 - for (i =3D 0; i < ARRAY_SIZE(new_usage); i++) - swap(c->usage[i], new_usage[i]); - swap(c->usage_base, new_base); swap(c->usage_gc, new_gc); swap(c->replicas, *new_r); out: free_percpu(new_gc); - for (i =3D 0; i < ARRAY_SIZE(new_usage); i++) - free_percpu(new_usage[i]); - kfree(new_base); return ret; err: bch_err(c, "error updating replicas table: memory allocation failure"); @@ -544,6 +521,8 @@ int bch2_replicas_gc_start(struct bch_fs *c, unsigned t= ypemask) */ int bch2_replicas_gc2(struct bch_fs *c) { + return 0; +#if 0 struct bch_replicas_cpu new =3D { 0 }; unsigned i, nr; int ret =3D 0; @@ -598,34 +577,7 @@ int bch2_replicas_gc2(struct bch_fs *c) mutex_unlock(&c->sb_lock); =20 return ret; -} - -int bch2_replicas_set_usage(struct bch_fs *c, - struct bch_replicas_entry_v1 *r, - u64 sectors) -{ - int ret, idx =3D bch2_replicas_entry_idx(c, r); - - if (idx < 0) { - struct bch_replicas_cpu n; - - n =3D cpu_replicas_add_entry(c, &c->replicas, r); - if (!n.entries) - return -BCH_ERR_ENOMEM_cpu_replicas; - - ret =3D replicas_table_update(c, &n); - if (ret) - return ret; - - kfree(n.entries); - - idx =3D bch2_replicas_entry_idx(c, r); - BUG_ON(ret < 0); - } - - c->usage_base->replicas[idx] =3D sectors; - - return 0; +#endif } =20 /* Replicas tracking - superblock: */ @@ -1016,11 +968,6 @@ unsigned bch2_dev_has_data(struct bch_fs *c, struct b= ch_dev *ca) =20 void bch2_fs_replicas_exit(struct bch_fs *c) { - unsigned i; - - for (i =3D 0; i < ARRAY_SIZE(c->usage); i++) - free_percpu(c->usage[i]); - kfree(c->usage_base); kfree(c->replicas.entries); kfree(c->replicas_gc.entries); } diff --git a/fs/bcachefs/replicas.h b/fs/bcachefs/replicas.h index f00c586f8cd9..eac2dff20423 100644 --- a/fs/bcachefs/replicas.h +++ b/fs/bcachefs/replicas.h @@ -54,10 +54,6 @@ int bch2_replicas_gc_end(struct bch_fs *, int); int bch2_replicas_gc_start(struct bch_fs *, unsigned); int bch2_replicas_gc2(struct bch_fs *); =20 -int bch2_replicas_set_usage(struct bch_fs *, - struct bch_replicas_entry_v1 *, - u64); - #define for_each_cpu_replicas_entry(_r, _i) \ for (_i =3D (_r)->entries; \ (void *) (_i) < (void *) (_r)->entries + (_r)->nr * (_r)->entry_size= ;\ diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c index a26472f4620a..30b41c8de309 100644 --- a/fs/bcachefs/super.c +++ b/fs/bcachefs/super.c @@ -567,6 +567,7 @@ static void __bch2_fs_free(struct bch_fs *c) =20 darray_exit(&c->btree_roots_extra); free_percpu(c->pcpu); + free_percpu(c->usage); mempool_exit(&c->large_bkey_pool); mempool_exit(&c->btree_bounce_pool); bioset_exit(&c->btree_bio); @@ -893,6 +894,7 @@ static struct bch_fs *bch2_fs_alloc(struct bch_sb *sb, = struct bch_opts opts) offsetof(struct btree_write_bio, wbio.bio)), BIOSET_NEED_BVECS) || !(c->pcpu =3D alloc_percpu(struct bch_fs_pcpu)) || + !(c->usage =3D alloc_percpu(struct bch_fs_usage_base)) || !(c->online_reserved =3D alloc_percpu(u64)) || mempool_init_kvmalloc_pool(&c->btree_bounce_pool, 1, c->opts.btree_node_size) || --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-181.mta1.migadu.com (out-181.mta1.migadu.com [95.215.58.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB4CE11718 for ; Sun, 25 Feb 2024 02:38:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828728; cv=none; b=fg/YDDzcdYYsdF5kbfWI6qArGtPDTLc8qUXI5dfsM0SFUY1A3tU7mif07/ks4mnGOjkAPwe3z26wKS4mC5uSwih+nFjCplAf2TpvJ/1TFgEcUgKUs9cOs7sULt1sI733+dNNAmtyyTXHVACHdoH33U/8OCqRdRdG5z16Z7Agzv0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828728; c=relaxed/simple; bh=E/Dw+70hY8yzNuYISXbZNy7QhvNTNxoGY+Dru6pQDaY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=C+s0rnO+VUToOx3Fyk9Hwjm24jz+GRD7m5vxaW4MV0AUZiV48PVsytA+ePDGfUQPuZL9vh3W6nleXuCMdotrBobuo+3kV9rl7pUvvgau7/sgXUjV73FoV/6BKaJEz7fYmbehOtxb81jsXyCZ/WQChjXQTuX+iGvxplVL+ZM625Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=QX5sk03l; arc=none smtp.client-ip=95.215.58.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="QX5sk03l" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828724; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7nYmc8JzzayqwmwTcyeDB+fspGtJHDUPXZk3mSOrddE=; b=QX5sk03loVgVOzDEg/keMY5owPiZTJ3+TdpipV+jUiDa++8JEBtz5FnYXvCpNkH5KQT6gy bD7x6WgvlMGsCwMbJ5P8aWaS6aqKDIPsAaBUZtNstJtQXdmQtSwXcmktZ7uUjhhkyXIDZS q7BL40Ei0WhoEKhdATFkWEamlBHafuM= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 11/21] bcachefs: Kill bch2_fs_usage_to_text() Date: Sat, 24 Feb 2024 21:38:13 -0500 Message-ID: <20240225023826.2413565-12-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Dead code. Signed-off-by: Kent Overstreet --- fs/bcachefs/buckets.c | 39 --------------------------------------- fs/bcachefs/buckets.h | 3 --- 2 files changed, 42 deletions(-) diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c index 8476bd5cb3af..c261fa3a0273 100644 --- a/fs/bcachefs/buckets.c +++ b/fs/bcachefs/buckets.c @@ -32,45 +32,6 @@ void bch2_dev_usage_read_fast(struct bch_dev *ca, struct= bch_dev_usage *usage) acc_u64s_percpu((u64 *) usage, (u64 __percpu *) ca->usage, dev_usage_u64s= ()); } =20 -void bch2_fs_usage_to_text(struct printbuf *out, - struct bch_fs *c, - struct bch_fs_usage_online *fs_usage) -{ - unsigned i; - - prt_printf(out, "capacity:\t\t\t%llu\n", c->capacity); - - prt_printf(out, "hidden:\t\t\t\t%llu\n", - fs_usage->u.b.hidden); - prt_printf(out, "data:\t\t\t\t%llu\n", - fs_usage->u.b.data); - prt_printf(out, "cached:\t\t\t\t%llu\n", - fs_usage->u.b.cached); - prt_printf(out, "reserved:\t\t\t%llu\n", - fs_usage->u.b.reserved); - prt_printf(out, "nr_inodes:\t\t\t%llu\n", - fs_usage->u.b.nr_inodes); - prt_printf(out, "online reserved:\t\t%llu\n", - fs_usage->online_reserved); - - for (i =3D 0; - i < ARRAY_SIZE(fs_usage->u.persistent_reserved); - i++) { - prt_printf(out, "%u replicas:\n", i + 1); - prt_printf(out, "\treserved:\t\t%llu\n", - fs_usage->u.persistent_reserved[i]); - } - - for (i =3D 0; i < c->replicas.nr; i++) { - struct bch_replicas_entry_v1 *e =3D - cpu_replicas_entry(&c->replicas, i); - - prt_printf(out, "\t"); - bch2_replicas_entry_to_text(out, e); - prt_printf(out, ":\t%llu\n", fs_usage->u.replicas[i]); - } -} - static u64 reserve_factor(u64 r) { return r + (round_up(r, (1 << RESERVE_FACTOR)) >> RESERVE_FACTOR); diff --git a/fs/bcachefs/buckets.h b/fs/bcachefs/buckets.h index dfdf1b3ee817..ccf9813c65e7 100644 --- a/fs/bcachefs/buckets.h +++ b/fs/bcachefs/buckets.h @@ -294,9 +294,6 @@ static inline unsigned dev_usage_u64s(void) return sizeof(struct bch_dev_usage) / sizeof(u64); } =20 -void bch2_fs_usage_to_text(struct printbuf *, - struct bch_fs *, struct bch_fs_usage_online *); - u64 bch2_fs_sectors_used(struct bch_fs *, struct bch_fs_usage_online *); =20 struct bch_fs_usage_short --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-186.mta1.migadu.com (out-186.mta1.migadu.com [95.215.58.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6D3A134A9 for ; Sun, 25 Feb 2024 02:38:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.186 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828728; cv=none; b=Jvl+UMWEuCNHFbqv37zbRwxPXaQKG0jCHM4rnAoJlyobW0lC2+NmjhOgj7tSs7NExL9VWtTVnrhVmnuUTyxPZXmoSGBMW3fVP3WqQqE5y9bGqMmqlcsJilR05CElZbWKK3xhwy2+wvCQ9AoG53PUcmOROp6JPPRj8xF9o4d/Bdw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828728; c=relaxed/simple; bh=DkSMbHGXD7jvgfcyN0h8VxzJwADDueaJ+dPGY9kVlEo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gmCkgsI+uOOFQThSN3hjB20YfJppfX/Gw8zGDVQ89te7cvzM2j8hsv0Q/mMMM6FZFLFe6gEs45DvZtCNZg272btDwnfy6xot1axIZDOA3EnIDoHLt6ChMxSJGor0kgfOVUZy69zgs5usEAk+qSowbUAbsckstwhgKKpHtUHp2Vo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=iRFP2ifW; arc=none smtp.client-ip=95.215.58.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="iRFP2ifW" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828725; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+XxTvZ+w+R//l1/8+X2sA3MHcm6KcV5Vz+RJqp7UJC8=; b=iRFP2ifW2GoApeZJ1zZOyXzDJTb4hPXB2Bn1jIWeMOr8zpgWXdwf76Iuwyr4tE5SoWJgWj TBRmLrUg3wHfNe1Vd4J6EHDyAKHn7yD/44qN6d2rr0Ro2rh64NN4lRTfppGIturZovj2V5 n0Fog7fhMrzvJqNAvbB89DYQBytzCcQ= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 12/21] bcachefs: Kill fs_usage_online Date: Sat, 24 Feb 2024 21:38:14 -0500 Message-ID: <20240225023826.2413565-13-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" More dead code deletion. Signed-off-by: Kent Overstreet --- fs/bcachefs/buckets.c | 10 ---------- fs/bcachefs/buckets.h | 12 ------------ fs/bcachefs/buckets_types.h | 5 ----- 3 files changed, 27 deletions(-) diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c index c261fa3a0273..5e2b9aa93241 100644 --- a/fs/bcachefs/buckets.c +++ b/fs/bcachefs/buckets.c @@ -37,16 +37,6 @@ static u64 reserve_factor(u64 r) return r + (round_up(r, (1 << RESERVE_FACTOR)) >> RESERVE_FACTOR); } =20 -u64 bch2_fs_sectors_used(struct bch_fs *c, struct bch_fs_usage_online *fs_= usage) -{ - return min(fs_usage->u.b.hidden + - fs_usage->u.b.btree + - fs_usage->u.b.data + - reserve_factor(fs_usage->u.b.reserved + - fs_usage->online_reserved), - c->capacity); -} - static struct bch_fs_usage_short __bch2_fs_usage_read_short(struct bch_fs *c) { diff --git a/fs/bcachefs/buckets.h b/fs/bcachefs/buckets.h index ccf9813c65e7..f9d8d7b9fbd1 100644 --- a/fs/bcachefs/buckets.h +++ b/fs/bcachefs/buckets.h @@ -279,23 +279,11 @@ static inline unsigned fs_usage_u64s(struct bch_fs *c) return __fs_usage_u64s(READ_ONCE(c->replicas.nr)); } =20 -static inline unsigned __fs_usage_online_u64s(unsigned nr_replicas) -{ - return sizeof(struct bch_fs_usage_online) / sizeof(u64) + nr_replicas; -} - -static inline unsigned fs_usage_online_u64s(struct bch_fs *c) -{ - return __fs_usage_online_u64s(READ_ONCE(c->replicas.nr)); -} - static inline unsigned dev_usage_u64s(void) { return sizeof(struct bch_dev_usage) / sizeof(u64); } =20 -u64 bch2_fs_sectors_used(struct bch_fs *, struct bch_fs_usage_online *); - struct bch_fs_usage_short bch2_fs_usage_read_short(struct bch_fs *); =20 diff --git a/fs/bcachefs/buckets_types.h b/fs/bcachefs/buckets_types.h index baa7e0924390..570acdf455bb 100644 --- a/fs/bcachefs/buckets_types.h +++ b/fs/bcachefs/buckets_types.h @@ -61,11 +61,6 @@ struct bch_fs_usage { u64 replicas[]; }; =20 -struct bch_fs_usage_online { - u64 online_reserved; - struct bch_fs_usage u; -}; - struct bch_fs_usage_short { u64 capacity; u64 used; --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-183.mta1.migadu.com (out-183.mta1.migadu.com [95.215.58.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4D0D14290; Sun, 25 Feb 2024 02:38:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828729; cv=none; b=cP7nhWJ0qmYicwVOLYF/Hx8hOb01kDLg0BjHL5v7zx9On3xdZILsRkTRHQ2jDmESOKv+o/ExcP1glDOpZxb9uV7qr5UVff8R2NbObz1lkcwi42zAoTJ0Ke6e84yGb6/FWHxhbLY8i40ZSIRzBD+qJGh7x5Dji99hhU+AOnlbNvs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828729; c=relaxed/simple; bh=kMbO4D6+stofxnjriibFLbcNaDrcicjpLGuvPqCK11g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dNYFV971j0KB0OF8iDGjH/lVnoP/0+6B2YTV/VOQkeIBpmWubLLmZwKtF6o6i6rEKJfEdmPzpPOkLAHjj3rwovb7tebRc9B9rE8gNONj4gm9tpYeH6jAAaeFW14Srzq7sGdiehcon2ojrQR/54oXeCB+42OuGIohQHVSho6fz5g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=mQAl7rg9; arc=none smtp.client-ip=95.215.58.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="mQAl7rg9" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828726; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3edXSLQswq3o2Ph2NSZVb78JLMRJ7gBJFLnjH1EUAXg=; b=mQAl7rg9h1+1IHylCVADyGyzkZr/GSbWxQva1XfNkPAt2v8iF8j36ZGPKMZcH3FpO2ZJlV mqAKIe3WpFCaH+SHL9OLToeuVfDRsT+zrbxzLd2tHXiJkI/pe5HnsEWbzu6H0vjzjfiHiD mwcKfh5u+5OFSgvOG5fAeL9IaTyy7HM= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 13/21] bcachefs: Kill replicas_journal_res Date: Sat, 24 Feb 2024 21:38:15 -0500 Message-ID: <20240225023826.2413565-14-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" More dead code deletion Signed-off-by: Kent Overstreet --- fs/bcachefs/bcachefs.h | 2 -- fs/bcachefs/replicas.c | 34 ---------------------------------- fs/bcachefs/super.c | 21 --------------------- 3 files changed, 57 deletions(-) diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h index 2e7c4d10c951..22dc455cb436 100644 --- a/fs/bcachefs/bcachefs.h +++ b/fs/bcachefs/bcachefs.h @@ -768,9 +768,7 @@ struct bch_fs { struct mutex replicas_gc_lock; =20 struct journal_entry_res btree_root_journal_res; - struct journal_entry_res replicas_journal_res; struct journal_entry_res clock_journal_res; - struct journal_entry_res dev_usage_journal_res; =20 struct bch_disk_groups_cpu __rcu *disk_groups; =20 diff --git a/fs/bcachefs/replicas.c b/fs/bcachefs/replicas.c index 6dca705eaf1f..427dc6711427 100644 --- a/fs/bcachefs/replicas.c +++ b/fs/bcachefs/replicas.c @@ -342,32 +342,6 @@ static int replicas_table_update(struct bch_fs *c, goto out; } =20 -static unsigned reserve_journal_replicas(struct bch_fs *c, - struct bch_replicas_cpu *r) -{ - struct bch_replicas_entry_v1 *e; - unsigned journal_res_u64s =3D 0; - - /* nr_inodes: */ - journal_res_u64s +=3D - DIV_ROUND_UP(sizeof(struct jset_entry_usage), sizeof(u64)); - - /* key_version: */ - journal_res_u64s +=3D - DIV_ROUND_UP(sizeof(struct jset_entry_usage), sizeof(u64)); - - /* persistent_reserved: */ - journal_res_u64s +=3D - DIV_ROUND_UP(sizeof(struct jset_entry_usage), sizeof(u64)) * - BCH_REPLICAS_MAX; - - for_each_cpu_replicas_entry(r, e) - journal_res_u64s +=3D - DIV_ROUND_UP(sizeof(struct jset_entry_data_usage) + - e->nr_devs, sizeof(u64)); - return journal_res_u64s; -} - noinline static int bch2_mark_replicas_slowpath(struct bch_fs *c, struct bch_replicas_entry_v1 *new_entry) @@ -401,10 +375,6 @@ static int bch2_mark_replicas_slowpath(struct bch_fs *= c, ret =3D bch2_cpu_replicas_to_sb_replicas(c, &new_r); if (ret) goto err; - - bch2_journal_entry_res_resize(&c->journal, - &c->replicas_journal_res, - reserve_journal_replicas(c, &new_r)); } =20 if (!new_r.entries && @@ -974,9 +944,5 @@ void bch2_fs_replicas_exit(struct bch_fs *c) =20 int bch2_fs_replicas_init(struct bch_fs *c) { - bch2_journal_entry_res_resize(&c->journal, - &c->replicas_journal_res, - reserve_journal_replicas(c, &c->replicas)); - return replicas_table_update(c, &c->replicas); } diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c index 30b41c8de309..89c481831608 100644 --- a/fs/bcachefs/super.c +++ b/fs/bcachefs/super.c @@ -221,22 +221,6 @@ struct bch_fs *bch2_uuid_to_fs(__uuid_t uuid) return c; } =20 -static void bch2_dev_usage_journal_reserve(struct bch_fs *c) -{ - unsigned nr =3D 0, u64s =3D - ((sizeof(struct jset_entry_dev_usage) + - sizeof(struct jset_entry_dev_usage_type) * BCH_DATA_NR)) / - sizeof(u64); - - rcu_read_lock(); - for_each_member_device_rcu(c, ca, NULL) - nr++; - rcu_read_unlock(); - - bch2_journal_entry_res_resize(&c->journal, - &c->dev_usage_journal_res, u64s * nr); -} - /* Filesystem RO/RW: */ =20 /* @@ -940,7 +924,6 @@ static struct bch_fs *bch2_fs_alloc(struct bch_sb *sb, = struct bch_opts opts) bch2_journal_entry_res_resize(&c->journal, &c->btree_root_journal_res, BTREE_ID_NR * (JSET_KEYS_U64s + BKEY_BTREE_PTR_U64s_MAX)); - bch2_dev_usage_journal_reserve(c); bch2_journal_entry_res_resize(&c->journal, &c->clock_journal_res, (sizeof(struct jset_entry_clock) / sizeof(u64)) * 2); @@ -1680,8 +1663,6 @@ int bch2_dev_remove(struct bch_fs *c, struct bch_dev = *ca, int flags) =20 mutex_unlock(&c->sb_lock); up_write(&c->state_lock); - - bch2_dev_usage_journal_reserve(c); return 0; err: if (ca->mi.state =3D=3D BCH_MEMBER_STATE_rw && @@ -1791,8 +1772,6 @@ int bch2_dev_add(struct bch_fs *c, const char *path) bch2_write_super(c); mutex_unlock(&c->sb_lock); =20 - bch2_dev_usage_journal_reserve(c); - ret =3D bch2_dev_usage_init(ca); if (ret) goto err_late; --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-188.mta1.migadu.com (out-188.mta1.migadu.com [95.215.58.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E562168DA for ; Sun, 25 Feb 2024 02:38:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828733; cv=none; b=JB4sUai3dlJ8cyi+hObd9DEuZUhGQzx3/9b7Yuxb/N9ONwSHy+SogVWOWEyF4dhFjVXC6qlxl2umN9LmIPkP16qLIfdgyHVICnzCReEDvi3dyrCDsFk269+jjayLXtRHMDrZhQMiUt+s/JSi5GT5YXR9e/f8RoSjphP45rc7CBo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828733; c=relaxed/simple; bh=Na222wNarH0rNKP8Cur5R0WSBSTXS7OOq3fMMHi1W9w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ifJw76oPuyh5dLBf7rkUZsBqVW7qZ7v2lah4JfOVMMcMOZxpHHipWu70b4rmsh0E6yPgfaVCcAne0bL4xD37pYB8HgNBFs9OoVLKABQe1Lwvd+gxtOsjyXW7/zIr714ZHcTRA6IXamh+zXNCqj7vJqFnGBxymet30nshVL981qs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=bn14oTRh; arc=none smtp.client-ip=95.215.58.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="bn14oTRh" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828727; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EbUhq312t/ujD/5UG8QAGFllKHwHozY0sW6mmTZ6ZAg=; b=bn14oTRhbAz1DzfQxtsQ0EIOmJJero91RqdDt5e6rIIKFno/DqERwr1CqAB+v7uzC3iGvp 3Tp4f2RIv9O1RxQko0h+Bif6mNgLnuVpCa2iY+j0sv9Ymk155LDxpur7zMYjx4NZiaH4FI wc25Q6Dayxmicj4npQRYDzT7dIQShqw= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 14/21] bcachefs: Convert gc to new accounting Date: Sat, 24 Feb 2024 21:38:16 -0500 Message-ID: <20240225023826.2413565-15-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Rewrite fsck/gc for the new accounting scheme. This adds a second set of in-memory accounting counters for gc to use; like with other parts of gc we run all trigger in TRIGGER_GC mode, then compare what we calculated to existing in-memory accounting at the end. Signed-off-by: Kent Overstreet --- fs/bcachefs/alloc_background.c | 181 +++++++++++----------- fs/bcachefs/alloc_background.h | 2 + fs/bcachefs/bcachefs.h | 4 +- fs/bcachefs/btree_gc.c | 257 ++++++++++--------------------- fs/bcachefs/btree_trans_commit.c | 4 +- fs/bcachefs/buckets.c | 182 ++++------------------ fs/bcachefs/buckets.h | 20 +-- fs/bcachefs/buckets_types.h | 7 - fs/bcachefs/disk_accounting.c | 171 +++++++++++++++++--- fs/bcachefs/disk_accounting.h | 81 ++++++---- fs/bcachefs/ec.c | 148 +++++++++--------- fs/bcachefs/inode.c | 43 ++---- fs/bcachefs/recovery.c | 3 +- fs/bcachefs/replicas.c | 86 +---------- fs/bcachefs/replicas.h | 1 - fs/bcachefs/super.c | 9 +- 16 files changed, 508 insertions(+), 691 deletions(-) diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c index d8ad5bb28a7f..54cb345b104c 100644 --- a/fs/bcachefs/alloc_background.c +++ b/fs/bcachefs/alloc_background.c @@ -731,6 +731,96 @@ static noinline int bch2_bucket_gen_update(struct btre= e_trans *trans, return ret; } =20 +static int bch2_alloc_key_to_dev_counters(struct btree_trans *trans, struc= t bch_dev *ca, + const struct bch_alloc_v4 *old_a, + const struct bch_alloc_v4 *new_a, + unsigned flags) +{ + bool gc =3D flags & BTREE_TRIGGER_GC; + + if ((flags & BTREE_TRIGGER_BUCKET_INVALIDATE) && + old_a->cached_sectors) { + int ret =3D bch2_mod_dev_cached_sectors(trans, ca->dev_idx, + -((s64) old_a->cached_sectors), gc); + if (ret) + return ret; + } + + if (old_a->data_type !=3D new_a->data_type || + old_a->dirty_sectors !=3D new_a->dirty_sectors) { + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_dev_data_type, + .dev_data_type.dev =3D ca->dev_idx, + .dev_data_type.data_type =3D new_a->data_type, + }; + s64 d[3]; + + if (old_a->data_type =3D=3D new_a->data_type) { + d[0] =3D 0; + d[1] =3D (s64) new_a->dirty_sectors - (s64) old_a->dirty_sectors; + d[2] =3D bucket_sectors_fragmented(ca, *new_a) - + bucket_sectors_fragmented(ca, *old_a); + + int ret =3D bch2_disk_accounting_mod(trans, &acc, d, 3, gc); + if (ret) + return ret; + } else { + d[0] =3D 1; + d[1] =3D new_a->dirty_sectors; + d[2] =3D bucket_sectors_fragmented(ca, *new_a); + + int ret =3D bch2_disk_accounting_mod(trans, &acc, d, 3, gc); + if (ret) + return ret; + + acc.dev_data_type.data_type =3D old_a->data_type; + d[0] =3D -1; + d[1] =3D -(s64) old_a->dirty_sectors; + d[2] =3D -bucket_sectors_fragmented(ca, *old_a); + + ret =3D bch2_disk_accounting_mod(trans, &acc, d, 3, gc); + if (ret) + return ret; + } + } + + if (!!old_a->stripe !=3D !!new_a->stripe) { + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_dev_stripe_buckets, + .dev_stripe_buckets.dev =3D ca->dev_idx, + }; + u64 d[1]; + + d[0] =3D (s64) !!new_a->stripe - (s64) !!old_a->stripe; + int ret =3D bch2_disk_accounting_mod(trans, &acc, d, 1, gc); + if (ret) + return ret; + } + + return 0; +} + +static inline struct bch_alloc_v4 bucket_m_to_alloc(struct bucket b) +{ + return (struct bch_alloc_v4) { + .gen =3D b.gen, + .data_type =3D b.data_type, + .dirty_sectors =3D b.dirty_sectors, + .cached_sectors =3D b.cached_sectors, + .stripe =3D b.stripe, + }; +} + +int bch2_bucket_to_dev_counters(struct btree_trans *trans, struct bch_dev = *ca, + struct bucket *old, struct bucket *new, + unsigned flags) +{ + struct bch_alloc_v4 old_a =3D bucket_m_to_alloc(*old); + struct bch_alloc_v4 new_a =3D bucket_m_to_alloc(*new); + + return bch2_alloc_key_to_dev_counters(trans, ca, &old_a, &new_a, flags); +} + int bch2_trigger_alloc(struct btree_trans *trans, enum btree_id btree, unsigned level, struct bkey_s_c old, struct bkey_s new, @@ -807,70 +897,9 @@ int bch2_trigger_alloc(struct btree_trans *trans, return ret; } =20 - /* - * need to know if we're getting called from the invalidate path or - * not: - */ - - if ((flags & BTREE_TRIGGER_BUCKET_INVALIDATE) && - old_a->cached_sectors) { - ret =3D bch2_mod_dev_cached_sectors(trans, new.k->p.inode, - -((s64) old_a->cached_sectors)); - if (ret) - return ret; - } - - - if (old_a->data_type !=3D new_a->data_type || - old_a->dirty_sectors !=3D new_a->dirty_sectors) { - struct disk_accounting_key acc =3D { - .type =3D BCH_DISK_ACCOUNTING_dev_data_type, - .dev_data_type.dev =3D new.k->p.inode, - .dev_data_type.data_type =3D new_a->data_type, - }; - s64 d[3]; - - if (old_a->data_type =3D=3D new_a->data_type) { - d[0] =3D 0; - d[1] =3D (s64) new_a->dirty_sectors - (s64) old_a->dirty_sectors; - d[2] =3D bucket_sectors_fragmented(ca, *new_a) - - bucket_sectors_fragmented(ca, *old_a); - - ret =3D bch2_disk_accounting_mod(trans, &acc, d, 3); - if (ret) - return ret; - } else { - d[0] =3D 1; - d[1] =3D new_a->dirty_sectors; - d[2] =3D bucket_sectors_fragmented(ca, *new_a); - - ret =3D bch2_disk_accounting_mod(trans, &acc, d, 3); - if (ret) - return ret; - - acc.dev_data_type.data_type =3D old_a->data_type; - d[0] =3D -1; - d[1] =3D -(s64) old_a->dirty_sectors; - d[2] =3D -bucket_sectors_fragmented(ca, *old_a); - - ret =3D bch2_disk_accounting_mod(trans, &acc, d, 3); - if (ret) - return ret; - } - } - - if (!!old_a->stripe !=3D !!new_a->stripe) { - struct disk_accounting_key acc =3D { - .type =3D BCH_DISK_ACCOUNTING_dev_stripe_buckets, - .dev_stripe_buckets.dev =3D new.k->p.inode, - }; - u64 d[1]; - - d[0] =3D (s64) !!new_a->stripe - (s64) !!old_a->stripe; - ret =3D bch2_disk_accounting_mod(trans, &acc, d, 1); - if (ret) - return ret; - } + ret =3D bch2_alloc_key_to_dev_counters(trans, ca, old_a, new_a, flags); + if (ret) + return ret; } =20 if ((flags & BTREE_TRIGGER_ATOMIC) && (flags & BTREE_TRIGGER_INSERT)) { @@ -938,30 +967,6 @@ int bch2_trigger_alloc(struct btree_trans *trans, bch2_do_gc_gens(c); } =20 - if ((flags & BTREE_TRIGGER_GC) && - (flags & BTREE_TRIGGER_BUCKET_INVALIDATE)) { - struct bch_alloc_v4 new_a_convert; - const struct bch_alloc_v4 *new_a =3D bch2_alloc_to_v4(new.s_c, &new_a_co= nvert); - - percpu_down_read(&c->mark_lock); - struct bucket *g =3D gc_bucket(ca, new.k->p.offset); - - bucket_lock(g); - - g->gen_valid =3D 1; - g->gen =3D new_a->gen; - g->data_type =3D new_a->data_type; - g->stripe =3D new_a->stripe; - g->stripe_redundancy =3D new_a->stripe_redundancy; - g->dirty_sectors =3D new_a->dirty_sectors; - g->cached_sectors =3D new_a->cached_sectors; - - bucket_unlock(g); - percpu_up_read(&c->mark_lock); - - bch2_dev_usage_update(c, ca, old_a, new_a); - } - return 0; } =20 diff --git a/fs/bcachefs/alloc_background.h b/fs/bcachefs/alloc_background.h index 052b2fac25d6..6f273a456a6d 100644 --- a/fs/bcachefs/alloc_background.h +++ b/fs/bcachefs/alloc_background.h @@ -228,6 +228,8 @@ static inline bool bkey_is_alloc(const struct bkey *k) =20 int bch2_alloc_read(struct bch_fs *); =20 +int bch2_bucket_to_dev_counters(struct btree_trans *, struct bch_dev *, + struct bucket *, struct bucket *, unsigned); int bch2_trigger_alloc(struct btree_trans *, enum btree_id, unsigned, struct bkey_s_c, struct bkey_s, unsigned); int bch2_check_alloc_info(struct bch_fs *); diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h index 22dc455cb436..41c436c608cf 100644 --- a/fs/bcachefs/bcachefs.h +++ b/fs/bcachefs/bcachefs.h @@ -577,7 +577,6 @@ struct bch_dev { struct rw_semaphore bucket_lock; =20 struct bch_dev_usage __percpu *usage; - struct bch_dev_usage __percpu *usage_gc; =20 /* Allocator: */ u64 new_fs_bucket_idx; @@ -761,7 +760,7 @@ struct bch_fs { =20 struct bch_dev __rcu *devs[BCH_SB_MEMBERS_MAX]; =20 - struct bch_accounting_mem accounting; + struct bch_accounting_mem accounting[2]; =20 struct bch_replicas_cpu replicas; struct bch_replicas_cpu replicas_gc; @@ -906,7 +905,6 @@ struct bch_fs { =20 seqcount_t usage_lock; struct bch_fs_usage_base __percpu *usage; - struct bch_fs_usage __percpu *usage_gc; u64 __percpu *online_reserved; =20 struct io_clock io_clock[2]; diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c index 15a8796197f3..54a90e88f5b8 100644 --- a/fs/bcachefs/btree_gc.c +++ b/fs/bcachefs/btree_gc.c @@ -18,6 +18,7 @@ #include "buckets.h" #include "clock.h" #include "debug.h" +#include "disk_accounting.h" #include "ec.h" #include "error.h" #include "extents.h" @@ -1115,10 +1116,10 @@ static int bch2_gc_btrees(struct bch_fs *c, bool in= itial, bool metadata_only) return ret; } =20 -static void mark_metadata_sectors(struct bch_fs *c, struct bch_dev *ca, - u64 start, u64 end, - enum bch_data_type type, - unsigned flags) +static int mark_metadata_sectors(struct btree_trans *trans, struct bch_dev= *ca, + u64 start, u64 end, + enum bch_data_type type, + unsigned flags) { u64 b =3D sector_to_bucket(ca, start); =20 @@ -1126,48 +1127,68 @@ static void mark_metadata_sectors(struct bch_fs *c,= struct bch_dev *ca, unsigned sectors =3D min_t(u64, bucket_to_sector(ca, b + 1), end) - start; =20 - bch2_mark_metadata_bucket(c, ca, b, type, sectors, - gc_phase(GC_PHASE_SB), flags); + int ret =3D bch2_mark_metadata_bucket(trans, ca, b, type, sectors, + gc_phase(GC_PHASE_SB), flags); + if (ret) + return ret; + b++; start +=3D sectors; } while (start < end); + + return 0; } =20 -static void bch2_mark_dev_superblock(struct bch_fs *c, struct bch_dev *ca, - unsigned flags) +static int bch2_mark_dev_superblock(struct btree_trans *trans, struct bch_= dev *ca, + unsigned flags) { struct bch_sb_layout *layout =3D &ca->disk_sb.sb->layout; - unsigned i; - u64 b; =20 - for (i =3D 0; i < layout->nr_superblocks; i++) { + for (unsigned i =3D 0; i < layout->nr_superblocks; i++) { u64 offset =3D le64_to_cpu(layout->sb_offset[i]); =20 - if (offset =3D=3D BCH_SB_SECTOR) - mark_metadata_sectors(c, ca, 0, BCH_SB_SECTOR, - BCH_DATA_sb, flags); + if (offset =3D=3D BCH_SB_SECTOR) { + int ret =3D mark_metadata_sectors(trans, ca, 0, BCH_SB_SECTOR, + BCH_DATA_sb, flags); + if (ret) + return ret; + } =20 - mark_metadata_sectors(c, ca, offset, + int ret =3D mark_metadata_sectors(trans, ca, offset, offset + (1 << layout->sb_max_size_bits), BCH_DATA_sb, flags); + if (ret) + return ret; } =20 - for (i =3D 0; i < ca->journal.nr; i++) { - b =3D ca->journal.buckets[i]; - bch2_mark_metadata_bucket(c, ca, b, BCH_DATA_journal, - ca->mi.bucket_size, + for (unsigned i =3D 0; i < ca->journal.nr; i++) { + int ret =3D bch2_mark_metadata_bucket(trans, ca, ca->journal.buckets[i], + BCH_DATA_journal, ca->mi.bucket_size, gc_phase(GC_PHASE_SB), flags); + if (ret) + return ret; } + + return 0; } =20 -static void bch2_mark_superblocks(struct bch_fs *c) +static int bch2_mark_superblocks(struct btree_trans *trans) { + struct bch_fs *c =3D trans->c; + mutex_lock(&c->sb_lock); gc_pos_set(c, gc_phase(GC_PHASE_SB)); =20 - for_each_online_member(c, ca) - bch2_mark_dev_superblock(c, ca, BTREE_TRIGGER_GC); + for_each_online_member(c, ca) { + int ret =3D bch2_mark_dev_superblock(trans, ca, BTREE_TRIGGER_GC); + if (ret) { + percpu_ref_put(&ca->io_ref); + return ret; + } + } mutex_unlock(&c->sb_lock); + + return 0; } =20 #if 0 @@ -1190,146 +1211,25 @@ static void bch2_mark_pending_btree_node_frees(str= uct bch_fs *c) =20 static void bch2_gc_free(struct bch_fs *c) { + bch2_accounting_free(&c->accounting[1]); + genradix_free(&c->reflink_gc_table); genradix_free(&c->gc_stripes); =20 for_each_member_device(c, ca) { kvfree(rcu_dereference_protected(ca->buckets_gc, 1)); ca->buckets_gc =3D NULL; - - free_percpu(ca->usage_gc); - ca->usage_gc =3D NULL; - } - - free_percpu(c->usage_gc); - c->usage_gc =3D NULL; -} - -static int bch2_gc_done(struct bch_fs *c, - bool initial, bool metadata_only) -{ - struct bch_dev *ca =3D NULL; - struct printbuf buf =3D PRINTBUF; - bool verify =3D !metadata_only && - !c->opts.reconstruct_alloc && - (!initial || (c->sb.compat & (1ULL << BCH_COMPAT_alloc_info))); - unsigned i; - int ret =3D 0; - - percpu_down_write(&c->mark_lock); - -#define copy_field(_err, _f, _msg, ...) \ - if (dst->_f !=3D src->_f && \ - (!verify || \ - fsck_err(c, _err, _msg ": got %llu, should be %llu" \ - , ##__VA_ARGS__, dst->_f, src->_f))) \ - dst->_f =3D src->_f -#define copy_dev_field(_err, _f, _msg, ...) \ - copy_field(_err, _f, "dev %u has wrong " _msg, ca->dev_idx, ##__VA_ARGS__) -#define copy_fs_field(_err, _f, _msg, ...) \ - copy_field(_err, _f, "fs has wrong " _msg, ##__VA_ARGS__) - - __for_each_member_device(c, ca) { - /* XXX */ - struct bch_dev_usage *dst =3D this_cpu_ptr(ca->usage); - struct bch_dev_usage *src =3D (void *) - bch2_acc_percpu_u64s((u64 __percpu *) ca->usage_gc, - dev_usage_u64s()); - - for (i =3D 0; i < BCH_DATA_NR; i++) { - copy_dev_field(dev_usage_buckets_wrong, - d[i].buckets, "%s buckets", bch2_data_type_str(i)); - copy_dev_field(dev_usage_sectors_wrong, - d[i].sectors, "%s sectors", bch2_data_type_str(i)); - copy_dev_field(dev_usage_fragmented_wrong, - d[i].fragmented, "%s fragmented", bch2_data_type_str(i)); - } - } - - { -#if 0 - unsigned nr =3D fs_usage_u64s(c); - /* XX: */ - struct bch_fs_usage *dst =3D this_cpu_ptr(c->usage); - struct bch_fs_usage *src =3D (void *) - bch2_acc_percpu_u64s((u64 __percpu *) c->usage_gc, nr); - - copy_fs_field(fs_usage_hidden_wrong, - b.hidden, "hidden"); - copy_fs_field(fs_usage_btree_wrong, - b.btree, "btree"); - - if (!metadata_only) { - copy_fs_field(fs_usage_data_wrong, - b.data, "data"); - copy_fs_field(fs_usage_cached_wrong, - b.cached, "cached"); - copy_fs_field(fs_usage_reserved_wrong, - b.reserved, "reserved"); - copy_fs_field(fs_usage_nr_inodes_wrong, - b.nr_inodes,"nr_inodes"); - - for (i =3D 0; i < BCH_REPLICAS_MAX; i++) - copy_fs_field(fs_usage_persistent_reserved_wrong, - persistent_reserved[i], - "persistent_reserved[%i]", i); - } - - for (i =3D 0; i < c->replicas.nr; i++) { - struct bch_replicas_entry_v1 *e =3D - cpu_replicas_entry(&c->replicas, i); - - if (metadata_only && - (e->data_type =3D=3D BCH_DATA_user || - e->data_type =3D=3D BCH_DATA_cached)) - continue; - - printbuf_reset(&buf); - bch2_replicas_entry_to_text(&buf, e); - - copy_fs_field(fs_usage_replicas_wrong, - replicas[i], "%s", buf.buf); - } -#endif } - -#undef copy_fs_field -#undef copy_dev_field -#undef copy_stripe_field -#undef copy_field -fsck_err: - if (ca) - percpu_ref_put(&ca->ref); - bch_err_fn(c, ret); - - percpu_up_write(&c->mark_lock); - printbuf_exit(&buf); - return ret; } =20 static int bch2_gc_start(struct bch_fs *c) { - BUG_ON(c->usage_gc); - - c->usage_gc =3D __alloc_percpu_gfp(fs_usage_u64s(c) * sizeof(u64), - sizeof(u64), GFP_KERNEL); - if (!c->usage_gc) { - bch_err(c, "error allocating c->usage_gc"); - return -BCH_ERR_ENOMEM_gc_start; - } - for_each_member_device(c, ca) { - BUG_ON(ca->usage_gc); - - ca->usage_gc =3D alloc_percpu(struct bch_dev_usage); - if (!ca->usage_gc) { - bch_err(c, "error allocating ca->usage_gc"); + int ret =3D bch2_dev_usage_init(ca, true); + if (ret) { percpu_ref_put(&ca->ref); - return -BCH_ERR_ENOMEM_gc_start; + return ret; } - - this_cpu_write(ca->usage_gc->d[BCH_DATA_free].buckets, - ca->mi.nbuckets - ca->mi.first_bucket); } =20 return 0; @@ -1337,13 +1237,7 @@ static int bch2_gc_start(struct bch_fs *c) =20 static int bch2_gc_reset(struct bch_fs *c) { - for_each_member_device(c, ca) { - free_percpu(ca->usage_gc); - ca->usage_gc =3D NULL; - } - - free_percpu(c->usage_gc); - c->usage_gc =3D NULL; + bch2_accounting_free(&c->accounting[1]); =20 return bch2_gc_start(c); } @@ -1368,7 +1262,7 @@ static int bch2_alloc_write_key(struct btree_trans *t= rans, { struct bch_fs *c =3D trans->c; struct bch_dev *ca =3D bch_dev_bkey_exists(c, iter->pos.inode); - struct bucket gc, *b; + struct bucket gc; struct bkey_i_alloc_v4 *a; struct bch_alloc_v4 old_convert, new; const struct bch_alloc_v4 *old; @@ -1379,30 +1273,39 @@ static int bch2_alloc_write_key(struct btree_trans = *trans, new =3D *old; =20 percpu_down_read(&c->mark_lock); - b =3D gc_bucket(ca, iter->pos.offset); + gc =3D *gc_bucket(ca, iter->pos.offset); + percpu_up_read(&c->mark_lock); =20 /* * b->data_type doesn't yet include need_discard & need_gc_gen states - * fix that here: */ - type =3D __alloc_data_type(b->dirty_sectors, - b->cached_sectors, - b->stripe, + type =3D __alloc_data_type(gc.dirty_sectors, + gc.cached_sectors, + gc.stripe, *old, - b->data_type); - if (b->data_type !=3D type) { - struct bch_dev_usage *u; - - preempt_disable(); - u =3D this_cpu_ptr(ca->usage_gc); - u->d[b->data_type].buckets--; - b->data_type =3D type; - u->d[b->data_type].buckets++; - preempt_enable(); - } + gc.data_type); =20 - gc =3D *b; - percpu_up_read(&c->mark_lock); + if (gc.data_type !=3D type) { + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_dev_data_type, + .dev_data_type.dev =3D ca->dev_idx, + .dev_data_type.data_type =3D type, + }; + u64 d[3] =3D { 1, 0, 0 }; + + ret =3D bch2_disk_accounting_mod(trans, &acc, d, 3, true); + if (ret) + return ret; + + acc.dev_data_type.data_type =3D gc.data_type; + d[0] =3D -1; + ret =3D bch2_disk_accounting_mod(trans, &acc, d, 3, true); + if (ret) + return ret; + + gc.data_type =3D type; + } =20 if (metadata_only && gc.data_type !=3D BCH_DATA_sb && @@ -1778,10 +1681,12 @@ int bch2_gc(struct bch_fs *c, bool initial, bool me= tadata_only) again: gc_pos_set(c, gc_phase(GC_PHASE_START)); =20 - bch2_mark_superblocks(c); + ret =3D bch2_trans_run(c, bch2_mark_superblocks(trans)); + bch_err_msg(c, ret, "marking superblocks"); + if (ret) + goto out; =20 ret =3D bch2_gc_btrees(c, initial, metadata_only); - if (ret) goto out; =20 @@ -1823,7 +1728,7 @@ int bch2_gc(struct bch_fs *c, bool initial, bool meta= data_only) ret =3D bch2_gc_stripes_done(c, metadata_only) ?: bch2_gc_reflink_done(c, metadata_only) ?: bch2_gc_alloc_done(c, metadata_only) ?: - bch2_gc_done(c, initial, metadata_only); + bch2_accounting_gc_done(c); =20 bch2_journal_unblock(&c->journal); } diff --git a/fs/bcachefs/btree_trans_commit.c b/fs/bcachefs/btree_trans_com= mit.c index b005e20039bb..eac9d45bcc8c 100644 --- a/fs/bcachefs/btree_trans_commit.c +++ b/fs/bcachefs/btree_trans_commit.c @@ -697,7 +697,7 @@ bch2_trans_commit_write_locked(struct btree_trans *tran= s, unsigned flags, a->k.version =3D journal_pos_to_bversion(&trans->journal_res, (u64 *) entry - (u64 *) trans->journal_entries); BUG_ON(bversion_zero(a->k.version)); - ret =3D bch2_accounting_mem_add(trans, accounting_i_to_s_c(a)); + ret =3D bch2_accounting_mem_add_locked(trans, accounting_i_to_s_c(a), f= alse); if (ret) goto revert_fs_usage; } @@ -784,7 +784,7 @@ bch2_trans_commit_write_locked(struct btree_trans *tran= s, unsigned flags, struct bkey_s_accounting a =3D bkey_i_to_s_accounting(entry2->start); =20 bch2_accounting_neg(a); - bch2_accounting_mem_add(trans, a.c); + bch2_accounting_mem_add_locked(trans, a.c, false); bch2_accounting_neg(a); } percpu_up_read(&c->mark_lock); diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c index 5e2b9aa93241..506bb580bff4 100644 --- a/fs/bcachefs/buckets.c +++ b/fs/bcachefs/buckets.c @@ -95,113 +95,12 @@ void bch2_dev_usage_to_text(struct printbuf *out, stru= ct bch_dev_usage *usage) } } =20 -void bch2_dev_usage_update(struct bch_fs *c, struct bch_dev *ca, - const struct bch_alloc_v4 *old, - const struct bch_alloc_v4 *new) -{ - struct bch_fs_usage *fs_usage; - struct bch_dev_usage *u; - - preempt_disable(); - fs_usage =3D this_cpu_ptr(c->usage_gc); - - if (data_type_is_hidden(old->data_type)) - fs_usage->b.hidden -=3D ca->mi.bucket_size; - if (data_type_is_hidden(new->data_type)) - fs_usage->b.hidden +=3D ca->mi.bucket_size; - - u =3D this_cpu_ptr(ca->usage_gc); - - u->d[old->data_type].buckets--; - u->d[new->data_type].buckets++; - - u->d[old->data_type].sectors -=3D bch2_bucket_sectors_dirty(*old); - u->d[new->data_type].sectors +=3D bch2_bucket_sectors_dirty(*new); - - u->d[BCH_DATA_cached].sectors +=3D new->cached_sectors; - u->d[BCH_DATA_cached].sectors -=3D old->cached_sectors; - - u->d[old->data_type].fragmented -=3D bch2_bucket_sectors_fragmented(ca, *= old); - u->d[new->data_type].fragmented +=3D bch2_bucket_sectors_fragmented(ca, *= new); - - preempt_enable(); -} - -static inline struct bch_alloc_v4 bucket_m_to_alloc(struct bucket b) -{ - return (struct bch_alloc_v4) { - .gen =3D b.gen, - .data_type =3D b.data_type, - .dirty_sectors =3D b.dirty_sectors, - .cached_sectors =3D b.cached_sectors, - .stripe =3D b.stripe, - }; -} - -void bch2_dev_usage_update_m(struct bch_fs *c, struct bch_dev *ca, - struct bucket *old, struct bucket *new) -{ - struct bch_alloc_v4 old_a =3D bucket_m_to_alloc(*old); - struct bch_alloc_v4 new_a =3D bucket_m_to_alloc(*new); - - bch2_dev_usage_update(c, ca, &old_a, &new_a); -} - -int bch2_update_replicas(struct bch_fs *c, struct bkey_s_c k, - struct bch_replicas_entry_v1 *r, s64 sectors) -{ - struct bch_fs_usage *fs_usage; - int idx, ret =3D 0; - struct printbuf buf =3D PRINTBUF; - - percpu_down_read(&c->mark_lock); - - idx =3D bch2_replicas_entry_idx(c, r); - if (idx < 0 && - fsck_err(c, ptr_to_missing_replicas_entry, - "no replicas entry\n while marking %s", - (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { - percpu_up_read(&c->mark_lock); - ret =3D bch2_mark_replicas(c, r); - percpu_down_read(&c->mark_lock); - - if (ret) - goto err; - idx =3D bch2_replicas_entry_idx(c, r); - } - if (idx < 0) { - ret =3D -1; - goto err; - } - - preempt_disable(); - fs_usage =3D this_cpu_ptr(c->usage_gc); - fs_usage_data_type_to_base(&fs_usage->b, r->data_type, sectors); - fs_usage->replicas[idx] +=3D sectors; - preempt_enable(); -err: -fsck_err: - percpu_up_read(&c->mark_lock); - printbuf_exit(&buf); - return ret; -} - -static inline int update_cached_sectors(struct bch_fs *c, - struct bkey_s_c k, - unsigned dev, s64 sectors) -{ - struct bch_replicas_padded r; - - bch2_replicas_entry_cached(&r.e, dev); - - return bch2_update_replicas(c, k, &r.e, sectors); -} - -int bch2_mark_metadata_bucket(struct bch_fs *c, struct bch_dev *ca, +int bch2_mark_metadata_bucket(struct btree_trans *trans, struct bch_dev *c= a, size_t b, enum bch_data_type data_type, unsigned sectors, struct gc_pos pos, unsigned flags) { + struct bch_fs *c =3D trans->c; struct bucket old, new, *g; int ret =3D 0; =20 @@ -242,12 +141,15 @@ int bch2_mark_metadata_bucket(struct bch_fs *c, struc= t bch_dev *ca, g->data_type =3D data_type; g->dirty_sectors +=3D sectors; new =3D *g; -err: bucket_unlock(g); - if (!ret) - bch2_dev_usage_update_m(c, ca, &old, &new); percpu_up_read(&c->mark_lock); + ret =3D bch2_bucket_to_dev_counters(trans, ca, &old, &new, flags); +out: return ret; +err: + bucket_unlock(g); + percpu_up_read(&c->mark_lock); + goto out; } =20 int bch2_check_bucket_ref(struct btree_trans *trans, @@ -496,8 +398,11 @@ static int bch2_trigger_pointer(struct btree_trans *tr= ans, g->data_type =3D bucket_data_type; struct bucket new =3D *g; bucket_unlock(g); - bch2_dev_usage_update_m(c, ca, &old, &new); percpu_up_read(&c->mark_lock); + + ret =3D bch2_bucket_to_dev_counters(trans, ca, &old, &new, flags); + if (ret) + return ret; } =20 return 0; @@ -539,7 +444,7 @@ static int bch2_trigger_stripe_ptr(struct btree_trans *= trans, }; bch2_bkey_to_replicas(&acc.replicas, bkey_i_to_s_c(&s->k_i)); acc.replicas.data_type =3D data_type; - ret =3D bch2_disk_accounting_mod(trans, &acc, §ors, 1); + ret =3D bch2_disk_accounting_mod(trans, &acc, §ors, 1, false); err: bch2_trans_iter_exit(trans, &iter); return ret; @@ -548,8 +453,6 @@ static int bch2_trigger_stripe_ptr(struct btree_trans *= trans, if (flags & BTREE_TRIGGER_GC) { struct bch_fs *c =3D trans->c; =20 - BUG_ON(!(flags & BTREE_TRIGGER_GC)); - struct gc_stripe *m =3D genradix_ptr_alloc(&c->gc_stripes, p.ec.idx, GFP= _KERNEL); if (!m) { bch_err(c, "error allocating memory for gc_stripes, idx %llu", @@ -572,11 +475,16 @@ static int bch2_trigger_stripe_ptr(struct btree_trans= *trans, =20 m->block_sectors[p.ec.block] +=3D sectors; =20 - struct bch_replicas_padded r =3D m->r; + struct disk_accounting_key acc =3D { + .type =3D BCH_DISK_ACCOUNTING_replicas, + }; + memcpy(&acc.replicas, &m->r.e, replicas_entry_bytes(&m->r.e)); mutex_unlock(&c->ec_stripes_heap_lock); =20 - r.e.data_type =3D data_type; - bch2_update_replicas(c, k, &r.e, sectors); + acc.replicas.data_type =3D data_type; + int ret =3D bch2_disk_accounting_mod(trans, &acc, §ors, 1, true); + if (ret) + return ret; } =20 return 0; @@ -587,7 +495,6 @@ static int __trigger_extent(struct btree_trans *trans, struct bkey_s_c k, unsigned flags) { bool gc =3D flags & BTREE_TRIGGER_GC; - struct bch_fs *c =3D trans->c; struct bkey_ptrs_c ptrs =3D bch2_bkey_ptrs_c(k); const union bch_extent_entry *entry; struct extent_ptr_decoded p; @@ -614,11 +521,7 @@ static int __trigger_extent(struct btree_trans *trans, =20 if (p.ptr.cached) { if (!stale) { - ret =3D !gc - ? bch2_mod_dev_cached_sectors(trans, p.ptr.dev, disk_sectors) - : update_cached_sectors(c, k, p.ptr.dev, disk_sectors); - bch2_fs_fatal_err_on(ret && gc, c, "%s(): no replicas entry while upda= ting cached sectors", - __func__); + ret =3D bch2_mod_dev_cached_sectors(trans, p.ptr.dev, disk_sectors, gc= ); if (ret) return ret; } @@ -640,16 +543,7 @@ static int __trigger_extent(struct btree_trans *trans, } =20 if (acc.replicas.nr_devs) { - ret =3D !gc - ? bch2_disk_accounting_mod(trans, &acc, &dirty_sectors, 1) - : bch2_update_replicas(c, k, &acc.replicas, dirty_sectors); - if (unlikely(ret && gc)) { - struct printbuf buf =3D PRINTBUF; - - bch2_bkey_val_to_text(&buf, c, k); - bch2_fs_fatal_error(c, "%s(): no replicas entry for %s", __func__, buf.= buf); - printbuf_exit(&buf); - } + ret =3D bch2_disk_accounting_mod(trans, &acc, &dirty_sectors, 1, gc); if (ret) return ret; } @@ -699,36 +593,18 @@ static int __trigger_reservation(struct btree_trans *= trans, enum btree_id btree_id, unsigned level, struct bkey_s_c k, unsigned flags) { - struct bch_fs *c =3D trans->c; - unsigned replicas =3D bkey_s_c_to_reservation(k).v->nr_replicas; - s64 sectors =3D (s64) k.k->size; + if (flags & (BTREE_TRIGGER_TRANSACTIONAL|BTREE_TRIGGER_GC)) { + s64 sectors =3D k.k->size; =20 - if (flags & BTREE_TRIGGER_OVERWRITE) - sectors =3D -sectors; + if (flags & BTREE_TRIGGER_OVERWRITE) + sectors =3D -sectors; =20 - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { struct disk_accounting_key acc =3D { .type =3D BCH_DISK_ACCOUNTING_persistent_reserved, - .persistent_reserved.nr_replicas =3D replicas, + .persistent_reserved.nr_replicas =3D bkey_s_c_to_reservation(k).v->nr_r= eplicas, }; =20 - return bch2_disk_accounting_mod(trans, &acc, §ors, 1); - } - - if (flags & BTREE_TRIGGER_GC) { - sectors *=3D replicas; - - percpu_down_read(&c->mark_lock); - preempt_disable(); - - struct bch_fs_usage *fs_usage =3D this_cpu_ptr(c->usage_gc); - - replicas =3D min(replicas, ARRAY_SIZE(fs_usage->persistent_reserved)); - fs_usage->b.reserved +=3D sectors; - fs_usage->persistent_reserved[replicas - 1] +=3D sectors; - - preempt_enable(); - percpu_up_read(&c->mark_lock); + return bch2_disk_accounting_mod(trans, &acc, §ors, 1, flags & BTREE_= TRIGGER_GC); } =20 return 0; diff --git a/fs/bcachefs/buckets.h b/fs/bcachefs/buckets.h index f9d8d7b9fbd1..7b8b10f74be0 100644 --- a/fs/bcachefs/buckets.h +++ b/fs/bcachefs/buckets.h @@ -269,16 +269,6 @@ static inline s64 bucket_sectors_fragmented(struct bch= _dev *ca, struct bch_alloc =20 /* Filesystem usage: */ =20 -static inline unsigned __fs_usage_u64s(unsigned nr_replicas) -{ - return sizeof(struct bch_fs_usage) / sizeof(u64) + nr_replicas; -} - -static inline unsigned fs_usage_u64s(struct bch_fs *c) -{ - return __fs_usage_u64s(READ_ONCE(c->replicas.nr)); -} - static inline unsigned dev_usage_u64s(void) { return sizeof(struct bch_dev_usage) / sizeof(u64); @@ -287,19 +277,11 @@ static inline unsigned dev_usage_u64s(void) struct bch_fs_usage_short bch2_fs_usage_read_short(struct bch_fs *); =20 -void bch2_dev_usage_update(struct bch_fs *, struct bch_dev *, - const struct bch_alloc_v4 *, - const struct bch_alloc_v4 *); -void bch2_dev_usage_update_m(struct bch_fs *, struct bch_dev *, - struct bucket *, struct bucket *); -int bch2_update_replicas(struct bch_fs *, struct bkey_s_c, - struct bch_replicas_entry_v1 *, s64); - int bch2_check_bucket_ref(struct btree_trans *, struct bkey_s_c, const struct bch_extent_ptr *, s64, enum bch_data_type, u8, u8, u32); =20 -int bch2_mark_metadata_bucket(struct bch_fs *, struct bch_dev *, +int bch2_mark_metadata_bucket(struct btree_trans *, struct bch_dev *, size_t, enum bch_data_type, unsigned, struct gc_pos, unsigned); =20 diff --git a/fs/bcachefs/buckets_types.h b/fs/bcachefs/buckets_types.h index 570acdf455bb..7bd1a117afe4 100644 --- a/fs/bcachefs/buckets_types.h +++ b/fs/bcachefs/buckets_types.h @@ -54,13 +54,6 @@ struct bch_fs_usage_base { u64 nr_inodes; }; =20 -struct bch_fs_usage { - /* all fields are in units of 512 byte sectors: */ - struct bch_fs_usage_base b; - u64 persistent_reserved[BCH_REPLICAS_MAX]; - u64 replicas[]; -}; - struct bch_fs_usage_short { u64 capacity; u64 used; diff --git a/fs/bcachefs/disk_accounting.c b/fs/bcachefs/disk_accounting.c index f898323f72c7..2884615adc1e 100644 --- a/fs/bcachefs/disk_accounting.c +++ b/fs/bcachefs/disk_accounting.c @@ -19,7 +19,7 @@ static const char * const disk_accounting_type_strs[] =3D= { =20 int bch2_disk_accounting_mod(struct btree_trans *trans, struct disk_accounting_key *k, - s64 *d, unsigned nr) + s64 *d, unsigned nr, bool gc) { /* Normalize: */ switch (k->type) { @@ -40,11 +40,14 @@ int bch2_disk_accounting_mod(struct btree_trans *trans, =20 memcpy_u64s_small(acc->v.d, d, nr); =20 - return bch2_trans_update_buffered(trans, BTREE_ID_accounting, &acc->k_i); + return likely(!gc) + ? bch2_trans_update_buffered(trans, BTREE_ID_accounting, &acc->k_i) + : bch2_accounting_mem_add(trans, accounting_i_to_s_c(acc), true); } =20 int bch2_mod_dev_cached_sectors(struct btree_trans *trans, - unsigned dev, s64 sectors) + unsigned dev, s64 sectors, + bool gc) { struct disk_accounting_key acc =3D { .type =3D BCH_DISK_ACCOUNTING_replicas, @@ -52,7 +55,7 @@ int bch2_mod_dev_cached_sectors(struct btree_trans *trans, =20 bch2_replicas_entry_cached(&acc.replicas, dev); =20 - return bch2_disk_accounting_mod(trans, &acc, §ors, 1); + return bch2_disk_accounting_mod(trans, &acc, §ors, 1, gc); } =20 int bch2_accounting_invalid(struct bch_fs *c, struct bkey_s_c k, @@ -147,7 +150,7 @@ int bch2_accounting_update_sb(struct btree_trans *trans) return 0; } =20 -static int __bch2_accounting_mem_add_slowpath(struct bch_fs *c, struct bke= y_s_c_accounting a) +static int __bch2_accounting_mem_add_slowpath(struct bch_fs *c, struct bke= y_s_c_accounting a, bool gc) { struct bch_replicas_padded r; =20 @@ -155,7 +158,7 @@ static int __bch2_accounting_mem_add_slowpath(struct bc= h_fs *c, struct bkey_s_c_ !bch2_replicas_marked_locked(c, &r.e)) return -BCH_ERR_btree_insert_need_mark_replicas; =20 - struct bch_accounting_mem *acc =3D &c->accounting; + struct bch_accounting_mem *acc =3D &c->accounting[gc]; unsigned new_nr_counters =3D acc->nr_counters + bch2_accounting_counters(= a.k); =20 u64 __percpu *new_counters =3D __alloc_percpu_gfp(new_nr_counters * sizeo= f(u64), @@ -191,19 +194,64 @@ static int __bch2_accounting_mem_add_slowpath(struct = bch_fs *c, struct bkey_s_c_ return 0; } =20 -int bch2_accounting_mem_add_slowpath(struct bch_fs *c, struct bkey_s_c_acc= ounting a) +int bch2_accounting_mem_add_slowpath(struct bch_fs *c, struct bkey_s_c_acc= ounting a, bool gc) { percpu_up_read(&c->mark_lock); percpu_down_write(&c->mark_lock); - int ret =3D __bch2_accounting_mem_add_slowpath(c, a); + int ret =3D __bch2_accounting_mem_add_slowpath(c, a, gc); percpu_up_write(&c->mark_lock); percpu_down_read(&c->mark_lock); return ret; } =20 +/* Ensures all counters in @src exist in @dst: */ +static int copy_counters(struct bch_accounting_mem *dst, + struct bch_accounting_mem *src) +{ + unsigned orig_dst_k_nr =3D dst->k.nr; + unsigned dst_counters =3D dst->nr_counters; + + darray_for_each(src->k, i) + if (eytzinger0_find(dst->k.data, orig_dst_k_nr, sizeof(dst->k.data[0]), + accounting_pos_cmp, &i->pos) >=3D orig_dst_k_nr) { + if (darray_push(&dst->k, ((struct accounting_pos_offset) { + .pos =3D i->pos, + .offset =3D dst_counters, + .nr_counters =3D i->nr_counters }))) + goto err; + + dst_counters +=3D i->nr_counters; + } + + if (dst->k.nr =3D=3D orig_dst_k_nr) + return 0; + + u64 __percpu *new_counters =3D __alloc_percpu_gfp(dst_counters * sizeof(u= 64), + sizeof(u64), GFP_KERNEL); + if (!new_counters) + goto err; + + preempt_disable(); + memcpy(this_cpu_ptr(new_counters), + bch2_acc_percpu_u64s(dst->v, dst->nr_counters), + dst->nr_counters * sizeof(u64)); + preempt_enable(); + + free_percpu(dst->v); + dst->v =3D new_counters; + dst->nr_counters =3D dst_counters; + + eytzinger0_sort(dst->k.data, dst->k.nr, sizeof(dst->k.data[0]), accountin= g_pos_cmp, NULL); + + return 0; +err: + dst->k.nr =3D orig_dst_k_nr; + return -BCH_ERR_ENOMEM_disk_accounting; +} + int bch2_fs_replicas_usage_read(struct bch_fs *c, darray_char *usage) { - struct bch_accounting_mem *acc =3D &c->accounting; + struct bch_accounting_mem *acc =3D &c->accounting[0]; int ret =3D 0; =20 darray_init(usage); @@ -234,6 +282,85 @@ int bch2_fs_replicas_usage_read(struct bch_fs *c, darr= ay_char *usage) return ret; } =20 +static int accounting_write_key(struct btree_trans *trans, struct bpos pos= , u64 *v, unsigned nr_counters) +{ + struct bkey_i_accounting *a =3D bch2_trans_kmalloc(trans, sizeof(*a) + si= zeof(*v) * nr_counters); + int ret =3D PTR_ERR_OR_ZERO(a); + if (ret) + return ret; + + bkey_accounting_init(&a->k_i); + a->k.p =3D pos; + set_bkey_val_bytes(&a->k, sizeof(a->v) + sizeof(*v) * nr_counters); + memcpy(a->v.d, v, sizeof(*v) * nr_counters); + + return bch2_btree_insert_trans(trans, BTREE_ID_accounting, &a->k_i, 0); +} + +int bch2_accounting_gc_done(struct bch_fs *c) +{ + struct bch_accounting_mem *dst =3D &c->accounting[0]; + struct bch_accounting_mem *src =3D &c->accounting[1]; + struct btree_trans *trans =3D bch2_trans_get(c); + struct printbuf buf =3D PRINTBUF; + int ret =3D 0; + + percpu_down_write(&c->mark_lock); + + ret =3D copy_counters(dst, src) ?: + copy_counters(src, dst); + if (ret) + goto err; + + BUG_ON(dst->k.nr !=3D src->k.nr); + + for (unsigned i =3D 0; i < src->k.nr; i++) { + BUG_ON(src->k.data[i].nr_counters !=3D dst->k.data[i].nr_counters); + BUG_ON(!bpos_eq(dst->k.data[i].pos, src->k.data[i].pos)); + + struct disk_accounting_key acc_k; + bpos_to_disk_accounting_key(&acc_k, src->k.data[i].pos); + + unsigned nr =3D src->k.data[i].nr_counters; + u64 src_v[BCH_ACCOUNTING_MAX_COUNTERS]; + u64 dst_v[BCH_ACCOUNTING_MAX_COUNTERS]; + + bch2_accounting_mem_read_counters(c, i, dst_v, nr, false); + bch2_accounting_mem_read_counters(c, i, src_v, nr, true); + + if (memcmp(dst_v, src_v, nr * sizeof(u64))) { + printbuf_reset(&buf); + prt_str(&buf, "accounting mismatch for "); + bch2_accounting_key_to_text(&buf, &acc_k); + + prt_str(&buf, ": got"); + for (unsigned j =3D 0; j < nr; j++) + prt_printf(&buf, " %llu", dst_v[j]); + + prt_str(&buf, " should be"); + for (unsigned j =3D 0; j < nr; j++) + prt_printf(&buf, " %llu", src_v[j]); + + if (fsck_err(c, accounting_mismatch, "%s", buf.buf)) { + for (unsigned j =3D 0; j < dst->k.data[i].nr_counters; j++) + percpu_u64_set(dst->v + dst->k.data[i].offset + j, src_v[j]); + + ret =3D commit_do(trans, NULL, NULL, 0, + accounting_write_key(trans, src->k.data[i].pos, src_v, nr)); + if (ret) + goto err; + } + } + } +err: +fsck_err: + percpu_up_write(&c->mark_lock); + printbuf_exit(&buf); + bch2_trans_put(trans); + bch_err_fn(c, ret); + return ret; +} + static bool accounting_key_is_zero(struct bkey_s_c_accounting a) { =20 @@ -251,7 +378,7 @@ static int accounting_read_key(struct bch_fs *c, struct= bkey_s_c k) return 0; =20 percpu_down_read(&c->mark_lock); - int ret =3D __bch2_accounting_mem_add(c, bkey_s_c_to_accounting(k)); + int ret =3D __bch2_accounting_mem_add(c, bkey_s_c_to_accounting(k), false= ); percpu_up_read(&c->mark_lock); =20 if (accounting_key_is_zero(bkey_s_c_to_accounting(k)) && @@ -274,7 +401,7 @@ static int accounting_read_key(struct bch_fs *c, struct= bkey_s_c k) =20 int bch2_accounting_read(struct bch_fs *c) { - struct bch_accounting_mem *acc =3D &c->accounting; + struct bch_accounting_mem *acc =3D &c->accounting[0]; =20 int ret =3D bch2_trans_run(c, for_each_btree_key(trans, iter, @@ -321,7 +448,7 @@ int bch2_accounting_read(struct bch_fs *c) bpos_to_disk_accounting_key(&k, acc->k.data[i].pos); =20 u64 v[BCH_ACCOUNTING_MAX_COUNTERS]; - bch2_accounting_mem_read_counters(c, i, v, ARRAY_SIZE(v)); + bch2_accounting_mem_read_counters(c, i, v, ARRAY_SIZE(v), false); =20 switch (k.type) { case BCH_DISK_ACCOUNTING_persistent_reserved: @@ -370,8 +497,9 @@ int bch2_dev_usage_remove(struct bch_fs *c, unsigned de= v) bch2_btree_write_buffer_flush_sync(trans)); } =20 -int bch2_dev_usage_init(struct bch_dev *ca) +int bch2_dev_usage_init(struct bch_dev *ca, bool gc) { + struct bch_fs *c =3D ca->fs; struct disk_accounting_key acc =3D { .type =3D BCH_DISK_ACCOUNTING_dev_data_type, .dev_data_type.dev =3D ca->dev_idx, @@ -379,14 +507,21 @@ int bch2_dev_usage_init(struct bch_dev *ca) }; u64 v[3] =3D { ca->mi.nbuckets - ca->mi.first_bucket, 0, 0 }; =20 - return bch2_trans_do(ca->fs, NULL, NULL, 0, - bch2_disk_accounting_mod(trans, &acc, v, ARRAY_SIZE(v))); + int ret =3D bch2_trans_do(c, NULL, NULL, 0, + bch2_disk_accounting_mod(trans, &acc, v, ARRAY_SIZE(v), gc)); + bch_err_fn(c, ret); + return ret; } =20 -void bch2_fs_accounting_exit(struct bch_fs *c) +void bch2_accounting_free(struct bch_accounting_mem *acc) { - struct bch_accounting_mem *acc =3D &c->accounting; - darray_exit(&acc->k); free_percpu(acc->v); + acc->v =3D NULL; + acc->nr_counters =3D 0; +} + +void bch2_fs_accounting_exit(struct bch_fs *c) +{ + bch2_accounting_free(&c->accounting[0]); } diff --git a/fs/bcachefs/disk_accounting.h b/fs/bcachefs/disk_accounting.h index a8526bf43207..70ac67f4a3cb 100644 --- a/fs/bcachefs/disk_accounting.h +++ b/fs/bcachefs/disk_accounting.h @@ -78,11 +78,9 @@ static inline struct bpos disk_accounting_key_to_bpos(st= ruct disk_accounting_key return ret; } =20 -int bch2_disk_accounting_mod(struct btree_trans *, - struct disk_accounting_key *, - s64 *, unsigned); -int bch2_mod_dev_cached_sectors(struct btree_trans *trans, - unsigned dev, s64 sectors); +int bch2_disk_accounting_mod(struct btree_trans *, struct disk_accounting_= key *, + s64 *, unsigned, bool); +int bch2_mod_dev_cached_sectors(struct btree_trans *, unsigned, s64, bool); =20 int bch2_accounting_invalid(struct bch_fs *, struct bkey_s_c, enum bkey_invalid_flags, struct printbuf *); @@ -106,15 +104,15 @@ static inline int accounting_pos_cmp(const void *_l, = const void *_r) return bpos_cmp(*l, *r); } =20 -int bch2_accounting_mem_add_slowpath(struct bch_fs *, struct bkey_s_c_acco= unting); +int bch2_accounting_mem_add_slowpath(struct bch_fs *, struct bkey_s_c_acco= unting, bool); =20 -static inline int __bch2_accounting_mem_add(struct bch_fs *c, struct bkey_= s_c_accounting a) +static inline int __bch2_accounting_mem_add(struct bch_fs *c, struct bkey_= s_c_accounting a, bool gc) { - struct bch_accounting_mem *acc =3D &c->accounting; + struct bch_accounting_mem *acc =3D &c->accounting[gc]; unsigned idx =3D eytzinger0_find(acc->k.data, acc->k.nr, sizeof(acc->k.da= ta[0]), accounting_pos_cmp, &a.k->p); if (unlikely(idx >=3D acc->k.nr)) - return bch2_accounting_mem_add_slowpath(c, a); + return bch2_accounting_mem_add_slowpath(c, a, gc); =20 unsigned offset =3D acc->k.data[idx].offset; =20 @@ -125,37 +123,48 @@ static inline int __bch2_accounting_mem_add(struct bc= h_fs *c, struct bkey_s_c_ac return 0; } =20 -static inline int bch2_accounting_mem_add(struct btree_trans *trans, struc= t bkey_s_c_accounting a) +static inline int bch2_accounting_mem_add_locked(struct btree_trans *trans= , struct bkey_s_c_accounting a, bool gc) { struct bch_fs *c =3D trans->c; - struct disk_accounting_key acc_k; - bpos_to_disk_accounting_key(&acc_k, a.k->p); =20 - switch (acc_k.type) { - case BCH_DISK_ACCOUNTING_persistent_reserved: - trans->fs_usage_delta.reserved +=3D acc_k.persistent_reserved.nr_replica= s * a.v->d[0]; - break; - case BCH_DISK_ACCOUNTING_replicas: - fs_usage_data_type_to_base(&trans->fs_usage_delta, acc_k.replicas.data_t= ype, a.v->d[0]); - break; - case BCH_DISK_ACCOUNTING_dev_data_type: { - struct bch_dev *ca =3D bch_dev_bkey_exists(c, acc_k.dev_data_type.dev); - - this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].buckets, a.v->d= [0]); - this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].sectors, a.v->d= [1]); - this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].fragmented, a.v= ->d[2]); + if (!gc) { + struct disk_accounting_key acc_k; + bpos_to_disk_accounting_key(&acc_k, a.k->p); + + switch (acc_k.type) { + case BCH_DISK_ACCOUNTING_persistent_reserved: + trans->fs_usage_delta.reserved +=3D acc_k.persistent_reserved.nr_replic= as * a.v->d[0]; + break; + case BCH_DISK_ACCOUNTING_replicas: + fs_usage_data_type_to_base(&trans->fs_usage_delta, acc_k.replicas.data_= type, a.v->d[0]); + break; + case BCH_DISK_ACCOUNTING_dev_data_type: { + struct bch_dev *ca =3D bch_dev_bkey_exists(c, acc_k.dev_data_type.dev); + + this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].buckets, a.v->= d[0]); + this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].sectors, a.v->= d[1]); + this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].fragmented, a.= v->d[2]); + } + } } - } - return __bch2_accounting_mem_add(c, a); + + return __bch2_accounting_mem_add(c, a, gc); } =20 -static inline void bch2_accounting_mem_read_counters(struct bch_fs *c, - unsigned idx, - u64 *v, unsigned nr) +static inline int bch2_accounting_mem_add(struct btree_trans *trans, struc= t bkey_s_c_accounting a, bool gc) +{ + percpu_down_read(&trans->c->mark_lock); + int ret =3D bch2_accounting_mem_add_locked(trans, a, gc); + percpu_up_read(&trans->c->mark_lock); + return ret; +} + +static inline void bch2_accounting_mem_read_counters(struct bch_fs *c, uns= igned idx, + u64 *v, unsigned nr, bool gc) { memset(v, 0, sizeof(*v) * nr); =20 - struct bch_accounting_mem *acc =3D &c->accounting; + struct bch_accounting_mem *acc =3D &c->accounting[0]; if (unlikely(idx >=3D acc->k.nr)) return; =20 @@ -169,19 +178,23 @@ static inline void bch2_accounting_mem_read_counters(= struct bch_fs *c, static inline void bch2_accounting_mem_read(struct bch_fs *c, struct bpos = p, u64 *v, unsigned nr) { - struct bch_accounting_mem *acc =3D &c->accounting; + struct bch_accounting_mem *acc =3D &c->accounting[0]; unsigned idx =3D eytzinger0_find(acc->k.data, acc->k.nr, sizeof(acc->k.da= ta[0]), accounting_pos_cmp, &p); =20 - bch2_accounting_mem_read_counters(c, idx, v, nr); + bch2_accounting_mem_read_counters(c, idx, v, nr, false); } =20 int bch2_fs_replicas_usage_read(struct bch_fs *, darray_char *); =20 +int bch2_accounting_gc_done(struct bch_fs *); + int bch2_accounting_read(struct bch_fs *); =20 int bch2_dev_usage_remove(struct bch_fs *, unsigned); -int bch2_dev_usage_init(struct bch_dev *); +int bch2_dev_usage_init(struct bch_dev *, bool); + +void bch2_accounting_free(struct bch_accounting_mem *); void bch2_fs_accounting_exit(struct bch_fs *); =20 #endif /* _BCACHEFS_DISK_ACCOUNTING_H */ diff --git a/fs/bcachefs/ec.c b/fs/bcachefs/ec.c index 38e5e882f4a4..bd435d385559 100644 --- a/fs/bcachefs/ec.c +++ b/fs/bcachefs/ec.c @@ -238,10 +238,8 @@ static int bch2_trans_mark_stripe_bucket(struct btree_= trans *trans, return ret; } =20 -static int mark_stripe_bucket(struct btree_trans *trans, - struct bkey_s_c k, - unsigned ptr_idx, - unsigned flags) +static int mark_stripe_bucket(struct btree_trans *trans, struct bkey_s_c k, + unsigned ptr_idx, unsigned flags) { struct bch_fs *c =3D trans->c; const struct bch_stripe *s =3D bkey_s_c_to_stripe(k).v; @@ -287,13 +285,16 @@ static int mark_stripe_bucket(struct btree_trans *tra= ns, g->stripe =3D k.k->p.offset; g->stripe_redundancy =3D s->nr_redundant; new =3D *g; -err: bucket_unlock(g); - if (!ret) - bch2_dev_usage_update_m(c, ca, &old, &new); percpu_up_read(&c->mark_lock); + ret =3D bch2_bucket_to_dev_counters(trans, ca, &old, &new, flags); +out: printbuf_exit(&buf); return ret; +err: + bucket_unlock(g); + percpu_up_read(&c->mark_lock); + goto out; } =20 int bch2_trigger_stripe(struct btree_trans *trans, @@ -309,7 +310,12 @@ int bch2_trigger_stripe(struct btree_trans *trans, const struct bch_stripe *new_s =3D new.k->type =3D=3D KEY_TYPE_stripe ? bkey_s_c_to_stripe(new).v : NULL; =20 - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { + BUG_ON(new_s && old_s && + (new_s->nr_blocks !=3D old_s->nr_blocks || + new_s->nr_redundant !=3D old_s->nr_redundant)); + + + if (flags & (BTREE_TRIGGER_TRANSACTIONAL|BTREE_TRIGGER_GC)) { /* * If the pointers aren't changing, we don't need to do anything: */ @@ -320,9 +326,34 @@ int bch2_trigger_stripe(struct btree_trans *trans, new_s->nr_blocks * sizeof(struct bch_extent_ptr))) return 0; =20 - BUG_ON(new_s && old_s && - (new_s->nr_blocks !=3D old_s->nr_blocks || - new_s->nr_redundant !=3D old_s->nr_redundant)); + struct gc_stripe *gc =3D NULL; + if (flags & BTREE_TRIGGER_GC) { + gc =3D genradix_ptr_alloc(&c->gc_stripes, idx, GFP_KERNEL); + if (!gc) { + bch_err(c, "error allocating memory for gc_stripes, idx %llu", idx); + return -BCH_ERR_ENOMEM_mark_stripe; + } + + /* + * This will be wrong when we bring back runtime gc: we should + * be unmarking the old key and then marking the new key + * + * Also: when we bring back runtime gc, locking + */ + gc->alive =3D true; + gc->sectors =3D le16_to_cpu(new_s->sectors); + gc->nr_blocks =3D new_s->nr_blocks; + gc->nr_redundant =3D new_s->nr_redundant; + + for (unsigned i =3D 0; i < new_s->nr_blocks; i++) + gc->ptrs[i] =3D new_s->ptrs[i]; + + /* + * gc recalculates this field from stripe ptr + * references: + */ + memset(gc->block_sectors, 0, sizeof(gc->block_sectors)); + } =20 if (new_s) { s64 sectors =3D (u64) le16_to_cpu(new_s->sectors) * new_s->nr_redundant; @@ -331,9 +362,12 @@ int bch2_trigger_stripe(struct btree_trans *trans, .type =3D BCH_DISK_ACCOUNTING_replicas, }; bch2_bkey_to_replicas(&acc.replicas, new); - int ret =3D bch2_disk_accounting_mod(trans, &acc, §ors, 1); + int ret =3D bch2_disk_accounting_mod(trans, &acc, §ors, 1, gc); if (ret) return ret; + + if (gc) + memcpy(&gc->r.e, &acc.replicas, replicas_entry_bytes(&acc.replicas)); } =20 if (old_s) { @@ -343,29 +377,42 @@ int bch2_trigger_stripe(struct btree_trans *trans, .type =3D BCH_DISK_ACCOUNTING_replicas, }; bch2_bkey_to_replicas(&acc.replicas, old); - int ret =3D bch2_disk_accounting_mod(trans, &acc, §ors, 1); + int ret =3D bch2_disk_accounting_mod(trans, &acc, §ors, 1, gc); if (ret) return ret; } =20 unsigned nr_blocks =3D new_s ? new_s->nr_blocks : old_s->nr_blocks; - for (unsigned i =3D 0; i < nr_blocks; i++) { - if (new_s && old_s && - !memcmp(&new_s->ptrs[i], - &old_s->ptrs[i], - sizeof(new_s->ptrs[i]))) - continue; =20 - if (new_s) { - int ret =3D bch2_trans_mark_stripe_bucket(trans, - bkey_s_c_to_stripe(new), i, false); - if (ret) - return ret; + if (flags & BTREE_TRIGGER_TRANSACTIONAL) { + for (unsigned i =3D 0; i < nr_blocks; i++) { + if (new_s && old_s && + !memcmp(&new_s->ptrs[i], + &old_s->ptrs[i], + sizeof(new_s->ptrs[i]))) + continue; + + if (new_s) { + int ret =3D bch2_trans_mark_stripe_bucket(trans, + bkey_s_c_to_stripe(new), i, false); + if (ret) + return ret; + } + + if (old_s) { + int ret =3D bch2_trans_mark_stripe_bucket(trans, + bkey_s_c_to_stripe(old), i, true); + if (ret) + return ret; + } } + } =20 - if (old_s) { - int ret =3D bch2_trans_mark_stripe_bucket(trans, - bkey_s_c_to_stripe(old), i, true); + if (flags & BTREE_TRIGGER_GC) { + BUG_ON(old_s); + + for (unsigned i =3D 0; i < nr_blocks; i++) { + int ret =3D mark_stripe_bucket(trans, new, i, flags); if (ret) return ret; } @@ -411,53 +458,6 @@ int bch2_trigger_stripe(struct btree_trans *trans, } } =20 - if (flags & BTREE_TRIGGER_GC) { - struct gc_stripe *m =3D - genradix_ptr_alloc(&c->gc_stripes, idx, GFP_KERNEL); - - if (!m) { - bch_err(c, "error allocating memory for gc_stripes, idx %llu", - idx); - return -BCH_ERR_ENOMEM_mark_stripe; - } - /* - * This will be wrong when we bring back runtime gc: we should - * be unmarking the old key and then marking the new key - */ - m->alive =3D true; - m->sectors =3D le16_to_cpu(new_s->sectors); - m->nr_blocks =3D new_s->nr_blocks; - m->nr_redundant =3D new_s->nr_redundant; - - for (unsigned i =3D 0; i < new_s->nr_blocks; i++) - m->ptrs[i] =3D new_s->ptrs[i]; - - bch2_bkey_to_replicas(&m->r.e, new); - - /* - * gc recalculates this field from stripe ptr - * references: - */ - memset(m->block_sectors, 0, sizeof(m->block_sectors)); - - for (unsigned i =3D 0; i < new_s->nr_blocks; i++) { - int ret =3D mark_stripe_bucket(trans, new, i, flags); - if (ret) - return ret; - } - - int ret =3D bch2_update_replicas(c, new, &m->r.e, - ((s64) m->sectors * m->nr_redundant)); - if (ret) { - struct printbuf buf =3D PRINTBUF; - - bch2_bkey_val_to_text(&buf, c, new); - bch2_fs_fatal_error(c, "no replicas entry for %s", buf.buf); - printbuf_exit(&buf); - return ret; - } - } - return 0; } =20 diff --git a/fs/bcachefs/inode.c b/fs/bcachefs/inode.c index 3dfa9f77c739..e8f128d6b703 100644 --- a/fs/bcachefs/inode.c +++ b/fs/bcachefs/inode.c @@ -607,41 +607,26 @@ int bch2_trigger_inode(struct btree_trans *trans, struct bkey_s new, unsigned flags) { - s64 nr =3D bkey_is_inode(new.k) - bkey_is_inode(old.k); - - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { - if (nr) { - struct disk_accounting_key acc =3D { - .type =3D BCH_DISK_ACCOUNTING_nr_inodes - }; - - int ret =3D bch2_disk_accounting_mod(trans, &acc, &nr, 1); - if (ret) - return ret; - } - - bool old_deleted =3D bkey_is_deleted_inode(old); - bool new_deleted =3D bkey_is_deleted_inode(new.s_c); - if (old_deleted !=3D new_deleted) { - int ret =3D bch2_btree_bit_mod_buffered(trans, BTREE_ID_deleted_inodes, - new.k->p, new_deleted); - if (ret) - return ret; - } - } - if ((flags & BTREE_TRIGGER_ATOMIC) && (flags & BTREE_TRIGGER_INSERT)) { BUG_ON(!trans->journal_res.seq); - bkey_s_to_inode_v3(new).v->bi_journal_seq =3D cpu_to_le64(trans->journal= _res.seq); } =20 - if (flags & BTREE_TRIGGER_GC) { - struct bch_fs *c =3D trans->c; + s64 nr =3D bkey_is_inode(new.k) - bkey_is_inode(old.k); + if ((flags & (BTREE_TRIGGER_TRANSACTIONAL|BTREE_TRIGGER_GC)) && nr) { + struct disk_accounting_key acc =3D { .type =3D BCH_DISK_ACCOUNTING_nr_in= odes }; + int ret =3D bch2_disk_accounting_mod(trans, &acc, &nr, 1, flags & BTREE_= TRIGGER_GC); + if (ret) + return ret; + } =20 - percpu_down_read(&c->mark_lock); - this_cpu_add(c->usage_gc->b.nr_inodes, nr); - percpu_up_read(&c->mark_lock); + int deleted_delta =3D (int) bkey_is_deleted_inode(new.s_c) - + (int) bkey_is_deleted_inode(old); + if ((flags & BTREE_TRIGGER_TRANSACTIONAL) && deleted_delta) { + int ret =3D bch2_btree_bit_mod_buffered(trans, BTREE_ID_deleted_inodes, + new.k->p, deleted_delta > 0); + if (ret) + return ret; } =20 return 0; diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index 18fd71960d2e..6a8b2c753688 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -1177,8 +1177,7 @@ int bch2_fs_initialize(struct bch_fs *c) goto err; =20 for_each_member_device(c, ca) { - ret =3D bch2_dev_usage_init(ca); - bch_err_msg(c, ret, "initializing device usage"); + ret =3D bch2_dev_usage_init(ca, false); if (ret) { percpu_ref_put(&ca->ref); goto err; diff --git a/fs/bcachefs/replicas.c b/fs/bcachefs/replicas.c index 427dc6711427..cba5ba44cfd8 100644 --- a/fs/bcachefs/replicas.c +++ b/fs/bcachefs/replicas.c @@ -275,73 +275,6 @@ bool bch2_replicas_marked(struct bch_fs *c, return ret; } =20 -static void __replicas_table_update(struct bch_fs_usage *dst, - struct bch_replicas_cpu *dst_r, - struct bch_fs_usage *src, - struct bch_replicas_cpu *src_r) -{ - int src_idx, dst_idx; - - *dst =3D *src; - - for (src_idx =3D 0; src_idx < src_r->nr; src_idx++) { - if (!src->replicas[src_idx]) - continue; - - dst_idx =3D __replicas_entry_idx(dst_r, - cpu_replicas_entry(src_r, src_idx)); - BUG_ON(dst_idx < 0); - - dst->replicas[dst_idx] =3D src->replicas[src_idx]; - } -} - -static void __replicas_table_update_pcpu(struct bch_fs_usage __percpu *dst= _p, - struct bch_replicas_cpu *dst_r, - struct bch_fs_usage __percpu *src_p, - struct bch_replicas_cpu *src_r) -{ - unsigned src_nr =3D sizeof(struct bch_fs_usage) / sizeof(u64) + src_r->nr; - struct bch_fs_usage *dst, *src =3D (void *) - bch2_acc_percpu_u64s((u64 __percpu *) src_p, src_nr); - - preempt_disable(); - dst =3D this_cpu_ptr(dst_p); - preempt_enable(); - - __replicas_table_update(dst, dst_r, src, src_r); -} - -/* - * Resize filesystem accounting: - */ -static int replicas_table_update(struct bch_fs *c, - struct bch_replicas_cpu *new_r) -{ - struct bch_fs_usage __percpu *new_gc =3D NULL; - unsigned bytes =3D sizeof(struct bch_fs_usage) + - sizeof(u64) * new_r->nr; - int ret =3D 0; - - if ((c->usage_gc && - !(new_gc =3D __alloc_percpu_gfp(bytes, sizeof(u64), GFP_KERNEL)))) - goto err; - - if (c->usage_gc) - __replicas_table_update_pcpu(new_gc, new_r, - c->usage_gc, &c->replicas); - - swap(c->usage_gc, new_gc); - swap(c->replicas, *new_r); -out: - free_percpu(new_gc); - return ret; -err: - bch_err(c, "error updating replicas table: memory allocation failure"); - ret =3D -BCH_ERR_ENOMEM_replicas_table; - goto out; -} - noinline static int bch2_mark_replicas_slowpath(struct bch_fs *c, struct bch_replicas_entry_v1 *new_entry) @@ -389,7 +322,7 @@ static int bch2_mark_replicas_slowpath(struct bch_fs *c, /* don't update in memory replicas until changes are persistent */ percpu_down_write(&c->mark_lock); if (new_r.entries) - ret =3D replicas_table_update(c, &new_r); + swap(c->replicas, new_r); if (new_gc.entries) swap(new_gc, c->replicas_gc); percpu_up_write(&c->mark_lock); @@ -424,8 +357,9 @@ int bch2_replicas_gc_end(struct bch_fs *c, int ret) percpu_down_write(&c->mark_lock); =20 ret =3D ret ?: - bch2_cpu_replicas_to_sb_replicas(c, &c->replicas_gc) ?: - replicas_table_update(c, &c->replicas_gc); + bch2_cpu_replicas_to_sb_replicas(c, &c->replicas_gc); + if (!ret) + swap(c->replicas, c->replicas_gc); =20 kfree(c->replicas_gc.entries); c->replicas_gc.entries =3D NULL; @@ -635,8 +569,7 @@ int bch2_sb_replicas_to_cpu_replicas(struct bch_fs *c) bch2_cpu_replicas_sort(&new_r); =20 percpu_down_write(&c->mark_lock); - - ret =3D replicas_table_update(c, &new_r); + swap(c->replicas, new_r); percpu_up_write(&c->mark_lock); =20 kfree(new_r.entries); @@ -927,10 +860,8 @@ unsigned bch2_sb_dev_has_data(struct bch_sb *sb, unsig= ned dev) =20 unsigned bch2_dev_has_data(struct bch_fs *c, struct bch_dev *ca) { - unsigned ret; - mutex_lock(&c->sb_lock); - ret =3D bch2_sb_dev_has_data(c->disk_sb.sb, ca->dev_idx); + unsigned ret =3D bch2_sb_dev_has_data(c->disk_sb.sb, ca->dev_idx); mutex_unlock(&c->sb_lock); =20 return ret; @@ -941,8 +872,3 @@ void bch2_fs_replicas_exit(struct bch_fs *c) kfree(c->replicas.entries); kfree(c->replicas_gc.entries); } - -int bch2_fs_replicas_init(struct bch_fs *c) -{ - return replicas_table_update(c, &c->replicas); -} diff --git a/fs/bcachefs/replicas.h b/fs/bcachefs/replicas.h index eac2dff20423..ab2d00e4865c 100644 --- a/fs/bcachefs/replicas.h +++ b/fs/bcachefs/replicas.h @@ -80,6 +80,5 @@ extern const struct bch_sb_field_ops bch_sb_field_ops_rep= licas; extern const struct bch_sb_field_ops bch_sb_field_ops_replicas_v0; =20 void bch2_fs_replicas_exit(struct bch_fs *); -int bch2_fs_replicas_init(struct bch_fs *); =20 #endif /* _BCACHEFS_REPLICAS_H */ diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c index 89c481831608..6617c8912e51 100644 --- a/fs/bcachefs/super.c +++ b/fs/bcachefs/super.c @@ -894,7 +894,6 @@ static struct bch_fs *bch2_fs_alloc(struct bch_sb *sb, = struct bch_opts opts) bch2_io_clock_init(&c->io_clock[READ]) ?: bch2_io_clock_init(&c->io_clock[WRITE]) ?: bch2_fs_journal_init(&c->journal) ?: - bch2_fs_replicas_init(c) ?: bch2_fs_btree_cache_init(c) ?: bch2_fs_btree_key_cache_init(&c->btree_key_cache) ?: bch2_fs_btree_iter_init(c) ?: @@ -1772,7 +1771,7 @@ int bch2_dev_add(struct bch_fs *c, const char *path) bch2_write_super(c); mutex_unlock(&c->sb_lock); =20 - ret =3D bch2_dev_usage_init(ca); + ret =3D bch2_dev_usage_init(ca, false); if (ret) goto err_late; =20 @@ -1946,9 +1945,9 @@ int bch2_dev_resize(struct bch_fs *c, struct bch_dev = *ca, u64 nbuckets) }; u64 v[3] =3D { nbuckets - old_nbuckets, 0, 0 }; =20 - ret =3D bch2_dev_freespace_init(c, ca, old_nbuckets, nbuckets) ?: - bch2_trans_do(ca->fs, NULL, NULL, 0, - bch2_disk_accounting_mod(trans, &acc, v, ARRAY_SIZE(v))); + ret =3D bch2_trans_do(ca->fs, NULL, NULL, 0, + bch2_disk_accounting_mod(trans, &acc, v, ARRAY_SIZE(v), false)) ?: + bch2_dev_freespace_init(c, ca, old_nbuckets, nbuckets); if (ret) goto err; } --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-173.mta1.migadu.com (out-173.mta1.migadu.com [95.215.58.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3814017756 for ; Sun, 25 Feb 2024 02:38:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828732; cv=none; b=goeLBd3I6gfBv+ar/1ETvfT37Ekl3UKqdSKWeK9rzixhc8ggmnzhfmuxjrTVW3shYWvbDVuMxkdGTXGRyEs15K5bNdTB+IkLgaji+C3SQ8FvKN8j+TLXKcyoU2liwdYV476f/VB/o7ZJznCDw7G1NIECtg2fSDzJklyFR9gJTl0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828732; c=relaxed/simple; bh=8Dzr04OBJMHQlzrTPGyog2ZCPLnWs2SakI+yYoIZVac=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mJyIUkcdrgnTcMGprniKIt0HNThF1cywJ38RhTZoeoDzHOwPtBlhNhRg6Xch7RASXNAMR7sAMyO+ICluQZO0Z8i3yPmbPU+sZorCsmdyfpOdWROHcTSd393qAkNxOqcHe28YmBcn+zSacwZeveCQOyzNbY/A2hBmVD8n8jzbPZs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=xf2Xkblz; arc=none smtp.client-ip=95.215.58.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="xf2Xkblz" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828728; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y3FpSQO+5DeuDSgLy4pK3aklIcBNM9sDCfco3eXxsBU=; b=xf2Xkblz2aGS8hwBJXBPOzrrYlK07lSAMxX8n8o5PQQs8pqo7kZxNeCbl8l4YxHUTwjU4D CKMJpJCYI2oY55RXahByj51h/1LiFp8t5V8mhhNFeyU/a55ugr0omICPqdcmsDQRUoc1sW puYGne+ebIvnIdlMhYYEkyM+2zQMIw8= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 15/21] bcachefs: Convert bch2_replicas_gc2() to new accounting Date: Sat, 24 Feb 2024 21:38:17 -0500 Message-ID: <20240225023826.2413565-16-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" bch2_replicas_gc2() is used for garbage collection superblock replicas entries that are empty - this converts it to the new accounting scheme. Signed-off-by: Kent Overstreet --- fs/bcachefs/replicas.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/fs/bcachefs/replicas.c b/fs/bcachefs/replicas.c index cba5ba44cfd8..18137abb1857 100644 --- a/fs/bcachefs/replicas.c +++ b/fs/bcachefs/replicas.c @@ -2,6 +2,7 @@ =20 #include "bcachefs.h" #include "buckets.h" +#include "disk_accounting.h" #include "journal.h" #include "replicas.h" #include "super-io.h" @@ -425,8 +426,6 @@ int bch2_replicas_gc_start(struct bch_fs *c, unsigned t= ypemask) */ int bch2_replicas_gc2(struct bch_fs *c) { - return 0; -#if 0 struct bch_replicas_cpu new =3D { 0 }; unsigned i, nr; int ret =3D 0; @@ -456,20 +455,26 @@ int bch2_replicas_gc2(struct bch_fs *c) struct bch_replicas_entry_v1 *e =3D cpu_replicas_entry(&c->replicas, i); =20 - if (e->data_type =3D=3D BCH_DATA_journal || - c->usage_base->replicas[i] || - percpu_u64_get(&c->usage[0]->replicas[i]) || - percpu_u64_get(&c->usage[1]->replicas[i]) || - percpu_u64_get(&c->usage[2]->replicas[i]) || - percpu_u64_get(&c->usage[3]->replicas[i])) + struct disk_accounting_key k =3D { + .type =3D BCH_DISK_ACCOUNTING_replicas, + }; + + memcpy(&k.replicas, e, replicas_entry_bytes(e)); + + u64 v =3D 0; + bch2_accounting_mem_read(c, disk_accounting_key_to_bpos(&k), &v, 1); + + if (e->data_type =3D=3D BCH_DATA_journal || v) memcpy(cpu_replicas_entry(&new, new.nr++), e, new.entry_size); } =20 bch2_cpu_replicas_sort(&new); =20 - ret =3D bch2_cpu_replicas_to_sb_replicas(c, &new) ?: - replicas_table_update(c, &new); + ret =3D bch2_cpu_replicas_to_sb_replicas(c, &new); + + if (!ret) + swap(c->replicas, new); =20 kfree(new.entries); =20 @@ -481,7 +486,6 @@ int bch2_replicas_gc2(struct bch_fs *c) mutex_unlock(&c->sb_lock); =20 return ret; -#endif } =20 /* Replicas tracking - superblock: */ --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF0E317BBB for ; Sun, 25 Feb 2024 02:38:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828732; cv=none; b=aaa47r7CRweTWVCw1cg+Up3mMUj+FvtoBi0dK+lljd0wX3Bsp5YVmGOP2nfBskhk0WdKdCIsRXDmIfSBZfbZLbrDlfhQ3GKn0PudWWUE4xqkJ8Ax75cKIe7gDb0AscvBZig8wUOQBFGHW3IxOyvo3EIC/Ssu4KRD4qnsHYZqEFI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828732; c=relaxed/simple; bh=693nN94qUSbbDkqHrUPV8kMgo4OMZv7SWEQcI8AOyfo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HPDPzzRrjxFQG2bayNX7etKnCs0ChDJsEaGuBnXa1I6GOPjZxspUE4QuXiIbxlifzcYgpQI+6OmF0x71YvEiOBPSkYpT1S6G1aRprIFz/xjxVVbsVs+Dw0JiGaok093q8yjevgHwzNy/KYISfOU+B8nYRtQfySClTisrpfjSaic= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=t5Uq6mW9; arc=none smtp.client-ip=95.215.58.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="t5Uq6mW9" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828729; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Qtt8N3DTXT8YcoliwHPFpPxKQXIz9TTQtDYUciFTLnM=; b=t5Uq6mW9GG3Iozxg1JY5qA7lforHDcbj+wg4MIkQsFH4ck90PVZ93xMlSgL4eMlDl2woyi EvPtgsLVWvtMuqILOQqRjhSeb3Dmsvw1RAoitJjHJDyZJ4aMCg5VZlKRoiFlMU1RXMTgZb bE7B3hd9Ir4lBJC32mIPIaOweIf8fqk= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 16/21] bcachefs: bch2_verify_accounting_clean() Date: Sat, 24 Feb 2024 21:38:18 -0500 Message-ID: <20240225023826.2413565-17-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Verify that the in-memory accounting verifies the on-disk accounting after a clean shutdown. Signed-off-by: Kent Overstreet --- fs/bcachefs/disk_accounting.c | 27 +++++++++++++++++++++++++++ fs/bcachefs/disk_accounting.h | 4 +++- fs/bcachefs/super.c | 1 + 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/fs/bcachefs/disk_accounting.c b/fs/bcachefs/disk_accounting.c index 2884615adc1e..8d7b6ab66e71 100644 --- a/fs/bcachefs/disk_accounting.c +++ b/fs/bcachefs/disk_accounting.c @@ -513,6 +513,33 @@ int bch2_dev_usage_init(struct bch_dev *ca, bool gc) return ret; } =20 +void bch2_verify_accounting_clean(struct bch_fs *c) +{ + bch2_trans_run(c, + for_each_btree_key(trans, iter, + BTREE_ID_accounting, POS_MIN, + BTREE_ITER_ALL_SNAPSHOTS, k, ({ + u64 v[BCH_ACCOUNTING_MAX_COUNTERS]; + struct bkey_s_c_accounting a =3D bkey_s_c_to_accounting(k); + unsigned nr =3D bch2_accounting_counters(k.k); + + bch2_accounting_mem_read(c, k.k->p, v, nr); + + if (memcmp(a.v->d, v, nr * sizeof(u64))) { + struct printbuf buf =3D PRINTBUF; + + bch2_bkey_val_to_text(&buf, c, k); + prt_str(&buf, " in mem"); + for (unsigned j =3D 0; j < nr; j++) + prt_printf(&buf, " %llu", v[j]); + + WARN(1, "accounting mismatch: %s", buf.buf); + printbuf_exit(&buf); + } + 0; + }))); +} + void bch2_accounting_free(struct bch_accounting_mem *acc) { darray_exit(&acc->k); diff --git a/fs/bcachefs/disk_accounting.h b/fs/bcachefs/disk_accounting.h index 70ac67f4a3cb..a0cf7a0b84a7 100644 --- a/fs/bcachefs/disk_accounting.h +++ b/fs/bcachefs/disk_accounting.h @@ -164,7 +164,7 @@ static inline void bch2_accounting_mem_read_counters(st= ruct bch_fs *c, unsigned { memset(v, 0, sizeof(*v) * nr); =20 - struct bch_accounting_mem *acc =3D &c->accounting[0]; + struct bch_accounting_mem *acc =3D &c->accounting[gc]; if (unlikely(idx >=3D acc->k.nr)) return; =20 @@ -194,6 +194,8 @@ int bch2_accounting_read(struct bch_fs *); int bch2_dev_usage_remove(struct bch_fs *, unsigned); int bch2_dev_usage_init(struct bch_dev *, bool); =20 +void bch2_verify_accounting_clean(struct bch_fs *c); + void bch2_accounting_free(struct bch_accounting_mem *); void bch2_fs_accounting_exit(struct bch_fs *); =20 diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c index 6617c8912e51..201d7767e478 100644 --- a/fs/bcachefs/super.c +++ b/fs/bcachefs/super.c @@ -355,6 +355,7 @@ void bch2_fs_read_only(struct bch_fs *c) BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty)); BUG_ON(c->btree_write_buffer.inc.keys.nr); BUG_ON(c->btree_write_buffer.flushing.keys.nr); + bch2_verify_accounting_clean(c); =20 bch_verbose(c, "marking filesystem clean"); bch2_fs_mark_clean(c); --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-185.mta1.migadu.com (out-185.mta1.migadu.com [95.215.58.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6C3E1805A for ; Sun, 25 Feb 2024 02:38:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828734; cv=none; b=lPuBQpE6jLSUlAlRu1ujxBHZAPCLrF6pg+o7ZH+NQDfROtniHKy65/XVRAjU7kcolynZk+fl8ThgdQ4ty34gaCvJjCCvRdVcjqQOvQS4yXF0/s87ompN4JuFkam1f38axuss/0d9TpT5l3H51/nn6kntqBCySYbFrGEgwcM2VMU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828734; c=relaxed/simple; bh=9aDC/MHLqlfI7ZdF71omQwXEoDMiNVdOwQThfvn1sVw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uJ5Eq7O7GcmmxHyz9HTMRzbB/MNwSJK6aUV6RiROo1lkSaaW0Ia9tqNk1o2YTUd8VBLffQWSKzlsGgLC0yir3c7fZKbb598TMMXimLpaHsy/3hYItvMotOrLCb/gW/F1+D5lUZig4A/W7xIE+3NygmxUBKwumFY13PKjJiUHKxg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=P1ACEzCo; arc=none smtp.client-ip=95.215.58.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="P1ACEzCo" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828730; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DZ05Q5mbXYgDagN6r2B0XGx91E8emllsAKskdvguQ3M=; b=P1ACEzCotmgZtAdngBcAlZDwiID3fTZ23wEWyAp6dqymLfSg0PrfkPjzlmAqAK6csR0B4E mJVmcwwKGucTeMgN+bOT4aqQmHfnSfYgVYLffQCwjGTHCGwi3erHybuFlhHDg6UBMT/VLD nKLa/tn48WNpIGD2S25xiBzDYmsdnLk= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 17/21] bcachefs: Eytzinger accumulation for accounting keys Date: Sat, 24 Feb 2024 21:38:19 -0500 Message-ID: <20240225023826.2413565-18-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" The btree write buffer takes as input keys from the journal, sorts them, deduplicates them, and flushes them back to the btree in sorted order. The disk space accounting rewrite is moving accounting to normal btree keys, with update (in this case deltas) accumulated in the write buffer and then flushed to the btree; but this is going to increase the number of keys handled by the write buffer by perhaps as much as a factor of 3x-5x. The overhead from copying around and sorting this many keys would cause a significant performance regression, but: there is huge locality in updates to accounting keys that we can take advantage of. Instead of appending accounting keys to the list of keys to be sorted, this patch adds an eytzinger search tree of recently seen accounting keys. We look up the accounting key in the eytzinger search tree and apply the delta directly, adding it if it doesn't exist, and periodically prune the eytzinger tree of unused entries. Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_write_buffer.c | 54 +++++++++++++++++++++++++- fs/bcachefs/btree_write_buffer.h | 50 ++++++++++++++++++++++-- fs/bcachefs/btree_write_buffer_types.h | 2 + fs/bcachefs/journal_io.c | 13 +++++-- 4 files changed, 110 insertions(+), 9 deletions(-) diff --git a/fs/bcachefs/btree_write_buffer.c b/fs/bcachefs/btree_write_buf= fer.c index 002a0762fc85..13f5f63e22b7 100644 --- a/fs/bcachefs/btree_write_buffer.c +++ b/fs/bcachefs/btree_write_buffer.c @@ -531,6 +531,29 @@ static void bch2_btree_write_buffer_flush_work(struct = work_struct *work) bch2_write_ref_put(c, BCH_WRITE_REF_btree_write_buffer); } =20 +static void wb_accounting_sort(struct btree_write_buffer *wb) +{ + eytzinger0_sort(wb->accounting.data, wb->accounting.nr, + sizeof(wb->accounting.data[0]), + wb_key_cmp, NULL); +} + +int bch2_accounting_key_to_wb_slowpath(struct bch_fs *c, enum btree_id btr= ee, + struct bkey_i_accounting *k) +{ + struct btree_write_buffer *wb =3D &c->btree_write_buffer; + struct btree_write_buffered_key new =3D { .btree =3D btree }; + + bkey_copy(&new.k, &k->k_i); + + int ret =3D darray_push(&wb->accounting, new); + if (ret) + return ret; + + wb_accounting_sort(wb); + return 0; +} + int bch2_journal_key_to_wb_slowpath(struct bch_fs *c, struct journal_keys_to_wb *dst, enum btree_id btree, struct bkey_i *k) @@ -600,11 +623,35 @@ void bch2_journal_keys_to_write_buffer_start(struct b= ch_fs *c, struct journal_ke =20 bch2_journal_pin_add(&c->journal, seq, &dst->wb->pin, bch2_btree_write_buffer_journal_flush); + + darray_for_each(wb->accounting, i) + memset(&i->k.v, 0, bkey_val_bytes(&i->k.k)); } =20 -void bch2_journal_keys_to_write_buffer_end(struct bch_fs *c, struct journa= l_keys_to_wb *dst) +int bch2_journal_keys_to_write_buffer_end(struct bch_fs *c, struct journal= _keys_to_wb *dst) { struct btree_write_buffer *wb =3D &c->btree_write_buffer; + unsigned live_accounting_keys =3D 0; + int ret =3D 0; + + darray_for_each(wb->accounting, i) + if (!bch2_accounting_key_is_zero(bkey_i_to_s_c_accounting(&i->k))) { + i->journal_seq =3D dst->seq; + live_accounting_keys++; + ret =3D __bch2_journal_key_to_wb(c, dst, i->btree, &i->k); + if (ret) + break; + } + + if (live_accounting_keys * 2 < wb->accounting.nr) { + struct btree_write_buffered_key *dst =3D wb->accounting.data; + + darray_for_each(wb->accounting, src) + if (!bch2_accounting_key_is_zero(bkey_i_to_s_c_accounting(&src->k))) + *dst++ =3D *src; + wb->accounting.nr =3D dst - wb->accounting.data; + wb_accounting_sort(wb); + } =20 if (!dst->wb->keys.nr) bch2_journal_pin_drop(&c->journal, &dst->wb->pin); @@ -617,6 +664,8 @@ void bch2_journal_keys_to_write_buffer_end(struct bch_f= s *c, struct journal_keys if (dst->wb =3D=3D &wb->flushing) mutex_unlock(&wb->flushing.lock); mutex_unlock(&wb->inc.lock); + + return ret; } =20 static int bch2_journal_keys_to_write_buffer(struct bch_fs *c, struct jour= nal_buf *buf) @@ -640,7 +689,7 @@ static int bch2_journal_keys_to_write_buffer(struct bch= _fs *c, struct journal_bu buf->need_flush_to_write_buffer =3D false; spin_unlock(&c->journal.lock); out: - bch2_journal_keys_to_write_buffer_end(c, &dst); + ret =3D bch2_journal_keys_to_write_buffer_end(c, &dst) ?: ret; return ret; } =20 @@ -672,6 +721,7 @@ void bch2_fs_btree_write_buffer_exit(struct bch_fs *c) BUG_ON((wb->inc.keys.nr || wb->flushing.keys.nr) && !bch2_journal_error(&c->journal)); =20 + darray_exit(&wb->accounting); darray_exit(&wb->sorted); darray_exit(&wb->flushing.keys); darray_exit(&wb->inc.keys); diff --git a/fs/bcachefs/btree_write_buffer.h b/fs/bcachefs/btree_write_buf= fer.h index eebcd2b15249..828e2deaaa3d 100644 --- a/fs/bcachefs/btree_write_buffer.h +++ b/fs/bcachefs/btree_write_buffer.h @@ -3,6 +3,8 @@ #define _BCACHEFS_BTREE_WRITE_BUFFER_H =20 #include "bkey.h" +#include "disk_accounting.h" +#include =20 static inline bool bch2_btree_write_buffer_should_flush(struct bch_fs *c) { @@ -29,16 +31,45 @@ struct journal_keys_to_wb { u64 seq; }; =20 +static inline int wb_key_cmp(const void *_l, const void *_r) +{ + const struct btree_write_buffered_key *l =3D _l; + const struct btree_write_buffered_key *r =3D _r; + + return cmp_int(l->btree, r->btree) ?: bpos_cmp(l->k.k.p, r->k.k.p); +} + +int bch2_accounting_key_to_wb_slowpath(struct bch_fs *, + enum btree_id, struct bkey_i_accounting *); + +static inline int bch2_accounting_key_to_wb(struct bch_fs *c, + enum btree_id btree, struct bkey_i_accounting *k) +{ + struct btree_write_buffer *wb =3D &c->btree_write_buffer; + struct btree_write_buffered_key search; + search.btree =3D btree; + search.k.k.p =3D k->k.p; + + unsigned idx =3D eytzinger0_find(wb->accounting.data, wb->accounting.nr, + sizeof(wb->accounting.data[0]), + wb_key_cmp, &search); + + if (idx >=3D wb->accounting.nr) + return bch2_accounting_key_to_wb_slowpath(c, btree, k); + + struct bkey_i_accounting *dst =3D bkey_i_to_accounting(&wb->accounting.da= ta[idx].k); + bch2_accounting_accumulate(dst, accounting_i_to_s_c(k)); + return 0; +} + int bch2_journal_key_to_wb_slowpath(struct bch_fs *, struct journal_keys_to_wb *, enum btree_id, struct bkey_i *); =20 -static inline int bch2_journal_key_to_wb(struct bch_fs *c, +static inline int __bch2_journal_key_to_wb(struct bch_fs *c, struct journal_keys_to_wb *dst, enum btree_id btree, struct bkey_i *k) { - EBUG_ON(!dst->seq); - if (unlikely(!dst->room)) return bch2_journal_key_to_wb_slowpath(c, dst, btree, k); =20 @@ -51,8 +82,19 @@ static inline int bch2_journal_key_to_wb(struct bch_fs *= c, return 0; } =20 +static inline int bch2_journal_key_to_wb(struct bch_fs *c, + struct journal_keys_to_wb *dst, + enum btree_id btree, struct bkey_i *k) +{ + EBUG_ON(!dst->seq); + + return k->k.type =3D=3D KEY_TYPE_accounting + ? bch2_accounting_key_to_wb(c, btree, bkey_i_to_accounting(k)) + : __bch2_journal_key_to_wb(c, dst, btree, k); +} + void bch2_journal_keys_to_write_buffer_start(struct bch_fs *, struct journ= al_keys_to_wb *, u64); -void bch2_journal_keys_to_write_buffer_end(struct bch_fs *, struct journal= _keys_to_wb *); +int bch2_journal_keys_to_write_buffer_end(struct bch_fs *, struct journal_= keys_to_wb *); =20 int bch2_btree_write_buffer_resize(struct bch_fs *, size_t); void bch2_fs_btree_write_buffer_exit(struct bch_fs *); diff --git a/fs/bcachefs/btree_write_buffer_types.h b/fs/bcachefs/btree_wri= te_buffer_types.h index 5f248873087c..d39d163c6ea9 100644 --- a/fs/bcachefs/btree_write_buffer_types.h +++ b/fs/bcachefs/btree_write_buffer_types.h @@ -52,6 +52,8 @@ struct btree_write_buffer { struct btree_write_buffer_keys inc; struct btree_write_buffer_keys flushing; struct work_struct flush_work; + + DARRAY(struct btree_write_buffered_key) accounting; }; =20 #endif /* _BCACHEFS_BTREE_WRITE_BUFFER_TYPES_H */ diff --git a/fs/bcachefs/journal_io.c b/fs/bcachefs/journal_io.c index b37b75ccd602..3ea2be99d411 100644 --- a/fs/bcachefs/journal_io.c +++ b/fs/bcachefs/journal_io.c @@ -1815,7 +1815,8 @@ static int bch2_journal_write_prep(struct journal *j,= struct journal_buf *w) jset_entry_for_each_key(i, k) { ret =3D bch2_journal_key_to_wb(c, &wb, i->btree_id, k); if (ret) { - bch2_fs_fatal_error(c, "-ENOMEM flushing journal keys to btree write = buffer"); + bch2_fs_fatal_error(c, "error flushing journal keys to btree write bu= ffer: %s", + bch2_err_str(ret)); bch2_journal_keys_to_write_buffer_end(c, &wb); return ret; } @@ -1825,8 +1826,14 @@ static int bch2_journal_write_prep(struct journal *j= , struct journal_buf *w) } } =20 - if (wb.wb) - bch2_journal_keys_to_write_buffer_end(c, &wb); + if (wb.wb) { + ret =3D bch2_journal_keys_to_write_buffer_end(c, &wb); + if (ret) { + bch2_fs_fatal_error(c, "error flushing journal keys to btree write buff= er: %s", + bch2_err_str(ret)); + return ret; + } + } =20 spin_lock(&c->journal.lock); w->need_flush_to_write_buffer =3D false; --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9599C18AE0 for ; Sun, 25 Feb 2024 02:38:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828734; cv=none; b=QdPqLjNS3TcLf2fDJokjw5n95CQIO+hPp7h6oEWDhxBW50lk1L816wWTSnzKQRlNoyNxbJzmYXogHevkfCZXqSYTgpV3rF/ird/wSZ/Od3/Co0MItdlyT/EJS4uKfTb1YTLCVrFHZk1IBd04X+rUv/2Yjgs9/3ZeTnnkpXRD4Sw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828734; c=relaxed/simple; bh=CmRGISWqMl8OiQ/BrO+n0lHBIFvE2WpEgm7l++MU4x8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=g6eD9dEd/0VH6o5k3WmfBRWknQ+mmGtpZi6Lnqq0war1rb9tiz9s3Oyf91di5yWbZzi2vUYq4YsBitcBf0cU5ZcVic1qnZU3rfgaq5/JG8l+19KQBEqXfRkVBZXqpRX9ll9GJhILWR9u8r3vJdVIPB18iOCvYV44odaZavU2qCc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Fjr7jqpM; arc=none smtp.client-ip=95.215.58.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Fjr7jqpM" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828731; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KXqi/mUCPqaAmwNVCa/3pdqBU+XG+a15SZVPhvlslXA=; b=Fjr7jqpM1bfxbQZzfZEw2oEx+fC+2/tR+Hk4GtDWSMFSObqBrFPYCYyAx61DeCqFrB6BZV YN7kJt77NuFYbQv/Gjdssy8I0Qgx7U/GxmGEPG6PLtg35t0Z+WhBuwcXEv2NBjtttB3jDj B90A2do5+NkUf/ZfYL5c0dg/sNAKUlE= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 18/21] bcachefs: bch_acct_compression Date: Sat, 24 Feb 2024 21:38:20 -0500 Message-ID: <20240225023826.2413565-19-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" This adds per-compression-type accounting of compressed and uncompressed size as well as number of extents - meaning we can now see compression ratio (without walking the whole filesystem). Signed-off-by: Kent Overstreet --- fs/bcachefs/buckets.c | 45 ++++++++++++++++++++++++---- fs/bcachefs/disk_accounting.c | 4 +++ fs/bcachefs/disk_accounting_format.h | 8 ++++- 3 files changed, 51 insertions(+), 6 deletions(-) diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c index 506bb580bff4..6078b67e51cf 100644 --- a/fs/bcachefs/buckets.c +++ b/fs/bcachefs/buckets.c @@ -503,6 +503,7 @@ static int __trigger_extent(struct btree_trans *trans, : BCH_DATA_user; s64 dirty_sectors =3D 0; int ret =3D 0; + u64 compression_acct[3] =3D { 1, 0, 0 }; =20 struct disk_accounting_key acc =3D { .type =3D BCH_DISK_ACCOUNTING_replicas, @@ -511,6 +512,10 @@ static int __trigger_extent(struct btree_trans *trans, .replicas.nr_required =3D 1, }; =20 + struct disk_accounting_key compression_key =3D { + .type =3D BCH_DISK_ACCOUNTING_compression, + }; + bkey_for_each_ptr_decode(k.k, ptrs, p, entry) { s64 disk_sectors; ret =3D bch2_trigger_pointer(trans, btree_id, level, k, p, &disk_sectors= , flags); @@ -519,12 +524,13 @@ static int __trigger_extent(struct btree_trans *trans, =20 bool stale =3D ret > 0; =20 + if (p.ptr.cached && stale) + continue; + if (p.ptr.cached) { - if (!stale) { - ret =3D bch2_mod_dev_cached_sectors(trans, p.ptr.dev, disk_sectors, gc= ); - if (ret) - return ret; - } + ret =3D bch2_mod_dev_cached_sectors(trans, p.ptr.dev, disk_sectors, gc); + if (ret) + return ret; } else if (!p.has_ec) { dirty_sectors +=3D disk_sectors; acc.replicas.devs[acc.replicas.nr_devs++] =3D p.ptr.dev; @@ -540,6 +546,26 @@ static int __trigger_extent(struct btree_trans *trans, */ acc.replicas.nr_required =3D 0; } + + if (compression_key.compression.type && + compression_key.compression.type !=3D p.crc.compression_type) { + if (flags & BTREE_TRIGGER_OVERWRITE) + bch2_u64s_neg(compression_acct, 3); + + ret =3D bch2_disk_accounting_mod(trans, &compression_key, compression_a= cct, 2, gc); + if (ret) + return ret; + + compression_acct[0] =3D 1; + compression_acct[1] =3D 0; + compression_acct[2] =3D 0; + } + + compression_key.compression.type =3D p.crc.compression_type; + if (p.crc.compression_type) { + compression_acct[1] +=3D p.crc.uncompressed_size; + compression_acct[2] +=3D p.crc.compressed_size; + } } =20 if (acc.replicas.nr_devs) { @@ -548,6 +574,15 @@ static int __trigger_extent(struct btree_trans *trans, return ret; } =20 + if (compression_key.compression.type) { + if (flags & BTREE_TRIGGER_OVERWRITE) + bch2_u64s_neg(compression_acct, 3); + + ret =3D bch2_disk_accounting_mod(trans, &compression_key, compression_ac= ct, 3, gc); + if (ret) + return ret; + } + return 0; } =20 diff --git a/fs/bcachefs/disk_accounting.c b/fs/bcachefs/disk_accounting.c index 8d7b6ab66e71..dc020d651d0a 100644 --- a/fs/bcachefs/disk_accounting.c +++ b/fs/bcachefs/disk_accounting.c @@ -5,6 +5,7 @@ #include "btree_update.h" #include "btree_write_buffer.h" #include "buckets.h" +#include "compress.h" #include "disk_accounting.h" #include "error.h" #include "journal_io.h" @@ -91,6 +92,9 @@ void bch2_accounting_key_to_text(struct printbuf *out, st= ruct disk_accounting_ke case BCH_DISK_ACCOUNTING_dev_stripe_buckets: prt_printf(out, "dev=3D%u", k->dev_stripe_buckets.dev); break; + case BCH_DISK_ACCOUNTING_compression: + bch2_prt_compression_type(out, k->compression.type); + break; } } =20 diff --git a/fs/bcachefs/disk_accounting_format.h b/fs/bcachefs/disk_accoun= ting_format.h index e06a42f0d578..75bfc9bce79f 100644 --- a/fs/bcachefs/disk_accounting_format.h +++ b/fs/bcachefs/disk_accounting_format.h @@ -95,7 +95,8 @@ static inline bool data_type_is_hidden(enum bch_data_type= type) x(persistent_reserved, 1) \ x(replicas, 2) \ x(dev_data_type, 3) \ - x(dev_stripe_buckets, 4) + x(dev_stripe_buckets, 4) \ + x(compression, 5) =20 enum disk_accounting_type { #define x(f, nr) BCH_DISK_ACCOUNTING_##f =3D nr, @@ -120,6 +121,10 @@ struct bch_dev_stripe_buckets { __u8 dev; }; =20 +struct bch_acct_compression { + __u8 type; +}; + struct disk_accounting_key { union { struct { @@ -130,6 +135,7 @@ struct disk_accounting_key { struct bch_replicas_entry_v1 replicas; struct bch_dev_data_type dev_data_type; struct bch_dev_stripe_buckets dev_stripe_buckets; + struct bch_acct_compression compression; }; }; struct bpos _pad; --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96A4E1AADD for ; Sun, 25 Feb 2024 02:38:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828737; cv=none; b=uHSpJWvLppPfH27zWvWv4jTvwzFxcqgBoDRBwv3MLFn1NUXosVsaknDIs0wtj4iM0kwrb/xvQ6nkB6AhQVuSmXcficRwcq8Tu4/6iEziSsu29o1mgQJ8xQUfQqk1FtDX9aY8P/rkmCu36Hf8WiqKLEgb36hKMv12/dP4nzWNMBY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828737; c=relaxed/simple; bh=ZgM0zLMZFwJYqyvB4tXh9kF50cWRp9rLDmexM/GbfvM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=K8yU1fKcLEbGn9XDBWPWkIu38RE4zMe6a8SXSZiIwvFM//iKmPd1u68cik4aNQGUplzIHJifeddR5mR0nXr2RC1Q+8UuSD2GXKUmfGnTQ25HEjOk5DZ3vPlw3MVx2TwqqTOXZgH9EyNkmWQqmcBVsp8R06yWlIYDqJEce7bJhuU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=GtMroGI9; arc=none smtp.client-ip=95.215.58.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="GtMroGI9" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828732; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=weqtx2u84KusAKamrBTKZCBf70EK9pRqL6N4MsvPgW0=; b=GtMroGI9SLS2HtSssDe36Rm8zROfrOV84oqSilqNyQ2mrClrsaF9kMvGGXEIGcdMFQ+4ox yEVdD91npxWvtmF924xIjbT+LsABnJdd8Pk4fMKyoMRA3ZTxdonTPh4BrzJXvRlvxiIqHj JLCJDAQVT9YdZ0xYe8lCVAXTYVeoW0U= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 19/21] bcachefs: Convert bch2_compression_stats_to_text() to new accounting Date: Sat, 24 Feb 2024 21:38:21 -0500 Message-ID: <20240225023826.2413565-20-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" We no longer have to walk the whole btree to calculate compression stats. Signed-off-by: Kent Overstreet --- fs/bcachefs/sysfs.c | 85 ++++++++++----------------------------------- 1 file changed, 18 insertions(+), 67 deletions(-) diff --git a/fs/bcachefs/sysfs.c b/fs/bcachefs/sysfs.c index c86a93a8d8fc..287a0bf920db 100644 --- a/fs/bcachefs/sysfs.c +++ b/fs/bcachefs/sysfs.c @@ -22,6 +22,7 @@ #include "buckets.h" #include "clock.h" #include "compress.h" +#include "disk_accounting.h" #include "disk_groups.h" #include "ec.h" #include "inode.h" @@ -256,63 +257,6 @@ static size_t bch2_btree_cache_size(struct bch_fs *c) =20 static int bch2_compression_stats_to_text(struct printbuf *out, struct bch= _fs *c) { - struct btree_trans *trans; - enum btree_id id; - struct compression_type_stats { - u64 nr_extents; - u64 sectors_compressed; - u64 sectors_uncompressed; - } s[BCH_COMPRESSION_TYPE_NR]; - u64 compressed_incompressible =3D 0; - int ret =3D 0; - - memset(s, 0, sizeof(s)); - - if (!test_bit(BCH_FS_started, &c->flags)) - return -EPERM; - - trans =3D bch2_trans_get(c); - - for (id =3D 0; id < BTREE_ID_NR; id++) { - if (!btree_type_has_ptrs(id)) - continue; - - ret =3D for_each_btree_key(trans, iter, id, POS_MIN, - BTREE_ITER_ALL_SNAPSHOTS, k, ({ - struct bkey_ptrs_c ptrs =3D bch2_bkey_ptrs_c(k); - struct bch_extent_crc_unpacked crc; - const union bch_extent_entry *entry; - bool compressed =3D false, incompressible =3D false; - - bkey_for_each_crc(k.k, ptrs, crc, entry) { - incompressible |=3D crc.compression_type =3D=3D BCH_COMPRESSION_TYPE_i= ncompressible; - compressed |=3D crc_is_compressed(crc); - - if (crc_is_compressed(crc)) { - s[crc.compression_type].nr_extents++; - s[crc.compression_type].sectors_compressed +=3D crc.compressed_size; - s[crc.compression_type].sectors_uncompressed +=3D crc.uncompressed_si= ze; - } - } - - compressed_incompressible +=3D compressed && incompressible; - - if (!compressed) { - unsigned t =3D incompressible ? BCH_COMPRESSION_TYPE_incompressible : = 0; - - s[t].nr_extents++; - s[t].sectors_compressed +=3D k.k->size; - s[t].sectors_uncompressed +=3D k.k->size; - } - 0; - })); - } - - bch2_trans_put(trans); - - if (ret) - return ret; - prt_str(out, "type"); printbuf_tabstop_push(out, 12); prt_tab(out); @@ -330,28 +274,35 @@ static int bch2_compression_stats_to_text(struct prin= tbuf *out, struct bch_fs *c prt_tab_rjust(out); prt_newline(out); =20 - for (unsigned i =3D 0; i < ARRAY_SIZE(s); i++) { + for (unsigned i =3D 1; i < BCH_COMPRESSION_TYPE_NR; i++) { + struct disk_accounting_key a =3D { + .type =3D BCH_DISK_ACCOUNTING_compression, + .compression.type =3D i, + }; + struct bpos p =3D disk_accounting_key_to_bpos(&a); + u64 v[3]; + bch2_accounting_mem_read(c, p, v, ARRAY_SIZE(v)); + + u64 nr_extents =3D v[0]; + u64 sectors_uncompressed =3D v[1]; + u64 sectors_compressed =3D v[2]; + bch2_prt_compression_type(out, i); prt_tab(out); =20 - prt_human_readable_u64(out, s[i].sectors_compressed << 9); + prt_human_readable_u64(out, sectors_compressed << 9); prt_tab_rjust(out); =20 - prt_human_readable_u64(out, s[i].sectors_uncompressed << 9); + prt_human_readable_u64(out, sectors_uncompressed << 9); prt_tab_rjust(out); =20 - prt_human_readable_u64(out, s[i].nr_extents - ? div_u64(s[i].sectors_uncompressed << 9, s[i].nr_extents) + prt_human_readable_u64(out, nr_extents + ? div_u64(sectors_uncompressed << 9, nr_extents) : 0); prt_tab_rjust(out); prt_newline(out); } =20 - if (compressed_incompressible) { - prt_printf(out, "%llu compressed & incompressible extents", compressed_i= ncompressible); - prt_newline(out); - } - return 0; } =20 --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3948F5256 for ; Sun, 25 Feb 2024 02:38:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828736; cv=none; b=T+csMglcWtcULjv8IjuQZ3wbJkVi/3sW1gVtk3uLdGarQZNFMGDi9qcaz2x6H0kl7MlZcl7f4DD/ztCWTK6yzI798d4yWaRo4Wa494VosLOWU/e0IQXFH29yWon08i+bK2gw9iPEe7PYlg5kwRhioqSZkJphxHiKhbtB2CNI8Vg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828736; c=relaxed/simple; bh=d8WYXNqXDWpZE4u+quG0FNl5y/ZGCmEc/dORUsz9M60=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ibvo9H8HwoFwj+SxxnoXCwWpmu87ojTXMgztXW4YULMRbcEuZvHut7ZBkZbBwJCrUe5v1+k3/qrt+ZXiQOzBbc1d3J+yQKe1sFyl1+URBaCaxx9wKhujT9inSnbLEVmo9qh4/raUEdxcrdeL55/tLZpvOjECESK15LK9H9RP2DA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=dFpkOyJp; arc=none smtp.client-ip=95.215.58.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="dFpkOyJp" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828732; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=njdVxTl+tyjYoQarlLc40yxm4EXppgie2qLf5Ju2emw=; b=dFpkOyJpE+vAWCrCeqjmgO4KjuyA+cNXR4K91DVMXSDUhtYjTvtg6YwMCkRtaUGjncIAsx KVLX5Brz3u8mgYUCbdcLe11heCjIT0PwXC0u2vfVqvnYCX5qsnu+oB+a9z3fHp9fGTSoyx WNE/5CtIf30n9gMB+JkKbtfWeKAW9Nc= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 20/21] bcachefs: bch2_fs_accounting_to_text() Date: Sat, 24 Feb 2024 21:38:22 -0500 Message-ID: <20240225023826.2413565-21-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Helper to show raw accounting in sysfs, mainly for debugging. Signed-off-by: Kent Overstreet --- fs/bcachefs/disk_accounting.c | 26 ++++++++++++++++++++++++++ fs/bcachefs/disk_accounting.h | 1 + fs/bcachefs/sysfs.c | 5 +++++ 3 files changed, 32 insertions(+) diff --git a/fs/bcachefs/disk_accounting.c b/fs/bcachefs/disk_accounting.c index dc020d651d0a..9d6ca2ea307b 100644 --- a/fs/bcachefs/disk_accounting.c +++ b/fs/bcachefs/disk_accounting.c @@ -286,6 +286,32 @@ int bch2_fs_replicas_usage_read(struct bch_fs *c, darr= ay_char *usage) return ret; } =20 +void bch2_fs_accounting_to_text(struct printbuf *out, struct bch_fs *c) +{ + struct bch_accounting_mem *acc =3D &c->accounting[0]; + + percpu_down_read(&c->mark_lock); + out->atomic++; + + eytzinger0_for_each(i, acc->k.nr) { + struct disk_accounting_key acc_k; + bpos_to_disk_accounting_key(&acc_k, acc->k.data[i].pos); + + bch2_accounting_key_to_text(out, &acc_k); + + u64 v[BCH_ACCOUNTING_MAX_COUNTERS]; + bch2_accounting_mem_read_counters(c, i, v, ARRAY_SIZE(v), false); + + prt_str(out, ":"); + for (unsigned j =3D 0; j < acc->k.data[i].nr_counters; j++) + prt_printf(out, " %llu", v[j]); + prt_newline(out); + } + + --out->atomic; + percpu_up_read(&c->mark_lock); +} + static int accounting_write_key(struct btree_trans *trans, struct bpos pos= , u64 *v, unsigned nr_counters) { struct bkey_i_accounting *a =3D bch2_trans_kmalloc(trans, sizeof(*a) + si= zeof(*v) * nr_counters); diff --git a/fs/bcachefs/disk_accounting.h b/fs/bcachefs/disk_accounting.h index a0cf7a0b84a7..c4a8b9cce6ba 100644 --- a/fs/bcachefs/disk_accounting.h +++ b/fs/bcachefs/disk_accounting.h @@ -186,6 +186,7 @@ static inline void bch2_accounting_mem_read(struct bch_= fs *c, struct bpos p, } =20 int bch2_fs_replicas_usage_read(struct bch_fs *, darray_char *); +void bch2_fs_accounting_to_text(struct printbuf *, struct bch_fs *); =20 int bch2_accounting_gc_done(struct bch_fs *); =20 diff --git a/fs/bcachefs/sysfs.c b/fs/bcachefs/sysfs.c index 287a0bf920db..10470cef30f0 100644 --- a/fs/bcachefs/sysfs.c +++ b/fs/bcachefs/sysfs.c @@ -204,6 +204,7 @@ read_attribute(disk_groups); =20 read_attribute(has_data); read_attribute(alloc_debug); +read_attribute(accounting); =20 #define x(t, n, ...) read_attribute(t); BCH_PERSISTENT_COUNTERS() @@ -413,6 +414,9 @@ SHOW(bch2_fs) if (attr =3D=3D &sysfs_disk_groups) bch2_disk_groups_to_text(out, c); =20 + if (attr =3D=3D &sysfs_accounting) + bch2_fs_accounting_to_text(out, c); + return 0; } =20 @@ -625,6 +629,7 @@ struct attribute *bch2_fs_internal_files[] =3D { &sysfs_internal_uuid, =20 &sysfs_disk_groups, + &sysfs_accounting, NULL }; =20 --=20 2.43.0 From nobody Sun Feb 8 05:58:57 2026 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56ED8DDAD for ; Sun, 25 Feb 2024 02:38:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828737; cv=none; b=OiDg+RUw5hItsuZG5UpjNvFwHS3+YWGaYjZVUdpKcFY7GZVGCdEQ4whF5azB6AEDhulLv+prQsBS7t5sOCtiiQmcshVIaEDXjSrolQKb7Y7PI9yJkRwS9oKqfAVLNN9qbQVYeh0Nsh6GrHGIhK43LG93fuydIAcIpLrDVJtccgc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708828737; c=relaxed/simple; bh=DfA9FVN9BLpGyKYVyfilqkfCwE9P2RiMXuov2SLCyRQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Bafr9GcwVPQwgu0ZPjxVC1jvFBY+WgOFGf4SefzrFTzOegpk7BZMYjuqumfGoiENJMT5rtI5SXVGEXauTAHmHW8shBkM3IQvoEvJW2bMjFsx0Hq1lO2jCXcme9pNJnYb0BVBODuEW4525x+4/IpnK/yUqNr6DMQ30RyJDabzSns= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ABOfH9pg; arc=none smtp.client-ip=95.215.58.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ABOfH9pg" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708828733; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R05PvZLs8g3ZOV6GSh50wBEFv8JbV+JIgM3xfyBtfwY=; b=ABOfH9pgrb1Dn5VORYxI5KNr7K0saesLc8gSTUBUt1urNenkobzD64qE8QGOxt8XhmFwp6 4uk5NdJ08DMgkYwbothiJa4mlww9zk2RIBSiZLMd4iLL9u6pQlTm16HgnAv8jPtQmxAjCS IkUZ+qEOfB/dlGkaO9MjJJfoP/JeWWc= From: Kent Overstreet To: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet , djwong@kernel.org, bfoster@redhat.com Subject: [PATCH 21/21] bcachefs: bch2_fs_usage_base_to_text() Date: Sat, 24 Feb 2024 21:38:23 -0500 Message-ID: <20240225023826.2413565-22-kent.overstreet@linux.dev> In-Reply-To: <20240225023826.2413565-1-kent.overstreet@linux.dev> References: <20240225023826.2413565-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Helper to show raw accounting in sysfs, mainly for debugging. Signed-off-by: Kent Overstreet --- fs/bcachefs/sysfs.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/fs/bcachefs/sysfs.c b/fs/bcachefs/sysfs.c index 10470cef30f0..27aca70cb385 100644 --- a/fs/bcachefs/sysfs.c +++ b/fs/bcachefs/sysfs.c @@ -205,6 +205,7 @@ read_attribute(disk_groups); read_attribute(has_data); read_attribute(alloc_debug); read_attribute(accounting); +read_attribute(usage_base); =20 #define x(t, n, ...) read_attribute(t); BCH_PERSISTENT_COUNTERS() @@ -329,6 +330,20 @@ static void bch2_btree_wakeup_all(struct bch_fs *c) seqmutex_unlock(&c->btree_trans_lock); } =20 +static void bch2_fs_usage_base_to_text(struct printbuf *out, struct bch_fs= *c) +{ + struct bch_fs_usage_base b =3D {}; + + acc_u64s_percpu(&b.hidden, &c->usage->hidden, sizeof(b) / sizeof(u64)); + + prt_printf(out, "hidden:\t\t%llu\n", b.hidden); + prt_printf(out, "btree:\t\t%llu\n", b.btree); + prt_printf(out, "data:\t\t%llu\n", b.data); + prt_printf(out, "cached:\t%llu\n", b.cached); + prt_printf(out, "reserved:\t\t%llu\n", b.reserved); + prt_printf(out, "nr_inodes:\t%llu\n", b.nr_inodes); +} + SHOW(bch2_fs) { struct bch_fs *c =3D container_of(kobj, struct bch_fs, kobj); @@ -417,6 +432,9 @@ SHOW(bch2_fs) if (attr =3D=3D &sysfs_accounting) bch2_fs_accounting_to_text(out, c); =20 + if (attr =3D=3D &sysfs_usage_base) + bch2_fs_usage_base_to_text(out, c); + return 0; } =20 @@ -630,6 +648,7 @@ struct attribute *bch2_fs_internal_files[] =3D { =20 &sysfs_disk_groups, &sysfs_accounting, + &sysfs_usage_base, NULL }; =20 --=20 2.43.0