From nobody Mon Feb 9 09:34:23 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=suse.com ARC-Seal: i=1; a=rsa-sha256; t=1688973042; cv=none; d=zohomail.com; s=zohoarc; b=AKI+VdQG1nAWxqmEtVOBhdyIjlTOzXDBtJmZKsDJ04HzBP6/P0Et1fcxl1gH9Pt00spVggtswecZeBCG52XFLms4opIsjezaMz5VcniDU4TZGKV/xTIpYzoXEtMoxsvU/1eXOPmJxXi6IvIvcPwHdxMf8QDnufoxYzYm1zTFWXo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1688973042; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=L9wkqTCyOjGjiu/WtVNnaatoSAz4S33Tyi60lW8cADE=; b=ZIR9WrVsrH7whUZIPkOacuHmU1h50oWQICzi/jwGjQBa5nQtZhOXiQS3PYupA/fXUs0IvtPPo8LWo4I7p/eZ+YRB+zF/m1wmspu3E0jUWWhEnxZfoBgjCFu2YwZd/TKq5KFKxli6E0WMA4sWqAEqDoqeeveEWZ5hOgzRkWSVJJc= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1688973042152587.9034573016899; Mon, 10 Jul 2023 00:10:42 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.561004.877286 (Exim 4.92) (envelope-from ) id 1qIl23-0004Hs-OO; Mon, 10 Jul 2023 07:10:03 +0000 Received: by outflank-mailman (output) from mailman id 561004.877286; Mon, 10 Jul 2023 07:10:03 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1qIl23-0004Gb-G0; Mon, 10 Jul 2023 07:10:03 +0000 Received: by outflank-mailman (input) for mailman id 561004; Mon, 10 Jul 2023 07:10:02 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1qIktB-0002xd-Un for xen-devel@lists.xenproject.org; Mon, 10 Jul 2023 07:00:53 +0000 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 8235cea3-1eef-11ee-b239-6b7b168915f2; Mon, 10 Jul 2023 09:00:52 +0200 (CEST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 7E48421E91; Mon, 10 Jul 2023 07:00:52 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4CA0B1361C; Mon, 10 Jul 2023 07:00:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id pTM4EaSsq2QgYwAAMHmgww (envelope-from ); Mon, 10 Jul 2023 07:00:52 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 8235cea3-1eef-11ee-b239-6b7b168915f2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688972452; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L9wkqTCyOjGjiu/WtVNnaatoSAz4S33Tyi60lW8cADE=; b=vXZvqyklLjIhkBkbKZ5EF4zNFYmCqZpjYoeGcJ14N7NFil/lzVSLDOmKAlUlFo30zU77Ah IESp0RORqr+USuB9SMnROHq1uwRCAZwUg8PlRa4FHh9KkJJMQCM/6q+FwPL18RIlvfWXgD Um3wQAYfNoztBzclilwiFVgOokMWIL8= From: Juergen Gross To: xen-devel@lists.xenproject.org Cc: Juergen Gross , Wei Liu , Julien Grall , Anthony PERARD Subject: [PATCH v2 11/18] tools/xenstore: drop use of tdb Date: Mon, 10 Jul 2023 08:59:40 +0200 Message-Id: <20230710065947.4201-12-jgross@suse.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230710065947.4201-1-jgross@suse.com> References: <20230710065947.4201-1-jgross@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @suse.com) X-ZM-MESSAGEID: 1688973042447100001 Content-Type: text/plain; charset="utf-8" Today all Xenstore nodes are stored in a TDB data base. This data base has several disadvantages: - It is using a fixed sized hash table, resulting in high memory overhead for small installations with only very few VMs, and a rather large performance hit for systems with lots of VMs due to many collisions. The hash table size today is 7919 entries. This means that e.g. in case of a simple desktop use case with 2 or 3 VMs probably far less than 10% of the entries will be used (assuming roughly 100 nodes per VM). OTOH a setup on a large server with 500 VMs would result in heavy conflicts in the hash list with 5-10 nodes per hash table entry. - TDB is using a single large memory area for storing the nodes. It only ever increases this area and will never shrink it afterwards. This will result in more memory usage than necessary after a peak of Xenstore usage. - Xenstore is only single-threaded, while TDB is designed to be fit for multi-threaded use cases, resulting in much higher code complexity than needed. - Special use cases of Xenstore are not possible to implement with TDB in an effective way, while an implementation of a data base tailored for Xenstore could simplify some handling (e.g. transactions) a lot. So drop using TDB and store the nodes directly in memory making them easily accessible. Use a hash-based lookup mechanism for fast lookup of nodes by their full path. For now only replace TDB keeping the current access functions. Signed-off-by: Juergen Gross --- V2: - add const (Julien Grall) - use specific pointer type instead of void * (Julien Grall) - add comment to db_fetch() (Julien Grall) --- tools/xenstore/xenstored_core.c | 158 ++++++++++--------------- tools/xenstore/xenstored_core.h | 5 +- tools/xenstore/xenstored_transaction.c | 1 - 3 files changed, 64 insertions(+), 100 deletions(-) diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_cor= e.c index 70db5265ad..ff10659e65 100644 --- a/tools/xenstore/xenstored_core.c +++ b/tools/xenstore/xenstored_core.c @@ -53,7 +53,6 @@ #include "xenstored_domain.h" #include "xenstored_control.h" #include "xenstored_lu.h" -#include "tdb.h" =20 #ifndef NO_SOCKETS #if defined(HAVE_SYSTEMD) @@ -85,7 +84,7 @@ bool keep_orphans =3D false; static int reopen_log_pipe[2]; static int reopen_log_pipe0_pollfd_idx =3D -1; char *tracefile =3D NULL; -static TDB_CONTEXT *tdb_ctx =3D NULL; +static struct hashtable *nodes; unsigned int trace_flags =3D TRACE_OBJ | TRACE_IO; =20 static const char *sockmsg_string(enum xsd_sockmsg_type type); @@ -556,32 +555,32 @@ static void initialize_fds(int *p_sock_pollfd_idx, in= t *ptimeout) } } =20 -static void set_tdb_key(const char *name, TDB_DATA *key) -{ - /* - * Dropping const is fine here, as the key will never be modified - * by TDB. - */ - key->dptr =3D (char *)name; - key->dsize =3D strlen(name); -} - struct xs_tdb_record_hdr *db_fetch(const char *db_name, size_t *size) { - TDB_DATA key, data; + const struct xs_tdb_record_hdr *hdr; + struct xs_tdb_record_hdr *p; =20 - set_tdb_key(db_name, &key); - data =3D tdb_fetch(tdb_ctx, key); - if (!data.dptr) { - errno =3D (tdb_error(tdb_ctx) =3D=3D TDB_ERR_NOEXIST) ? ENOENT : EIO; - *size =3D 0; - } else { - *size =3D data.dsize; - trace_tdb("read %s size %zu\n", db_name, - *size + strlen(db_name)); + hdr =3D hashtable_search(nodes, db_name); + if (!hdr) { + errno =3D ENOENT; + return NULL; } =20 - return (struct xs_tdb_record_hdr *)data.dptr; + *size =3D sizeof(*hdr) + hdr->num_perms * sizeof(hdr->perms[0]) + + hdr->datalen + hdr->childlen; + + p =3D talloc_size(NULL, *size); + if (!p) { + errno =3D ENOMEM; + return NULL; + } + + trace_tdb("read %s size %zu\n", db_name, *size + strlen(db_name)); + + /* Return a copy, avoiding a potential modification in the DB. */ + memcpy(p, hdr, *size); + + return p; } =20 static void get_acc_data(const char *name, struct node_account_data *acc) @@ -621,12 +620,10 @@ int db_write(struct connection *conn, const char *db_= name, void *data, struct xs_tdb_record_hdr *hdr =3D data; struct node_account_data old_acc =3D {}; unsigned int old_domid, new_domid; + size_t name_len =3D strlen(db_name); + const char *name; int ret; - TDB_DATA key, dat; =20 - set_tdb_key(db_name, &key); - dat.dptr =3D data; - dat.dsize =3D size; if (!acc) old_acc.memory =3D -1; else @@ -642,29 +639,36 @@ int db_write(struct connection *conn, const char *db_= name, void *data, */ if (old_acc.memory) domain_memory_add_nochk(conn, old_domid, - -old_acc.memory - key.dsize); - ret =3D domain_memory_add(conn, new_domid, size + key.dsize, + -old_acc.memory - name_len); + ret =3D domain_memory_add(conn, new_domid, size + name_len, no_quota_check); if (ret) { /* Error path, so no quota check. */ if (old_acc.memory) domain_memory_add_nochk(conn, old_domid, - old_acc.memory + key.dsize); + old_acc.memory + name_len); return ret; } =20 - /* TDB should set errno, but doesn't even set ecode AFAICT. */ - if (tdb_store(tdb_ctx, key, dat, - (mode =3D=3D NODE_CREATE) ? TDB_INSERT : TDB_MODIFY) !=3D 0) { - domain_memory_add_nochk(conn, new_domid, -size - key.dsize); + if (mode =3D=3D NODE_CREATE) { + /* db_name could be modified later, so allocate a copy. */ + name =3D talloc_strdup(data, db_name); + ret =3D name ? hashtable_add(nodes, name, data) : ENOMEM; + } else + ret =3D hashtable_replace(nodes, db_name, data); + + if (ret) { + /* Free data, as it isn't owned by hashtable now. */ + talloc_free(data); + domain_memory_add_nochk(conn, new_domid, -size - name_len); /* Error path, so no quota check. */ if (old_acc.memory) domain_memory_add_nochk(conn, old_domid, - old_acc.memory + key.dsize); - errno =3D EIO; + old_acc.memory + name_len); + errno =3D ret; return errno; } - trace_tdb("store %s size %zu\n", db_name, size + key.dsize); + trace_tdb("store %s size %zu\n", db_name, size + name_len); =20 if (acc) { /* Don't use new_domid, as it might be a transaction node. */ @@ -680,9 +684,6 @@ int db_delete(struct connection *conn, const char *name, { struct node_account_data tmp_acc; unsigned int domid; - TDB_DATA key; - - set_tdb_key(name, &key); =20 if (!acc) { acc =3D &tmp_acc; @@ -691,15 +692,13 @@ int db_delete(struct connection *conn, const char *na= me, =20 get_acc_data(name, acc); =20 - if (tdb_delete(tdb_ctx, key)) { - errno =3D EIO; - return errno; - } + hashtable_remove(nodes, name); trace_tdb("delete %s\n", name); =20 if (acc->memory) { domid =3D get_acc_domid(conn, name, acc->domid); - domain_memory_add_nochk(conn, domid, -acc->memory - key.dsize); + domain_memory_add_nochk(conn, domid, + -acc->memory - strlen(name)); } =20 return 0; @@ -2349,43 +2348,29 @@ static void manual_node(const char *name, const cha= r *child) talloc_free(node); } =20 -static void tdb_logger(TDB_CONTEXT *tdb, int level, const char * fmt, ...) +static unsigned int hash_from_key_fn(const void *k) { - va_list ap; - char *s; - int saved_errno =3D errno; + const char *str =3D k; + unsigned int hash =3D 5381; + char c; =20 - va_start(ap, fmt); - s =3D talloc_vasprintf(NULL, fmt, ap); - va_end(ap); + while ((c =3D *str++)) + hash =3D ((hash << 5) + hash) + (unsigned int)c; =20 - if (s) { - trace("TDB: %s\n", s); - syslog(LOG_ERR, "TDB: %s", s); - if (verbose) - xprintf("TDB: %s", s); - talloc_free(s); - } else { - trace("talloc failure during logging\n"); - syslog(LOG_ERR, "talloc failure during logging\n"); - } + return hash; +} =20 - errno =3D saved_errno; +static int keys_equal_fn(const void *key1, const void *key2) +{ + return 0 =3D=3D strcmp(key1, key2); } =20 void setup_structure(bool live_update) { - char *tdbname; - - tdbname =3D talloc_strdup(talloc_autofree_context(), "/dev/mem"); - if (!tdbname) - barf_perror("Could not create tdbname"); - - tdb_ctx =3D tdb_open_ex(tdbname, 7919, TDB_INTERNAL | TDB_NOLOCK, - O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC, - 0640, &tdb_logger, NULL); - if (!tdb_ctx) - barf_perror("Could not create tdb file %s", tdbname); + nodes =3D create_hashtable(NULL, "nodes", hash_from_key_fn, keys_equal_fn, + HASHTABLE_FREE_KEY | HASHTABLE_FREE_VALUE); + if (!nodes) + barf_perror("Could not create nodes hashtable"); =20 if (live_update) manual_node("/", NULL); @@ -2399,24 +2384,6 @@ void setup_structure(bool live_update) } } =20 -static unsigned int hash_from_key_fn(const void *k) -{ - const char *str =3D k; - unsigned int hash =3D 5381; - char c; - - while ((c =3D *str++)) - hash =3D ((hash << 5) + hash) + (unsigned int)c; - - return hash; -} - - -static int keys_equal_fn(const void *key1, const void *key2) -{ - return 0 =3D=3D strcmp(key1, key2); -} - int remember_string(struct hashtable *hash, const char *str) { char *k =3D talloc_strdup(NULL, str); @@ -2476,12 +2443,11 @@ static int check_store_enoent(const void *ctx, stru= ct connection *conn, /** * Helper to clean_store below. */ -static int clean_store_(TDB_CONTEXT *tdb, TDB_DATA key, TDB_DATA val, - void *private) +static int clean_store_(const void *key, void *val, void *private) { struct hashtable *reachable =3D private; char *slash; - char * name =3D talloc_strndup(NULL, key.dptr, key.dsize); + char *name =3D talloc_strdup(NULL, key); =20 if (!name) { log("clean_store: ENOMEM"); @@ -2511,7 +2477,7 @@ static int clean_store_(TDB_CONTEXT *tdb, TDB_DATA ke= y, TDB_DATA val, */ static void clean_store(struct check_store_data *data) { - tdb_traverse(tdb_ctx, &clean_store_, data->reachable); + hashtable_iterate(nodes, clean_store_, data->reachable); domain_check_acc(data->domains); } =20 diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_cor= e.h index 590d1b0a1f..5d96799c60 100644 --- a/tools/xenstore/xenstored_core.h +++ b/tools/xenstore/xenstored_core.h @@ -33,7 +33,6 @@ #include "xenstore_lib.h" #include "xenstore_state.h" #include "list.h" -#include "tdb.h" #include "hashtable.h" =20 #ifndef O_CLOEXEC @@ -236,7 +235,7 @@ static inline unsigned int get_node_owner(const struct = node *node) return node->perms.p[0].id; } =20 -/* Write a node to the tdb data base. */ +/* Write a node to the data base. */ enum write_node_mode { NODE_CREATE, NODE_MODIFY @@ -246,7 +245,7 @@ int write_node_raw(struct connection *conn, const char = *db_name, struct node *node, enum write_node_mode mode, bool no_quota_check); =20 -/* Get a node from the tdb data base. */ +/* Get a node from the data base. */ struct node *read_node(struct connection *conn, const void *ctx, const char *name); =20 diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xensto= red_transaction.c index 7d8c0afc6a..420645e55d 100644 --- a/tools/xenstore/xenstored_transaction.c +++ b/tools/xenstore/xenstored_transaction.c @@ -396,7 +396,6 @@ static int finalize_transaction(struct connection *conn, ? NODE_CREATE : NODE_MODIFY; *is_corrupt |=3D db_write(conn, i->node, hdr, size, NULL, flag, true); - talloc_free(hdr); if (db_delete(conn, i->trans_name, NULL)) *is_corrupt =3D true; } else { --=20 2.35.3