From nobody Wed Sep 10 08:38:04 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=cloud.com ARC-Seal: i=1; a=rsa-sha256; t=1757261759; cv=none; d=zohomail.com; s=zohoarc; b=h2QSSsQ0WwXMCns3PgUG5d1nCuA6eiwRdZpGbtoe5swq1F7ik3BHU5uX3xvzOxD9cJSnZ8Apf+QiOBWFgEn2iUlJDNSP07282AK0dF23NPqvHDDjXm8YfygQ+HwRcJ7HCBUhlBZ+2nXhDZjMIkc8UbzZm0uvhDtRDS5pjOvKNQ4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1757261759; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=+wb0w+Ib0HchtKtOoJZ/15uDzicml5JsFwuf7VKWvmc=; b=OWK8aETp7qIfVuCvZzBwkFSsGIr84Tad+igOvWCA+XrP83dyiODZzt76OxDsyMPGDPLzLVyQNS1FN2Lrpa3YNx+Lx2+goQv/yGiZ8seV2p1fsqht8z5f9LxoNzc+voePQ+3DiZceafDvoOaajv6q5ybiMXXUlkZUS0dEerr8LnQ= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1757261759117930.5467089137037; Sun, 7 Sep 2025 09:15:59 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1114124.1461293 (Exim 4.92) (envelope-from ) id 1uvI3B-0000Ds-4H; Sun, 07 Sep 2025 16:15:33 +0000 Received: by outflank-mailman (output) from mailman id 1114124.1461293; Sun, 07 Sep 2025 16:15:33 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uvI3B-0000Dl-1T; Sun, 07 Sep 2025 16:15:33 +0000 Received: by outflank-mailman (input) for mailman id 1114124; Sun, 07 Sep 2025 16:15:31 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uvI39-0008Bo-8a for xen-devel@lists.xenproject.org; Sun, 07 Sep 2025 16:15:31 +0000 Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com [2a00:1450:4864:20::635]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id dfdd860c-8c05-11f0-9d13-b5c5bf9af7f9; Sun, 07 Sep 2025 18:15:30 +0200 (CEST) Received: by mail-ej1-x635.google.com with SMTP id a640c23a62f3a-b04163fe08dso630038666b.3 for ; Sun, 07 Sep 2025 09:15:30 -0700 (PDT) Received: from MinisforumBD795m.phoenix-carat.ts.net ([2a02:1748:f7df:8cb1:5474:d7c3:6edd:e683]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b047b61cf00sm908263766b.15.2025.09.07.09.15.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 07 Sep 2025 09:15:29 -0700 (PDT) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: dfdd860c-8c05-11f0-9d13-b5c5bf9af7f9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloud.com; s=cloud; t=1757261730; x=1757866530; darn=lists.xenproject.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+wb0w+Ib0HchtKtOoJZ/15uDzicml5JsFwuf7VKWvmc=; b=QblKeUDN3D+i0ATssUC5GbNyu+48pXHKeNp1FCY7cx6QBOVWPw29S9WsAMAeEqGJM/ qrj+VbOeLBC/c2rshq/EreJVUQZIE3tNRHqib3hFVK3zpf2dh91YTcjGTusorN3FjYY0 D3HtTIxV/xagi95chCOY74jJJJXM8Ji+92pZQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757261730; x=1757866530; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+wb0w+Ib0HchtKtOoJZ/15uDzicml5JsFwuf7VKWvmc=; b=s1AYt+ePzy5a2iMHXfIb1RdmTs0hStbZtH5fhT8Qs5+DHEG0maRZHrfRJpPtPF/qpA /WFdcq2oYHzXTeAIFhgdJ/iBf6Fyrzr4JlyapqMSRDbMo9XUFt6dPDQPD/wbj8ZX+1LO h5X/LefUNCUI0kShFtHU6W4g8zMbXP2df9VDMO2Wc9Kiu+ksTCZtR/IrsuK8tEhiHvst E1PA6C2i+ZSjh3Fcx8aZ4r5cPnQ82SifSkrdQYyp9LvLxxPRGU7BSZnxiQUg/GjjDP6d ybvCScy4IcrSpYc7TYHfeKSRuqipzEh0X06tqBqPEFDSAeX83xHdcnv8PUozKMKljry4 8MZA== X-Gm-Message-State: AOJu0YwY8Es4+rAdbvF23sv+BN5w+6sJvwYjjUr0JNkRX1ptcUIbaDE1 acmOqyCTMG7fW5EvdOV/13qzF5NYSSCF+vkxhPg6CaP0AFLyho91GdTDzGYpCFb2mAuH01Bi6gV HTnsCawE= X-Gm-Gg: ASbGncs7pEa1VzyPnKtjuHR6YvGMLvcMxYQQCSDELaOwAL8nfJqbwmFCcprr08B9mMn imzoccsYKyYN8aFJoGt1gtGvTutAimPLsK72wgA1QUhwOJKkzsoe69lTtmOm2VyUm/7EpR0F9eN d51Lw7AQoRZmVQ5Xv1y7RbipXMs7GhlUDs+zB7oyGv7Y2WZCfSv5DMamf3VKg+Ng+avTO21Kd/0 yDEYcQr6hoAj0udbucsrj3mHJiJGauWa9qbXM+9yIP+/Nzt9XcMlQg4mv/zsL5DsiX6fyh/Hrnd WeHY9NEqoEofyGdChGGWyC9YmoLCDXV7UuFXTcxLf1bZyVStrXt9X0M+xDZs1iWOXEIlPzFUKm8 mG3lv5s0f5TW/qFEcYHKl6sI3NFezQ1U6fDIxHcz3sT0rATsProvO9q75JhF0oDmJqC0= X-Google-Smtp-Source: AGHT+IGJpDSkkIcfCORJaJVTeHzgRf+E7BrBIiM6Xe5wOoPEEV56CMMZIVXBq/xg2FvjtqZ9Wxu4/w== X-Received: by 2002:a17:906:d54a:b0:b04:827c:9139 with SMTP id a640c23a62f3a-b04b16e4f4dmr432603466b.65.1757261729960; Sun, 07 Sep 2025 09:15:29 -0700 (PDT) From: Bernhard Kaindl To: xen-devel@lists.xenproject.org Cc: Alejandro Vallejo , Bernhard Kaindl , Andrew Cooper , Anthony PERARD , Michal Orzel , Jan Beulich , Julien Grall , =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= , Stefano Stabellini Subject: [PATCH v3 4/7] xen/page_alloc: Add staking a NUMA node claim for a domain Date: Sun, 7 Sep 2025 18:15:19 +0200 Message-ID: X-Mailer: git-send-email 2.43.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @cloud.com) X-ZM-MESSAGEID: 1757261760590116601 Content-Type: text/plain; charset="utf-8" Update domain_set_outstanding_pages() to domain_claim_pages() for staking claims for domains on NUMA nodes: domain_claim_pages() is a handler for claiming pages, where its former name suggested that it just sets the domain's outstanding claims. Actually, three different code locations do perform just this task: Fix this using a helper to avoid repeating yourself (an anti-pattern) for just only updating the domain's outstanding pages is added as well: It removes the need to repeat the same sequence of operations at three diffent places and helps to have a single location for adding multi-node claims. It also makes the code much shorter and easier to follow. Fix the meaning of the claims argument of domain_claim_pages() for NUMA-node claims: - For NUMA-node claims, we need to claim defined amounts of memory on different NUMA nodes. Previously, the argument was a "reservation" and the claim was made on the difference between d->tot_pages and the reservations. Of course, the argument needed to be > d->tot_pages. This interacs badly with NUMA claims: NUMA node claims are not related to potentially already allocated memory and reducing the claim by already allocated memory would not work in case d->tot_pages already has some amount of pages. - Fix this by simply claiming the given amount of pages. - Update the legacy caller of domain_claim_pages() accordingly by moving the reduction of the claim by d->tot_pages to it: No change for the users of the legacy hypercall, and a usable interface for staking NUMA claims. Signed-off-by: Bernhard Kaindl Signed-off-by: Alejandro Vallejo --- Changes in v3: - Renamed domain_set_outstanding_pages() and add check from review. - Reorganized v3, v4 and v5 as per review to avoid non-functional changes: - Combined patch v2#2 with v2#5 into a consolidated patch. - Moved the unrelated changes for domain_adjust_tot_pages() to #5. --- xen/common/domain.c | 2 +- xen/common/memory.c | 15 ++++++- xen/common/page_alloc.c | 93 ++++++++++++++++++++++++++++------------- xen/include/xen/mm.h | 3 +- xen/include/xen/sched.h | 1 + 5 files changed, 81 insertions(+), 33 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index 775c339285..6ee9f23b10 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -1247,7 +1247,7 @@ int domain_kill(struct domain *d) rspin_barrier(&d->domain_lock); argo_destroy(d); vnuma_destroy(d->vnuma); - domain_set_outstanding_pages(d, 0); + domain_claim_pages(d, NUMA_NO_NODE, 0); /* fallthrough */ case DOMDYING_dying: rc =3D domain_teardown(d); diff --git a/xen/common/memory.c b/xen/common/memory.c index 3688e6dd50..3371edec11 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -1682,7 +1682,20 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDL= E_PARAM(void) arg) rc =3D xsm_claim_pages(XSM_PRIV, d); =20 if ( !rc ) - rc =3D domain_set_outstanding_pages(d, reservation.nr_extents); + { + unsigned long new_claim =3D reservation.nr_extents; + + /* + * For backwards compatibility, keep the meaning of nr_extents: + * it is the target number of pages for the domain. + * In case memory for the domain was allocated before, we must + * substract the already allocated pages from the reservation. + */ + if ( new_claim ) + new_claim -=3D domain_tot_pages(d); + + rc =3D domain_claim_pages(d, NUMA_NO_NODE, new_claim); + } =20 rcu_unlock_domain(d); =20 diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index b8acb500da..bbb34994b7 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -492,6 +492,30 @@ DEFINE_PER_NODE(unsigned long, avail_pages); =20 static DEFINE_SPINLOCK(heap_lock); static long outstanding_claims; /* total outstanding claims by all domains= */ +DECLARE_PER_NODE(long, outstanding_claims); +DEFINE_PER_NODE(long, outstanding_claims); + +#define domain_has_node_claim(d) (d->claim_node !=3D NUMA_NO_NODE) + +static inline bool insufficient_memory(unsigned long request, nodeid_t nod= e) +{ + return per_node(avail_pages, node) - + per_node(outstanding_claims, node) < request; +} + +/* + * Adjust the claim of a domain host-wide and if set, for the claimed node + * + * All callers already hold d->page_alloc_lock and the heap_lock. + */ +static inline void domain_adjust_outstanding_claim(struct domain *d, long = pages) +{ + outstanding_claims +=3D pages; /* Update the host-wide-outstanding c= laims */ + d->outstanding_pages +=3D pages; /* Update the domain's outstanding cl= aims */ + + if ( domain_has_node_claim(d) ) /* Update the claims of that node */ + per_node(outstanding_claims, d->claim_node) +=3D pages; +} =20 static unsigned long avail_heap_pages( unsigned int zone_lo, unsigned int zone_hi, unsigned int node) @@ -529,7 +553,7 @@ unsigned long domain_adjust_tot_pages(struct domain *d,= long pages) /* * can test d->outstanding_pages race-free because it can only change * if d->page_alloc_lock and heap_lock are both held, see also - * domain_set_outstanding_pages below + * domain_claim_pages below * * If the domain has no outstanding claims (or we freed pages instead), * we don't update outstanding claims and skip the claims adjustment. @@ -544,18 +568,37 @@ unsigned long domain_adjust_tot_pages(struct domain *= d, long pages) * If allocated > outstanding, reduce the claims only by outstanding p= ages. */ adjustment =3D min(d->outstanding_pages + 0UL, pages + 0UL); - d->outstanding_pages -=3D adjustment; - outstanding_claims -=3D adjustment; + + domain_adjust_outstanding_claim(d, -adjustment); spin_unlock(&heap_lock); =20 out: return d->tot_pages; } =20 -int domain_set_outstanding_pages(struct domain *d, unsigned long pages) +/* + * Stake claim for memory for future allocations of a domain. + * + * The claim is an abstract stake on future memory allocations, + * no actual memory is allocated at this point. Instead, it guarantees + * that future allocations up to the claim's size will succeed. + * + * If node =3D=3D NUMA_NO_NODE, the claim is host-wide. + * Otherwise, it is local to the specific NUMA node defined by d->claim_no= de. + * + * It should normally only ever be before allocating the memory of the dom= ain. + * When libxenguest code has finished populating the memory of the domain,= it + * cleans up any remaining by passing of 0 to release any outstanding clai= ms. + * + * Returns 0 on success, -EINVAL if the request is invalid, + * or -ENOMEM if the claim cannot be satisfied in available memory. + */ +int domain_claim_pages(struct domain *d, nodeid_t node, unsigned long clai= m) { - int ret =3D -ENOMEM; - unsigned long claim, avail_pages; + int ret =3D -EINVAL; + + if ( node !=3D NUMA_NO_NODE && !node_online(node) ) + goto out; /* passed node is not valid */ =20 /* * take the domain's page_alloc_lock, else all d->tot_page adjustments @@ -565,45 +608,35 @@ int domain_set_outstanding_pages(struct domain *d, un= signed long pages) nrspin_lock(&d->page_alloc_lock); spin_lock(&heap_lock); =20 - /* pages=3D=3D0 means "unset" the claim. */ - if ( pages =3D=3D 0 ) + /* claim=3D=3D0 means "unset" the claim. */ + if ( claim =3D=3D 0 ) { - outstanding_claims -=3D d->outstanding_pages; - d->outstanding_pages =3D 0; + domain_adjust_outstanding_claim(d, -d->outstanding_pages); ret =3D 0; goto out; } =20 /* only one active claim per domain please */ if ( d->outstanding_pages ) - { - ret =3D -EINVAL; goto out; - } =20 - /* disallow a claim not exceeding domain_tot_pages() or above max_page= s */ - if ( (pages <=3D domain_tot_pages(d)) || (pages > d->max_pages) ) - { - ret =3D -EINVAL; + /* If we allocated for the domain already, the claim is on top of that= . */ + if ( (domain_tot_pages(d) + claim) > d->max_pages ) goto out; - } =20 - /* how much memory is available? */ - avail_pages =3D total_avail_pages; - - avail_pages -=3D outstanding_claims; + ret =3D -ENOMEM; + /* Check if the host-wide available memory is sufficent for this claim= */ + if ( claim > total_avail_pages - outstanding_claims ) + goto out; =20 - /* - * Note, if domain has already allocated memory before making a claim - * then the claim must take domain_tot_pages() into account - */ - claim =3D pages - domain_tot_pages(d); - if ( claim > avail_pages ) + /* Check if the node's available memory is insufficient for this claim= */ + if ( node !=3D NUMA_NO_NODE && insufficient_memory(node, claim) ) goto out; =20 /* yay, claim fits in available memory, stake the claim, success! */ - d->outstanding_pages =3D claim; - outstanding_claims +=3D d->outstanding_pages; + d->claim_node =3D node; + domain_adjust_outstanding_claim(d, claim); + ret =3D 0; =20 out: diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h index b968f47b87..52c12c5783 100644 --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -65,6 +65,7 @@ #include #include #include +#include #include #include #include @@ -131,7 +132,7 @@ int populate_pt_range(unsigned long virt, unsigned long= nr_mfns); /* Claim handling */ unsigned long __must_check domain_adjust_tot_pages(struct domain *d, long pages); -int domain_set_outstanding_pages(struct domain *d, unsigned long pages); +int domain_claim_pages(struct domain *d, nodeid_t node, unsigned long page= s); void get_outstanding_claims(uint64_t *free_pages, uint64_t *outstanding_pa= ges); =20 /* Domain suballocator. These functions are *not* interrupt-safe.*/ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 02bdc256ce..9b91261f20 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -405,6 +405,7 @@ struct domain unsigned int outstanding_pages; /* pages claimed but not possessed= */ unsigned int max_pages; /* maximum value for domain_tot_pa= ges() */ unsigned int extra_pages; /* pages not included in domain_to= t_pages() */ + nodeid_t claim_node; /* NUMA_NO_NODE for host-wide clai= ms */ =20 #ifdef CONFIG_MEM_SHARING atomic_t shr_pages; /* shared pages */ --=20 2.43.0