From nobody Mon Feb 9 22:57:20 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1749743907; cv=none; d=zohomail.com; s=zohoarc; b=VNzU9eOZ+dYkkTyWxdXDo0ajZd15u4lsBOwV/uZvlu5r/2wGhfC8b0hYEosGDHqGS7OSBNtf8gVfCdXSncnPi7rxEOsX/cgQ/E9zEam4bDkSP7AYPGr8yA7CyJVAXWhNTSpJ6X3fA6CZ3XHgyF/b41JkZVGqvROQuQUszVwYnAY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1749743907; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:Reply-To:References:Sender:Subject:Subject:To:To:Message-Id; bh=haTeecOOVHBXVYn2smSl6hzHISA/crP9RcvVIlJMJ8U=; b=PATtuy3QN6JaqxhsRx0X4dzBRMDsCdBaaQQTnbJD9JnfxVJVB88Y14TWPWRCgAIh9dBXzivh2CTt3Xq1MedbDgrlBfrjhsKXPpfE/6usGHKQzE8r4YN5z4FtpTUpQd3j+aK2qbQ2raXL9kKoh/sRpbsh010Rdqobu66PORwX3jI= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1749743907728779.7452797476088; Thu, 12 Jun 2025 08:58:27 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uPkJb-0003iv-8B; Thu, 12 Jun 2025 11:58:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPkJV-0003f6-7I for qemu-devel@nongnu.org; Thu, 12 Jun 2025 11:58:03 -0400 Received: from [185.176.79.56] (helo=frasgout.his.huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPkJS-0003KU-S0 for qemu-devel@nongnu.org; Thu, 12 Jun 2025 11:58:00 -0400 Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4bJ6cC28sTz6M4sT; Thu, 12 Jun 2025 23:57:31 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 475E41402F1; Thu, 12 Jun 2025 23:57:57 +0800 (CST) Received: from SecurePC-101-06.china.huawei.com (10.122.19.247) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 12 Jun 2025 17:57:56 +0200 To: Pierrick Bouvier , , , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Alexandre Iooss , Mahmoud Mandour , Bowman Terry CC: , , , , , , , , , , Bharata B Rao Subject: [RFC PATCH v2 QEMU 1/4] hw/cxl: Switch to using an array for CXLRegisterLocator base addresses. Date: Thu, 12 Jun 2025 16:57:21 +0100 Message-ID: <20250612155724.1887266-2-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250612155724.1887266-1-Jonathan.Cameron@huawei.com> References: <20250612155724.1887266-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.122.19.247] X-ClientProxiedBy: lhrpeml500012.china.huawei.com (7.191.174.4) To frapeml500008.china.huawei.com (7.182.85.71) X-Host-Lookup-Failed: Reverse DNS lookup failed for 185.176.79.56 (deferred) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.176.79.56; envelope-from=jonathan.cameron@huawei.com; helo=frasgout.his.huawei.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Jonathan Cameron From: Jonathan Cameron via Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1749743910143116600 Content-Type: text/plain; charset="utf-8" Allows for easier looping over entries when adding CHMU and CPMU instances. Signed-off-by: Jonathan Cameron --- CHMU RFC v2: New patch to simplify a few code paths. --- include/hw/cxl/cxl_pci.h | 17 ++++++++++------- hw/cxl/switch-mailbox-cci.c | 4 ++-- hw/mem/cxl_type3.c | 12 ++++++++---- hw/pci-bridge/cxl_downstream.c | 4 ++-- hw/pci-bridge/cxl_root_port.c | 4 ++-- hw/pci-bridge/cxl_upstream.c | 4 ++-- 6 files changed, 26 insertions(+), 19 deletions(-) diff --git a/include/hw/cxl/cxl_pci.h b/include/hw/cxl/cxl_pci.h index d0855ed78b..00a0335d55 100644 --- a/include/hw/cxl/cxl_pci.h +++ b/include/hw/cxl/cxl_pci.h @@ -161,6 +161,12 @@ typedef struct CXLDVSECPortFlexBus { } CXLDVSECPortFlexBus; QEMU_BUILD_BUG_ON(sizeof(CXLDVSECPortFlexBus) !=3D 0x20); =20 +/* Only applies to the type 3 device emulation */ +enum register_locator_indicies { + REG_LOC_IDX_COMPONENT, + REG_LOC_IDX_DEVICE, + NR_REG_LOC_IDX +}; /* * CXL r3.1 Section 8.1.9: Register Locator DVSEC * DVSEC ID: 8, Revision 0 @@ -168,14 +174,11 @@ QEMU_BUILD_BUG_ON(sizeof(CXLDVSECPortFlexBus) !=3D 0x= 20); typedef struct CXLDVSECRegisterLocator { DVSECHeader hdr; uint16_t rsvd; - uint32_t reg0_base_lo; - uint32_t reg0_base_hi; - uint32_t reg1_base_lo; - uint32_t reg1_base_hi; - uint32_t reg2_base_lo; - uint32_t reg2_base_hi; + struct { + uint32_t lo; + uint32_t hi; + } reg_base[NR_REG_LOC_IDX]; } CXLDVSECRegisterLocator; -QEMU_BUILD_BUG_ON(sizeof(CXLDVSECRegisterLocator) !=3D 0x24); =20 /* BAR Equivalence Indicator */ #define BEI_BAR_10H 0 diff --git a/hw/cxl/switch-mailbox-cci.c b/hw/cxl/switch-mailbox-cci.c index 223f220433..af91525445 100644 --- a/hw/cxl/switch-mailbox-cci.c +++ b/hw/cxl/switch-mailbox-cci.c @@ -50,8 +50,8 @@ static void cswbcci_realize(PCIDevice *pci_dev, Error **e= rrp) &cxl_dstate->device_registers); regloc_dvsec =3D &(CXLDVSECRegisterLocator) { .rsvd =3D 0, - .reg0_base_lo =3D RBI_CXL_DEVICE_REG | 0, - .reg0_base_hi =3D 0, + .reg_base[0].lo =3D RBI_CXL_DEVICE_REG | 0, + .reg_base[0].hi =3D 0, }; cxl_component_create_dvsec(cxl_cstate, CXL3_SWITCH_MAILBOX_CCI, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC, diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index ca9fe89e4f..dcefd41088 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -386,10 +386,14 @@ static void build_dvsecs(CXLType3Dev *ct3d) =20 dvsec =3D (uint8_t *)&(CXLDVSECRegisterLocator){ .rsvd =3D 0, - .reg0_base_lo =3D RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX, - .reg0_base_hi =3D 0, - .reg1_base_lo =3D RBI_CXL_DEVICE_REG | CXL_DEVICE_REG_BAR_IDX, - .reg1_base_hi =3D 0, + .reg_base[REG_LOC_IDX_COMPONENT] =3D { + .lo =3D RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX, + .hi =3D 0, + }, + .reg_base[REG_LOC_IDX_DEVICE] =3D { + .lo =3D RBI_CXL_DEVICE_REG | CXL_DEVICE_REG_BAR_IDX, + .hi =3D 0, + }, }; cxl_component_create_dvsec(cxl_cstate, CXL2_TYPE3_DEVICE, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC, diff --git a/hw/pci-bridge/cxl_downstream.c b/hw/pci-bridge/cxl_downstream.c index 1065245a8b..387cebbb98 100644 --- a/hw/pci-bridge/cxl_downstream.c +++ b/hw/pci-bridge/cxl_downstream.c @@ -126,8 +126,8 @@ static void build_dvsecs(CXLComponentState *cxl) =20 dvsec =3D (uint8_t *)&(CXLDVSECRegisterLocator){ .rsvd =3D 0, - .reg0_base_lo =3D RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX, - .reg0_base_hi =3D 0, + .reg_base[0].lo =3D RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX, + .reg_base[0].hi =3D 0, }; cxl_component_create_dvsec(cxl, CXL2_DOWNSTREAM_PORT, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC, diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c index e6a4035d26..d955f3bcc5 100644 --- a/hw/pci-bridge/cxl_root_port.c +++ b/hw/pci-bridge/cxl_root_port.c @@ -136,8 +136,8 @@ static void build_dvsecs(CXLComponentState *cxl) =20 dvsec =3D (uint8_t *)&(CXLDVSECRegisterLocator){ .rsvd =3D 0, - .reg0_base_lo =3D RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX, - .reg0_base_hi =3D 0, + .reg_base[0].lo =3D RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX, + .reg_base[0].hi =3D 0, }; cxl_component_create_dvsec(cxl, CXL2_ROOT_PORT, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC, diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c index 208e0c6172..28f7542814 100644 --- a/hw/pci-bridge/cxl_upstream.c +++ b/hw/pci-bridge/cxl_upstream.c @@ -129,8 +129,8 @@ static void build_dvsecs(CXLComponentState *cxl) =20 dvsec =3D (uint8_t *)&(CXLDVSECRegisterLocator){ .rsvd =3D 0, - .reg0_base_lo =3D RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX, - .reg0_base_hi =3D 0, + .reg_base[0].lo =3D RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX, + .reg_base[0].hi =3D 0, }; cxl_component_create_dvsec(cxl, CXL2_UPSTREAM_PORT, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC, --=20 2.48.1 From nobody Mon Feb 9 22:57:20 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1749743936; cv=none; d=zohomail.com; s=zohoarc; b=bhH1KPpAuLh3pwLDj/uVZxEIqfOG+zUlGckqW8kCDjUkja8Pu27Hv2FFQiuGwOVL5WFxuLrKJumyHh5aYqAQuYcwL7XP7UPNbGF+kVhR6rINg1ZX+qC0RApQGmiKuz+XUHWpzpUMyxHoltKEnAMTqsrl9+NtyZ7aXV5iWcZSkV4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1749743936; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:Reply-To:References:Sender:Subject:Subject:To:To:Message-Id; bh=HWuhVplPu3y2noykuidg0TJu/1Guaof/3ad9J3Wsu5U=; b=AUT5k6lodAUrQmxo3b3yB/k56HdDqUbmpiy45xjlJjBKob1CBcPouh9PR/Y+sUfHn4Rt1bZMnZI/9ogqRw6N7BRkYyaIw8OB/ghmsrLui9Z3md2jyXTEUmeY6vwYPuhN1nauZSYEKmKt7I1Ym0TaSrmQuXcW9iNTogI4p+jRGY8= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1749743936498579.9619410421977; Thu, 12 Jun 2025 08:58:56 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uPkK7-0004kL-RW; Thu, 12 Jun 2025 11:58:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPkK6-0004k1-5X for qemu-devel@nongnu.org; Thu, 12 Jun 2025 11:58:38 -0400 Received: from [185.176.79.56] (helo=frasgout.his.huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPkK2-0003MN-Ae for qemu-devel@nongnu.org; Thu, 12 Jun 2025 11:58:37 -0400 Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4bJ6XG4wpDz6K5xK; Thu, 12 Jun 2025 23:54:06 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 9C39D1400D3; Thu, 12 Jun 2025 23:58:28 +0800 (CST) Received: from SecurePC-101-06.china.huawei.com (10.122.19.247) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 12 Jun 2025 17:58:27 +0200 To: Pierrick Bouvier , , , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Alexandre Iooss , Mahmoud Mandour , Bowman Terry CC: , , , , , , , , , , Bharata B Rao Subject: [RFC PATCH v2 QEMU 2/4] hw/cxl: Add emulation of a CXL Hotness Monitoring Unit (CHMU) Date: Thu, 12 Jun 2025 16:57:22 +0100 Message-ID: <20250612155724.1887266-3-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250612155724.1887266-1-Jonathan.Cameron@huawei.com> References: <20250612155724.1887266-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.122.19.247] X-ClientProxiedBy: lhrpeml500012.china.huawei.com (7.191.174.4) To frapeml500008.china.huawei.com (7.182.85.71) X-Host-Lookup-Failed: Reverse DNS lookup failed for 185.176.79.56 (deferred) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.176.79.56; envelope-from=jonathan.cameron@huawei.com; helo=frasgout.his.huawei.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Jonathan Cameron From: Jonathan Cameron via Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1749743938722116600 Content-Type: text/plain; charset="utf-8" CXL r3.2 defines a CXL Hotness Monitoring Unit. This allows for a CXL device to do on device estimation of which 'granuals' of data are 'hot' - that is accessed a lot. For a typical application hot data on a CXL device both wastes potentially limited bandwidth and many have latency impacts. Access counts are therefore a measurable proxy on which to base memory placement decisions. Typical use cases include: 1 - Establishing which data to move to faster RAM in a tiered memory system. Discussions on how to do this in Linux are ongoing so likely use case 2 will happen first. 2 - Provide detailed data (at low overhead) on what memory in an application is hot, allowing for optimization of initial data placement on future runs fo the application. The focus of this emulation is providing a way to capture 'real' data in order to help us develop and tune the kernel stack. This emulated device will be fed with data from a QEMU plugin. That plugin is responsible for the actual tracking and counting part of hotness tracking. This device simply provides a timebase (epoch end point) along with configuration and data retrieval. The connection to the QEMU plugin providing the data is via a sockets. Supply the cxl-type3 device parameter chmu-port=3D4443 to specify the network port as 4443 and ensure the plugin is loaded (see later patch). Signed-off-by: Jonathan Cameron --- include/hw/cxl/cxl.h | 1 + include/hw/cxl/cxl_chmu.h | 187 +++++++++++++ include/hw/cxl/cxl_device.h | 24 +- include/hw/cxl/cxl_pci.h | 3 + hw/cxl/cxl-chmu.c | 516 ++++++++++++++++++++++++++++++++++++ hw/mem/cxl_type3.c | 103 ++++++- hw/cxl/meson.build | 1 + 7 files changed, 831 insertions(+), 4 deletions(-) diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h index de66ab8c35..12844d3418 100644 --- a/include/hw/cxl/cxl.h +++ b/include/hw/cxl/cxl.h @@ -16,6 +16,7 @@ #include "hw/pci/pci_host.h" #include "cxl_pci.h" #include "cxl_component.h" +#include "cxl_chmu.h" #include "cxl_device.h" =20 #define CXL_CACHE_LINE_SIZE 64 diff --git a/include/hw/cxl/cxl_chmu.h b/include/hw/cxl/cxl_chmu.h new file mode 100644 index 0000000000..2186e11a31 --- /dev/null +++ b/include/hw/cxl/cxl_chmu.h @@ -0,0 +1,187 @@ +/* + * SPDX-License-Identifier: GPL-2.0-or-later + * + * QEMU CXL Hotness Monitoring Unit + * + * Copyright (c) 2025 Huawei + */ + +#include "hw/register.h" + +#ifndef _CXL_CHMU_H_ +#define _CXL_CHMU_H_ + +/* Emulated parameters - arbitrary choices */ +#define CXL_CHMU_INSTANCES_PER_BLOCK 1 +#define CXL_HOTLIST_ENTRIES 1024 + +/* 1TB - should be enough for anyone, right? */ +#define CXL_MAX_DRAM_CAPACITY 0x10000000000UL + +/* Relative to per instance base address */ +#define CXL_CHMU_HL_START (0x70 + (CXL_MAX_DRAM_CAPACITY / (0x10000000UL *= 8))) +#define CXL_CHMU_INSTANCE_SIZE (CXL_CHMU_HL_START + CXL_HOTLIST_ENTRIES * = 8) +#define CXL_CHMU_SIZE \ + (0x10 + CXL_CHMU_INSTANCE_SIZE * CXL_CHMU_INSTANCES_PER_BLOCK) + +/* + * Many of these registers are documented as being a multiple of 64 bits l= ong. + * Reading then can only be done in 64 bit chunks though so specify them h= ere + * as multiple registers. + */ +REG64(CXL_CHMU_COMMON_CAP0, 0x0) + FIELD(CXL_CHMU_COMMON_CAP0, VERSION, 0, 4) + FIELD(CXL_CHMU_COMMON_CAP0, NUM_INSTANCES, 8, 8) +REG64(CXL_CHMU_COMMON_CAP1, 0x8) + FIELD(CXL_CHMU_COMMON_CAP1, INSTANCE_LENGTH, 0, 16) + +/* Per instance registers for instance 0 in CHMU main address space */ +REG64(CXL_CHMU0_CAP0, 0x10) + FIELD(CXL_CHMU0_CAP0, MSI_N, 0, 4) + FIELD(CXL_CHMU0_CAP0, OVERFLOW_INT, 4, 1) + FIELD(CXL_CHMU0_CAP0, LEVEL_INT, 5, 1) + FIELD(CXL_CHMU0_CAP0, EPOCH_TYPE, 6, 2) +#define CXL_CHMU0_CAP0_EPOCH_TYPE_GLOBAL 0 +#define CXL_CHMU0_CAP0_EPOCH_TYPE_PERCNT 1 + /* Break up the Tracked M2S Request field into flags */ + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_R, 8, 1) + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_W, 9, 1) + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_RW, 10, 1) + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_ALL_R, 11, 1) + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_ALL_W, 12, 1) + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_ALL_RW, 13, 1) + + FIELD(CXL_CHMU0_CAP0, MAX_EPOCH_LENGTH_SCALE, 16, 4) +#define CXL_CHMU_EPOCH_LENGTH_SCALE_100USEC 1 +#define CXL_CHMU_EPOCH_LENGTH_SCALE_1MSEC 2 +#define CXL_CHMU_EPOCH_LENGTH_SCALE_10MSEC 3 +#define CXL_CHMU_EPOCH_LENGTH_SCALE_100MSEC 4 +#define CXL_CHMU_EPOCH_LENGTH_SCALE_1SEC 5 + FIELD(CXL_CHMU0_CAP0, MAX_EPOCH_LENGTH_VAL, 20, 12) + FIELD(CXL_CHMU0_CAP0, MIN_EPOCH_LENGTH_SCALE, 32, 4) + FIELD(CXL_CHMU0_CAP0, MIN_EPOCH_LENGTH_VAL, 36, 12) + FIELD(CXL_CHMU0_CAP0, HOTLIST_SIZE, 48, 16) +REG64(CXL_CHMU0_CAP1, 0x18) + FIELD(CXL_CHMU0_CAP1, UNIT_SIZES, 0, 32) + FIELD(CXL_CHMU0_CAP1, DOWN_SAMPLING_FACTORS, 32, 16) + /* Split up Flags */ + FIELD(CXL_CHMU0_CAP1, FLAGS_EPOCH_BASED, 48, 1) + FIELD(CXL_CHMU0_CAP1, FLAGS_ALWAYS_ON, 49, 1) + FIELD(CXL_CHMU0_CAP1, FLAGS_RANDOMIZED_DOWN_SAMPLING, 50, 1) + FIELD(CXL_CHMU0_CAP1, FLAGS_OVERLAPPING_ADDRESS_RANGES, 51, 1) + FIELD(CXL_CHMU0_CAP1, FLAGS_INSERT_AFTER_CLEAR, 52, 1) +REG64(CXL_CHMU0_CAP2, 0x20) + FIELD(CXL_CHMU0_CAP2, BITMAP_REG_OFFSET, 0, 64) +REG64(CXL_CHMU0_CAP3, 0x28) + FIELD(CXL_CHMU0_CAP3, HOTLIST_REG_OFFSET, 0, 64) + +REG64(CXL_CHMU0_CONF0, 0x50) + FIELD(CXL_CHMU0_CONF0, M2S_REQ_TO_TRACK, 0, 8) + FIELD(CXL_CHMU0_CONF0, FLAGS_RANDOMIZE_DOWNSAMPLING, 8, 1) + FIELD(CXL_CHMU0_CONF0, FLAGS_INT_ON_OVERFLOW, 9, 1) + FIELD(CXL_CHMU0_CONF0, FLAGS_INT_ON_FILL_THRESH, 10, 1) + FIELD(CXL_CHMU0_CONF0, CONTROL_ENABLE, 16, 1) + FIELD(CXL_CHMU0_CONF0, CONTROL_RESET, 17, 1) + FIELD(CXL_CHMU0_CONF0, HOTNESS_THRESHOLD, 32, 32) +REG64(CXL_CHMU0_CONF1, 0x58) + FIELD(CXL_CHMU0_CONF1, UNIT_SIZE, 0, 32) + FIELD(CXL_CHMU0_CONF1, DOWN_SAMPLING_FACTOR, 32, 8) + FIELD(CXL_CHMU0_CONF1, REPORTING_MODE, 40, 8) + FIELD(CXL_CHMU0_CONF1, EPOCH_LENGTH_SCALE, 48, 4) + FIELD(CXL_CHMU0_CONF1, EPOCH_LENGTH_VAL, 52, 12) +REG64(CXL_CHMU0_CONF2, 0x60) + FIELD(CXL_CHMU0_CONF2, NOTIFICATION_THRESHOLD, 0, 16) + +REG64(CXL_CHMU0_STATUS, 0x70) + /* Break up status field into separate flags */ + FIELD(CXL_CHMU0_STATUS, STATUS_ENABLED, 0, 1) + FIELD(CXL_CHMU0_STATUS, OPERATION_IN_PROG, 16, 16) + FIELD(CXL_CHMU0_STATUS, COUNTER_WIDTH, 32, 8) + /* Break up oddly named overflow interrupt stats */ + FIELD(CXL_CHMU0_STATUS, OVERFLOW_INT, 40, 1) + FIELD(CXL_CHMU0_STATUS, LEVEL_INT, 41, 1) + +REG16(CXL_CHMU0_HEAD, 0x78) +REG16(CXL_CHMU0_TAIL, 0x7A) + +/* Provide first few of these so we can calculate the size */ +REG64(CXL_CHMU0_RANGE_CONFIG_BITMAP0, 0x80) +REG64(CXL_CHMU0_RANGE_CONFIG_BITMAP1, 0x88) + +REG64(CXL_CHMU0_HOTLIST0, CXL_CHMU_HL_START + 0x10) +REG64(CXL_CHMU0_HOTLIST1, CXL_CHMU_HL_START + 0x10) + +REG64(CXL_CHMU1_CAP0, 0x10 + CXL_CHMU_INSTANCE_SIZE) + +typedef struct CHMUState CHMUState; + +/* + * Each device may have multiple CHMUs (CHMUState) with each CHMU having + * multiple hotness tracker instances (CHMUInstance). + */ +typedef struct CHMUInstance { + /* The reference to the PCIDevice is needed for MSI */ + Object *private; + /* Number of counts in an epoch to be considered hot */ + uint32_t hotness_thresh; + /* Tracking unit in bytes of DPA space as power of 2 */ + uint32_t unit_size; + /* + * Ring buffer pointers + * - head is the offset in the ring of the oldest hot unit + * - tail is the offset in the ring of where the next hot unit will be + * saved. + * + * Ring empty if head =3D=3D tail. + * Ring full if (tail + 1) % length =3D=3D head + */ + uint16_t head, tail; + /* Ring buffer event threshold. Interrupt of first exceeding */ + uint16_t fill_thresh; + /* Down sampling factor */ + uint8_t ds_factor; + /* Type of request to track */ + uint8_t what; + + /* Interrupt controls and status */ + bool int_on_overflow; + bool int_on_fill_thresh; + bool overflow_set; + bool fill_thresh_set; + uint8_t msi_n; + + bool enabled; + uint64_t hotlist[CXL_HOTLIST_ENTRIES]; + QEMUTimer *timer; + uint32_t epoch_ms; + uint8_t epoch_scale; + uint16_t epoch_val; + /* Reference needed for timer */ + CHMUState *parent; +} CHMUInstance; + +typedef struct CHMUState { + CHMUInstance inst[CXL_CHMU_INSTANCES_PER_BLOCK]; + int socket; + /* Hack updated on first HDM decoder only */ + uint16_t port; + + /* + * Routing of accesses depends on interleave settings of the + * relevant memory range. That must be passed to the cache plugin. + */ + struct { + uint64_t base; + uint64_t size; + uint64_t dpa_base; + uint16_t interleave_gran; + uint8_t ways; + uint8_t way; + } decoder[CXL_HDM_DECODER_COUNT]; +} CHMUState; + +typedef struct cxl_device_state CXLDeviceState; +int cxl_chmu_register_block_init(Object *obj, CXLDeviceState *cxl_dstte, + int id, uint8_t msi_n, Error **errp); + +#endif /* _CXL_CHMU_H_ */ diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h index 9cc08da4cf..c4c092d77e 100644 --- a/include/hw/cxl/cxl_device.h +++ b/include/hw/cxl/cxl_device.h @@ -13,6 +13,7 @@ #include "hw/cxl/cxl_component.h" #include "hw/pci/pci_device.h" #include "hw/register.h" +#include "hw/cxl/cxl_chmu.h" #include "hw/cxl/cxl_events.h" =20 /* @@ -91,9 +92,21 @@ (CXL_MAILBOX_REGISTERS_OFFSET + CXL_MAILBOX_REGISTERS_LENGTH) #define CXL_MEMORY_DEVICE_REGISTERS_LENGTH 0x8 =20 +#define CXL_NUM_CHMU_INSTANCES 1 +#define CXL_CHMU_OFFSET(x) \ + QEMU_ALIGN_UP(CXL_MEMORY_DEVICE_REGISTERS_OFFSET + \ + CXL_MEMORY_DEVICE_REGISTERS_LENGTH + \ + (x) * QEMU_ALIGN_UP(CXL_CHMU_SIZE, 1 << 16), \ + 1 << 16) + #define CXL_MMIO_SIZE \ - (CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH + \ - CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH) + QEMU_ALIGN_UP(CXL_DEVICE_CAP_REG_SIZE + \ + CXL_DEVICE_STATUS_REGISTERS_LENGTH + \ + CXL_MAILBOX_REGISTERS_LENGTH + \ + CXL_MEMORY_DEVICE_REGISTERS_LENGTH + \ + CXL_NUM_CHMU_INSTANCES * \ + QEMU_ALIGN_UP(CXL_CHMU_SIZE, 1 << 16), \ + (1 << 16)) =20 /* CXL r3.1 Table 8-34: Command Return Codes */ typedef enum { @@ -236,6 +249,7 @@ typedef struct CXLCCI { =20 typedef struct cxl_device_state { MemoryRegion device_registers; + MemoryRegion chmu_registers[1]; =20 /* CXL r3.1 Section 8.2.8.3: Device Status Registers */ struct { @@ -285,6 +299,7 @@ typedef struct cxl_device_state { uint64_t vmem_size; =20 const struct cxl_cmd (*cxl_cmd_set)[256]; + CHMUState chmu[1]; CXLEventLog event_logs[CXL_EVENT_TYPE_MAX]; } CXLDeviceState; =20 @@ -698,6 +713,11 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_a= ddr, uint64_t *data, MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data, unsigned size, MemTxAttrs attrs); =20 +bool cxl_type3_get_hdm_interleave_props(CXLType3Dev *ct3d, int which, + uint64_t *hpa_base, uint16_t *gran= ual, + uint8_t *ways); +void cxl_type3_set_hdm_isp(CXLType3Dev *ctrd, int which, uint8_t isp); + uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds); =20 void cxl_event_init(CXLDeviceState *cxlds, int start_msg_num); diff --git a/include/hw/cxl/cxl_pci.h b/include/hw/cxl/cxl_pci.h index 00a0335d55..5af10e8ce0 100644 --- a/include/hw/cxl/cxl_pci.h +++ b/include/hw/cxl/cxl_pci.h @@ -165,6 +165,7 @@ QEMU_BUILD_BUG_ON(sizeof(CXLDVSECPortFlexBus) !=3D 0x20= ); enum register_locator_indicies { REG_LOC_IDX_COMPONENT, REG_LOC_IDX_DEVICE, + REG_LOC_IDX_CHMU0, NR_REG_LOC_IDX }; /* @@ -193,5 +194,7 @@ typedef struct CXLDVSECRegisterLocator { #define RBI_COMPONENT_REG (1 << 8) #define RBI_BAR_VIRT_ACL (2 << 8) #define RBI_CXL_DEVICE_REG (3 << 8) +#define RBI_CXL_CPMU_REG (4 << 8) +#define RBI_CXL_CHMU_REG (5 << 8) =20 #endif diff --git a/hw/cxl/cxl-chmu.c b/hw/cxl/cxl-chmu.c new file mode 100644 index 0000000000..2e50eff5f8 --- /dev/null +++ b/hw/cxl/cxl-chmu.c @@ -0,0 +1,516 @@ +/* + * SPDX-License-Identifier: GPL-2.0-or-later + * + * CXL Hotness Monitoring Unit + * + * Copyright(C) 2025 Huawei + * + * TODO: + * - Support bitmap of 256MiB ranges to track. + * - Downsampling + * - Multiple instances per block (CXL_CHMU_INSTANCES_PER_BLOCK > 1) + * - Read / Write only filtering + * - Cleanup error logging. + */ + +#include "qemu/osdep.h" +#include "qemu/log.h" +#include "qemu/guest-random.h" +#include "hw/cxl/cxl.h" +#include "hw/cxl/cxl_chmu.h" + +#include "hw/pci/msi.h" +#include "hw/pci/msix.h" + +#define CHMU_HOTLIST_LENGTH 1024 + +/* Must match enum in plugin */ +enum chmu_consumer_request { + QUERY_TAIL, + QUERY_HEAD, + SET_THRESHOLD, + SET_HEAD, + SET_HOTLIST_SIZE, + QUERY_HOTLIST_ENTRY, + SIGNAL_EPOCH_END, + SET_ENABLED, + SET_GRANUAL_SIZE, + SET_HPA_BASE, + SET_HPA_SIZE, + SET_DPA_BASE, + SET_INTERLEAVE_WAYS, + SET_INTERLEAVE_WAY, + SET_INTERLEAVE_GRAN, +}; + +static int chmu_send(CHMUState *chmu, uint64_t instance, + enum chmu_consumer_request command, + uint64_t param, uint64_t param2, uint64_t *response) +{ + uint64_t request[4] =3D { instance, command, param, param2 }; + uint64_t temp; + uint64_t *reply =3D response ?: &temp; + + send(chmu->socket, request, sizeof(request), 0); + if (recv(chmu->socket, reply, sizeof(*reply), 0) < sizeof(reply)) { + return -1; + } + return 0; +} + +static uint64_t chmu_read(void *opaque, hwaddr offset, unsigned size) +{ + const hwaddr chmu_stride =3D A_CXL_CHMU1_CAP0 - A_CXL_CHMU0_CAP0; + CHMUState *chmu =3D opaque; + CHMUInstance *chmui; + uint64_t val =3D 0; + int instance =3D 0; + int rc; + + if (offset >=3D A_CXL_CHMU0_CAP0) { + instance =3D (offset - A_CXL_CHMU0_CAP0) / chmu_stride; + /* + * Offset allows register defs for CHMU instance 0 to be used + * for all instances. Includes COMMON_CAP. + */ + offset -=3D chmu_stride * instance; + } + + if (instance >=3D CXL_CHMU_INSTANCES_PER_BLOCK) { + return 0; + } + + chmui =3D &chmu->inst[instance]; + switch (offset) { + case A_CXL_CHMU_COMMON_CAP0: + val =3D FIELD_DP64(val, CXL_CHMU_COMMON_CAP0, VERSION, 1); + val =3D FIELD_DP64(val, CXL_CHMU_COMMON_CAP0, NUM_INSTANCES, + CXL_CHMU_INSTANCES_PER_BLOCK); + break; + case A_CXL_CHMU_COMMON_CAP1: + val =3D FIELD_DP64(val, CXL_CHMU_COMMON_CAP1, INSTANCE_LENGTH, + A_CXL_CHMU1_CAP0 - A_CXL_CHMU0_CAP0); + break; + case A_CXL_CHMU0_CAP0: + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, MSI_N, chmui->msi_n); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, OVERFLOW_INT, 1); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, LEVEL_INT, 1); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, EPOCH_TYPE, + CXL_CHMU0_CAP0_EPOCH_TYPE_GLOBAL); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_R, = 1); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_W, = 1); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_RW,= 1); + /* No emulation of TEE modes yet so don't pretend to support them = */ + + /* Epoch length from 100 milliseconds to 100 second */ + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, MAX_EPOCH_LENGTH_SCALE, + CXL_CHMU_EPOCH_LENGTH_SCALE_1SEC); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, MAX_EPOCH_LENGTH_VAL, 100); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, MIN_EPOCH_LENGTH_SCALE, + CXL_CHMU_EPOCH_LENGTH_SCALE_100MSEC); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, MIN_EPOCH_LENGTH_VAL, 1); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP0, HOTLIST_SIZE, + CXL_HOTLIST_ENTRIES); + break; + case A_CXL_CHMU0_CAP1: + /* 4KiB and 8KiB only - 2^N * 256 for each bit set */ + val =3D FIELD_DP64(val, CXL_CHMU0_CAP1, UNIT_SIZES, BIT(4) | BIT(5= )); + /* No downsampling - 2^(N - 1) for each bit set */ + val =3D FIELD_DP64(val, CXL_CHMU0_CAP1, DOWN_SAMPLING_FACTORS, BIT= (1)); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP1, FLAGS_EPOCH_BASED, 1); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP1, FLAGS_ALWAYS_ON, 0); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP1, FLAGS_RANDOMIZED_DOWN_SAMP= LING, + 1); + val =3D FIELD_DP64(val, CXL_CHMU0_CAP1, FLAGS_OVERLAPPING_ADDRESS_= RANGES, + 1); + /* + * Feature to enable a backlog of entries that immediately fill th= e list + * once space is available. Only relevant if reading list infreque= ntly + * and concerned about stale data. (Not implemented) + */ + val =3D FIELD_DP64(val, CXL_CHMU0_CAP1, FLAGS_INSERT_AFTER_CLEAR, = 0); + break; + case A_CXL_CHMU0_CAP2: + val =3D FIELD_DP64(val, CXL_CHMU0_CAP2, BITMAP_REG_OFFSET, + A_CXL_CHMU0_RANGE_CONFIG_BITMAP0 - A_CXL_CHMU0_CA= P0); + break; + case A_CXL_CHMU0_CAP3: + val =3D FIELD_DP64(val, CXL_CHMU0_CAP3, HOTLIST_REG_OFFSET, + A_CXL_CHMU0_HOTLIST0 - A_CXL_CHMU0_CAP0); + break; + case A_CXL_CHMU0_STATUS: + val =3D FIELD_DP64(val, CXL_CHMU0_STATUS, STATUS_ENABLED, + chmui->enabled ? 1 : 0); + val =3D FIELD_DP64(val, CXL_CHMU0_STATUS, OPERATION_IN_PROG, + 0); /* All operations effectively instantaneous */ + val =3D FIELD_DP64(val, CXL_CHMU0_STATUS, COUNTER_WIDTH, 16); + val =3D FIELD_DP64(val, CXL_CHMU0_STATUS, OVERFLOW_INT, + chmui->overflow_set ? 1 : 0); + val =3D FIELD_DP64(val, CXL_CHMU0_STATUS, LEVEL_INT, + chmui->fill_thresh_set ? 1 : 0); + break; + case A_CXL_CHMU0_CONF0: + val =3D FIELD_DP64(val, CXL_CHMU0_CONF0, M2S_REQ_TO_TRACK, chmui->= what); + val =3D FIELD_DP64(val, CXL_CHMU0_CONF0, FLAGS_RANDOMIZE_DOWNSAMPL= ING, 0); + val =3D FIELD_DP64(val, CXL_CHMU0_CONF0, FLAGS_INT_ON_OVERFLOW, + chmui->int_on_overflow); + val =3D FIELD_DP64(val, CXL_CHMU0_CONF0, FLAGS_INT_ON_FILL_THRESH, + chmui->int_on_fill_thresh); + val =3D FIELD_DP64(val, CXL_CHMU0_CONF0, CONTROL_ENABLE, + chmui->enabled); + val =3D FIELD_DP64(val, CXL_CHMU0_CONF0, CONTROL_RESET, 0); + val =3D FIELD_DP64(val, CXL_CHMU0_CONF0, HOTNESS_THRESHOLD, + chmui->hotness_thresh); + break; + case A_CXL_CHMU0_CONF1: + val =3D FIELD_DP64(val, CXL_CHMU0_CONF1, UNIT_SIZE, + chmui->unit_size); + val =3D FIELD_DP64(val, CXL_CHMU0_CONF1, DOWN_SAMPLING_FACTOR, 0); + val =3D FIELD_DP64(val, CXL_CHMU0_CONF1, REPORTING_MODE, 0); + val =3D FIELD_DP64(val, CXL_CHMU0_CONF1, EPOCH_LENGTH_SCALE, + chmui->epoch_scale); + val =3D FIELD_DP64(val, CXL_CHMU0_CONF1, EPOCH_LENGTH_VAL, + chmui->epoch_val); + break; + case A_CXL_CHMU0_CONF2: + val =3D FIELD_DP64(val, CXL_CHMU0_CONF2, NOTIFICATION_THRESHOLD, + chmui->fill_thresh); + break; + case A_CXL_CHMU0_TAIL: + if (chmu->socket) { + rc =3D chmu_send(chmu, instance, QUERY_TAIL, 0, 0, &val); + if (rc < 0) { + printf("Failed to read tail\n"); + return 0; + } + } else { + val =3D chmui->tail; + } + break; + case A_CXL_CHMU0_HEAD: + if (chmu->socket) { + rc =3D chmu_send(chmu, instance, QUERY_HEAD, 0, 0, &val); + if (rc < 0) { + printf("Failed to read head\n"); + return 0; + } + } else { + val =3D chmui->head; + } + break; + case A_CXL_CHMU0_HOTLIST0...(8 * (A_CXL_CHMU0_HOTLIST0 + + CHMU_HOTLIST_LENGTH)): + if (chmu->socket) { + rc =3D chmu_send(chmu, instance, QUERY_HOTLIST_ENTRY, + (offset - A_CXL_CHMU0_HOTLIST0) / 8, 0, &val); + if (rc < 0) { + printf("Failed to read a hotlist entry\n"); + return 0; + } + } else { + val =3D chmui->hotlist[(offset - A_CXL_CHMU0_HOTLIST0) / 8]; + } + break; + } + return val; +} + +static void chmu_write(void *opaque, hwaddr offset, uint64_t value, + unsigned size) +{ + CHMUState *chmu =3D opaque; + CHMUInstance *chmui; + hwaddr chmu_stride =3D A_CXL_CHMU1_CAP0 - A_CXL_CHMU0_CAP0; + int instance =3D 0; + int i, rc; + + if (offset >=3D A_CXL_CHMU0_CAP0) { + instance =3D (offset - A_CXL_CHMU0_CAP0) / chmu_stride; + /* offset as if in chmu0 so includes the common caps */ + offset -=3D chmu_stride * instance; + } + if (instance >=3D CXL_CHMU_INSTANCES_PER_BLOCK) { + return; + } + + chmui =3D &chmu->inst[instance]; + + switch (offset) { + case A_CXL_CHMU0_STATUS: + /* The interrupt fields are RW12C */ + if (FIELD_EX64(value, CXL_CHMU0_STATUS, OVERFLOW_INT)) { + chmui->overflow_set =3D false; + } + if (FIELD_EX64(value, CXL_CHMU0_STATUS, LEVEL_INT)) { + chmui->fill_thresh_set =3D false; + } + break; + case A_CXL_CHMU0_RANGE_CONFIG_BITMAP0...(A_CXL_CHMU0_HOTLIST0 - 8): + /* TODO - wire this up */ + printf("Bitmap write %lx %lx\n", + offset - A_CXL_CHMU0_RANGE_CONFIG_BITMAP0, value); + break; + case A_CXL_CHMU0_CONF0: + if (FIELD_EX64(value, CXL_CHMU0_CONF0, CONTROL_ENABLE)) { + chmui->enabled =3D true; + timer_mod(chmui->timer, + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + chmui->epoch= _ms); + } else { + timer_del(chmui->timer); + chmui->enabled =3D false; + } + if (chmu->socket) { + bool enabled =3D FIELD_EX64(value, CXL_CHMU0_CONF0, CONTROL_EN= ABLE); + + if (enabled) { + int d; + for (d =3D 0; d < CXL_HDM_DECODER_COUNT; d++) { + /* Should loop over ranges + the base addresses */ + + rc =3D chmu_send(chmu, instance, SET_HPA_BASE, + chmu->decoder[d].base, d, NULL); + if (rc < 0) { + printf("Failed to set base\n"); + } + rc =3D chmu_send(chmu, instance, SET_HPA_SIZE, + chmu->decoder[d].size, d, NULL); + if (rc < 0) { + printf("Failed to set size\n"); + } + rc =3D chmu_send(chmu, instance, SET_DPA_BASE, + chmu->decoder[d].dpa_base, d, NULL); + if (rc < 0) { + printf("Failed to set base\n"); + } + + rc =3D chmu_send(chmu, instance, SET_INTERLEAVE_WAYS, + chmu->decoder[d].ways, d, NULL); + if (rc < 0) { + printf("Failed to set ways\n"); + } + rc =3D chmu_send(chmu, instance, SET_INTERLEAVE_WAY, + chmu->decoder[d].way, d, NULL); + if (rc < 0) { + printf("Failed to set ways\n"); + } + rc =3D chmu_send(chmu, instance, SET_INTERLEAVE_GRAN, + chmu->decoder[d].interleave_gran, d, NU= LL); + if (rc < 0) { + printf("Failed to set ways\n"); + } + } + } + rc =3D chmu_send(chmu, instance, SET_THRESHOLD, + FIELD_EX64(value, CXL_CHMU0_CONF0, + HOTNESS_THRESHOLD), + 0, NULL); + if (rc < 0) { + printf("failed to set threshold\n"); + } + rc =3D chmu_send(chmu, instance, SET_ENABLED, enabled ? 1 : 0,= 0, + NULL); + if (rc < 0) { + printf("Failed to set enabled\n"); + } + } + + if (FIELD_EX64(value, CXL_CHMU0_CONF0, CONTROL_RESET)) { + chmui->head =3D 0; + chmui->tail =3D 0; + for (i =3D 0; i < CXL_HOTLIST_ENTRIES; i++) { + chmui->hotlist[i] =3D 0; + } + } + chmui->what =3D FIELD_EX64(value, CXL_CHMU0_CONF0, M2S_REQ_TO_TRAC= K); + chmui->int_on_overflow =3D + FIELD_EX64(value, CXL_CHMU0_CONF0, FLAGS_INT_ON_OVERFLOW); + chmui->int_on_fill_thresh =3D + FIELD_EX64(value, CXL_CHMU0_CONF0, FLAGS_INT_ON_FILL_THRESH); + chmui->hotness_thresh =3D + FIELD_EX64(value, CXL_CHMU0_CONF0, HOTNESS_THRESHOLD); + break; + case A_CXL_CHMU0_CONF1: { + chmui->unit_size =3D FIELD_EX64(value, CXL_CHMU0_CONF1, UNIT_SIZE); + chmui->ds_factor =3D + FIELD_EX64(value, CXL_CHMU0_CONF1, DOWN_SAMPLING_FACTOR); + + /* TODO: Sanity check value in supported range */ + chmui->epoch_scale =3D + FIELD_EX64(value, CXL_CHMU0_CONF1, EPOCH_LENGTH_SCALE); + chmui->epoch_val =3D FIELD_EX64(value, CXL_CHMU0_CONF1, EPOCH_LENG= TH_VAL); + switch (chmui->epoch_scale) { + /* TODO: Implement maths, not lookup */ + case 1: /* 100usec */ + chmui->epoch_ms =3D chmui->epoch_val / 10; + break; + case 2: + chmui->epoch_ms =3D chmui->epoch_val; + break; + case 3: + chmui->epoch_ms =3D chmui->epoch_val * 10; + break; + case 4: + chmui->epoch_ms =3D chmui->epoch_val * 100; + break; + case 5: + chmui->epoch_ms =3D chmui->epoch_val * 1000; + break; + default: + /* Unknown value so ignore */ + break; + } + break; + } + case A_CXL_CHMU0_CONF2: + chmui->fill_thresh =3D FIELD_EX64(value, CXL_CHMU0_CONF2, + NOTIFICATION_THRESHOLD); + break; + case A_CXL_CHMU0_HEAD: + chmui->head =3D value; + if (chmu->socket) { + rc =3D chmu_send(chmu, instance, SET_HEAD, value, 0, NULL); + if (rc < 0) { + printf("Failed to set head pointer\n"); + } + } + break; + case A_CXL_CHMU0_TAIL: /* Not sure why this is writeable! */ + chmui->tail =3D value; + break; + } +} + +static const MemoryRegionOps chmu_ops =3D { + .read =3D chmu_read, + .write =3D chmu_write, + .endianness =3D DEVICE_LITTLE_ENDIAN, + .valid =3D { + .min_access_size =3D 1, + .max_access_size =3D 8, + .unaligned =3D false, + }, + .impl =3D { + .min_access_size =3D 4, + .max_access_size =3D 8, + }, +}; + +static void chmu_timer_update(void *opaque) +{ + CHMUInstance *chmui =3D opaque; + PCIDevice *pdev =3D PCI_DEVICE(chmui->private); + bool interrupt_needed =3D false; + uint64_t reply; + int rc; + + timer_del(chmui->timer); + + /* FIXME: instance always 0! */ + rc =3D chmu_send(chmui->parent, 0, SIGNAL_EPOCH_END, 0, 0, &reply); + if (rc < 0) { + error_setg(&error_fatal, "Epoch signalling failed"); + return; + } + + rc =3D chmu_send(chmui->parent, 0, QUERY_TAIL, 0, 0, &reply); + if (rc < 0) { + error_setg(&error_fatal, "Tail read failed"); + return; + } + chmui->tail =3D reply; + printf("After epoch tail is %x\n", chmui->tail); + + /* All interrupt code is kept in here whatever the data source */ + if (chmui->int_on_fill_thresh && !chmui->fill_thresh_set) { + if (((chmui->tail > chmui->head) && + (chmui->tail - chmui->head > chmui->fill_thresh)) | + ((chmui->tail < chmui->head) && + (CXL_HOTLIST_ENTRIES - chmui->head + chmui->tail > + chmui->fill_thresh))) { + chmui->fill_thresh_set =3D true; + interrupt_needed =3D true; + } + } + if (chmui->int_on_overflow && !chmui->overflow_set) { + if ((chmui->tail + 1) % CXL_HOTLIST_ENTRIES =3D=3D chmui->head) { + chmui->overflow_set =3D true; + interrupt_needed =3D true; + } + } + + if (interrupt_needed) { + if (msix_enabled(pdev)) { + msix_notify(pdev, chmui->msi_n); + } else if (msi_enabled(pdev)) { + msi_notify(pdev, chmui->msi_n); + } + } + + timer_mod(chmui->timer, + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + chmui->epoch_ms); +} + +int cxl_chmu_register_block_init(Object *obj, CXLDeviceState *cxl_dstate, + int id, uint8_t msi_n, Error **errp) +{ + CHMUState *chmu =3D &cxl_dstate->chmu[id]; + MemoryRegion *registers =3D &cxl_dstate->chmu_registers[id]; + g_autofree gchar *name =3D g_strdup_printf("chmu%d-registers", id); + int i; + + memory_region_init_io(registers, obj, &chmu_ops, chmu, name, + pow2ceil(CXL_CHMU_SIZE)); + memory_region_add_subregion(&cxl_dstate->device_registers, + CXL_CHMU_OFFSET(id), registers); + + for (i =3D 0; i < CXL_CHMU_INSTANCES_PER_BLOCK; i++) { + CHMUInstance *chmui =3D &chmu->inst[i]; + + chmui->parent =3D chmu; /* Back reference needed for timer */ + chmui->private =3D obj; /* Reference to PCIDevice needed for MSI/M= SI-x */ + chmui->msi_n =3D msi_n + i; + chmui->timer =3D timer_new_ms(QEMU_CLOCK_VIRTUAL, chmu_timer_updat= e, + chmui); + } + + /* No port means fake non funtional hardware only */ + if (chmu->port) { + struct sockaddr_in server_addr =3D {}; + + chmu->socket =3D socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + if (chmu->socket < 0) { + error_setg(errp, "Failed to create a socket"); + return -1; + } + + server_addr.sin_family =3D AF_INET; + server_addr.sin_addr.s_addr =3D htonl(INADDR_LOOPBACK); + server_addr.sin_port =3D htons(chmu->port); + if (connect(chmu->socket, (struct sockaddr *)&server_addr, + sizeof(server_addr)) < 0) { + close(chmu->socket); + error_setg(errp, "Socket connect failed"); + return -1; + } + + for (i =3D 0; i < CXL_CHMU_INSTANCES_PER_BLOCK; i++) { + uint64_t granual_size =3D (1 << chmu->inst[i].unit_size); + int rc; + + rc =3D chmu_send(chmu, i, SET_HOTLIST_SIZE, CHMU_HOTLIST_LENGT= H, 0, + NULL); + if (rc) { + error_setg(errp, "Failed to set hotlist size"); + return rc; + } + + rc =3D chmu_send(chmu, i, SET_GRANUAL_SIZE, granual_size, 0, N= ULL); + if (rc) { + error_setg(errp, "Failed to set granual size"); + return rc; + } + } + } + + return 0; +} diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index dcefd41088..43f4cd8023 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -36,7 +36,10 @@ enum CXL_T3_MSIX_VECTOR { CXL_T3_MSIX_PCIE_DOE_TABLE_ACCESS =3D 0, CXL_T3_MSIX_EVENT_START =3D 2, CXL_T3_MSIX_MBOX =3D CXL_T3_MSIX_EVENT_START + CXL_EVENT_TYPE_MAX, - CXL_T3_MSIX_VECTOR_NR + CXL_T3_MSIX_CHMU0_BASE, + /* One interrupt per CMUH instance in the block */ + CXL_T3_MSIX_VECTOR_NR =3D + CXL_T3_MSIX_CHMU0_BASE + CXL_CHMU_INSTANCES_PER_BLOCK, }; =20 #define DWORD_BYTE 4 @@ -394,7 +397,13 @@ static void build_dvsecs(CXLType3Dev *ct3d) .lo =3D RBI_CXL_DEVICE_REG | CXL_DEVICE_REG_BAR_IDX, .hi =3D 0, }, + .reg_base[REG_LOC_IDX_CHMU0] =3D { + .lo =3D CXL_CHMU_OFFSET(0) | RBI_CXL_CHMU_REG | + CXL_DEVICE_REG_BAR_IDX, + .hi =3D 0, + }, }; + cxl_component_create_dvsec(cxl_cstate, CXL2_TYPE3_DEVICE, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC, REG_LOC_DVSEC_REVID, dvsec); @@ -418,19 +427,101 @@ static void build_dvsecs(CXLType3Dev *ct3d) PCIE_CXL3_FLEXBUS_PORT_DVSEC_REVID, dvsec); } =20 +bool cxl_type3_get_hdm_interleave_props(CXLType3Dev *ct3d, int which, + uint64_t *hpa_base, uint16_t *gran= ual, + uint8_t *ways) +{ + int hdm_inc =3D R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_L= O; + ComponentRegisters *cregs =3D &ct3d->cxl_cstate.crb; + uint32_t *cache_mem =3D cregs->cache_mem_registers; + uint32_t ctrl, low, high; + + ctrl =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + which * hdm_in= c); + /* TODO: Sanity checks that the decoder is possible */ + if (!FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { + return false; + } + + *granual =3D cxl_decode_ig(FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG)= ); + *ways =3D cxl_interleave_ways_dec(FIELD_EX32(ctrl, CXL_HDM_DECODER0_CT= RL, IW), + NULL); + low =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_LO + which * hdm_= inc); + high =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_HI + which * hdm= _inc); + *hpa_base =3D ((uint64_t)high << 32) | (low & 0xf0000000); + + return true; +} + +/* Only the CHMU needs to know the way */ +void cxl_type3_set_hdm_isp(CXLType3Dev *ct3d, int which, uint8_t isp) +{ + ct3d->cxl_dstate.chmu[0].decoder[which].way =3D isp; +} + static void hdm_decoder_commit(CXLType3Dev *ct3d, int which) { int hdm_inc =3D R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_L= O; ComponentRegisters *cregs =3D &ct3d->cxl_cstate.crb; uint32_t *cache_mem =3D cregs->cache_mem_registers; - uint32_t ctrl; + uint32_t ctrl, low, high; + uint64_t dpa_base =3D 0; + uint8_t iws; + uint16_t ig; + int d; =20 ctrl =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + which * hdm_in= c); /* TODO: Sanity checks that the decoder is possible */ ctrl =3D FIELD_DP32(ctrl, CXL_HDM_DECODER0_CTRL, ERR, 0); ctrl =3D FIELD_DP32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED, 1); =20 + /* Get interleave details for chmu */ + ig =3D FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG); + ct3d->cxl_dstate.chmu[0].decoder[which].interleave_gran =3D cxl_decode= _ig(ig); + + iws =3D FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IW); + ct3d->cxl_dstate.chmu[0].decoder[which].ways =3D + cxl_interleave_ways_dec(iws, NULL); + stl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + which * hdm_inc, ctrl); + + low =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_LO + which * hdm_= inc); + high =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_HI + which * hdm= _inc); + ct3d->cxl_dstate.chmu[0].decoder[which].base =3D + ((uint64_t)high << 32) | (low & 0xf0000000); + + low =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_LO + which * hdm_= inc); + high =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_HI + which * hdm= _inc); + ct3d->cxl_dstate.chmu[0].decoder[which].size =3D + ((uint64_t)high << 32) | (low & 0xf0000000); + + /* + * To figure out the DPA start, Add size / ways + skip for all earlier + * decoders + skip for the current one. + */ + for (d =3D 0; d < which; d++) { + ctrl =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + d * hdm_in= c); + + low =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_LO + + d * hdm_inc); + high =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_HI + + d * hdm_inc); + dpa_base +=3D ((uint64_t)high << 32) | (low & 0xf0000000); + + iws =3D FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IW); + low =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_LO + d * hdm_= inc); + high =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_HI + d * hdm= _inc); + /* DPA space used is size / ways */ + dpa_base +=3D (((uint64_t)high << 32) | (low & 0xf0000000)) / + cxl_interleave_ways_dec(iws, NULL); + } + low =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_LO + + which * hdm_inc); + high =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_HI + + which * hdm_inc); + dpa_base +=3D ((uint64_t)high << 32) | (low & 0xf0000000); + + + ct3d->cxl_dstate.chmu[0].decoder[which].dpa_base =3D dpa_base; } =20 static void hdm_decoder_uncommit(CXLType3Dev *ct3d, int which) @@ -913,6 +1004,13 @@ static void ct3_realize(PCIDevice *pci_dev, Error **e= rrp) =20 cxl_device_register_block_init(OBJECT(pci_dev), &ct3d->cxl_dstate, &ct3d->cci); + + rc =3D cxl_chmu_register_block_init(OBJECT(pci_dev), &ct3d->cxl_dstate, + 0, CXL_T3_MSIX_CHMU0_BASE, errp); + if (rc) { + goto err_free_special_ops; + } + pci_register_bar(pci_dev, CXL_DEVICE_REG_BAR_IDX, PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, @@ -1288,6 +1386,7 @@ static const Property ct3_props[] =3D { speed, PCIE_LINK_SPEED_32), DEFINE_PROP_PCIE_LINK_WIDTH("x-width", CXLType3Dev, width, PCIE_LINK_WIDTH_16), + DEFINE_PROP_UINT16("chmu-port", CXLType3Dev, cxl_dstate.chmu[0].port, = 0), }; =20 static uint64_t get_lsa_size(CXLType3Dev *ct3d) diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build index 3e375f61a9..e3abb49d27 100644 --- a/hw/cxl/meson.build +++ b/hw/cxl/meson.build @@ -6,6 +6,7 @@ system_ss.add(when: 'CONFIG_CXL', 'cxl-host.c', 'cxl-cdat.c', 'cxl-events.c', + 'cxl-chmu.c', 'switch-mailbox-cci.c', ), if_false: files( --=20 2.48.1 From nobody Mon Feb 9 22:57:20 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1749743966; cv=none; d=zohomail.com; s=zohoarc; b=LIQgsLjQYjlJSsN4gld3x2XBlexCylmoVKT8NV42Rhq7GaJzUTtA6dUGJs1ETOubJkJ02IaqMtEcfYNOiWfJY8mdQyP8znBHywhUn2isPxYvUuUSqvZoAe4h1h5STA+qrqKBicaSdBNSg3dWTlvum1WxkYAp7+BdzQbKK5lNu38= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1749743966; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:Reply-To:References:Sender:Subject:Subject:To:To:Message-Id; bh=18+59q3yJeDD2mqyc3ngMwS595eBrsbCkFmmaxRI+rU=; b=IgtULUy8o4Nl+0vHJ6P+GwJdg163Hb1r8Zr5HgUbNwEVoISi5hHMVWH43HakmGjSkndLI3V5J6Ypkgj0M+xlgvMJtRpD2iXN3I9vsp2xZqVnWws/jnOS34VLq81GBGvpkAJkyiTGhIVJ6CbJrIQAmHjH9xIFiQwNr/zSlEDYhQM= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1749743966307239.16403418558878; Thu, 12 Jun 2025 08:59:26 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uPkKb-00057y-NK; Thu, 12 Jun 2025 11:59:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPkKY-00051u-J5 for qemu-devel@nongnu.org; Thu, 12 Jun 2025 11:59:07 -0400 Received: from [185.176.79.56] (helo=frasgout.his.huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPkKV-0003Q9-Qp for qemu-devel@nongnu.org; Thu, 12 Jun 2025 11:59:05 -0400 Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4bJ6bh3GCMz6K9B2; Thu, 12 Jun 2025 23:57:04 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 2DFD31404F9; Thu, 12 Jun 2025 23:59:00 +0800 (CST) Received: from SecurePC-101-06.china.huawei.com (10.122.19.247) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 12 Jun 2025 17:58:59 +0200 To: Pierrick Bouvier , , , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Alexandre Iooss , Mahmoud Mandour , Bowman Terry CC: , , , , , , , , , , Bharata B Rao Subject: [RFC PATCH v2 QEMU 3/4] hw/cxl: Provide a means to get the interleave set position for an EP Date: Thu, 12 Jun 2025 16:57:23 +0100 Message-ID: <20250612155724.1887266-4-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250612155724.1887266-1-Jonathan.Cameron@huawei.com> References: <20250612155724.1887266-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.122.19.247] X-ClientProxiedBy: lhrpeml500012.china.huawei.com (7.191.174.4) To frapeml500008.china.huawei.com (7.182.85.71) X-Host-Lookup-Failed: Reverse DNS lookup failed for 185.176.79.56 (deferred) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.176.79.56; envelope-from=jonathan.cameron@huawei.com; helo=frasgout.his.huawei.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Jonathan Cameron From: Jonathan Cameron via Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1749743968705116600 Content-Type: text/plain; charset="utf-8" CXL interleave decoding is hieriarchical in a fashion that means the CXL memory devices only need to know the interleave granularity and interleave ways to figure out which address bits to drop from incoming translactions. Unfortunately to provide the right information to the hotness monitoring plugin, which filters transactions in Host Physical Address space, it is necessary to know which interleave set position a given device is in. I tried various more sophisticated solutions to provide this information but they were all rather complex. The solution used here is the brute force one. Every time an HDM Decoder is committed (these are the address routing elements) it checks every Type 3 Device HDM Decoder to find a HPA range. It then uses the address routing HPA to Device matching heirarchical routing at a series of addresses corresponding to the first byte of each interleave set. When the same device is reached then we know we have the correct Interleave Set Position and pass that to the CHMU. Signed-off-by: Jonathan Cameron --- RFC v2: New patch. Note lightly tested only so far. --- include/hw/cxl/cxl.h | 1 + hw/cxl/cxl-component-utils.c | 4 ++ hw/cxl/cxl-host.c | 72 ++++++++++++++++++++++++++++++++++++ hw/mem/cxl_type3.c | 2 + 4 files changed, 79 insertions(+) diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h index 12844d3418..b4b83c0b63 100644 --- a/include/hw/cxl/cxl.h +++ b/include/hw/cxl/cxl.h @@ -71,4 +71,5 @@ CXLComponentState *cxl_usp_to_cstate(CXLUpstreamPort *usp= ); typedef struct CXLDownstreamPort CXLDownstreamPort; DECLARE_INSTANCE_CHECKER(CXLDownstreamPort, CXL_DSP, TYPE_CXL_DSP) =20 +void cxl_update_isp(void); #endif diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c index 473895948b..f53ce1ebaa 100644 --- a/hw/cxl/cxl-component-utils.c +++ b/hw/cxl/cxl-component-utils.c @@ -116,6 +116,10 @@ static void dumb_hdm_handler(CXLComponentState *cxl_cs= tate, hwaddr offset, value =3D FIELD_DP32(value, CXL_HDM_DECODER0_CTRL, COMMITTED, 0); } stl_le_p((uint8_t *)cache_mem + offset, value); + + if (should_commit) { + cxl_update_isp(); + } } =20 static void cxl_cache_mem_write_reg(void *opaque, hwaddr offset, uint64_t = value, diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c index 5239555f6c..893ef7f7fa 100644 --- a/hw/cxl/cxl-host.c +++ b/hw/cxl/cxl-host.c @@ -279,6 +279,78 @@ static MemTxResult cxl_write_cfmws(void *opaque, hwadd= r addr, return cxl_type3_write(d, addr + fw->base, data, size, attrs); } =20 +/* + * Updating the end point decoder stashed Interleave Set Positions (ISP) + * that is needed for the CHMU to pass to the cache plugin + hotness track= er + * is tricky as the decoders can be committed in any order. + * + * Brute force the problem by finding any endpoints below a cfmws and for + * each enabled decoder, probing until we get a match - if any upstream + * decoders are not commited this will fail but that is fine as we try aga= in + * later when the situation is resolved by commiting upstream decoders. + * + * This is a rare operation, so not worth complexity of walking down from + * the Fixed memory windows. Just compare all with all. + */ + +/* Update ISP for a given Type 3 memory device */ +static int cxl_type3_update_isp(Object *obj, void *opaque) +{ + CXLType3Dev *ct3d; + CXLFixedWindow *fw =3D opaque; + int i; + + /* + * From the CXL Type 3 HDM decoders we need interleave info. + * That will let us then find out for each decoder what hits it.... + */ + if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) { + return 0; + } + + ct3d =3D CXL_TYPE3(obj); + + for (i =3D 0; i < CXL_HDM_DECODER_COUNT; i++) { + uint64_t hpa_base; + uint16_t granual; + uint8_t ways, w; + PCIDevice *d; + + if (!cxl_type3_get_hdm_interleave_props(ct3d, i, &hpa_base, &granu= al, + &ways)) { + continue; /* commit in order, but teardown can be messy */ + } + + for (w =3D 0; w < ways; w++) { + d =3D cxl_cfmws_find_device(fw, hpa_base + w * granual - fw->b= ase); + if (d =3D=3D PCI_DEVICE(ct3d)) { + cxl_type3_set_hdm_isp(ct3d, i, w); + } + } + } + return 0; +} + +static int cxl_fmw_update_isp(Object *obj, void *priv) +{ + struct CXLFixedWindow *fw; + + if (!object_dynamic_cast(obj, TYPE_CXL_FMW)) { + return 0; + } + fw =3D CXL_FMW(obj); + object_child_foreach_recursive(object_get_root(), + cxl_type3_update_isp, fw); + return 0; +} + +/* Update all Interleave Set Positions on all EP HDM decoders */ +void cxl_update_isp(void) +{ + object_child_foreach_recursive(object_get_root(), + cxl_fmw_update_isp, NULL); +} + const MemoryRegionOps cfmws_ops =3D { .read_with_attrs =3D cxl_read_cfmws, .write_with_attrs =3D cxl_write_cfmws, diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index 43f4cd8023..8e9f76a07a 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -484,6 +484,8 @@ static void hdm_decoder_commit(CXLType3Dev *ct3d, int w= hich) =20 stl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + which * hdm_inc, ctrl); =20 + cxl_update_isp(); + low =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_LO + which * hdm_= inc); high =3D ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_HI + which * hdm= _inc); ct3d->cxl_dstate.chmu[0].decoder[which].base =3D --=20 2.48.1 From nobody Mon Feb 9 22:57:20 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1749744000; cv=none; d=zohomail.com; s=zohoarc; b=AMQEUiTi00tMDWxnZ5BWHK7xo2xaD/dmldlyqBhKQ2NPlLbN/Oeqx9xvBhIMNPyjjaOKCdQrbk71qe7ULfS2l55LROMJsyZpD3o7ziGLR7s7lMJgFb8SiNgWqe/JTtVtyhtKGOB/YrV+OD1x2wAFIVdVcC4dtfMNbJ2RA8VnzWg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1749744000; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:Reply-To:References:Sender:Subject:Subject:To:To:Message-Id; bh=EXnaG3ZZQ82uw+weMNt7WThOPbkLNLl7Jq6YVMcttls=; b=Oz0t/Ejv4ZKIkb0JS74xcToeqJ+eO3Pozl+pzCSw64VPdcUn/gPXQPveg9jDNObjpywQyuQJjfVmmJ8jJkk3no6eXiSeAiDsjvVICREfTdY/jC+QK0JOX8w7q2GsjNYrRg2DiioIZZZ52lSojbnvWxIpf8h7EFZYoYaMJXnDdtk= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1749744000656633.4971219460325; Thu, 12 Jun 2025 09:00:00 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uPkLH-0005qv-Lg; Thu, 12 Jun 2025 11:59:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPkLG-0005qg-EK for qemu-devel@nongnu.org; Thu, 12 Jun 2025 11:59:50 -0400 Received: from [185.176.79.56] (helo=frasgout.his.huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPkLD-0003Sb-To for qemu-devel@nongnu.org; Thu, 12 Jun 2025 11:59:50 -0400 Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4bJ6cH4lc3z6K9C8; Thu, 12 Jun 2025 23:57:35 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 73B571404F9; Thu, 12 Jun 2025 23:59:31 +0800 (CST) Received: from SecurePC-101-06.china.huawei.com (10.122.19.247) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 12 Jun 2025 17:59:30 +0200 To: Pierrick Bouvier , , , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Alexandre Iooss , Mahmoud Mandour , Bowman Terry CC: , , , , , , , , , , Bharata B Rao Subject: [RFC PATCH v2 QEMU 4/4] plugins: cache: Add a hotness tracker for cache misses with socket connection to device emulation Date: Thu, 12 Jun 2025 16:57:24 +0100 Message-ID: <20250612155724.1887266-5-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250612155724.1887266-1-Jonathan.Cameron@huawei.com> References: <20250612155724.1887266-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.122.19.247] X-ClientProxiedBy: lhrpeml500012.china.huawei.com (7.191.174.4) To frapeml500008.china.huawei.com (7.182.85.71) X-Host-Lookup-Failed: Reverse DNS lookup failed for 185.176.79.56 (deferred) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.176.79.56; envelope-from=jonathan.cameron@huawei.com; helo=frasgout.his.huawei.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Jonathan Cameron From: Jonathan Cameron via Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1749744003265116600 Content-Type: text/plain; charset="utf-8" This adds simple hotness tracker instances suitable for pairing with the CXL HMU emulation with control and data transfer via a socket (port 4443) A typical command line is: -plugin ../qemu/bin/native/contrib/plugins/libcache.so,hotness=3D1,\ dcachesize=3D8192,dassoc=3D4,dblksize=3D64,icachesize=3D8192,iassoc=3D4,\ iblksize=3D64,l2cachesize=3D32768,l2assoc=3D16,l2blksize=3D64 Most of the parameters are concerned with configuring the cache topology so that the accesses that reach the hotness monitor (which is pretending to be on the CXL device) reflect those that did not hit in cache. The only hotness specific parameter is hotness=3D1 to turn on hotness tracking and allow connections from consuming device emulation. There are many approximations in this cache model but it is closer than not modelling the caches at all. More sophisticated modeling is easy to add but will come with a performance cost. The hotness tracker is based on an oracle counter array (1 counter per granual) + hotness threshold (supplied from the emulated device). Real devices will be resource constrained and are likely to implement either a limited number of precise counters, or an imprecise counting method. Emulating any of these should be easy to add. The device emulation sends an end of epoch system based on the emulated machine idea of time. At that point entries are added to the reported hotlist for any counters that are over the threshold set via hotness=3DX. That hotlist is queried by the device side. All configuration is provided over the socket from the emulated CXL Hotness Monitoring unit. RFC question: Should I split this off as a separate plugin that duplicates all of the cache plugin logic as well as providing the hostness monitor. Signed-off-by: Jonathan Cameron --- RFCv2: Bring the hotness server element into the plugin. Still an RFC because there are more features to implement. Looking for feedback on the overall approach. --- contrib/plugins/cache.c | 434 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 427 insertions(+), 7 deletions(-) diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c index 56508587d3..26185c52b0 100644 --- a/contrib/plugins/cache.c +++ b/contrib/plugins/cache.c @@ -7,10 +7,64 @@ =20 #include #include +#include #include +#include +#include +#include =20 #include =20 +/* ? Where to put a header with this stuff that the CHMU and plugin need? = */ +#define HOTNESS_SERVER_PORT 4443 +enum consumer_request { + QUERY_TAIL, + QUERY_HEAD, + SET_THRESHOLD, + SET_HEAD, + SET_HOTLIST_SIZE, + QUERY_HOTLIST_ENTRY, + SIGNAL_EPOCH_END, + SET_ENABLED, + SET_GRANUAL_SIZE, /* Granualrity of DPA blocks to track (1 << unit siz= e) */ + SET_HPA_BASE, + SET_HPA_SIZE, + SET_DPA_BASE, + SET_INTERLEAVE_WAYS, + SET_INTERLEAVE_WAY, + SET_INTERLEAVE_GRAN, +}; + +#define HOTNESS_NUM_RANGES 8 +struct tracking_instance { + /* + * Some checks are done first without lock and then repeated with + * lock to avoid contention. (TODO show that matters) + */ + pthread_mutex_t lock; + struct tracking_range { + uint64_t base; + uint64_t size; + uint64_t dpa_offset; + uint8_t ways; + uint8_t way; + uint64_t interleave_granual; + } ranges[HOTNESS_NUM_RANGES]; + uint16_t head, tail; + uint32_t granual_size; + uint16_t hotlist_length; + uint64_t threshold; + uint64_t *hotlist; + uint32_t *counters; + size_t num_counters; + bool enabled; +}; + +#define MAX_INSTANCES 16 +pthread_mutex_t instances_lock; +static int num_tracking_instances; +static struct tracking_instance *instances[MAX_INSTANCES] =3D {}; + #define STRTOLL(x) g_ascii_strtoll(x, NULL, 10) =20 QEMU_PLUGIN_EXPORT int qemu_plugin_version =3D QEMU_PLUGIN_VERSION; @@ -104,6 +158,7 @@ static Cache **l2_ucaches; static GMutex *l1_dcache_locks; static GMutex *l1_icache_locks; static GMutex *l2_ucache_locks; +static GMutex *socket_lock; =20 static uint64_t l1_dmem_accesses; static uint64_t l1_imem_accesses; @@ -385,6 +440,80 @@ static bool access_cache(Cache *cache, uint64_t addr) return false; } =20 +static bool match_range(struct tracking_range *range, uint64_t paddr) +{ + uint64_t offset; + + if (!range->size || + paddr < range->base || + paddr >=3D range->base + range->size / range->ways) { + return false; + } + if (range->ways =3D=3D 0 || range->ways =3D=3D 1) { /* no interleave */ + return true; + } + + /* Offset in granuals */ + offset =3D (paddr - range->base) / range->interleave_granual; + if (offset % range->ways !=3D range->way) { + return false; + } + return true; +} + +/* Under instance lock */ +static void notify_tracker(struct tracking_instance *inst, uint64_t paddr) +{ + uint64_t offset; + int i; + + /* + * This check may be wrong if racing with enabled, but + * we don't use the data until we have the lock and recheck. + * If we drop an access due to a race on an enable/disable/enable + * then meh. + */ + for (i =3D 0; i < HOTNESS_NUM_RANGES; i++) { + if (!match_range(&inst->ranges[i], paddr)) { + continue; + } + break; + } + if (i =3D=3D HOTNESS_NUM_RANGES) { + return; + } + + pthread_mutex_lock(&inst->lock); + /* recheck under the lock */ + if (!inst->enabled || !inst->counters || + !match_range(&inst->ranges[i], paddr)) { + goto err; + } + + offset =3D (paddr - inst->ranges[i].base + inst->ranges[i].dpa_offset)= / + (inst->granual_size * inst->ranges[i].ways); + + /* TODO - check masking */ + if (offset >=3D inst->num_counters) { + fprintf(stderr, "Out of range? %lx %lx\n", offset, inst->num_count= ers); + goto err; + } + inst->counters[offset]++; + err: + pthread_mutex_unlock(&inst->lock); +} + +static void miss(uint64_t paddr) +{ + int i; + + for (i =3D 0; i < num_tracking_instances; i++) { + if (instances[i]->enabled) { + notify_tracker(instances[i], paddr); + } + } +} + static void vcpu_mem_access(unsigned int vcpu_index, qemu_plugin_meminfo_t= info, uint64_t vaddr, void *userdata) { @@ -395,9 +524,6 @@ static void vcpu_mem_access(unsigned int vcpu_index, qe= mu_plugin_meminfo_t info, bool hit_in_l1; =20 hwaddr =3D qemu_plugin_get_hwaddr(info, vaddr); - if (hwaddr && qemu_plugin_hwaddr_is_io(hwaddr)) { - return; - } =20 effective_addr =3D hwaddr ? qemu_plugin_hwaddr_phys_addr(hwaddr) : vad= dr; cache_idx =3D vcpu_index % cores; @@ -412,7 +538,11 @@ static void vcpu_mem_access(unsigned int vcpu_index, q= emu_plugin_meminfo_t info, l1_dcaches[cache_idx]->accesses++; g_mutex_unlock(&l1_dcache_locks[cache_idx]); =20 - if (hit_in_l1 || !use_l2) { + if (hit_in_l1) { + return; + } + if (!use_l2) { + miss(effective_addr); /* No need to access L2 */ return; } @@ -422,6 +552,7 @@ static void vcpu_mem_access(unsigned int vcpu_index, qe= mu_plugin_meminfo_t info, insn =3D userdata; __atomic_fetch_add(&insn->l2_misses, 1, __ATOMIC_SEQ_CST); l2_ucaches[cache_idx]->misses++; + miss(effective_addr); } l2_ucaches[cache_idx]->accesses++; g_mutex_unlock(&l2_ucache_locks[cache_idx]); @@ -447,8 +578,12 @@ static void vcpu_insn_exec(unsigned int vcpu_index, vo= id *userdata) l1_icaches[cache_idx]->accesses++; g_mutex_unlock(&l1_icache_locks[cache_idx]); =20 - if (hit_in_l1 || !use_l2) { - /* No need to access L2 */ + if (hit_in_l1) { + return; + } + + if (!use_l2) { + miss(insn_addr); return; } =20 @@ -735,15 +870,286 @@ static void policy_init(void) } } =20 +static int register_tracker(struct tracking_instance *inst) +{ + pthread_mutex_lock(&instances_lock); + if (num_tracking_instances >=3D MAX_INSTANCES) { + pthread_mutex_unlock(&instances_lock); + return -1; + } + instances[num_tracking_instances++] =3D inst; + pthread_mutex_unlock(&instances_lock); + + return 0; +} + +/* Per hotness monitoring unit thread */ +static void *consumer_innerloop(void *_socket) +{ + int socket =3D *(int *)_socket; + struct tracking_instance inst =3D {}; + /* Instance, command, parameter, parameter2 */ + uint64_t paddr[4]; + int rc; + + pthread_mutex_init(&inst.lock, NULL); + + /* For now only handle a single instance per block */ + rc =3D register_tracker(&inst); + if (rc) { + fprintf(stderr, "Failed to register tracker\n"); + return NULL; + } + + while (1) { + uint64_t reply, param, param2; + enum consumer_request request; + + rc =3D read(socket, paddr, sizeof(paddr)); + if (rc < sizeof(paddr)) { + fprintf(stderr, "short message %x\n", rc); + continue; + } + if (paddr[0] > 0) { + fprintf(stderr, "Instance out of range\n"); + continue; + } + request =3D paddr[1]; + param =3D paddr[2]; + param2 =3D paddr[3]; + + pthread_mutex_lock(&inst.lock); + switch (request) { + case QUERY_TAIL: + reply =3D inst.tail; + break; + case QUERY_HEAD: + reply =3D inst.head; + break; + case SET_HEAD: + reply =3D param; + inst.head =3D param; + break; + case SET_HOTLIST_SIZE: { + uint64_t *newlist; + + reply =3D param; + inst.hotlist_length =3D param; + newlist =3D realloc(inst.hotlist, sizeof(*inst.hotlist) * para= m); + if (!newlist) { + fprintf(stderr, "failed to allocate hotlist\n"); + break; + } + inst.hotlist =3D newlist; + break; + } + case QUERY_HOTLIST_ENTRY: + if (param >=3D inst.hotlist_length) { + fprintf(stderr, "out of range hotlist read?\n"); + break; + } + reply =3D inst.hotlist[param]; + break; + case SIGNAL_EPOCH_END: { + int space; + int added =3D 0; + int max =3D 0; + + reply =3D param; + + /* Head is read location, tail write */ + /* If the rdad location is after the tail then gap */ + if (inst.head > inst.tail) { + space =3D inst.head - inst.tail - 1; + } else { + space =3D inst.hotlist_length - inst.tail + inst.head - 1; + } + printf("Epoch end, space %d given %d %d %d\n", + space, inst.hotlist_length, inst.head, inst.tail); + if (!inst.counters) { + fprintf(stderr, + "How did we reach end of an epoque without counter= s?\n"); + break; + } + for (int i =3D 0; i < inst.num_counters; i++) { + /* + * This helps tune tests - unfortunately no such thing in = the + * CXL spec + */ + if (inst.counters[i] > max) { + max =3D inst.counters[i]; + } + if (!(inst.counters[i] > inst.threshold)) { + continue; + } + inst.hotlist[inst.tail] =3D (uint64_t)inst.counters[i] | + ((uint64_t)i << 32); + inst.tail =3D (inst.tail + 1) % inst.hotlist_length; + added++; + if (added =3D=3D space) { + break; + } + } + memset(inst.counters, 0, + inst.num_counters * sizeof(*inst.counters)); + + printf("End of epoch %u %u %d\n", inst.head, inst.tail, max); + break; + } + case SET_ENABLED: + reply =3D param; + if (param && !inst.enabled) { + uint32_t *new_counters; + uint32_t num_counters; + uint64_t full_range =3D 0; + int i; + + for (i =3D 0; i < HOTNESS_NUM_RANGES; i++) { + uint64_t end; + + /* Skip disabled ranges */ + if (inst.ranges[i].size =3D=3D 0 || inst.ranges[i].way= s =3D=3D 0) { + continue; + } + end =3D inst.ranges[i].dpa_offset + + inst.ranges[i].size / inst.ranges[i].ways; + if (end > full_range) { + full_range =3D end; + } + } + num_counters =3D full_range / inst.granual_size; + new_counters =3D realloc(inst.counters, + sizeof(*inst.counters) * num_counter= s); + if (!new_counters) { + fprintf(stderr, "Failed to allocate counter storage\n"= ); + break; + } + inst.counters =3D new_counters; + inst.num_counters =3D num_counters; + } + inst.enabled =3D !!param; + break; + case SET_THRESHOLD: + reply =3D param; + if (!inst.enabled) { + inst.threshold =3D param; + } + break; + case SET_GRANUAL_SIZE: + reply =3D param; + if (!inst.enabled) { + inst.granual_size =3D param; + } + break; + case SET_HPA_BASE: + reply =3D param; + if (!inst.enabled) { + inst.ranges[param2].base =3D param; + } + break; + case SET_HPA_SIZE: + reply =3D param; + if (!inst.enabled) { + inst.ranges[param2].size =3D param; + } + break; + case SET_DPA_BASE: + reply =3D param; + if (!inst.enabled) { + inst.ranges[param2].dpa_offset =3D param; + } + break; + case SET_INTERLEAVE_WAYS: + reply =3D param; + if (!inst.enabled) { + inst.ranges[param2].ways =3D param; + } + break; + case SET_INTERLEAVE_WAY: + reply =3D param; + if (!inst.enabled) { + inst.ranges[param2].way =3D param; + } + break; + case SET_INTERLEAVE_GRAN: + reply =3D param; + if (!inst.enabled) { + inst.ranges[param2].interleave_granual =3D param; + } + break; + default: + fprintf(stderr, "Unexpected command to hotness monitor\n"); + break; + } + rc =3D write(socket, &reply, sizeof(reply)); + if (rc < 0) { + fprintf(stderr, "write failed - muddle on\n"); + } else if (rc !=3D sizeof(reply)) { + fprintf(stderr, "partial write? %d\n", rc); + } + pthread_mutex_unlock(&inst.lock); + } +} + +/* Outer thread that is responsible for spinning off individual server thr= ead */ +static void *hotness_serverloop(void *private) +{ + int server_fd, new_socket; + int opt =3D 1; + struct sockaddr_in address; + socklen_t addrlen =3D sizeof(address); + int rc; + + server_fd =3D socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + if (server_fd =3D=3D 0) { + return NULL; + } + + if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR | SO_REUSEPORT, + &opt, sizeof(opt))) { + return NULL; + } + + address.sin_family =3D AF_INET; + address.sin_addr.s_addr =3D INADDR_ANY; + address.sin_port =3D htons(HOTNESS_SERVER_PORT); + + rc =3D bind(server_fd, (struct sockaddr *)&address, sizeof(address)); + if (rc < 0) { + return NULL; + } + + if (listen(server_fd, 3) < 0) { + return NULL; + } + + while (1) { + pthread_t thread; + + new_socket =3D accept(server_fd, (struct sockaddr *)&address, &add= rlen); + if (new_socket < 0) { + return NULL; + } + + if (pthread_create(&thread, NULL, consumer_innerloop, &new_socket)= ) { + fprintf(stderr, "thread create fail\n"); + return NULL; + } + } + + return NULL; +} + QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info, int argc, char **argv) { - int i; + int i, hotness; int l1_iassoc, l1_iblksize, l1_icachesize; int l1_dassoc, l1_dblksize, l1_dcachesize; int l2_assoc, l2_blksize, l2_cachesize; =20 + hotness =3D 0; /* No hotness server */ limit =3D 32; sys =3D info->system_emulation; =20 @@ -808,6 +1214,8 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qem= u_info_t *info, fprintf(stderr, "invalid eviction policy: %s\n", opt); return -1; } + } else if (g_strcmp0(tokens[0], "hotness") =3D=3D 0) { + hotness =3D STRTOLL(tokens[1]); } else { fprintf(stderr, "option parsing failed: %s\n", opt); return -1; @@ -840,6 +1248,8 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qem= u_info_t *info, return -1; } =20 + socket_lock =3D g_new0(GMutex, 1); + l1_dcache_locks =3D g_new0(GMutex, cores); l1_icache_locks =3D g_new0(GMutex, cores); l2_ucache_locks =3D use_l2 ? g_new0(GMutex, cores) : NULL; @@ -849,5 +1259,15 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qe= mu_info_t *info, =20 miss_ht =3D g_hash_table_new_full(g_int64_hash, g_int64_equal, NULL, i= nsn_free); =20 + if (hotness) { + pthread_t server_thread; + + pthread_mutex_init(&instances_lock, NULL); + if (pthread_create(&server_thread, NULL, hotness_serverloop, NULL)= ) { + fprintf(stderr, "Hotness server failed\n"); + return -1; + } + } + return 0; } --=20 2.48.1