From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2481A2ECE8E for ; Thu, 26 Jun 2025 19:56:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967820; cv=none; b=Odyv3xAwa0Tp0o9pFnHGmbfoY5dfOn0FZNPaKVWeNveEDfqstk8iyeFRBrHGxZghl0QuuRKjMWGEldmo5AUAEEZeN3MFbuWPYStXWAXKLGXskgQ1phgT7gYbPZZLqFBUZqcWUehjcKtsBW7ZTCvkP7NzaRgpn0Zm8rycpXaUVN0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967820; c=relaxed/simple; bh=ilZZDLKFtvjxpxoBhhniohVxQ7x9krgeaddVpbJsspM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ap56xkMH/yyRLuB7d0wDCTwsd4B3RU5O71cwbBfrz3kSlr3Odl2LmgW4WOsD0FrIR5kbgeqanz2KfiTDQTheACuyjjGO69asopiGyYa2OPT6/pGi5IYAqkgOf9xgr9V5HwmDvHB7/q1IASy/DtPJwB/mZlF+V3CCCexZIFZd9bc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=bq8XlBvG; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bq8XlBvG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967819; x=1782503819; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ilZZDLKFtvjxpxoBhhniohVxQ7x9krgeaddVpbJsspM=; b=bq8XlBvGEiHKQ9rI6Rr73o+tySIeo+LzXe8Wugn/I/psZtj02JzoH+bC z9US7ImRJD59LUD8tl0KBFSMSrSCRBmnMFg/+kn1yu6C4mFNjgGFPAdSh 1HESCUaEVne95dbm78XE/0CWy6fkTCGj4gn5Zhm+shX9ZJkOZ4lmX7oUt e3jzk4Hyraw+F6ATZUFlpO2QO2XiCfOEJ33azYy8kXiP2vV1dBsofcbV4 mUrdjgNrnIxNsnOh2eyjbAiuBCKFX8dqDiv9dF2t7CPGAGdHGFMGwN8XV S/kWzTNYLqR6oH1kvs30jUSORPxXrLgpJbymGJSLpgbimzvabP89rLWlV g==; X-CSE-ConnectionGUID: D1/l0RDBQg6rApVr4ykrLg== X-CSE-MsgGUID: O8JLlBmaTaSEwwNxOZQXxw== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002122" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002122" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:56:57 -0700 X-CSE-ConnectionGUID: PYcqLTJRT269/olE/fK1Og== X-CSE-MsgGUID: YmUqp189QMiKDn9N2qKKhQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902894" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:56:57 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 01/13] perf/x86: Use x86_perf_regs in the x86 nmi handler Date: Thu, 26 Jun 2025 12:55:58 -0700 Message-Id: <20250626195610.405379-2-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang More and more regs will be supported in the overflow, e.g., more vector registers, SSP, etc. The generic pt_regs struct cannot store all of them. Use a X86 specific x86_perf_regs instead. The struct pt_regs *regs is still passed to x86_pmu_handle_irq(). There is no functional change for the existing code. AMD IBS's NMI handler doesn't utilize the static call x86_pmu_handle_irq(). The x86_perf_regs struct doesn't apply to the AMD IBS. It can be added separately later when AMD IBS supports more regs. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 7610f26dfbd9..64a7a8aa2e38 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1752,6 +1752,7 @@ void perf_events_lapic_init(void) static int perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs) { + struct x86_perf_regs x86_regs; u64 start_clock; u64 finish_clock; int ret; @@ -1764,7 +1765,8 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_re= gs *regs) return NMI_DONE; =20 start_clock =3D sched_clock(); - ret =3D static_call(x86_pmu_handle_irq)(regs); + x86_regs.regs =3D *regs; + ret =3D static_call(x86_pmu_handle_irq)(&x86_regs.regs); finish_clock =3D sched_clock(); =20 perf_sample_event_took(finish_clock - start_clock); --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BCFE2EFDA5 for ; Thu, 26 Jun 2025 19:56:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967821; cv=none; b=mmSES21jzflS5JOdjZzrKkXn0myjlqN5NxCowNMalcC4C9HioyrfgDjhbpD0g2DiYX5zrqHEmT5PR7fX9k5KwAf0somCUGoYXAoruonDBTowE3craR7+Qs0CLNsoc+gDRqbaFwIyE4s1/1+4A3joMSOr4VZnSHk9ELMvX54sFDk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967821; c=relaxed/simple; bh=jiUU/CO/dIeSSSoaGHlhX1BWyie8LcKytE6z9fFLDBo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=i/5w7d/61gU3zOQ+dTKuTTXamshiGOpmbq7znA9j5oXCz/nbn2xj83QlWBboR7ls0GghietCnaDTfgxp5cLBXvjtHP+EN1PWN2ulFq/doJA0ONdGAs+fGBoQne54+md1fkE/z9EAlzuMPxroXqV8B//na/joURomM6IUK3gQEks= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aujfukfF; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aujfukfF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967819; x=1782503819; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jiUU/CO/dIeSSSoaGHlhX1BWyie8LcKytE6z9fFLDBo=; b=aujfukfFRlBOiWcy289i1GD2OCam5RyScPuzQLSOcwIbM392i8ycgTqr gm1DeJuV3kHnqr4kSWkN0yV7CrAnYPqHrUaLvZS6R7DbbkCBovaZ2JqKU vgbN0Tcshf5GWfgVO0WrWbS1TZtkg6q6wCn/Q3JEdFtb2nYEYtiZttTOS D+BWKWlhEvbd3EvXQCkhxrDAa1TxBCUgYHM28GZj0t1taA1DbZUOuKnbT 9AqMPc8+Ea0//rIrLVaYR3vCw5JAXmnpbtcDa3uaqWgdTXEN7631L78A1 joLEhUGVRQRDDWS2xELkSc4b0Jm3QnHpq5NB2PytbE6ZRu7IiLzikkRN7 A==; X-CSE-ConnectionGUID: epXjcKTZSkW1/Zc6+OYH6g== X-CSE-MsgGUID: VINT20ZRRN+f6f9tgOfrpA== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002132" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002132" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:56:58 -0700 X-CSE-ConnectionGUID: ONNkOTygRrupnby8dcXAPw== X-CSE-MsgGUID: fpGKqIczRa2jTf8sqdmkbA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902899" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:56:57 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 02/13] perf/x86: Setup the regs data Date: Thu, 26 Jun 2025 12:55:59 -0700 Message-Id: <20250626195610.405379-3-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The current code relies on the generic code to setup the regs data. It will not work well when there are more regs introduced. Introduce a X86-specific x86_pmu_setup_regs_data(). Now, it's the same as the generic code. More X86-specific codes will be added later when the new regs. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 32 ++++++++++++++++++++++++++++++++ arch/x86/events/intel/ds.c | 4 +++- arch/x86/events/perf_event.h | 4 ++++ 3 files changed, 39 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 64a7a8aa2e38..c601ad761534 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1685,6 +1685,38 @@ static void x86_pmu_del(struct perf_event *event, in= t flags) static_call_cond(x86_pmu_del)(event); } =20 +void x86_pmu_setup_regs_data(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs) +{ + u64 sample_type =3D event->attr.sample_type; + + if (sample_type & PERF_SAMPLE_REGS_USER) { + if (user_mode(regs)) { + data->regs_user.abi =3D perf_reg_abi(current); + data->regs_user.regs =3D regs; + } else if (!(current->flags & PF_KTHREAD)) { + perf_get_regs_user(&data->regs_user, regs); + } else { + data->regs_user.abi =3D PERF_SAMPLE_REGS_ABI_NONE; + data->regs_user.regs =3D NULL; + } + data->dyn_size +=3D sizeof(u64); + if (data->regs_user.regs) + data->dyn_size +=3D hweight64(event->attr.sample_regs_user) * sizeof(u6= 4); + data->sample_flags |=3D PERF_SAMPLE_REGS_USER; + } + + if (sample_type & PERF_SAMPLE_REGS_INTR) { + data->regs_intr.regs =3D regs; + data->regs_intr.abi =3D perf_reg_abi(current); + data->dyn_size +=3D sizeof(u64); + if (data->regs_intr.regs) + data->dyn_size +=3D hweight64(event->attr.sample_regs_intr) * sizeof(u6= 4); + data->sample_flags |=3D PERF_SAMPLE_REGS_INTR; + } +} + int x86_pmu_handle_irq(struct pt_regs *regs) { struct perf_sample_data data; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index c0b7ac1c7594..e67d8a03ddfe 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2126,8 +2126,10 @@ static void setup_pebs_adaptive_sample_data(struct p= erf_event *event, regs->flags &=3D ~PERF_EFLAGS_EXACT; } =20 - if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) + if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) { adaptive_pebs_save_regs(regs, gprs); + x86_pmu_setup_regs_data(event, data, regs); + } } =20 if (format_group & PEBS_DATACFG_MEMINFO) { diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 2b969386dcdd..12682a059608 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1278,6 +1278,10 @@ void x86_pmu_enable_event(struct perf_event *event); =20 int x86_pmu_handle_irq(struct pt_regs *regs); =20 +void x86_pmu_setup_regs_data(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs); + void x86_pmu_show_pmu_cap(struct pmu *pmu); =20 static inline int x86_pmu_num_counters(struct pmu *pmu) --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2BB622F0057 for ; Thu, 26 Jun 2025 19:56:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967821; cv=none; b=PqIeTNRQfJ5fnpc1jZWHS48zPsXqo7QU9fzqFE8aANS5FDVU/DDaxIOMKaXxJT1OM/d4iKM5RMQ491hNtwDvm3UE/TbHbcsZWpFdggYOvhDCUU1NyIghuvYlh3e9hRWal/Vvql1PzZnIgDFzkbRbR5J1zK/aGBGnDoZuiIBiozs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967821; c=relaxed/simple; bh=LCO5Fdp7rj+IB5i/Rg82/lrS2S+GA3QhWc5ciizjtWI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=oNrDdaiGNzjw8bTY54re/MfpFFoEHBCbbqWJqakr5rr8JlYVPgC9btUtWKKpQdQdwmfP69MVnadKRjd3S0a5kACm/3PIWpIw9xxVoITnjzX/wPL+DxvptaE56WuWShdc9pUimo9FmXZUAUX8Xsc/X+1sAMONpCKxsdODNoyhrM4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=MOGFDb4B; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MOGFDb4B" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967820; x=1782503820; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LCO5Fdp7rj+IB5i/Rg82/lrS2S+GA3QhWc5ciizjtWI=; b=MOGFDb4BGDJS5kolyz2M23W/IzRLpX1jXOxsrMvKt9mAr0dzNAuFo+hb Cqbbziidghp1Md8RxFeHGvIuswZ5UNuIArTf2eHcbfBBv8pGZmRwW/VhW QMVkCjxFWAGpismmH/lIoGrwkI9PERkx4FQ5Rr/E6mhCwC5OeJ7eKLdiU M7IBLWtc2pDg2LFtPSC7i2P4Txqd2g+o20APCSBJgAWHEeCz+zscWqNSe Zhtt6ePq0y5BM4OreNDJFpwSpjk4u+puo9VS3mcUCNcQYv2XrwNeA570R 7m4lw1sCDxXfvESvJjoMpEuUDmcEHQKwjxjvVPjlLA5BwO0go5A8+uu86 w==; X-CSE-ConnectionGUID: 3Qg5xULcQ02myxRouRkorA== X-CSE-MsgGUID: ujd5qns6TvC7l936QfnteQ== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002140" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002140" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:56:58 -0700 X-CSE-ConnectionGUID: eiL4cqtuRt2o8UTh4lIZsQ== X-CSE-MsgGUID: 8tJBfgVeQ+6vyygsG6IZXg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902904" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:56:58 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 03/13] x86/fpu/xstate: Add xsaves_nmi Date: Thu, 26 Jun 2025 12:56:00 -0700 Message-Id: <20250626195610.405379-4-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang There is a hardware feature (Intel PEBS XMMs group), which can handle XSAVE "snapshots" from random code running. This just provides another XSAVE data source at a random time. Add an interface to retrieve the actual register contents when the NMI hit. The interface is different from the other interfaces of FPU. The other mechanisms that deal with xstate try to get something coherent. But this interface is *in*coherent. There's no telling what was in the registers when a NMI hits. It writes whatever was in the registers when the NMI hit. It's the invoker's responsibility to make sure the contents are properly filtered before exposing them to the end user. The support of the supervisor state components is required. The compacted storage format is preferred. So the XSAVES is used. Suggested-by: Dave Hansen Signed-off-by: Kan Liang --- arch/x86/include/asm/fpu/xstate.h | 1 + arch/x86/kernel/fpu/xstate.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/x= state.h index b308a76afbb7..0c8b9251c29f 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -107,6 +107,7 @@ int xfeature_size(int xfeature_nr); =20 void xsaves(struct xregs_state *xsave, u64 mask); void xrstors(struct xregs_state *xsave, u64 mask); +void xsaves_nmi(struct xregs_state *xsave, u64 mask); =20 int xfd_enable_feature(u64 xfd_err); =20 diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 9aa9ac8399ae..8602683fcb12 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1448,6 +1448,36 @@ void xrstors(struct xregs_state *xstate, u64 mask) WARN_ON_ONCE(err); } =20 +/** + * xsaves_nmi - Save selected components to a kernel xstate buffer in NMI + * @xstate: Pointer to the buffer + * @mask: Feature mask to select the components to save + * + * The @xstate buffer must be 64 byte aligned. + * + * Caution: The interface is different from the other interfaces of FPU. + * The other mechanisms that deal with xstate try to get something coheren= t. + * But this interface is *in*coherent. There's no telling what was in the + * registers when a NMI hits. It writes whatever was in the registers when + * the NMI hit. + * The only user for the interface is perf_event. There is already a + * hardware feature (See Intel PEBS XMMs group), which can handle XSAVE + * "snapshots" from random code running. This just provides another XSAVE + * data source at a random time. + * This function can only be invoked in an NMI. It returns the *ACTUAL* + * register contents when the NMI hit. + */ +void xsaves_nmi(struct xregs_state *xstate, u64 mask) +{ + int err; + + if (!in_nmi()) + return; + + XSTATE_OP(XSAVES, xstate, (u32)mask, (u32)(mask >> 32), err); + WARN_ON_ONCE(err); +} + #if IS_ENABLED(CONFIG_KVM) void fpstate_clear_xstate_component(struct fpstate *fpstate, unsigned int = xfeature) { --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A26B82F0C7D for ; Thu, 26 Jun 2025 19:57:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967822; cv=none; b=UNJyEynRIulZB63Xep/2gShe0TO2TbDOMMzbVi2GvaPgL7i8B1tmhI89ShtUk85rKAZBHE0OHJIw9m3k+qMahTCG2r54/r9t+BsZUeDwuzjLeq9Zpw+2DVMxzWiEW+PZs7dZBYeHTTE8n8nJqpCSc0rZGdEGVhYdUcEjyGiIf6Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967822; c=relaxed/simple; bh=QJCPyGq/8+1SlLUHgMpK6zoVk7NJb9wK746eGcBYWM8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nkMU14gq5K6Faw3CBqnPuGedOVA19BFT3PYV8JF1FqOys6AOGlGKpRWXT2UjZaNG0/r5IccZyLDsfg8t+3kM73oVcJbi/cN3vsWt88JMSZ7ZsEwCZRmVzZ/T0hCe2+Uica4X169ahnvjFCJJZe+hbGsqUT/7HwxNPhayWzP83Lw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CkfUjg2z; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CkfUjg2z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967820; x=1782503820; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QJCPyGq/8+1SlLUHgMpK6zoVk7NJb9wK746eGcBYWM8=; b=CkfUjg2zhY1Da+1kzWqWcEw6wnW8TUMHh/xUuubonYNnCfR4M5HaOnfd PgyZ0RKv9gW4ZwDUtysI6N2Qtddy8JoZz99bTUSnUs9bWgbd6GHdaNEek j3RIc/kW8xMYX82ws0Pu5JrW3kCGHWf/KOt1jecHmsIbNKtN8YxYXLDJ9 ywp3y42Pu5hJlGPvSG5ojFSan8DYyr7b8F8gj72KK/G5yAPrQhvpw49dh LLhPcnh1U20xJvIsquf0kHSsLrphIVUUV0VRb44O5He1AB7W1I8RBI74B 8lNzwFghRFpES8uITzlRqcYUkMxpSeetKwss2MJfP/H0C8gvhltBm1M8i A==; X-CSE-ConnectionGUID: jTlh2OTAS8OVMCss3Cd9rA== X-CSE-MsgGUID: XNVM6b8+QbafY0vnjQfL6A== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002148" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002148" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:56:59 -0700 X-CSE-ConnectionGUID: QvoswWgQSoSHYWDuV4y1og== X-CSE-MsgGUID: 9AKpfC8mT/+lzPGggIBr5g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902907" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:56:58 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 04/13] perf: Move has_extended_regs() to header file Date: Thu, 26 Jun 2025 12:56:01 -0700 Message-Id: <20250626195610.405379-5-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The function will also be used in the ARCH-specific code. Rename it to follow the naming rule of the existing functions. No functional change. Signed-off-by: Kan Liang --- include/linux/perf_event.h | 8 ++++++++ kernel/events/core.c | 8 +------- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 52dc7cfab0e0..74c188a699e4 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1488,6 +1488,14 @@ perf_event__output_id_sample(struct perf_event *even= t, extern void perf_log_lost_samples(struct perf_event *event, u64 lost); =20 +static inline bool event_has_extended_regs(struct perf_event *event) +{ + struct perf_event_attr *attr =3D &event->attr; + + return (attr->sample_regs_user & PERF_REG_EXTENDED_MASK) || + (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK); +} + static inline bool event_has_any_exclude_flag(struct perf_event *event) { struct perf_event_attr *attr =3D &event->attr; diff --git a/kernel/events/core.c b/kernel/events/core.c index cc77f127e11a..7f0d98d73629 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -12502,12 +12502,6 @@ int perf_pmu_unregister(struct pmu *pmu) } EXPORT_SYMBOL_GPL(perf_pmu_unregister); =20 -static inline bool has_extended_regs(struct perf_event *event) -{ - return (event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK) || - (event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK); -} - static int perf_try_init_event(struct pmu *pmu, struct perf_event *event) { struct perf_event_context *ctx =3D NULL; @@ -12542,7 +12536,7 @@ static int perf_try_init_event(struct pmu *pmu, str= uct perf_event *event) goto err_pmu; =20 if (!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS) && - has_extended_regs(event)) { + event_has_extended_regs(event)) { ret =3D -EOPNOTSUPP; goto err_destroy; } --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 175442F0E2E for ; Thu, 26 Jun 2025 19:57:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967822; cv=none; b=CmvI0v9q2EUFpF3jS79/iAUvSLkpc3E/iEcmU36d+KJFRBqs+QyCSCIVQRMbvhkFkWjppZ/MhCo3I0yZGuc3FV2klcjtOUKl1hE0jzbz85DngWS5B4be1fFKJ7d+gOvGjUCQ+L28HMEqAAWAdZ18ClLGVt2esImddf5fmOXg1pw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967822; c=relaxed/simple; bh=wZgEjj3If4fylfgepZcFZAdQUGPrMwiwihqqfNK2M7c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=g0kV92UcEDXynXsMHBvReG+CYROod4uXn6CbIXwcCOzZFT1StAuj+DBSYMCqfFw85Hq5/gecFwiWJt3g6rHO4h0J9rhRof+iODYtTK4niGZ0oA6sxosOce42ZhUgcru1rF6iUQSdRmLngd5BY/iTm6IvnKwXTltmHy5THOaIgJg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=EeihQiOz; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EeihQiOz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967821; x=1782503821; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wZgEjj3If4fylfgepZcFZAdQUGPrMwiwihqqfNK2M7c=; b=EeihQiOzgbozF09JIpQVQqKloBDgnu3COP7hYBYzlTcV4PaMRmWGD55Q oZBsRpY/T3I46L6V5r8OBZbe6LX+RYw+LhfXDmXqEepAaKeLOpN6bM74N zg5hmVSOdAylx9FEWM5KnSwxmhWw3xZh2mIV1pk3vrxiuzTz61hGWyVXV l9wYxtKl9eigUkdwBudVry1lseLuQa8nQszUc6TjnhHZeacbbaqydNvj+ YQwY7ms3kRkvNnXb3WNKCYSzG+iOyXzUSqqrSEntEahgjfNv9s1HyOARw HRUQQJinOC5jkmMl7XvFdaikNjr0ZX/+H/PE0rwNYnQerWZsZPG3sB0X9 Q==; X-CSE-ConnectionGUID: I0XYL2XCQc2RFZX0l9P2og== X-CSE-MsgGUID: TbYLbu44T6+T6jDlFnrD5Q== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002156" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002156" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:56:59 -0700 X-CSE-ConnectionGUID: agdH+hyyRO6+YaotbPOAVA== X-CSE-MsgGUID: cc5W/s+YTHGyFgGBZsEX7A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902910" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:56:59 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 05/13] perf/x86: Support XMM register for non-PEBS and REGS_USER Date: Thu, 26 Jun 2025 12:56:02 -0700 Message-Id: <20250626195610.405379-6-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Collecting the XMM registers in a PEBS record has been supported since the Icelake. But non-PEBS events don't support the feature. It's possible to retrieve the XMM registers from the XSAVE for non-PEBS. Add it to make the feature complete. To utilize the XSAVE, a 64-byte aligned buffer is required. Add a per-CPU ext_regs_buf to store the vector registers. The size of the buffer is ~2K. kzalloc_node() is used because there's a _guarantee_ that all kmalloc()'s with powers of 2 are naturally aligned and also 64b aligned. Extend the support for both REGS_USER and REGS_INTR. For REGS_USER, the perf_get_regs_user() returns the regs from the task_pt_regs(current), which is struct pt_regs. Need to move it to local struct x86_perf_regs x86_user_regs. For PEBS, the HW support is still preferred. The XMM should be retrieved from PEBS records. There could be more vector registers supported later. Add ext_regs_mask to track the supported vector register group. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 128 +++++++++++++++++++++++++----- arch/x86/events/intel/core.c | 27 +++++++ arch/x86/events/intel/ds.c | 10 ++- arch/x86/events/perf_event.h | 12 ++- arch/x86/include/asm/fpu/xstate.h | 2 + arch/x86/include/asm/perf_event.h | 5 +- arch/x86/kernel/fpu/xstate.c | 2 +- 7 files changed, 161 insertions(+), 25 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index c601ad761534..899bd5680f6b 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -406,6 +406,62 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf= _event *event) return x86_pmu_extra_regs(val, event); } =20 +static DEFINE_PER_CPU(struct xregs_state *, ext_regs_buf); + +static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask) +{ + struct xregs_state *xsave =3D per_cpu(ext_regs_buf, smp_processor_id()); + + if (WARN_ON_ONCE(!xsave)) + return; + + xsaves_nmi(xsave, mask); + + if (mask & XFEATURE_MASK_SSE && + xsave->header.xfeatures & BIT_ULL(XFEATURE_SSE)) + perf_regs->xmm_space =3D xsave->i387.xmm_space; +} + +static void release_ext_regs_buffers(void) +{ + int cpu; + + if (!x86_pmu.ext_regs_mask) + return; + + for_each_possible_cpu(cpu) { + kfree(per_cpu(ext_regs_buf, cpu)); + per_cpu(ext_regs_buf, cpu) =3D NULL; + } +} + +static void reserve_ext_regs_buffers(void) +{ + unsigned int size; + u64 mask =3D 0; + int cpu; + + if (!x86_pmu.ext_regs_mask) + return; + + if (x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM) + mask |=3D XFEATURE_MASK_SSE; + + size =3D xstate_calculate_size(mask, true); + + for_each_possible_cpu(cpu) { + per_cpu(ext_regs_buf, cpu) =3D kzalloc_node(size, GFP_KERNEL, + cpu_to_node(cpu)); + if (!per_cpu(ext_regs_buf, cpu)) + goto err; + } + + return; + +err: + release_ext_regs_buffers(); +} + int x86_reserve_hardware(void) { int err =3D 0; @@ -418,6 +474,7 @@ int x86_reserve_hardware(void) } else { reserve_ds_buffers(); reserve_lbr_buffers(); + reserve_ext_regs_buffers(); } } if (!err) @@ -434,6 +491,7 @@ void x86_release_hardware(void) release_pmc_hardware(); release_ds_buffers(); release_lbr_buffers(); + release_ext_regs_buffers(); mutex_unlock(&pmc_reserve_mutex); } } @@ -642,21 +700,18 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; } =20 - /* sample_regs_user never support XMM registers */ - if (unlikely(event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK)) - return -EINVAL; - /* - * Besides the general purpose registers, XMM registers may - * be collected in PEBS on some platforms, e.g. Icelake - */ - if (unlikely(event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK)) { - if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) - return -EINVAL; - - if (!event->attr.precise_ip) - return -EINVAL; + if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_U= SER)) { + /* + * Besides the general purpose registers, XMM registers may + * be collected as well. + */ + if (event_has_extended_regs(event)) { + if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) + return -EINVAL; + if (!(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM)) + return -EINVAL; + } } - return x86_setup_perfctr(event); } =20 @@ -1685,25 +1740,51 @@ static void x86_pmu_del(struct perf_event *event, i= nt flags) static_call_cond(x86_pmu_del)(event); } =20 +static DEFINE_PER_CPU(struct x86_perf_regs, x86_user_regs); + +static struct x86_perf_regs * +x86_pmu_perf_get_regs_user(struct perf_sample_data *data, + struct pt_regs *regs) +{ + struct x86_perf_regs *x86_regs_user =3D this_cpu_ptr(&x86_user_regs); + struct perf_regs regs_user; + + perf_get_regs_user(®s_user, regs); + data->regs_user.abi =3D regs_user.abi; + if (regs_user.regs) { + x86_regs_user->regs =3D *regs_user.regs; + data->regs_user.regs =3D &x86_regs_user->regs; + } else + data->regs_user.regs =3D NULL; + return x86_regs_user; +} + void x86_pmu_setup_regs_data(struct perf_event *event, struct perf_sample_data *data, - struct pt_regs *regs) + struct pt_regs *regs, + u64 ignore_mask) { - u64 sample_type =3D event->attr.sample_type; + struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); + struct perf_event_attr *attr =3D &event->attr; + u64 sample_type =3D attr->sample_type; + u64 mask =3D 0; + + if (!(attr->sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)= )) + return; =20 if (sample_type & PERF_SAMPLE_REGS_USER) { if (user_mode(regs)) { data->regs_user.abi =3D perf_reg_abi(current); data->regs_user.regs =3D regs; } else if (!(current->flags & PF_KTHREAD)) { - perf_get_regs_user(&data->regs_user, regs); + perf_regs =3D x86_pmu_perf_get_regs_user(data, regs); } else { data->regs_user.abi =3D PERF_SAMPLE_REGS_ABI_NONE; data->regs_user.regs =3D NULL; } data->dyn_size +=3D sizeof(u64); if (data->regs_user.regs) - data->dyn_size +=3D hweight64(event->attr.sample_regs_user) * sizeof(u6= 4); + data->dyn_size +=3D hweight64(attr->sample_regs_user) * sizeof(u64); data->sample_flags |=3D PERF_SAMPLE_REGS_USER; } =20 @@ -1712,9 +1793,18 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, data->regs_intr.abi =3D perf_reg_abi(current); data->dyn_size +=3D sizeof(u64); if (data->regs_intr.regs) - data->dyn_size +=3D hweight64(event->attr.sample_regs_intr) * sizeof(u6= 4); + data->dyn_size +=3D hweight64(attr->sample_regs_intr) * sizeof(u64); data->sample_flags |=3D PERF_SAMPLE_REGS_INTR; } + + if (event_has_extended_regs(event)) { + perf_regs->xmm_regs =3D NULL; + mask |=3D XFEATURE_MASK_SSE; + } + + mask &=3D ~ignore_mask; + if (mask) + x86_pmu_get_ext_regs(perf_regs, mask); } =20 int x86_pmu_handle_irq(struct pt_regs *regs) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index edebc8dfbc96..c73c2e57d71b 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3285,6 +3285,8 @@ static int handle_pmi_common(struct pt_regs *regs, u6= 4 status) if (has_branch_stack(event)) intel_pmu_lbr_save_brstack(&data, cpuc, event); =20 + x86_pmu_setup_regs_data(event, &data, regs, 0); + perf_event_overflow(event, &data, regs); } =20 @@ -5273,6 +5275,29 @@ static inline bool intel_pmu_broken_perf_cap(void) return false; } =20 +static void intel_extended_regs_init(struct pmu *pmu) +{ + /* + * Extend the vector registers support to non-PEBS. + * The feature is limited to newer Intel machines with + * PEBS V4+ or archPerfmonExt (0x23) enabled for now. + * In theory, the vector registers can be retrieved as + * long as the CPU supports. The support for the old + * generations may be added later if there is a + * requirement. + * Only support the extension when XSAVES is available. + */ + if (!boot_cpu_has(X86_FEATURE_XSAVES)) + return; + + if (!boot_cpu_has(X86_FEATURE_XMM) || + !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL)) + return; + + x86_pmu.ext_regs_mask |=3D X86_EXT_REGS_XMM; + x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_EXTENDED_= REGS; +} + static void update_pmu_cap(struct pmu *pmu) { unsigned int cntr, fixed_cntr, ecx, edx; @@ -5307,6 +5332,8 @@ static void update_pmu_cap(struct pmu *pmu) /* Perf Metric (Bit 15) and PEBS via PT (Bit 16) are hybrid enumeration = */ rdmsrq(MSR_IA32_PERF_CAPABILITIES, hybrid(pmu, intel_cap).capabilities); } + + intel_extended_regs_init(pmu); } =20 static void intel_pmu_check_hybrid_pmus(struct x86_hybrid_pmu *pmu) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index e67d8a03ddfe..8437730abfb7 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1415,8 +1415,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event= *event) if (gprs || (attr->precise_ip < 2) || tsx_weight) pebs_data_cfg |=3D PEBS_DATACFG_GP; =20 - if ((sample_type & PERF_SAMPLE_REGS_INTR) && - (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK)) + if (event_has_extended_regs(event)) pebs_data_cfg |=3D PEBS_DATACFG_XMMS; =20 if (sample_type & PERF_SAMPLE_BRANCH_STACK) { @@ -2127,8 +2126,12 @@ static void setup_pebs_adaptive_sample_data(struct p= erf_event *event, } =20 if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) { + u64 mask =3D 0; + adaptive_pebs_save_regs(regs, gprs); - x86_pmu_setup_regs_data(event, data, regs); + if (format_group & PEBS_DATACFG_XMMS) + mask |=3D XFEATURE_MASK_SSE; + x86_pmu_setup_regs_data(event, data, regs, mask); } } =20 @@ -2755,6 +2758,7 @@ void __init intel_pebs_init(void) x86_pmu.flags |=3D PMU_FL_PEBS_ALL; x86_pmu.pebs_capable =3D ~0ULL; pebs_qual =3D "-baseline"; + x86_pmu.ext_regs_mask |=3D X86_EXT_REGS_XMM; x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_EXTEND= ED_REGS; } else { /* Only basic record supported */ diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 12682a059608..37ed46cafa53 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -687,6 +687,10 @@ enum { x86_lbr_exclusive_max, }; =20 +enum { + X86_EXT_REGS_XMM =3D BIT_ULL(0), +}; + #define PERF_PEBS_DATA_SOURCE_MAX 0x100 #define PERF_PEBS_DATA_SOURCE_MASK (PERF_PEBS_DATA_SOURCE_MAX - 1) #define PERF_PEBS_DATA_SOURCE_GRT_MAX 0x10 @@ -992,6 +996,11 @@ struct x86_pmu { struct extra_reg *extra_regs; unsigned int flags; =20 + /* + * Extended regs, e.g., vector registers + */ + u64 ext_regs_mask; + /* * Intel host/guest support (KVM) */ @@ -1280,7 +1289,8 @@ int x86_pmu_handle_irq(struct pt_regs *regs); =20 void x86_pmu_setup_regs_data(struct perf_event *event, struct perf_sample_data *data, - struct pt_regs *regs); + struct pt_regs *regs, + u64 ignore_mask); =20 void x86_pmu_show_pmu_cap(struct pmu *pmu); =20 diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/x= state.h index 0c8b9251c29f..58bbdf9226d1 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -109,6 +109,8 @@ void xsaves(struct xregs_state *xsave, u64 mask); void xrstors(struct xregs_state *xsave, u64 mask); void xsaves_nmi(struct xregs_state *xsave, u64 mask); =20 +unsigned int xstate_calculate_size(u64 xfeatures, bool compacted); + int xfd_enable_feature(u64 xfd_err); =20 #ifdef CONFIG_X86_64 diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 70d1d94aca7e..f36f04bc95f1 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -592,7 +592,10 @@ extern void perf_events_lapic_init(void); struct pt_regs; struct x86_perf_regs { struct pt_regs regs; - u64 *xmm_regs; + union { + u64 *xmm_regs; + u32 *xmm_space; /* for xsaves */ + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 8602683fcb12..4747b29608cd 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -583,7 +583,7 @@ static bool __init check_xstate_against_struct(int nr) return true; } =20 -static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted) +unsigned int xstate_calculate_size(u64 xfeatures, bool compacted) { unsigned int topmost =3D fls64(xfeatures) - 1; unsigned int offset, i; --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA9F92F0E57 for ; Thu, 26 Jun 2025 19:57:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967823; cv=none; b=FJLFzfjDEXickQtwDiPSk8WeXlivllVlYVb8JrHS+12GDtv/b+Q60PrGvNoPDWztZOTCvY04tPMgZ2RtFKzUJF5sCYp2SaAJuMEVeLhuoveCBnVpwBxzJ+BzaHyqVu3YcqY7PP1rd07KqHyTtBMa7kCnnaqcuHYHpZI9+oUbvGY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967823; c=relaxed/simple; bh=1qGhHvhK/NugQdUvFGT3y430McBRET3ozZ1kbREKaAI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=F1DiGmEPOmMeYREu7uSg2il6+ZhIkJQ9/uC3PapT1uA6IgqqBMTt8le3c4BziD900BOaFKsapvX5l3ZKelXRDrUAZeT/LFyXh0nnIsyLmSHPK/GP3780gmczGKFkUV8saN7JY0ImrVetCEFedaWwFBJrQfzwEui9qZulcKamwpw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=bjyem9IV; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bjyem9IV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967821; x=1782503821; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1qGhHvhK/NugQdUvFGT3y430McBRET3ozZ1kbREKaAI=; b=bjyem9IVlmYSj+tecpcDir9lo+paQlWEcH1uUZOv5tGZ++JWpMIxm5QZ K5UFx+M+w+JVg9URtLPefj0I1sWjDlLN9VtbtXHKOWqg8md6YSmVaWdKI QkZAtRcvKwzCNt+4rcrBTzPuIBCQ4Kt9g0cFFpvbIUad6EOaId97P6o2C qgPws51I9Eh4w3XGspwbZD+GN5DqYTClEOPByetwosnTAiGAe/yd0uooQ LEfYQ7ZTgntCJBT8M7WxN0aS1fJXfbTiWSCIXyXGhXQBl85BiSvr30VnN BworEjFU9jrD+NlpdUsRd668wbrVvb8o60X5O1XdIAJVgpPFGt9x1dNHd Q==; X-CSE-ConnectionGUID: R1Z+eOchQpy2hwVb59EFoQ== X-CSE-MsgGUID: hnVkmlUTSWalX7aat2KBcQ== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002164" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002164" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:57:00 -0700 X-CSE-ConnectionGUID: 79lGNJ13ToClEofa/FA5GQ== X-CSE-MsgGUID: vkc8SucWRim4NF8J1X5EYg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902913" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:56:59 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 06/13] perf: Support SIMD registers Date: Thu, 26 Jun 2025 12:56:03 -0700 Message-Id: <20250626195610.405379-7-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The users may be interested in the SIMD registers in a sample while profiling. The current sample_regs_XXX doesn't have enough space for all SIMD registers. Add sets of the sample_simd_{pred,vec}_reg_* in the struct perf_event_attr to define a set of SIMD registers to dump on samples. The current X86 supports the XMM registers in sample_regs_XXX. To utilize the new SIMD registers configuration method, the sample_simd_regs_enabled should always be set. If so, the XMM space in the sample_regs_XXX is reserved for other usage. The SIMD registers are wider than 64. A new output format is introduced. The number and width of SIMD registers will be dumped first, following the register values. The number and width are the same as the user's configuration now. If, for some reason (e.g., ARM) they are different, an ARCH-specific perf_output_sample_simd_regs can be implemented later separately. Add a new ABI, PERF_SAMPLE_REGS_ABI_SIMD, to indicate the new format. The enum perf_sample_regs_abi becomes a bitmap now. There should be no impact on the existing tool, since the version and bitmap are the same for 1 and 2. Add two new __weak functions to validate the configuration of the SIMD registers and retrieve the SIMD registers. The ARCH-specific functions will be implemented in the following patches. Add a new flag PERF_PMU_CAP_SIMD_REGS to indicate that the PMU has the capability to support SIMD registers dumping. Error out if the sample_simd_{pred,vec}_reg_* mistakenly set for a PMU that doesn't have the capability. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Kan Liang --- include/linux/perf_event.h | 13 +++++ include/linux/perf_regs.h | 5 ++ include/uapi/linux/perf_event.h | 47 +++++++++++++++-- kernel/events/core.c | 89 +++++++++++++++++++++++++++++++-- 4 files changed, 146 insertions(+), 8 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 74c188a699e4..56bcb073100f 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -305,6 +305,7 @@ struct perf_event_pmu_context; #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 #define PERF_PMU_CAP_AUX_PAUSE 0x0200 #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400 +#define PERF_PMU_CAP_SIMD_REGS 0x0800 =20 /** * pmu::scope @@ -1488,6 +1489,18 @@ perf_event__output_id_sample(struct perf_event *even= t, extern void perf_log_lost_samples(struct perf_event *event, u64 lost); =20 +static inline bool event_has_simd_regs(struct perf_event *event) +{ + struct perf_event_attr *attr =3D &event->attr; + + return attr->sample_simd_regs_enabled !=3D 0 || + attr->sample_simd_pred_reg_intr !=3D 0 || + attr->sample_simd_pred_reg_user !=3D 0 || + attr->sample_simd_vec_reg_qwords !=3D 0 || + attr->sample_simd_vec_reg_intr !=3D 0 || + attr->sample_simd_vec_reg_user !=3D 0; +} + static inline bool event_has_extended_regs(struct perf_event *event) { struct perf_event_attr *attr =3D &event->attr; diff --git a/include/linux/perf_regs.h b/include/linux/perf_regs.h index f632c5725f16..38d11f152753 100644 --- a/include/linux/perf_regs.h +++ b/include/linux/perf_regs.h @@ -9,6 +9,11 @@ struct perf_regs { struct pt_regs *regs; }; =20 +int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, + u16 pred_qwords, u32 pred_mask); +u64 perf_simd_reg_value(struct pt_regs *regs, int idx, + u16 qwords_idx, bool pred); + #ifdef CONFIG_HAVE_PERF_REGS #include =20 diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_even= t.h index 78a362b80027..2e9b16acbed6 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -313,9 +313,10 @@ enum { * Values to determine ABI of the registers dump. */ enum perf_sample_regs_abi { - PERF_SAMPLE_REGS_ABI_NONE =3D 0, - PERF_SAMPLE_REGS_ABI_32 =3D 1, - PERF_SAMPLE_REGS_ABI_64 =3D 2, + PERF_SAMPLE_REGS_ABI_NONE =3D 0x00, + PERF_SAMPLE_REGS_ABI_32 =3D 0x01, + PERF_SAMPLE_REGS_ABI_64 =3D 0x02, + PERF_SAMPLE_REGS_ABI_SIMD =3D 0x04, }; =20 /* @@ -382,6 +383,7 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER6 120 /* Add: aux_sample_size */ #define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */ +#define PERF_ATTR_SIZE_VER9 168 /* Add: sample_simd_{pred,vec}_reg_* */ =20 /* * 'struct perf_event_attr' contains various attributes that define @@ -543,6 +545,25 @@ struct perf_event_attr { __u64 sig_data; =20 __u64 config3; /* extension of config2 */ + + + /* + * Defines set of SIMD registers to dump on samples. + * The sample_simd_regs_enabled !=3D0 implies the + * set of SIMD registers is used to config all SIMD registers. + * If !sample_simd_regs_enabled, sample_regs_XXX may be used to + * config some SIMD registers on X86. + */ + union { + __u16 sample_simd_regs_enabled; + __u16 sample_simd_pred_reg_qwords; + }; + __u32 sample_simd_pred_reg_intr; + __u32 sample_simd_pred_reg_user; + __u16 sample_simd_vec_reg_qwords; + __u64 sample_simd_vec_reg_intr; + __u64 sample_simd_vec_reg_user; + __u32 __reserved_4; }; =20 /* @@ -1016,7 +1037,15 @@ enum perf_event_type { * } && PERF_SAMPLE_BRANCH_STACK * * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; + * u16 vector_qwords; + * u16 nr_pred; + * u16 pred_qwords; + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_USER * * { u64 size; * char data[size]; @@ -1043,7 +1072,15 @@ enum perf_event_type { * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; + * u16 vector_qwords; + * u16 nr_pred; + * u16 pred_qwords; + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_INTR * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR * { u64 cgroup;} && PERF_SAMPLE_CGROUP * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE diff --git a/kernel/events/core.c b/kernel/events/core.c index 7f0d98d73629..14ae43694833 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7390,6 +7390,43 @@ perf_output_sample_regs(struct perf_output_handle *h= andle, } } =20 +static void +perf_output_sample_simd_regs(struct perf_output_handle *handle, + struct perf_event *event, + struct pt_regs *regs, + u64 mask, u16 pred_mask) +{ + u16 pred_qwords =3D event->attr.sample_simd_pred_reg_qwords; + u16 vec_qwords =3D event->attr.sample_simd_vec_reg_qwords; + u16 nr_pred =3D hweight16(pred_mask); + u16 nr_vectors =3D hweight64(mask); + int bit; + u64 val; + u16 i; + + perf_output_put(handle, nr_vectors); + perf_output_put(handle, vec_qwords); + perf_output_put(handle, nr_pred); + perf_output_put(handle, pred_qwords); + + if (nr_vectors) { + for_each_set_bit(bit, (unsigned long *)&mask, sizeof(mask) * BITS_PER_BY= TE) { + for (i =3D 0; i < vec_qwords; i++) { + val =3D perf_simd_reg_value(regs, bit, i, false); + perf_output_put(handle, val); + } + } + } + if (nr_pred) { + for_each_set_bit(bit, (unsigned long *)&pred_mask, sizeof(pred_mask) * B= ITS_PER_BYTE) { + for (i =3D 0; i < pred_qwords; i++) { + val =3D perf_simd_reg_value(regs, bit, i, true); + perf_output_put(handle, val); + } + } + } +} + static void perf_sample_regs_user(struct perf_regs *regs_user, struct pt_regs *regs) { @@ -7411,6 +7448,17 @@ static void perf_sample_regs_intr(struct perf_regs *= regs_intr, regs_intr->abi =3D perf_reg_abi(current); } =20 +int __weak perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, + u16 pred_qwords, u32 pred_mask) +{ + return vec_qwords || vec_mask || pred_qwords || pred_mask ? -ENOSYS : 0; +} + +u64 __weak perf_simd_reg_value(struct pt_regs *regs, int idx, + u16 qwords_idx, bool pred) +{ + return 0; +} =20 /* * Get remaining task size from user stack pointer. @@ -7939,10 +7987,17 @@ void perf_output_sample(struct perf_output_handle *= handle, perf_output_put(handle, abi); =20 if (abi) { - u64 mask =3D event->attr.sample_regs_user; + struct perf_event_attr *attr =3D &event->attr; + u64 mask =3D attr->sample_regs_user; perf_output_sample_regs(handle, data->regs_user.regs, mask); + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) { + perf_output_sample_simd_regs(handle, event, + data->regs_user.regs, + attr->sample_simd_vec_reg_user, + attr->sample_simd_pred_reg_user); + } } } =20 @@ -7970,11 +8025,18 @@ void perf_output_sample(struct perf_output_handle *= handle, perf_output_put(handle, abi); =20 if (abi) { - u64 mask =3D event->attr.sample_regs_intr; + struct perf_event_attr *attr =3D &event->attr; + u64 mask =3D attr->sample_regs_intr; =20 perf_output_sample_regs(handle, data->regs_intr.regs, mask); + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) { + perf_output_sample_simd_regs(handle, event, + data->regs_intr.regs, + attr->sample_simd_vec_reg_intr, + attr->sample_simd_pred_reg_intr); + } } } =20 @@ -12535,6 +12597,12 @@ static int perf_try_init_event(struct pmu *pmu, st= ruct perf_event *event) if (ret) goto err_pmu; =20 + if (!(pmu->capabilities & PERF_PMU_CAP_SIMD_REGS) && + event_has_simd_regs(event)) { + ret =3D -EOPNOTSUPP; + goto err_destroy; + } + if (!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS) && event_has_extended_regs(event)) { ret =3D -EOPNOTSUPP; @@ -13076,6 +13144,12 @@ static int perf_copy_attr(struct perf_event_attr _= _user *uattr, ret =3D perf_reg_validate(attr->sample_regs_user); if (ret) return ret; + ret =3D perf_simd_reg_validate(attr->sample_simd_vec_reg_qwords, + attr->sample_simd_vec_reg_user, + attr->sample_simd_pred_reg_qwords, + attr->sample_simd_pred_reg_user); + if (ret) + return ret; } =20 if (attr->sample_type & PERF_SAMPLE_STACK_USER) { @@ -13096,8 +13170,17 @@ static int perf_copy_attr(struct perf_event_attr _= _user *uattr, if (!attr->sample_max_stack) attr->sample_max_stack =3D sysctl_perf_event_max_stack; =20 - if (attr->sample_type & PERF_SAMPLE_REGS_INTR) + if (attr->sample_type & PERF_SAMPLE_REGS_INTR) { ret =3D perf_reg_validate(attr->sample_regs_intr); + if (ret) + return ret; + ret =3D perf_simd_reg_validate(attr->sample_simd_vec_reg_qwords, + attr->sample_simd_vec_reg_intr, + attr->sample_simd_pred_reg_qwords, + attr->sample_simd_pred_reg_intr); + if (ret) + return ret; + } =20 #ifndef CONFIG_CGROUP_PERF if (attr->sample_type & PERF_SAMPLE_CGROUP) --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33D9B2F1997 for ; Thu, 26 Jun 2025 19:57:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967823; cv=none; b=Enf5ib0MeswAyjpWRI3YA/B/hxfou5wr6+hlLGt2AWw3dZvpixO8cfbzVQWyw8ebJnSUzaxlhu1HqBYBPsX6sSo2xVj9xjpvVrqyklhWZLBxTArOIjWH2qRKO4y1+2PhUJC/RfQGZpyIFIdpv8u7mvtfF6TvCyLsAMzAZkNVp7Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967823; c=relaxed/simple; bh=amQzdgQn1nBGZUehLLz6xzFzzEqkFIbXu0BcytcPHK8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jFQiVIrO+kRKbBWgoaFv/4KFUw6QN14KBXVAo6oh9UrSWDl2e5StlFOooEVGPOEznOMeBDHq70zCnAX1hqX8BZKLAF7bxcBugqw+EP3T826r80cLKMJoUpsUvWj/iiq2BIPHqQQBZcboaS1ux9xMOl8HApMTxzBvM7QTaix3qXo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nt67gnlJ; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nt67gnlJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967822; x=1782503822; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=amQzdgQn1nBGZUehLLz6xzFzzEqkFIbXu0BcytcPHK8=; b=nt67gnlJZqumtI9DoOPoKuhdPvMwNIyFZQJjlVqtk2OPkvwhQANCbPPf zw4Wl3ckhdCdTvwmasH/RqgPfECEaACAxbn7aVqcBZZFnIIDG3ZQgmOGd bKhALKMFBprsDjklFMLfr8Uj6oolRe57xnnz5vI7u4cs2LlXRbJ+RC0DY PG3W077eQ6T9lLA5JJyhKXlG2aTKaW+gnSYRPXz6XoK6pPv+myBACDWlO lEJZSyZYd1lTVQfynL9gmkD2+eMTnLUn3MTFDju6WaVmvel+QGmUzKouV kCfynkSn4QeYjQzDEhoBH2zlDAe/lfM/L5g5wPtmqWtFJdoKXlHfjOAkG w==; X-CSE-ConnectionGUID: V7BdgTTyQrWEaDC7zl6zfg== X-CSE-MsgGUID: D+O+2c2zRFGEebozAGxfqg== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002172" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002172" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:57:00 -0700 X-CSE-ConnectionGUID: C61qElHxQ6qRJzgShF9rKQ== X-CSE-MsgGUID: Q12Lm5HwQRiwIIKuH6maOw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902917" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:57:00 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 07/13] perf/x86: Move XMM to sample_simd_vec_regs Date: Thu, 26 Jun 2025 12:56:04 -0700 Message-Id: <20250626195610.405379-8-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The XMM0-15 are SIMD registers. Move them from sample_regs to sample_simd_vec_regs. Reject access to the extended space of the sample_regs if the new sample_simd_vec_regs is used. The perf_reg_value requires the abi to understand the layout of the sample_regs. Add the abi information in the struct x86_perf_regs. Implement the X86-specific perf_simd_reg_validate to validate the SIMD registers configuration from the user tool. Only the XMM0-15 is supported now. More registers will be added in the following patches. Implement the X86-specific perf_simd_reg_value to retrieve the XMM value. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 38 ++++++++++++++++++++- arch/x86/events/intel/ds.c | 2 +- arch/x86/events/perf_event.h | 12 +++++++ arch/x86/include/asm/perf_event.h | 1 + arch/x86/include/uapi/asm/perf_regs.h | 6 ++++ arch/x86/kernel/perf_regs.c | 49 ++++++++++++++++++++++++++- 6 files changed, 105 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 899bd5680f6b..2515179ac664 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -710,6 +710,22 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; if (!(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM)) return -EINVAL; + if (event->attr.sample_simd_regs_enabled) + return -EINVAL; + } + + if (event_has_simd_regs(event)) { + if (!(event->pmu->capabilities & PERF_PMU_CAP_SIMD_REGS)) + return -EINVAL; + /* Not require any vector registers but set width */ + if (event->attr.sample_simd_vec_reg_qwords && + !event->attr.sample_simd_vec_reg_intr && + !event->attr.sample_simd_vec_reg_user) + return -EINVAL; + /* The vector registers set is not supported */ + if (event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_XMM_QWORDS && + !(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM)) + return -EINVAL; } } return x86_setup_perfctr(event); @@ -1785,6 +1801,16 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, data->dyn_size +=3D sizeof(u64); if (data->regs_user.regs) data->dyn_size +=3D hweight64(attr->sample_regs_user) * sizeof(u64); + if (attr->sample_simd_regs_enabled && data->regs_user.abi) { + /* num and qwords of vector and pred registers */ + data->dyn_size +=3D sizeof(u64); + /* data[] */ + data->dyn_size +=3D hweight64(attr->sample_simd_vec_reg_user) * + sizeof(u64) * + attr->sample_simd_vec_reg_qwords; + data->regs_user.abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; + } + perf_regs->abi =3D data->regs_user.abi; data->sample_flags |=3D PERF_SAMPLE_REGS_USER; } =20 @@ -1794,10 +1820,20 @@ void x86_pmu_setup_regs_data(struct perf_event *eve= nt, data->dyn_size +=3D sizeof(u64); if (data->regs_intr.regs) data->dyn_size +=3D hweight64(attr->sample_regs_intr) * sizeof(u64); + if (attr->sample_simd_regs_enabled && data->regs_intr.abi) { + /* num and qwords of vector and pred registers */ + data->dyn_size +=3D sizeof(u64); + /* data[] */ + data->dyn_size +=3D hweight64(attr->sample_simd_vec_reg_intr) * + sizeof(u64) * + attr->sample_simd_vec_reg_qwords; + data->regs_intr.abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; + } + perf_regs->abi =3D data->regs_intr.abi; data->sample_flags |=3D PERF_SAMPLE_REGS_INTR; } =20 - if (event_has_extended_regs(event)) { + if (event_needs_xmm(event)) { perf_regs->xmm_regs =3D NULL; mask |=3D XFEATURE_MASK_SSE; } diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 8437730abfb7..849136bef336 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1415,7 +1415,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event= *event) if (gprs || (attr->precise_ip < 2) || tsx_weight) pebs_data_cfg |=3D PEBS_DATACFG_GP; =20 - if (event_has_extended_regs(event)) + if (event_needs_xmm(event)) pebs_data_cfg |=3D PEBS_DATACFG_XMMS; =20 if (sample_type & PERF_SAMPLE_BRANCH_STACK) { diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 37ed46cafa53..69964433a245 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -133,6 +133,18 @@ static inline bool is_acr_event_group(struct perf_even= t *event) return check_leader_group(event->group_leader, PERF_X86_EVENT_ACR); } =20 +static inline bool event_needs_xmm(struct perf_event *event) +{ + if (event->attr.sample_simd_regs_enabled && + event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_XMM_QWORDS) + return true; + + if (!event->attr.sample_simd_regs_enabled && + event_has_extended_regs(event)) + return true; + return false; +} + struct amd_nb { int nb_id; /* NorthBridge id */ int refcnt; /* reference count */ diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index f36f04bc95f1..538219c59979 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -592,6 +592,7 @@ extern void perf_events_lapic_init(void); struct pt_regs; struct x86_perf_regs { struct pt_regs regs; + u64 abi; union { u64 *xmm_regs; u32 *xmm_space; /* for xsaves */ diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index 7c9d2bb3833b..bd8af802f757 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -55,4 +55,10 @@ enum perf_event_x86_regs { =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) =20 +#define PERF_X86_SIMD_VEC_REGS_MAX 16 +#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) + +#define PERF_X86_XMM_QWORDS 2 +#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_XMM_QWORDS + #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 624703af80a1..638b9e186c50 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -63,6 +63,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) =20 if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { perf_regs =3D container_of(regs, struct x86_perf_regs, regs); + /* SIMD registers are moved to dedicated sample_simd_vec_reg */ + if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) + return 0; if (!perf_regs->xmm_regs) return 0; return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; @@ -74,6 +77,49 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) return regs_get_register(regs, pt_regs_offset[idx]); } =20 +u64 perf_simd_reg_value(struct pt_regs *regs, int idx, + u16 qwords_idx, bool pred) +{ + struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); + + if (pred) + return 0; + + if (WARN_ON_ONCE(idx >=3D PERF_X86_SIMD_VEC_REGS_MAX || + qwords_idx >=3D PERF_X86_SIMD_QWORDS_MAX)) + return 0; + + if (qwords_idx < PERF_X86_XMM_QWORDS) { + if (!perf_regs->xmm_regs) + return 0; + return perf_regs->xmm_regs[idx * PERF_X86_XMM_QWORDS + qwords_idx]; + } + + return 0; +} + +int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, + u16 pred_qwords, u32 pred_mask) +{ + /* pred_qwords implies sample_simd_{pred,vec}_reg_* are supported */ + if (!pred_qwords) + return 0; + + if (!vec_qwords) { + if (vec_mask) + return -EINVAL; + } else { + if (vec_qwords !=3D PERF_X86_XMM_QWORDS) + return -EINVAL; + if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) + return -EINVAL; + } + if (pred_mask) + return -EINVAL; + + return 0; +} + #define PERF_REG_X86_RESERVED (((1ULL << PERF_REG_X86_XMM0) - 1) & \ ~((1ULL << PERF_REG_X86_MAX) - 1)) =20 @@ -114,7 +160,8 @@ void perf_get_regs_user(struct perf_regs *regs_user, =20 int perf_reg_validate(u64 mask) { - if (!mask || (mask & (REG_NOSUPPORT | PERF_REG_X86_RESERVED))) + /* The mask could be 0 if only the SIMD registers are interested */ + if (mask & (REG_NOSUPPORT | PERF_REG_X86_RESERVED)) return -EINVAL; =20 return 0; --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07FB22F2360 for ; Thu, 26 Jun 2025 19:57:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967824; cv=none; b=FfQhxr2e08c4Rb9/Z6MatZYnGaP024Kg2CUe5E2gY7hj5dD4xc1SbbyPsxS985b+C2mCFcXgCt02RzM6bAZnb/6ph+vHVTD7dZCKr7fn0J1Eign2ch34YezQ9BwGAkApvuzvUj1iUo81rnx1wC0/Uvo9fzHoQfK4PUw9kMwh/co= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967824; c=relaxed/simple; bh=ZxzlL/+/sTE7rS8L9N1ph6EFfIM8Tc5SpvtuIBJHoLQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=G7zg91Sg7wNwhAGsQcb9/0u4QngGoguoq1jCxlTrIUOZmVLvE2qcM/Ph096alyDoJ0ijAjBqs3vI56mXy5CONi9VbWZ1sM2wifyw06CEwTeN3QpwviayCsEfWrlAS/hPjMFAe/qYfQAfbRKvYSZlRdiHHvwZIqSUfJgsaqs2N18= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=I6fTwYhz; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="I6fTwYhz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967823; x=1782503823; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZxzlL/+/sTE7rS8L9N1ph6EFfIM8Tc5SpvtuIBJHoLQ=; b=I6fTwYhzusFtoYPlR4iNfA638Z31aCDgq+FmKTYGeVTzEEyeQQfbTGTY xnl9UO5nAwkuUBvzVwEYYIJjyyS/bmZb+Rt0/KS1rIHWup26YRPfS15Rb LJ/P0fg+dkKkzgvhj74l/FoPW5o5oLAcl+9fY524vEcj8Mbkwko3Nl6Ye k/CXCJGBaIv9JIl9fdrJJ9SwVybTVV5Udg/ltOwh42rGCWd0fmrBNYjhj Wcq3l+bxSSLx/VBAUh8974315hojv2ZWfwetzHO2djtjJoH0z9omqvKUJ yW7MKy8jAo+gKYbo5uoG08PnYIICAUGqK3uH5M9Ke/FKyw0ThpKuLtY7+ A==; X-CSE-ConnectionGUID: G1uGWbi4QUKlF5Z+MPYV0w== X-CSE-MsgGUID: KTdLNBVhQLuuiWLQW204eA== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002180" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002180" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:57:01 -0700 X-CSE-ConnectionGUID: xmJi9pPzS4mmfryv91NG3w== X-CSE-MsgGUID: x9aerAo3SjyOM9vNpeLQOg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902920" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:57:00 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 08/13] perf/x86: Add YMM into sample_simd_vec_regs Date: Thu, 26 Jun 2025 12:56:05 -0700 Message-Id: <20250626195610.405379-9-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The YMM0-15 is composed of XMM and YMMH. It requires 2 XSAVE commands to get the complete value. Internally, the XMM and YMMH are stored in different structures, which follow the XSAVE format. But the output dumps the YMM as a whole. The qwords 4 imply YMM. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 15 +++++++++++++++ arch/x86/events/perf_event.h | 1 + arch/x86/include/asm/perf_event.h | 4 ++++ arch/x86/include/uapi/asm/perf_regs.h | 4 +++- arch/x86/kernel/perf_regs.c | 7 ++++++- 5 files changed, 29 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 2515179ac664..20c825e83a3f 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -420,6 +420,9 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) if (mask & XFEATURE_MASK_SSE && xsave->header.xfeatures & BIT_ULL(XFEATURE_SSE)) perf_regs->xmm_space =3D xsave->i387.xmm_space; + + if (mask & XFEATURE_MASK_YMM) + perf_regs->ymmh =3D get_xsave_addr(xsave, XFEATURE_YMM); } =20 static void release_ext_regs_buffers(void) @@ -446,6 +449,8 @@ static void reserve_ext_regs_buffers(void) =20 if (x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM) mask |=3D XFEATURE_MASK_SSE; + if (x86_pmu.ext_regs_mask & X86_EXT_REGS_YMM) + mask |=3D XFEATURE_MASK_YMM; =20 size =3D xstate_calculate_size(mask, true); =20 @@ -726,6 +731,9 @@ int x86_pmu_hw_config(struct perf_event *event) if (event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_XMM_QWORDS && !(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM)) return -EINVAL; + if (event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_YMM_QWORDS && + !(x86_pmu.ext_regs_mask & X86_EXT_REGS_YMM)) + return -EINVAL; } } return x86_setup_perfctr(event); @@ -1838,6 +1846,13 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, mask |=3D XFEATURE_MASK_SSE; } =20 + if (attr->sample_simd_regs_enabled) { + if (attr->sample_simd_vec_reg_qwords >=3D PERF_X86_YMM_QWORDS) { + perf_regs->ymmh_regs =3D NULL; + mask |=3D XFEATURE_MASK_YMM; + } + } + mask &=3D ~ignore_mask; if (mask) x86_pmu_get_ext_regs(perf_regs, mask); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 69964433a245..7d332d0247ed 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -701,6 +701,7 @@ enum { =20 enum { X86_EXT_REGS_XMM =3D BIT_ULL(0), + X86_EXT_REGS_YMM =3D BIT_ULL(1), }; =20 #define PERF_PEBS_DATA_SOURCE_MAX 0x100 diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 538219c59979..81e3143fd91a 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -597,6 +597,10 @@ struct x86_perf_regs { u64 *xmm_regs; u32 *xmm_space; /* for xsaves */ }; + union { + u64 *ymmh_regs; + struct ymmh_struct *ymmh; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index bd8af802f757..feb3e8f80761 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -59,6 +59,8 @@ enum perf_event_x86_regs { #define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) =20 #define PERF_X86_XMM_QWORDS 2 -#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_XMM_QWORDS +#define PERF_X86_YMM_QWORDS 4 +#define PERF_X86_YMMH_QWORDS (PERF_X86_YMM_QWORDS / 2) +#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_YMM_QWORDS =20 #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 638b9e186c50..37cf0a282915 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -93,6 +93,10 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, if (!perf_regs->xmm_regs) return 0; return perf_regs->xmm_regs[idx * PERF_X86_XMM_QWORDS + qwords_idx]; + } else if (qwords_idx < PERF_X86_YMM_QWORDS) { + if (!perf_regs->ymmh_regs) + return 0; + return perf_regs->ymmh_regs[idx * PERF_X86_YMMH_QWORDS + qwords_idx - PE= RF_X86_XMM_QWORDS]; } =20 return 0; @@ -109,7 +113,8 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, if (vec_mask) return -EINVAL; } else { - if (vec_qwords !=3D PERF_X86_XMM_QWORDS) + if (vec_qwords !=3D PERF_X86_XMM_QWORDS && + vec_qwords !=3D PERF_X86_YMM_QWORDS) return -EINVAL; if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) return -EINVAL; --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6F8C2ECE8E for ; Thu, 26 Jun 2025 19:57:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967825; cv=none; b=cwY8dgGz6U0VDJJkVQEczHKHtbAy0ntrdNLKKOOQUfv+07Jo05p7Vu9LAAoYZghNsVziyOo4SIU38aqNL20FyGcI3c5MSIyCECL43RekcpRhRizzwbzPXQhDesBeJmfn5WrlopQ2uBoRxfYDHhTRpv3H9XVWAjbjpT6SWsVRvu4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967825; c=relaxed/simple; bh=bEBghAayTWVTiz8GLTNBDauW0F84HxRjwynMyY9tdQE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hyovFGmdjKgXuVJi2jdi3iE10SDN6peX8zepFpJ9Vp7LIZ4IDKJK91pJdaY1jH8F8S99zFhMlOmAncBPyiuvKisrI5B0O18oRVG4hwXNBpBzt1rowwLJNDtwp7N413l5jcTvOFSYO5V7s1+LYyh4+Efq5PWQcpZ3UeCuY9W/DVQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=EYBdmJAD; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EYBdmJAD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967823; x=1782503823; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bEBghAayTWVTiz8GLTNBDauW0F84HxRjwynMyY9tdQE=; b=EYBdmJADH240zwZ7dcptHqzE6rdRMOWYF8q5/5OUqh1GCeSZtr8/c7j1 wBG6ruWo25Jn5X3gQtCn5EGsYNlTR4PtUyhXROR1kVh1AFSps7f+h91IG EMrLZ+goNdHglw0YW5xhOI1DyVWh+vdtFtt23fgf9G1f16XDc1DNyYydm cZZSSXPpVsiNv5YjcaJLzII/95Ey1XnLHbSr/GWrNcs87CPkKzAU/OzbL hY9TPee+uFOrqBY3yOdT5SO9IGu8NncDxO383z4dqP2cIO/NObuAqTckw kd9X6m//SP58/r1FFTtB3+WreKe0njWTiZtcTihCwiXCO/m02Wsy3ORx+ g==; X-CSE-ConnectionGUID: RH6v7JFsSJWfyeixBBFLzQ== X-CSE-MsgGUID: XEn/xBq8SVeYixDbFLNZgQ== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002188" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002188" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:57:01 -0700 X-CSE-ConnectionGUID: BVwXKmUvRtazBhGxnLTC5Q== X-CSE-MsgGUID: Wd1mSsTFR+eVsjfThy/6UA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902923" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:57:01 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 09/13] perf/x86: Add ZMM into sample_simd_vec_regs Date: Thu, 26 Jun 2025 12:56:06 -0700 Message-Id: <20250626195610.405379-10-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The ZMM0-15 is composed of XMM, YMMH, and ZMMH. It requires 3 XSAVE commands to get the complete value. The ZMM16-31/YMM16-31/XMM16-31 are also supported, which only require the XSAVE Hi16_ZMM. Internally, the XMM, YMMH, ZMMH and Hi16_ZMM are stored in different structures, which follow the XSAVE format. But the output dumps the ZMM or Hi16 XMM/YMM/ZMM as a whole. The qwords 8 imply ZMM. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 24 ++++++++++++++++++++++++ arch/x86/events/perf_event.h | 2 ++ arch/x86/include/asm/perf_event.h | 8 ++++++++ arch/x86/include/uapi/asm/perf_regs.h | 8 ++++++-- arch/x86/kernel/perf_regs.c | 13 ++++++++++++- 5 files changed, 52 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 20c825e83a3f..3c05ca98ec3f 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -423,6 +423,10 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs = *perf_regs, u64 mask) =20 if (mask & XFEATURE_MASK_YMM) perf_regs->ymmh =3D get_xsave_addr(xsave, XFEATURE_YMM); + if (mask & XFEATURE_MASK_ZMM_Hi256) + perf_regs->zmmh =3D get_xsave_addr(xsave, XFEATURE_ZMM_Hi256); + if (mask & XFEATURE_MASK_Hi16_ZMM) + perf_regs->h16zmm =3D get_xsave_addr(xsave, XFEATURE_Hi16_ZMM); } =20 static void release_ext_regs_buffers(void) @@ -451,6 +455,10 @@ static void reserve_ext_regs_buffers(void) mask |=3D XFEATURE_MASK_SSE; if (x86_pmu.ext_regs_mask & X86_EXT_REGS_YMM) mask |=3D XFEATURE_MASK_YMM; + if (x86_pmu.ext_regs_mask & X86_EXT_REGS_ZMMH) + mask |=3D XFEATURE_MASK_ZMM_Hi256; + if (x86_pmu.ext_regs_mask & X86_EXT_REGS_H16ZMM) + mask |=3D XFEATURE_MASK_Hi16_ZMM; =20 size =3D xstate_calculate_size(mask, true); =20 @@ -734,6 +742,13 @@ int x86_pmu_hw_config(struct perf_event *event) if (event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_YMM_QWORDS && !(x86_pmu.ext_regs_mask & X86_EXT_REGS_YMM)) return -EINVAL; + if (event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_ZMM_QWORDS && + !(x86_pmu.ext_regs_mask & X86_EXT_REGS_ZMMH)) + return -EINVAL; + if ((fls64(event->attr.sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE= || + fls64(event->attr.sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE= ) && + !(x86_pmu.ext_regs_mask & X86_EXT_REGS_H16ZMM)) + return -EINVAL; } } return x86_setup_perfctr(event); @@ -1851,6 +1866,15 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, perf_regs->ymmh_regs =3D NULL; mask |=3D XFEATURE_MASK_YMM; } + if (attr->sample_simd_vec_reg_qwords >=3D PERF_X86_ZMM_QWORDS) { + perf_regs->zmmh_regs =3D NULL; + mask |=3D XFEATURE_MASK_ZMM_Hi256; + } + if (fls64(attr->sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE || + fls64(attr->sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE) { + perf_regs->h16zmm_regs =3D NULL; + mask |=3D XFEATURE_MASK_Hi16_ZMM; + } } =20 mask &=3D ~ignore_mask; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 7d332d0247ed..cc42e9d3e13d 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -702,6 +702,8 @@ enum { enum { X86_EXT_REGS_XMM =3D BIT_ULL(0), X86_EXT_REGS_YMM =3D BIT_ULL(1), + X86_EXT_REGS_ZMMH =3D BIT_ULL(2), + X86_EXT_REGS_H16ZMM =3D BIT_ULL(3), }; =20 #define PERF_PEBS_DATA_SOURCE_MAX 0x100 diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 81e3143fd91a..2d78bd9649bd 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -601,6 +601,14 @@ struct x86_perf_regs { u64 *ymmh_regs; struct ymmh_struct *ymmh; }; + union { + u64 *zmmh_regs; + struct avx_512_zmm_uppers_state *zmmh; + }; + union { + u64 *h16zmm_regs; + struct avx_512_hi16_state *h16zmm; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index feb3e8f80761..f74e3ba65be2 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -55,12 +55,16 @@ enum perf_event_x86_regs { =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) =20 -#define PERF_X86_SIMD_VEC_REGS_MAX 16 +#define PERF_X86_SIMD_VEC_REGS_MAX 32 #define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) =20 +#define PERF_X86_H16ZMM_BASE 16 + #define PERF_X86_XMM_QWORDS 2 #define PERF_X86_YMM_QWORDS 4 #define PERF_X86_YMMH_QWORDS (PERF_X86_YMM_QWORDS / 2) -#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_YMM_QWORDS +#define PERF_X86_ZMM_QWORDS 8 +#define PERF_X86_ZMMH_QWORDS (PERF_X86_ZMM_QWORDS / 2) +#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_ZMM_QWORDS =20 #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 37cf0a282915..74e05e2e5c90 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -89,6 +89,12 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, qwords_idx >=3D PERF_X86_SIMD_QWORDS_MAX)) return 0; =20 + if (idx >=3D PERF_X86_H16ZMM_BASE) { + if (!perf_regs->h16zmm_regs) + return 0; + return perf_regs->h16zmm_regs[idx * PERF_X86_ZMM_QWORDS + qwords_idx]; + } + if (qwords_idx < PERF_X86_XMM_QWORDS) { if (!perf_regs->xmm_regs) return 0; @@ -97,6 +103,10 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, if (!perf_regs->ymmh_regs) return 0; return perf_regs->ymmh_regs[idx * PERF_X86_YMMH_QWORDS + qwords_idx - PE= RF_X86_XMM_QWORDS]; + } else if (qwords_idx < PERF_X86_ZMM_QWORDS) { + if (!perf_regs->zmmh_regs) + return 0; + return perf_regs->zmmh_regs[idx * PERF_X86_ZMMH_QWORDS + qwords_idx - PE= RF_X86_YMM_QWORDS]; } =20 return 0; @@ -114,7 +124,8 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, return -EINVAL; } else { if (vec_qwords !=3D PERF_X86_XMM_QWORDS && - vec_qwords !=3D PERF_X86_YMM_QWORDS) + vec_qwords !=3D PERF_X86_YMM_QWORDS && + vec_qwords !=3D PERF_X86_ZMM_QWORDS) return -EINVAL; if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) return -EINVAL; --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 02C512F271D for ; Thu, 26 Jun 2025 19:57:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967825; cv=none; b=HtI5Y/FnjEiqFKDUoWECqdqE93qBPao3UGrVqeC8FF7YHJBDJQVhx5M9Yo/0o9bIN1R4Uu7buY7AoBheNhfQVETsbhZPpk3yjSRnPw1H1DVvXR7aYs9Lm8Y3ZAmwWt8hQBQukXli2/+4x69/A4qZZxEyBTN7iuN8NPASu9W29Wo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967825; c=relaxed/simple; bh=7nUh86q/nRmzY0g6gz2a8jMcMeRzIyFvhvxqJbELXTA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bSXlyKhO6WhdJSVxXY2qJMIDU6eDCA8ipz9CMw2ZYlcfLLMEEx4ujTKIEAY60gMqF2RYzJ2N0U2zD+k0lJsTBvmcbtX8+9Khl+79Ag73dahdK8GDLvoHd0B5eooe7kyG645Na4HfQXFs2f8ivzXaYuPKPhicjOV3O2XEdKENHlg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=H0Ay/UA0; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="H0Ay/UA0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967824; x=1782503824; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7nUh86q/nRmzY0g6gz2a8jMcMeRzIyFvhvxqJbELXTA=; b=H0Ay/UA0ph+YVCI6vURzwuE2DH7qTRtfjBeCvNGRXZG/N4MgYfIUW58v QS33ZHx7yEpLkxpqm98ZrrxMuHpgWvdmQsFCQC2CGhfQPDV+hfFmNdHwa TzTG2rRHU1k9KEai6luvK4uJ8YlR3ocH2kV7Bkhm6vneXPRaMXOfNnWDr /syE+iTpBZg9VY/k1wYUHDS2l+pQ9UKpaPMN55sJYAXvkl+NRdxJkcE6B kSx1s5jOeYoY4gln1rd3xwAl9F02T+xf2o94AE2AwWrZqDtd4rZQFNfnd qyMYGu+9J+Eu9v68xECE10nGdozjOzHwlRMb1g7bIc/pCxwJXTi9KGkVv A==; X-CSE-ConnectionGUID: 7f3HialcSHOttDAh9f/bzA== X-CSE-MsgGUID: trWaZIIRQhSroNmLM7jZKA== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002196" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002196" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:57:02 -0700 X-CSE-ConnectionGUID: dThG1ugEQBGSKGbK6sEhaA== X-CSE-MsgGUID: ywlBy3PXTMSqdf+B3m2h4A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902926" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:57:01 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 10/13] perf/x86: Add OPMASK into sample_simd_pred_reg Date: Thu, 26 Jun 2025 12:56:07 -0700 Message-Id: <20250626195610.405379-11-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The OPMASK is the SIMD's predicate registers. Add them into sample_simd_pred_reg. The qwords of OPMASK is 1. There are 8 registers. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 15 +++++++++++++++ arch/x86/events/perf_event.h | 1 + arch/x86/include/asm/perf_event.h | 4 ++++ arch/x86/include/uapi/asm/perf_regs.h | 3 +++ arch/x86/kernel/perf_regs.c | 15 ++++++++++++--- 5 files changed, 35 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 3c05ca98ec3f..d4710edce2e9 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -427,6 +427,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) perf_regs->zmmh =3D get_xsave_addr(xsave, XFEATURE_ZMM_Hi256); if (mask & XFEATURE_MASK_Hi16_ZMM) perf_regs->h16zmm =3D get_xsave_addr(xsave, XFEATURE_Hi16_ZMM); + if (mask & XFEATURE_MASK_OPMASK) + perf_regs->opmask =3D get_xsave_addr(xsave, XFEATURE_OPMASK); } =20 static void release_ext_regs_buffers(void) @@ -459,6 +461,8 @@ static void reserve_ext_regs_buffers(void) mask |=3D XFEATURE_MASK_ZMM_Hi256; if (x86_pmu.ext_regs_mask & X86_EXT_REGS_H16ZMM) mask |=3D XFEATURE_MASK_Hi16_ZMM; + if (x86_pmu.ext_regs_mask & X86_EXT_REGS_OPMASK) + mask |=3D XFEATURE_MASK_OPMASK; =20 size =3D xstate_calculate_size(mask, true); =20 @@ -1831,6 +1835,9 @@ void x86_pmu_setup_regs_data(struct perf_event *event, data->dyn_size +=3D hweight64(attr->sample_simd_vec_reg_user) * sizeof(u64) * attr->sample_simd_vec_reg_qwords; + data->dyn_size +=3D hweight32(attr->sample_simd_pred_reg_user) * + sizeof(u64) * + attr->sample_simd_pred_reg_qwords; data->regs_user.abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; } perf_regs->abi =3D data->regs_user.abi; @@ -1850,6 +1857,9 @@ void x86_pmu_setup_regs_data(struct perf_event *event, data->dyn_size +=3D hweight64(attr->sample_simd_vec_reg_intr) * sizeof(u64) * attr->sample_simd_vec_reg_qwords; + data->dyn_size +=3D hweight32(attr->sample_simd_pred_reg_intr) * + sizeof(u64) * + attr->sample_simd_pred_reg_qwords; data->regs_intr.abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; } perf_regs->abi =3D data->regs_intr.abi; @@ -1875,6 +1885,11 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, perf_regs->h16zmm_regs =3D NULL; mask |=3D XFEATURE_MASK_Hi16_ZMM; } + if (attr->sample_simd_pred_reg_intr || + attr->sample_simd_pred_reg_user) { + perf_regs->opmask_regs =3D NULL; + mask |=3D XFEATURE_MASK_OPMASK; + } } =20 mask &=3D ~ignore_mask; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index cc42e9d3e13d..cc0bd9479fa7 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -704,6 +704,7 @@ enum { X86_EXT_REGS_YMM =3D BIT_ULL(1), X86_EXT_REGS_ZMMH =3D BIT_ULL(2), X86_EXT_REGS_H16ZMM =3D BIT_ULL(3), + X86_EXT_REGS_OPMASK =3D BIT_ULL(4), }; =20 #define PERF_PEBS_DATA_SOURCE_MAX 0x100 diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 2d78bd9649bd..dda677022882 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -609,6 +609,10 @@ struct x86_perf_regs { u64 *h16zmm_regs; struct avx_512_hi16_state *h16zmm; }; + union { + u64 *opmask_regs; + struct avx_512_opmask_state *opmask; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index f74e3ba65be2..dd7bd1dd8d39 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -55,11 +55,14 @@ enum perf_event_x86_regs { =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) =20 +#define PERF_X86_SIMD_PRED_REGS_MAX 8 +#define PERF_X86_SIMD_PRED_MASK GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, = 0) #define PERF_X86_SIMD_VEC_REGS_MAX 32 #define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) =20 #define PERF_X86_H16ZMM_BASE 16 =20 +#define PERF_X86_OPMASK_QWORDS 1 #define PERF_X86_XMM_QWORDS 2 #define PERF_X86_YMM_QWORDS 4 #define PERF_X86_YMMH_QWORDS (PERF_X86_YMM_QWORDS / 2) diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 74e05e2e5c90..b569368743a4 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -82,8 +82,14 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, { struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); =20 - if (pred) - return 0; + if (pred) { + if (WARN_ON_ONCE(idx >=3D PERF_X86_SIMD_PRED_REGS_MAX || + qwords_idx >=3D PERF_X86_OPMASK_QWORDS)) + return 0; + if (!perf_regs->opmask_regs) + return 0; + return perf_regs->opmask_regs[idx]; + } =20 if (WARN_ON_ONCE(idx >=3D PERF_X86_SIMD_VEC_REGS_MAX || qwords_idx >=3D PERF_X86_SIMD_QWORDS_MAX)) @@ -130,7 +136,10 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mas= k, if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) return -EINVAL; } - if (pred_mask) + + if (pred_qwords !=3D PERF_X86_OPMASK_QWORDS) + return -EINVAL; + if (pred_mask & ~PERF_X86_SIMD_PRED_MASK) return -EINVAL; =20 return 0; --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF39D2F2C55 for ; Thu, 26 Jun 2025 19:57:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967826; cv=none; b=rOcOYVEV5NFUdqjZmEpZmR0/CFMCaMVP2dZack9EH/CHgjcCACiEs+RhTVKF2l7UeGK97cITO4BUZgE2YM6jCRkUm1QSWqUiVwRaDUUz1UQR19ukEI2C8LKyWMbso2sMmPiVuNJmE5S007yW2AVp1CGJiPORJjLByUe17XuBqAM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967826; c=relaxed/simple; bh=glk68Jbo3iXxlobGAf5SkFMLYktxFZc/qGKRnZxka8w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FXvQyxJpWQTTeOdJmFn9fEgZ20mXe84mHZmcEdyXmYsjVvPcTSU3phSUjCRs2qJMPrwSIXupOAdehkSIu4i83EK+OHufqYIS5nP6eZsWWK8zEe1PYMpzC8nG6DE40VKCZt+h5NVoORdzJ+tWK+XgtrR3X1PqPCfdPtevddCEU5Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DaoJEHR8; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DaoJEHR8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967824; x=1782503824; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=glk68Jbo3iXxlobGAf5SkFMLYktxFZc/qGKRnZxka8w=; b=DaoJEHR8XKfoz5bwMdw35lcmkXVF7GxJRwReqr1YecfBJsOE4oyaAauP wtd0NJVIGcAx3k+v+Y2PLlw9AkSSgBscAoTwFplFbpIAhCCbOH+m3D8kg aYtHgcedH5hTR2Ni7JJZOgL9EMcYrvYxQaY7cH4I0mJ0HlmG+8FV/6VRT FpT6KXqYBY6sOIuQya7MPWEMEo0cj7qgj25j+L11aLiviPpRTSCTY2c4r Yf9MS3zw61yYQ74VWlvc4cwKRpj4wc1SbQy8sVYx6G/6GTRmfoYCftGKJ Nel9gQTq5fDxFXcoLC3qrXT4rD8xSdPaRNQ7qPPLHWHfrf/YawogRqQMT A==; X-CSE-ConnectionGUID: eXkyhl+aS7qOXyKazjO84Q== X-CSE-MsgGUID: X19dfrpKTv21OBQEEX87NA== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002204" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002204" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:57:02 -0700 X-CSE-ConnectionGUID: oj8YrI88R2WcWdDc5vJk6Q== X-CSE-MsgGUID: VWRHCeK7RdirOWjfIKZO2A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902929" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:57:02 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 11/13] perf/x86: Add eGPRs into sample_regs Date: Thu, 26 Jun 2025 12:56:08 -0700 Message-Id: <20250626195610.405379-12-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The eGPRs is only supported when the new SIMD registers configuration method is used, which moves the XMM to sample_simd_vec_regs. So the space can be reclaimed for the eGPRs. The eGPRs is retrieved by XSAVE. Only support the eGPRs for X86_64. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Kan Liang --- arch/x86/events/core.c | 41 +++++++++++++++++++++------ arch/x86/events/perf_event.h | 1 + arch/x86/include/asm/perf_event.h | 4 +++ arch/x86/include/uapi/asm/perf_regs.h | 26 +++++++++++++++-- arch/x86/kernel/perf_regs.c | 31 ++++++++++---------- 5 files changed, 78 insertions(+), 25 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index d4710edce2e9..1da18886e1f3 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -429,6 +429,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) perf_regs->h16zmm =3D get_xsave_addr(xsave, XFEATURE_Hi16_ZMM); if (mask & XFEATURE_MASK_OPMASK) perf_regs->opmask =3D get_xsave_addr(xsave, XFEATURE_OPMASK); + if (mask & XFEATURE_MASK_APX) + perf_regs->egpr =3D get_xsave_addr(xsave, XFEATURE_APX); } =20 static void release_ext_regs_buffers(void) @@ -463,6 +465,8 @@ static void reserve_ext_regs_buffers(void) mask |=3D XFEATURE_MASK_Hi16_ZMM; if (x86_pmu.ext_regs_mask & X86_EXT_REGS_OPMASK) mask |=3D XFEATURE_MASK_OPMASK; + if (x86_pmu.ext_regs_mask & X86_EXT_REGS_EGPRS) + mask |=3D XFEATURE_MASK_APX; =20 size =3D xstate_calculate_size(mask, true); =20 @@ -718,17 +722,33 @@ int x86_pmu_hw_config(struct perf_event *event) } =20 if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_U= SER)) { - /* - * Besides the general purpose registers, XMM registers may - * be collected as well. - */ - if (event_has_extended_regs(event)) { - if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) + if (event->attr.sample_simd_regs_enabled) { + u64 reserved =3D ~GENMASK_ULL(PERF_REG_X86_64_MAX - 1, 0); + + if (!(event->pmu->capabilities & PERF_PMU_CAP_SIMD_REGS)) return -EINVAL; - if (!(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM)) + /* + * The XMM space in the perf_event_x86_regs is reclaimed + * for eGPRs and other general registers. + */ + if (event->attr.sample_regs_user & reserved || + event->attr.sample_regs_intr & reserved) return -EINVAL; - if (event->attr.sample_simd_regs_enabled) + if ((event->attr.sample_regs_user & PERF_X86_EGPRS_MASK || + event->attr.sample_regs_intr & PERF_X86_EGPRS_MASK) && + !(x86_pmu.ext_regs_mask & X86_EXT_REGS_EGPRS)) return -EINVAL; + } else { + /* + * Besides the general purpose registers, XMM registers may + * be collected as well. + */ + if (event_has_extended_regs(event)) { + if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) + return -EINVAL; + if (!(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM)) + return -EINVAL; + } } =20 if (event_has_simd_regs(event)) { @@ -1890,6 +1910,11 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, perf_regs->opmask_regs =3D NULL; mask |=3D XFEATURE_MASK_OPMASK; } + if (attr->sample_regs_user & PERF_X86_EGPRS_MASK || + attr->sample_regs_intr & PERF_X86_EGPRS_MASK) { + perf_regs->egpr_regs =3D NULL; + mask |=3D XFEATURE_MASK_APX; + } } =20 mask &=3D ~ignore_mask; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index cc0bd9479fa7..4dd1e7344021 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -705,6 +705,7 @@ enum { X86_EXT_REGS_ZMMH =3D BIT_ULL(2), X86_EXT_REGS_H16ZMM =3D BIT_ULL(3), X86_EXT_REGS_OPMASK =3D BIT_ULL(4), + X86_EXT_REGS_EGPRS =3D BIT_ULL(5), }; =20 #define PERF_PEBS_DATA_SOURCE_MAX 0x100 diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index dda677022882..4400cb66bc8e 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -613,6 +613,10 @@ struct x86_perf_regs { u64 *opmask_regs; struct avx_512_opmask_state *opmask; }; + union { + u64 *egpr_regs; + struct apx_state *egpr; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index dd7bd1dd8d39..cd0f6804debf 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -27,11 +27,31 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + /* Extended GPRs (EGPRs) */ + PERF_REG_X86_R16, + PERF_REG_X86_R17, + PERF_REG_X86_R18, + PERF_REG_X86_R19, + PERF_REG_X86_R20, + PERF_REG_X86_R21, + PERF_REG_X86_R22, + PERF_REG_X86_R23, + PERF_REG_X86_R24, + PERF_REG_X86_R25, + PERF_REG_X86_R26, + PERF_REG_X86_R27, + PERF_REG_X86_R28, + PERF_REG_X86_R29, + PERF_REG_X86_R30, + PERF_REG_X86_R31, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, - PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + PERF_REG_X86_64_MAX =3D PERF_REG_X86_R31 + 1, =20 - /* These all need two bits set because they are 128bit */ + /* + * These all need two bits set because they are 128bit. + * These are only available when !PERF_SAMPLE_REGS_ABI_SIMD + */ PERF_REG_X86_XMM0 =3D 32, PERF_REG_X86_XMM1 =3D 34, PERF_REG_X86_XMM2 =3D 36, @@ -55,6 +75,8 @@ enum perf_event_x86_regs { =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) =20 +#define PERF_X86_EGPRS_MASK GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R1= 6) + #define PERF_X86_SIMD_PRED_REGS_MAX 8 #define PERF_X86_SIMD_PRED_MASK GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, = 0) #define PERF_X86_SIMD_VEC_REGS_MAX 32 diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index b569368743a4..3780a7b0e021 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -61,14 +61,22 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) { struct x86_perf_regs *perf_regs; =20 - if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { + if (idx > PERF_REG_X86_R15) { perf_regs =3D container_of(regs, struct x86_perf_regs, regs); - /* SIMD registers are moved to dedicated sample_simd_vec_reg */ - if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) - return 0; - if (!perf_regs->xmm_regs) - return 0; - return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; + + if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + if (idx <=3D PERF_REG_X86_R31) { + if (!perf_regs->egpr_regs) + return 0; + return perf_regs->egpr_regs[idx - PERF_REG_X86_R16]; + } + } else { + if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { + if (!perf_regs->xmm_regs) + return 0; + return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; + } + } } =20 if (WARN_ON_ONCE(idx >=3D ARRAY_SIZE(pt_regs_offset))) @@ -149,14 +157,7 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mas= k, ~((1ULL << PERF_REG_X86_MAX) - 1)) =20 #ifdef CONFIG_X86_32 -#define REG_NOSUPPORT ((1ULL << PERF_REG_X86_R8) | \ - (1ULL << PERF_REG_X86_R9) | \ - (1ULL << PERF_REG_X86_R10) | \ - (1ULL << PERF_REG_X86_R11) | \ - (1ULL << PERF_REG_X86_R12) | \ - (1ULL << PERF_REG_X86_R13) | \ - (1ULL << PERF_REG_X86_R14) | \ - (1ULL << PERF_REG_X86_R15)) +#define REG_NOSUPPORT GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R8) =20 int perf_reg_validate(u64 mask) { --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DD5C2F2C7B for ; Thu, 26 Jun 2025 19:57:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967827; cv=none; b=qnE135n8PGNes/ffpfFKDEAfYfMlyIBL7NB15cHGxmZlpHw6ULe84fwYufhhE+fht4fkMzujuH03PPQMGWPpKT1B0sxtOc6wxAr7mBToXAcdMEkqo6GdHazoNN1dsRk5bkR4cJIzZ0yUdUDa+d49o+Bc/ZEA0Uy6rfiF8sKJ19E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967827; c=relaxed/simple; bh=s6rdrhi9FvkLruDv4uOvKkPwleSqXPdXEoKEuRLbtlY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HLSTU5AuC99wTeVOANn9u9QXw4Y3UyPQzoazwg9zFy3RqVLBlsaR/oZrlM7yzTH4fsnie1dK5p7KUjSIGI4c5mYto1ECeRLzPwb/U95L68absRBTaPccvDlyn5abrpUQ4+RqcLxZo7WT1+agv6A+qWRDgt8v0f+jEsauFpcw4fQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=UaVEotVw; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UaVEotVw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967825; x=1782503825; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=s6rdrhi9FvkLruDv4uOvKkPwleSqXPdXEoKEuRLbtlY=; b=UaVEotVwMFwltio2u5O1O+uRlQ+u/naekMwCfqq0lZ/Wl0eOrYmLVi1c de7dLL8HL0rmhfaloDYXDj9a7bAymeKBNU4QfI4MUyesHm/+kWea0HFDx y6M+5R72PKSbTFpSXPkMA4M/Ade0/MV7zRpWNx7tlw/GKN4xZJVPIT8x8 KFOKwmHGM/GQUoRIW6wUkXZGcg1JLUeCU1OfgQuaoRNpffpueKa1FM+1F izpEU1h6ovHuluPKrSNVr8aeg724xhNOdecbQi5YTgj/cPEKuD29bT/Ku 1JZ1O0sojB0AM0BCF2bm2lqF8vc3Gz5Y3FNJq1M2Hc0dUBIaCEoxJyB6v g==; X-CSE-ConnectionGUID: cndIxIX9SG68hgGNntBorA== X-CSE-MsgGUID: wA/WnZCdTm+Y0tcnPGPFIg== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002212" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002212" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:57:03 -0700 X-CSE-ConnectionGUID: UaOGcLnuT3ammFYdnREnJA== X-CSE-MsgGUID: 78QGSQexQCar3SGXI2Fn9A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902932" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:57:03 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 12/13] perf/x86: Add SSP into sample_regs Date: Thu, 26 Jun 2025 12:56:09 -0700 Message-Id: <20250626195610.405379-13-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The SSP is only supported when the new SIMD registers configuration method is used, which moves the XMM to sample_simd_vec_regs. So the space can be reclaimed for the SSP. The SSP is retrieved by XSAVE. Only support the SSP for X86_64. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 16 +++++++++++++++- arch/x86/events/perf_event.h | 1 + arch/x86/include/asm/perf_event.h | 4 ++++ arch/x86/include/uapi/asm/perf_regs.h | 3 +++ arch/x86/kernel/perf_regs.c | 8 +++++++- 5 files changed, 30 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 1da18886e1f3..b35b5695e42f 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -431,6 +431,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) perf_regs->opmask =3D get_xsave_addr(xsave, XFEATURE_OPMASK); if (mask & XFEATURE_MASK_APX) perf_regs->egpr =3D get_xsave_addr(xsave, XFEATURE_APX); + if (mask & XFEATURE_MASK_CET_USER) + perf_regs->cet =3D get_xsave_addr(xsave, XFEATURE_CET_USER); } =20 static void release_ext_regs_buffers(void) @@ -467,6 +469,8 @@ static void reserve_ext_regs_buffers(void) mask |=3D XFEATURE_MASK_OPMASK; if (x86_pmu.ext_regs_mask & X86_EXT_REGS_EGPRS) mask |=3D XFEATURE_MASK_APX; + if (x86_pmu.ext_regs_mask & X86_EXT_REGS_CET) + mask |=3D XFEATURE_MASK_CET_USER; =20 size =3D xstate_calculate_size(mask, true); =20 @@ -723,7 +727,7 @@ int x86_pmu_hw_config(struct perf_event *event) =20 if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_U= SER)) { if (event->attr.sample_simd_regs_enabled) { - u64 reserved =3D ~GENMASK_ULL(PERF_REG_X86_64_MAX - 1, 0); + u64 reserved =3D ~GENMASK_ULL(PERF_REG_MISC_MAX - 1, 0); =20 if (!(event->pmu->capabilities & PERF_PMU_CAP_SIMD_REGS)) return -EINVAL; @@ -738,6 +742,11 @@ int x86_pmu_hw_config(struct perf_event *event) event->attr.sample_regs_intr & PERF_X86_EGPRS_MASK) && !(x86_pmu.ext_regs_mask & X86_EXT_REGS_EGPRS)) return -EINVAL; + if ((event->attr.sample_regs_user & BIT_ULL(PERF_REG_X86_SSP) || + event->attr.sample_regs_intr & BIT_ULL(PERF_REG_X86_SSP)) && + !(x86_pmu.ext_regs_mask & X86_EXT_REGS_CET)) + return -EINVAL; + } else { /* * Besides the general purpose registers, XMM registers may @@ -1915,6 +1924,11 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, perf_regs->egpr_regs =3D NULL; mask |=3D XFEATURE_MASK_APX; } + if (attr->sample_regs_user & BIT_ULL(PERF_REG_X86_SSP) || + attr->sample_regs_intr & BIT_ULL(PERF_REG_X86_SSP)) { + perf_regs->cet_regs =3D NULL; + mask |=3D XFEATURE_MASK_CET_USER; + } } =20 mask &=3D ~ignore_mask; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 4dd1e7344021..1d958059db07 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -706,6 +706,7 @@ enum { X86_EXT_REGS_H16ZMM =3D BIT_ULL(3), X86_EXT_REGS_OPMASK =3D BIT_ULL(4), X86_EXT_REGS_EGPRS =3D BIT_ULL(5), + X86_EXT_REGS_CET =3D BIT_ULL(6), }; =20 #define PERF_PEBS_DATA_SOURCE_MAX 0x100 diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 4400cb66bc8e..28ddff38d232 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -617,6 +617,10 @@ struct x86_perf_regs { u64 *egpr_regs; struct apx_state *egpr; }; + union { + u64 *cet_regs; + struct cet_user_state *cet; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index cd0f6804debf..4d88cb18acb9 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -48,6 +48,9 @@ enum perf_event_x86_regs { PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, PERF_REG_X86_64_MAX =3D PERF_REG_X86_R31 + 1, =20 + PERF_REG_X86_SSP, + PERF_REG_MISC_MAX =3D PERF_REG_X86_SSP + 1, + /* * These all need two bits set because they are 128bit. * These are only available when !PERF_SAMPLE_REGS_ABI_SIMD diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 3780a7b0e021..f985765a799a 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -70,6 +70,11 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) return 0; return perf_regs->egpr_regs[idx - PERF_REG_X86_R16]; } + if (idx =3D=3D PERF_REG_X86_SSP) { + if (!perf_regs->cet_regs) + return 0; + return perf_regs->cet_regs[1]; + } } else { if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { if (!perf_regs->xmm_regs) @@ -157,7 +162,8 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, ~((1ULL << PERF_REG_X86_MAX) - 1)) =20 #ifdef CONFIG_X86_32 -#define REG_NOSUPPORT GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R8) +#define REG_NOSUPPORT (GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R8) | \ + BIT_ULL(PERF_REG_X86_SSP)) =20 int perf_reg_validate(u64 mask) { --=20 2.38.1 From nobody Tue Feb 10 21:38:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4CD32F362D for ; Thu, 26 Jun 2025 19:57:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967827; cv=none; b=k5pToAnupb1zJeUodZZbOnGc8P9EI96eA+HU+04l1XCWF6E4uXoscBRuuReBu+Jta62XG7WaxI9503nXV/UTK4yjUKy6TWgaS7Q2AAKUG4wVm7MTN/xuE+L9wR1jYEvaHBuhnFnHrA7p8i5Wr434gZg3Dtro/BZU3SRRwgK7i5A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750967827; c=relaxed/simple; bh=b7ddsw4B1TR/9B431hCDX6tkbkGcagN0Lz2oT/iRcg4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fJlYD9Q4hcpRezg+gKH3LbPIvrW0/592ELt1Tf1JOnzPV1YSfI1pQu59XQufaET3m6hNe+o2BjTv3GncXLbCxlxmzbnD9clTZFgy5Guknuwz3VWj/p3xKY8E3QOoTSxAnYV/7GDYO97HvwuE/zZQDDTbUvfMWPSM2LiBLDWmNxs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZECG+uUS; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZECG+uUS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750967825; x=1782503825; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=b7ddsw4B1TR/9B431hCDX6tkbkGcagN0Lz2oT/iRcg4=; b=ZECG+uUSHResyy8wZ5VGP8Sh7Twyfe/kw/1D2msEoErCmnbok5dlInqQ 74p1+SK9W62Xs7iEoywprESok1CJjS/7axxy9/3XqlGvkif/PB6l8OyEg S3Am8uWIC4k5ScZiTrLOr86LkCPMIFPnslopBOEXEqFw78djKIRX1v9gA 2vuxFtuCEaEjXMTgLm7hHgp6a4tj9OnF1wiChVf1j4+0TuO2gIQZSOdz1 Ig+oIwEjq88Tz8Ny/WM90dhMM1zhulARACSxqVwjHw8G05SICa4C5Nj8u n6pae5tkCdhWYeB1bM8ngZujrsLkU7hl+djZQtNP+Y9283SZuFiMYjmV1 w==; X-CSE-ConnectionGUID: CJBSd7nnRfu8dtZ8x9A4HA== X-CSE-MsgGUID: 4iLyGbHOQhyJIIzBcBN3kA== X-IronPort-AV: E=McAfee;i="6800,10657,11476"; a="53002220" X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="53002220" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2025 12:57:03 -0700 X-CSE-ConnectionGUID: DD2unR5/TQ+MbIfD61QlFg== X-CSE-MsgGUID: yEj+wh9sQ1OGJKcqtw3Zig== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,268,1744095600"; d="scan'208";a="156902936" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa005.fm.intel.com with ESMTP; 26 Jun 2025 12:57:03 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, Kan Liang Subject: [RFC PATCH V2 13/13] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS Date: Thu, 26 Jun 2025 12:56:10 -0700 Message-Id: <20250626195610.405379-14-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250626195610.405379-1-kan.liang@linux.intel.com> References: <20250626195610.405379-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Enable PERF_PMU_CAP_SIMD_REGS if there is XSAVES support for YMM, ZMM, OPMASK, eGPRs, or SSP. Disable large PEBS for these registers since PEBS HW doesn't support them yet. Signed-off-by: Kan Liang --- arch/x86/events/intel/core.c | 46 ++++++++++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index c73c2e57d71b..8dc638f9efd2 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4034,8 +4034,30 @@ static unsigned long intel_pmu_large_pebs_flags(stru= ct perf_event *event) flags &=3D ~PERF_SAMPLE_TIME; if (!event->attr.exclude_kernel) flags &=3D ~PERF_SAMPLE_REGS_USER; - if (event->attr.sample_regs_user & ~PEBS_GP_REGS) - flags &=3D ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); + if (event->attr.sample_simd_regs_enabled) { + u64 nolarge =3D PERF_X86_EGPRS_MASK | BIT_ULL(PERF_REG_X86_SSP); + + /* + * PEBS HW can only collect the XMM0-XMM15 for now. + * Disable large PEBS for other vector registers, predicate + * registers, eGPRs, and SSP. + */ + if (event->attr.sample_regs_user & nolarge || + fls64(event->attr.sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE || + event->attr.sample_simd_pred_reg_user) + flags &=3D ~PERF_SAMPLE_REGS_USER; + + if (event->attr.sample_regs_intr & nolarge || + fls64(event->attr.sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE || + event->attr.sample_simd_pred_reg_intr) + flags &=3D ~PERF_SAMPLE_REGS_INTR; + + if (event->attr.sample_simd_vec_reg_qwords > PERF_X86_XMM_QWORDS) + flags &=3D ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); + } else { + if (event->attr.sample_regs_user & ~PEBS_GP_REGS) + flags &=3D ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); + } return flags; } =20 @@ -5296,6 +5318,26 @@ static void intel_extended_regs_init(struct pmu *pmu) =20 x86_pmu.ext_regs_mask |=3D X86_EXT_REGS_XMM; x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_EXTENDED_= REGS; + + if (boot_cpu_has(X86_FEATURE_AVX) && + cpu_has_xfeatures(XFEATURE_MASK_YMM, NULL)) + x86_pmu.ext_regs_mask |=3D X86_EXT_REGS_YMM; + if (boot_cpu_has(X86_FEATURE_APX) && + cpu_has_xfeatures(XFEATURE_MASK_APX, NULL)) + x86_pmu.ext_regs_mask |=3D X86_EXT_REGS_EGPRS; + if (boot_cpu_has(X86_FEATURE_AVX512F)) { + if (cpu_has_xfeatures(XFEATURE_MASK_OPMASK, NULL)) + x86_pmu.ext_regs_mask |=3D X86_EXT_REGS_OPMASK; + if (cpu_has_xfeatures(XFEATURE_MASK_ZMM_Hi256, NULL)) + x86_pmu.ext_regs_mask |=3D X86_EXT_REGS_ZMMH; + if (cpu_has_xfeatures(XFEATURE_MASK_Hi16_ZMM, NULL)) + x86_pmu.ext_regs_mask |=3D X86_EXT_REGS_H16ZMM; + } + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + x86_pmu.ext_regs_mask |=3D X86_EXT_REGS_CET; + + if (x86_pmu.ext_regs_mask !=3D X86_EXT_REGS_XMM) + x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_SIMD_REG= S; } =20 static void update_pmu_cap(struct pmu *pmu) --=20 2.38.1