From nobody Tue Feb 10 14:25:55 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0B782DD60F; Mon, 9 Feb 2026 07:25:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770621944; cv=none; b=ZS9QPfLTs9rc0akhVr+29lc7fM6YRnX8jsfD6n9GOjg12Brih0gsicAssl5lUUr2YtJhFBKFLegiEYHLnenlBwKttQWvMK83fHEvpdi0hL+Rz5+SOgnE872vLSAJCjTCw6UOrtuWAExypZ0e7tucGLtm+1LofLnMhKT5PAISl5M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770621944; c=relaxed/simple; bh=YGt7Le9WtZ0UzK3leN4S+kdMPPfdpgbxaEzyWjtoKWI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kyPkaZFZqN9gnP5Z40diocKP8/qBX2hkySQFXj9MXo4Cqd4+X9Nf82xNtwskFNRC4mWrdJ801aBEbqajzIl03TZ07gz23Y9Rib7wp/7gGSr/dE8VJDZ4qD1gEzPzEI6xP4HJT7AMZSghsHcnQI8EnwzBVF/BW8Yv9UwUl+MuOM4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HZtK8jtP; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HZtK8jtP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1770621945; x=1802157945; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YGt7Le9WtZ0UzK3leN4S+kdMPPfdpgbxaEzyWjtoKWI=; b=HZtK8jtPmOU0+yjAO5KK4fRru9IisFunWCE0/FdhG0WL/mwChqZHAqXo kbWG07NnNd14ZUS5ga3WENXGdi3jeyWZXCyt0Z9xkQeB/MRUcTx/4Sio3 8HtD7PPt3KBHsvPRdRBQZucBN++Hq2N44iuAFS7jArgfkz2LBukuPEBkt eiEC+tHLlmsZAU4SJ8eE/7VuAFX+01CtXAZy+XrwjbEkk+6tejLEr6xLx 400SymYDnMHgN3135XIgJ3/15YiOMHl01HJc23ygU3Ud6/3dqZrKHoeig PQCyY1vZ5w1Avh6ab+76CeC56Bc6/t0Qipokic5HgN22f2K+P68CVyd0U Q==; X-CSE-ConnectionGUID: 4OgW6ZSyRgCCHK1HA8d9zA== X-CSE-MsgGUID: XcvULaGvQ7WuHZ5nHiNagQ== X-IronPort-AV: E=McAfee;i="6800,10657,11695"; a="83098415" X-IronPort-AV: E=Sophos;i="6.21,281,1763452800"; d="scan'208";a="83098415" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Feb 2026 23:25:45 -0800 X-CSE-ConnectionGUID: XD84p1oNReWJMzVKcGClDA== X-CSE-MsgGUID: rC8JH2IPTaeVcmz/l3cNYw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,281,1763452800"; d="scan'208";a="241694685" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa001.fm.intel.com with ESMTP; 08 Feb 2026 23:25:40 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi , Kan Liang Subject: [Patch v6 11/22] perf/x86: Enable XMM register sampling for REGS_USER case Date: Mon, 9 Feb 2026 15:20:36 +0800 Message-Id: <20260209072047.2180332-12-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260209072047.2180332-1-dapeng1.mi@linux.intel.com> References: <20260209072047.2180332-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds support for XMM register sampling in the REGS_USER case. To handle simultaneous sampling of XMM registers for both REGS_INTR and REGS_USER cases, a per-CPU `x86_user_regs` is introduced to store REGS_USER-specific XMM registers. This prevents REGS_USER-specific XMM register data from being overwritten by REGS_INTR-specific data if they share the same `x86_perf_regs` structure. To sample user-space XMM registers, the `x86_pmu_update_user_ext_regs()` helper function is added. It checks if the `TIF_NEED_FPU_LOAD` flag is set. If so, the user-space XMM register data can be directly retrieved from the cached task FPU state, as the corresponding hardware registers have been cleared or switched to kernel-space data. Otherwise, the data must be read from the hardware registers using the `xsaves` instruction. For PEBS events, `x86_pmu_update_user_ext_regs()` checks if the PEBS-sampled XMM register data belongs to user-space. If so, no further action is needed. Otherwise, the user-space XMM register data needs to be re-sampled using the same method as for non-PEBS events. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- V6: New patch, partly split from previous patch. Fully support user-regs sampling for SIMD regsiters as Peter suggested. arch/x86/events/core.c | 99 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 85 insertions(+), 14 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 3c0987e13edc..36b4bc413938 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -696,7 +696,7 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; } =20 - if (event->attr.sample_type & PERF_SAMPLE_REGS_INTR) { + if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_U= SER)) { /* * Besides the general purpose registers, XMM registers may * be collected as well. @@ -707,15 +707,6 @@ int x86_pmu_hw_config(struct perf_event *event) } } =20 - if (event->attr.sample_type & PERF_SAMPLE_REGS_USER) { - /* - * Currently XMM registers sampling for REGS_USER is not - * supported yet. - */ - if (event_has_extended_regs(event)) - return -EINVAL; - } - return x86_setup_perfctr(event); } =20 @@ -1745,6 +1736,28 @@ static void x86_pmu_del(struct perf_event *event, in= t flags) static_call_cond(x86_pmu_del)(event); } =20 +/* + * When both PERF_SAMPLE_REGS_INTR and PERF_SAMPLE_REGS_USER are set, + * an additional x86_perf_regs is required to save user-space registers. + * Without this, user-space register data may be overwritten by kernel-spa= ce + * registers. + */ +static DEFINE_PER_CPU(struct x86_perf_regs, x86_user_regs); +static void x86_pmu_perf_get_regs_user(struct perf_sample_data *data, + struct pt_regs *regs) +{ + struct x86_perf_regs *x86_regs_user =3D this_cpu_ptr(&x86_user_regs); + struct perf_regs regs_user; + + perf_get_regs_user(®s_user, regs); + data->regs_user.abi =3D regs_user.abi; + if (regs_user.regs) { + x86_regs_user->regs =3D *regs_user.regs; + data->regs_user.regs =3D &x86_regs_user->regs; + } else + data->regs_user.regs =3D NULL; +} + static void x86_pmu_setup_basic_regs_data(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs) @@ -1757,7 +1770,14 @@ static void x86_pmu_setup_basic_regs_data(struct per= f_event *event, data->regs_user.abi =3D perf_reg_abi(current); data->regs_user.regs =3D regs; } else if (!(current->flags & PF_KTHREAD)) { - perf_get_regs_user(&data->regs_user, regs); + /* + * It cannot guarantee that the kernel will never + * touch the registers outside of the pt_regs, + * especially when more and more registers + * (e.g., SIMD, eGPR) are added. The live data + * cannot be used. + */ + x86_pmu_perf_get_regs_user(data, regs); } else { data->regs_user.abi =3D PERF_SAMPLE_REGS_ABI_NONE; data->regs_user.regs =3D NULL; @@ -1810,6 +1830,47 @@ static inline void x86_pmu_update_ext_regs(struct x8= 6_perf_regs *perf_regs, perf_regs->xmm_space =3D xsave->i387.xmm_space; } =20 +/* + * This function retrieves cached user-space fpu registers (XMM/YMM/ZMM). + * If TIF_NEED_FPU_LOAD is set, it indicates that the user-space FPU state + * Otherwise, the data should be read directly from the hardware registers. + */ +static inline u64 x86_pmu_update_user_ext_regs(struct perf_sample_data *da= ta, + struct pt_regs *regs, + u64 mask, u64 ignore_mask) +{ + struct x86_perf_regs *perf_regs; + struct xregs_state *xsave; + struct fpu *fpu; + struct fpstate *fps; + u64 sample_mask =3D 0; + + if (data->regs_user.abi =3D=3D PERF_SAMPLE_REGS_ABI_NONE) + return 0; + + if (user_mode(regs)) + sample_mask =3D mask & ~ignore_mask; + + if (test_thread_flag(TIF_NEED_FPU_LOAD)) { + perf_regs =3D container_of(data->regs_user.regs, + struct x86_perf_regs, regs); + fpu =3D x86_task_fpu(current); + /* + * If __task_fpstate is set, it holds the right pointer, + * otherwise fpstate will. + */ + fps =3D READ_ONCE(fpu->__task_fpstate); + if (!fps) + fps =3D fpu->fpstate; + xsave =3D &fps->regs.xsave; + + x86_pmu_update_ext_regs(perf_regs, xsave, mask); + sample_mask =3D 0; + } + + return sample_mask; +} + static void x86_pmu_sample_extended_regs(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs, @@ -1818,6 +1879,7 @@ static void x86_pmu_sample_extended_regs(struct perf_= event *event, u64 sample_type =3D event->attr.sample_type; struct x86_perf_regs *perf_regs; struct xregs_state *xsave; + u64 user_mask =3D 0; u64 intr_mask =3D 0; u64 mask =3D 0; =20 @@ -1827,15 +1889,24 @@ static void x86_pmu_sample_extended_regs(struct per= f_event *event, mask |=3D XFEATURE_MASK_SSE; =20 mask &=3D x86_pmu.ext_regs_mask; + if (sample_type & PERF_SAMPLE_REGS_USER) { + user_mask =3D x86_pmu_update_user_ext_regs(data, regs, + mask, ignore_mask); + } =20 if (sample_type & PERF_SAMPLE_REGS_INTR) intr_mask =3D mask & ~ignore_mask; =20 - if (intr_mask) { - __x86_pmu_sample_ext_regs(intr_mask); + if (user_mask | intr_mask) { + __x86_pmu_sample_ext_regs(user_mask | intr_mask); xsave =3D per_cpu(ext_regs_buf, smp_processor_id()); - x86_pmu_update_ext_regs(perf_regs, xsave, intr_mask); } + + if (user_mask) + x86_pmu_update_ext_regs(perf_regs, xsave, user_mask); + + if (intr_mask) + x86_pmu_update_ext_regs(perf_regs, xsave, intr_mask); } =20 void x86_pmu_setup_regs_data(struct perf_event *event, --=20 2.34.1