From nobody Sat Feb 7 18:42:25 2026 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D076285419 for ; Mon, 10 Nov 2025 03:32:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745564; cv=none; b=jSESfKfo+r3TfRLggGJoro9llWYiIMLS4hpZ5ekLbujW4Pm7d0F+1CJufxm12BTNKeIDT+a+TXlrfnTTmBeCWgrv7IP01inKEAsaw8rzl1fotZQYoY+VmsOCin0FXo2/l75xqrj6lwrmYO6UJtVwoLhe1Rf5x/JRctt4OSU8MCo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745564; c=relaxed/simple; bh=nRWXZzHnr+QYjtzY+ghJMKruLFzKkLqJHp3ZfSMEgjE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MWmBnkl/od5TyXDYqqc5a0a4x7UB0UAQl3d8/YXjuuZn5buS03XPl787guMLoHQOdQGUOusfc1NxcowF3jB+FwBof2+orMyUsVDC/hbaKlEOtrwFDB7Flw9tUMt1UQi/ID/FpVZ4PMcc+cWsiBzIPD+bQZaV56iy5sFl5hH9XJE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gmNFqNYG; arc=none smtp.client-ip=209.85.210.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gmNFqNYG" Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-78af3fe5b17so1870283b3a.2 for ; Sun, 09 Nov 2025 19:32:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762745563; x=1763350363; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=X9qw7YzFGlFaoDmp/Ld9yRvNLNmzyUxB788+zSCyjy8=; b=gmNFqNYGxiXKyjaVyQPTmYzjpzJQmYQP/BtXM/uzTpfG7ous3+8lRr/Escuc+9b3Yh vpbBfdDMEezAYKpTkGo6y7mcoHwwzX719HAaG6nFLON4EYpYme3NIgmsod3rFVf8r2id OikUrXeG+kpSRpB2zOY8HjfwddIYcimkAP/G3oA9AuuvccUnESsD6OKRSaIZFbR/awHi VrMInQ6mPQvuxdbvxXMul5KNNMufGThrwD8m3plnGLOmGW995t/BpsXSaAmdrHmnce2H jXf4D5Lctck5+E8PO/zGXK4jziF/TSf5dOdug9P+ske10rSkMocTI4qyqOof+Cks9MX+ 6YZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762745563; x=1763350363; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=X9qw7YzFGlFaoDmp/Ld9yRvNLNmzyUxB788+zSCyjy8=; b=tjzFFs7ZujSy77x3fQI+o5oyZnvTA3/xoUwwftZLu4XVzl1eDI2UTWyNgm5ZiRaFVn XY/WUtLp4syIMLlcE6VKTnZiKWZTcM4qYF1cj8xEHGQFemuK3Tmz6Zug1FMGX2yiZYmK pvGHnSf03eIoXBD2GzzCrOv0+PnZR/e8jL0klsYkQo4jfJhJM5RkAP2cqFgeJITiWY6h all3XILKOesbEEtrLcxIYBcAbhZ64xLmPme4rTwfBHz8DcgG5/AQ3QCdA95nUIO9mKOD OtmOf4w1Zx42S2/kiR2qQ52WzGu7O0vdgMv/0ojHY8JkSS4CDSXUNemmmv02S15s6CgZ ijaw== X-Forwarded-Encrypted: i=1; AJvYcCX1J5VNna7xiCeDb/o8LRr7DS0Wvbrs8r584Rl4LZji8nf5Fn/wNfg4LWUS6pLPrkKSjuQzlzSDvypl//k=@vger.kernel.org X-Gm-Message-State: AOJu0YyS0nYfpvLShKjkQRu4gEb0Y9LL0CO0qIxEK+5Ot/53FRCVbuHq 2kVeh+Q25nYrOp7LIOdXQysOL7Lf4iYpoVIIv3TlbV6CA8WGDEln9aux X-Gm-Gg: ASbGncvaXS1wEj8uFb+4I9W4hXuGF6fYyo4Bn6NF5UrgBfpfAirt8CXcbX7UoiMuP7/ KM/ATP9fj1KjW3ttbPGkRmLyX1bT0yyIUE6u3EfoRPlJF7J0CAZhhZaPVEEOQSfkon02qfZe1b/ cjb77nqEkZ6R4RVHhiNJec2oYIcpGLJmAeFfq8pAO+bmzaSKoBrs8dGAdPf/kuh++FMPE2QfkCm 08FbvJBGx1zckBljrf088613QIitwdy60FLNE+QUFtA2qzoZq/VcPWRStvrml2vF4H2yMLYtUy3 DOkISppQB1adnYrOUUDzz9h8a9qgSJwDcxve3l4+I2ULc8ht36NUGIA021CAkSUdZ27UTuRqhKX T6NtHBKPOLfBYgKxnb1VuA1segwXquuIvN139J8eR/SfZ/ptqbKXbpl+LyBB878AhDu0Urp1Htw == X-Google-Smtp-Source: AGHT+IGHSkz/u8h+id2h+2kiN6Hn0irMMkmqsw1GYisIjjIpCXgcmApZJ3oJAtOVfvQCuYhZnaVvuQ== X-Received: by 2002:a05:6a20:2450:b0:342:a7cd:9221 with SMTP id adf61e73a8af0-353a18b4d2amr8121683637.20.1762745562607; Sun, 09 Nov 2025 19:32:42 -0800 (PST) Received: from wanpengli.. ([124.93.80.37]) by smtp.googlemail.com with ESMTPSA id 41be03b00d2f7-ba900fa571esm10913877a12.26.2025.11.09.19.32.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Nov 2025 19:32:42 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH 01/10] sched: Add vCPU debooster infrastructure Date: Mon, 10 Nov 2025 11:32:22 +0800 Message-ID: <20251110033232.12538-2-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com> References: <20251110033232.12538-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li From: Wanpeng Li Introduce foundational infrastructure for the vCPU debooster mechanism to improve yield_to() effectiveness in virtualization workloads. Add per-rq tracking fields for rate limiting (yield_deboost_last_time_ns) and debouncing (yield_deboost_last_src/dst_pid, last_pair_time_ns). Introduce global sysctl knob sysctl_sched_vcpu_debooster_enabled for runtime control, defaulting to enabled. Add debugfs interface for observability and initialization in sched_init(). The infrastructure is inert at this stage as no deboost logic is implemented yet, allowing independent verification that existing behavior remains unchanged. Signed-off-by: Wanpeng Li --- kernel/sched/core.c | 7 +++++-- kernel/sched/debug.c | 3 +++ kernel/sched/fair.c | 5 +++++ kernel/sched/sched.h | 9 +++++++++ 4 files changed, 22 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f754a60de848..03380790088b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8706,9 +8706,12 @@ void __init sched_init(void) #endif /* CONFIG_CGROUP_SCHED */ =20 for_each_possible_cpu(i) { - struct rq *rq; + struct rq *rq =3D cpu_rq(i); + /* init per-rq debounce tracking */ + rq->yield_deboost_last_src_pid =3D -1; + rq->yield_deboost_last_dst_pid =3D -1; + rq->yield_deboost_last_pair_time_ns =3D 0; =20 - rq =3D cpu_rq(i); raw_spin_lock_init(&rq->__lock); rq->nr_running =3D 0; rq->calc_load_active =3D 0; diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 02e16b70a790..905f303af752 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -508,6 +508,9 @@ static __init int sched_init_debug(void) debugfs_create_file("tunable_scaling", 0644, debugfs_sched, NULL, &sched_= scaling_fops); debugfs_create_u32("migration_cost_ns", 0644, debugfs_sched, &sysctl_sche= d_migration_cost); debugfs_create_u32("nr_migrate", 0644, debugfs_sched, &sysctl_sched_nr_mi= grate); + debugfs_create_u32("sched_vcpu_debooster_enabled", 0644, debugfs_sched, + &sysctl_sched_vcpu_debooster_enabled); + =20 sched_domains_mutex_lock(); update_sched_domain_debugfs(); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5b752324270b..5b7fcc86ccff 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -81,6 +81,11 @@ static unsigned int normalized_sysctl_sched_base_slice = =3D 700000ULL; =20 __read_mostly unsigned int sysctl_sched_migration_cost =3D 500000UL; =20 +/* + * vCPU debooster sysctl control + */ +unsigned int sysctl_sched_vcpu_debooster_enabled __read_mostly =3D 1; + static int __init setup_sched_thermal_decay_shift(char *str) { pr_warn("Ignoring the deprecated sched_thermal_decay_shift=3D option\n"); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index adfb6e3409d7..e9b4be024f89 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1292,6 +1292,13 @@ struct rq { unsigned int push_busy; struct cpu_stop_work push_work; =20 + /* vCPU debooster rate-limit */ + u64 yield_deboost_last_time_ns; + /* per-rq debounce state to avoid cross-CPU races */ + pid_t yield_deboost_last_src_pid; + pid_t yield_deboost_last_dst_pid; + u64 yield_deboost_last_pair_time_ns; + #ifdef CONFIG_SCHED_CORE /* per rq */ struct rq *core; @@ -2816,6 +2823,8 @@ extern int sysctl_resched_latency_warn_once; =20 extern unsigned int sysctl_sched_tunable_scaling; =20 +extern unsigned int sysctl_sched_vcpu_debooster_enabled; + extern unsigned int sysctl_numa_balancing_scan_delay; extern unsigned int sysctl_numa_balancing_scan_period_min; extern unsigned int sysctl_numa_balancing_scan_period_max; --=20 2.43.0 From nobody Sat Feb 7 18:42:25 2026 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 116962868A9 for ; Mon, 10 Nov 2025 03:32:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745568; cv=none; b=lEjzSQGwHgVuQi+Ywq62MuvfK57ir2CpYad8IOMtqB9MTci6UQIs34650ZZVpQ4RgiYfIcgJvG26YE/9l+dI9OzunlwJzS7kylonqHRRoehPt182AgjaL43NaNhiv4B+YL78WzfCxV+46ptn83Vt51O1fozZOx5NPwsOgxYmWCw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745568; c=relaxed/simple; bh=wbPUX9n8s+drIAX87sNapw/D2lFZMgm0X/R+CGmmVxQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UxLzAYIMa5+KdwDuC/r9VWG8HPV23x6C+yAocfVfZtcoWpJU6vOeN+WYtu5auWgMEbm/mO67SKRlz8X00B/czgKbKxX/5hYgNv+VuDFI5sObYvGyE6/gDgyLeGdLiGm7fG+eu3P8ZEeYNp9bka6YzBcHOVzE3i2v4vEKvma0eAI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FPQTIbZA; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FPQTIbZA" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-7aa9be9f03aso2130084b3a.2 for ; Sun, 09 Nov 2025 19:32:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762745566; x=1763350366; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=njeTN44gIUPeOIH83iq5dF5dVqD8fgi+Xi9oci3U0Kg=; b=FPQTIbZAU1KO4n4qkSJRdtbpnVcVh4X9XvM+TfIrdyedQg+cXHzXzio/JZSoehhzLl pA/P2XXEYUVburC1LW70vu/7t6XLBQlr75RL45YAmu6oGQGAu5lAgqpTp5b/0uWg7a30 MDMxF6wozDPnmFXmMwACXnq2WrSK+bGbRO5OWeXo1Mj7eq0Wlz8T5gj2VuHWDBObsIfC RN3Ap3YvIT3UokXZe9eU0+9DvLKLpoqgHZDbs0KT6H0Q6o+wEpzdrf3EpDrGXWw2MdXv 3REACQG68ndVt9HuJRE0jeYSbB5HzfVnpp4FTN4HvFgccOgi8tLVBoi/MpWZAU5BmTlp GraQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762745566; x=1763350366; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=njeTN44gIUPeOIH83iq5dF5dVqD8fgi+Xi9oci3U0Kg=; b=jr0kGV0oJfVTPSxAi+wlOE4bj4Ka+maIzajRVv2giBaDI1MPn2kdV4eAHXXpbWTDCh 5QihmddbADBJ8NefgE3vLZPuaVfkdgGCmEpI30WRxTkenrY2a7MpaOLUW2JhOocPyhTD iLpcE4GKxuR5y8TVrvNtJkm8kEFRA1ZUTtiF8LuGiAt6FPcwFZVMwWyJ+MY2YG8sG02d FarEaWKNg6pg7eGZ3KWRcQumzzfT5BQfoXqWMgCckn4bbH1SDuBwyyzTWvqgjW4mXynr CgEWaYhOL/l+FJ6JDQ39c71VYnXCyhyaePHVd+lGZ4YJVylDr6IK0305I1xKnYikbgvX JtEw== X-Forwarded-Encrypted: i=1; AJvYcCUV72zy5UJ8Egnl9vAueO1hsqBTYDRNkUbofmd8ENwHWRud32gTeqWscodxe3VtN62WwBlGCLb9bhrHSjg=@vger.kernel.org X-Gm-Message-State: AOJu0YwZLODlvYx8UfhSMPYePLIVepAN9f1IJ2EGovW0tKne4ffJ2G5w YHNxDtbe1PMCOTkUdM1SrEWeRpGsMgFmEwJM+2ivxIyGHwJ7obJ1yqDUKmoCrUl8psFVew== X-Gm-Gg: ASbGncvDbe+r4Owi1Jc8wDegVXTefk4/G5jBJ7lDiBeYIam0utAnlipzlNizn78rIbh 4EMwpvLYddJn5F+Iy2W9EOQEl2B0X49j9fEDtkY7urFAzgu37qDkpHYuMbRd66fBPOFhQr5PIH1 GnIz6NHQp4Jglh9ZqpgjQ/cy4TfYPG68OeIuyEXWw0yStX/4SvYPV+WyTFFwZpGyd9atcBvWnC2 NqBCJs0u72ZuDuV7q6zMU8qS0l2uWxYHsB42v/8fOK0dwOjHoQO6PhgblI28xiJR0SOteBfzCR0 tw7BOXtfcBZdPpAwVn3g97wbs6rh03rGl/xxEVcqFzKWd3O4RhfpKlK8jEBeAO5KWCq3PRyWbca uWyMMoQ+3A0g/2kJsYhnApIBpggGZz91hbxlXcymeWk2rx2ArHGnupAZaU9v2zIx/kjXg4Bp2hA == X-Google-Smtp-Source: AGHT+IGgWj+riHl/0AxPM6/yf+3pf97HB4xt0VxRxXs/RO4msxXfu/kysFJz978Pl0ItdGF8SGUj7A== X-Received: by 2002:a05:6a20:3d82:b0:342:a7cd:9214 with SMTP id adf61e73a8af0-353a20d6c44mr9577178637.23.1762745566383; Sun, 09 Nov 2025 19:32:46 -0800 (PST) Received: from wanpengli.. ([124.93.80.37]) by smtp.googlemail.com with ESMTPSA id 41be03b00d2f7-ba900fa571esm10913877a12.26.2025.11.09.19.32.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Nov 2025 19:32:46 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH 02/10] sched/fair: Add rate-limiting and validation helpers Date: Mon, 10 Nov 2025 11:32:23 +0800 Message-ID: <20251110033232.12538-3-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com> References: <20251110033232.12538-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li From: Wanpeng Li Implement core safety mechanisms for yield deboost operations. Add yield_deboost_rate_limit() for high-frequency gating to prevent excessive overhead on compute-intensive workloads. Use 6ms threshold with lockless READ_ONCE/WRITE_ONCE to minimize cache line contention while providing effective rate limiting. Add yield_deboost_validate_tasks() for comprehensive validation ensuring feature is enabled via sysctl, both tasks are valid and distinct, both belong to fair_sched_class, entities are on the same runqueue, and tasks are runnable. The rate limiter prevents pathological high-frequency cases while validation ensures only appropriate task pairs proceed. Both functions are static and will be integrated in subsequent patches. Signed-off-by: Wanpeng Li --- kernel/sched/fair.c | 68 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5b7fcc86ccff..a7dc21c2dbdb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8990,6 +8990,74 @@ static void put_prev_task_fair(struct rq *rq, struct= task_struct *prev, struct t } } =20 +/* + * High-frequency yield gating to reduce overhead on compute-intensive wor= kloads. + * Returns true if the yield should be skipped due to frequency limits. + * + * Optimized: single threshold with READ_ONCE/WRITE_ONCE, refresh timestam= p on every call. + */ +static bool yield_deboost_rate_limit(struct rq *rq, u64 now_ns) +{ + u64 last =3D READ_ONCE(rq->yield_deboost_last_time_ns); + bool limited =3D false; + + if (last) { + u64 delta =3D now_ns - last; + limited =3D (delta <=3D 6000ULL * NSEC_PER_USEC); + } + + WRITE_ONCE(rq->yield_deboost_last_time_ns, now_ns); + return limited; +} + +/* + * Validate tasks and basic parameters for yield deboost operation. + * Performs comprehensive safety checks including feature enablement, + * NULL pointer validation, task state verification, and same-rq requireme= nt. + * Returns false with appropriate debug logging if any validation fails, + * ensuring only safe and meaningful yield operations proceed. + */ +static bool __maybe_unused yield_deboost_validate_tasks(struct rq *rq, str= uct task_struct *p_target, + struct task_struct **p_yielding_out, + struct sched_entity **se_y_out, + struct sched_entity **se_t_out) +{ + struct task_struct *p_yielding; + struct sched_entity *se_y, *se_t; + u64 now_ns; + + if (!sysctl_sched_vcpu_debooster_enabled) + return false; + + if (!rq || !p_target) + return false; + + now_ns =3D rq->clock; + + if (yield_deboost_rate_limit(rq, now_ns)) + return false; + + p_yielding =3D rq->curr; + if (!p_yielding || p_yielding =3D=3D p_target || + p_target->sched_class !=3D &fair_sched_class || + p_yielding->sched_class !=3D &fair_sched_class) + return false; + + se_y =3D &p_yielding->se; + se_t =3D &p_target->se; + + if (!se_t || !se_y || !se_t->on_rq || !se_y->on_rq) + return false; + + if (task_rq(p_yielding) !=3D rq || task_rq(p_target) !=3D rq) + return false; + + *p_yielding_out =3D p_yielding; + *se_y_out =3D se_y; + *se_t_out =3D se_t; + return true; +} + /* * sched_yield() is very simple */ --=20 2.43.0 From nobody Sat Feb 7 18:42:25 2026 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBA99287506 for ; Mon, 10 Nov 2025 03:32:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745572; cv=none; b=ONDLBwwnqXt8WjWlcE2TG3NxN7q2agmeTlh0g22T4XGS+EyPwmpwKk8M16tR8Jn9679WlpZg3UHnfW+mOWqTgUjuHWG2nBNy/sbYdGR1g7AkZe241iT5QBiMbOmx0+FtoS8ZCXQmuZP8e11ISiwBy1t3aY/uAUTtWl24aHoE2hg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745572; c=relaxed/simple; bh=/Sbi2RsDbKFxh18qy4P7m0nH9/yJnCUGEiCikFyDTJk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pKoQKqE9sJIII8QeDPSL7gdc8KvRkuOkHdme3soeiOJqgqKsh70Zi8tDiOk0Bel814MZ+YL9zisx6yGFez11PBQrXKC0XvW5xRTkZ7iCKcqqK3Ao8V0ODDZIKgGpsf6vYlE3zIyGlLJswC5lbwIRPK7agitOyJOoGaYY8xOQMEc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Z8dnngHt; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Z8dnngHt" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-297eca3d0a3so15513375ad.0 for ; Sun, 09 Nov 2025 19:32:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762745570; x=1763350370; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mJ2T78O7RQhMqXnI5opkr+3j7nLw8GADo232iLhnq+M=; b=Z8dnngHt/b5aMUY9WnaF+GYgDuxVtosMTTkzSJ+9CRgjsgwtAjHtJ56ZJDVfM7Pb01 X2FPrOqHu+midWrOugGwvT3a2VcHqXSNi+YWkYmiAGgrAUKnDO9UqCrNb3sYViOlYGju KPuj/1p6cSUp+Z8AOxC1DyJXV4UJsj0+Ur+lxtmbV/mjEtMk8iOd+ZBCR1px9als/aao F0RjxJh7oerqXtNS8wGyQXTeRqLPBJsYZmbHahE/n4u8AGK7+uY/GpHz4eilqihvOAdC KNGtYwkonic8LMn45PTiBVhyrVYMbXKxqY7xn4p3FhJNDs8eJ56f1nFq7VipI5sDn10E gK3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762745570; x=1763350370; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=mJ2T78O7RQhMqXnI5opkr+3j7nLw8GADo232iLhnq+M=; b=dcZ3p9vbXU7Twebg+iUhsTEy6Q/oozv4hW+jQ/acEnLSGq3Oo2YUNjNkgocitteIJl W7x2K8y96AO+5BGCn/Qxvai/cY2nSCd6solKh0aP/N3H47JmAMQuwYqGlwCXxs9luSKd 8OAieJvzhufwR+3wyGPxcRALuu26gBCDAGO2mGU9VqXozOf47gnHaqIzfahl4CYBVkpK whlVmDJMg31toO+ULqAvI3YWpUCtCo6v5gykqcDyY9sZPHSX+IerSBqfgeQwVrPz6r23 RrJYHqrvJD/gNIrXmWKWL2WXGy1owISaeLCC3BzKT8ax/57kH50mwt7vcsiFQzXC6Jb9 4PrQ== X-Forwarded-Encrypted: i=1; AJvYcCWlnBDZnOThN8H9L63bCQyT4cZg709mCgU/jblR15BWiIi9H+YtVd4pyuqAXL58gqNle/LVvbZKS91on4A=@vger.kernel.org X-Gm-Message-State: AOJu0YxuXrtxTxbD9FM39TImRUbFk6jmZZkirOAY3c73z+yO0h+EgJAq Qhei2YqVY0ousIxcTyyRy2VCoPf/Cf/4cxWTFtiVAU+zzH9/vTBj+iQLQHoMxUa9l0rwgg== X-Gm-Gg: ASbGncs+FMS8AhQWF22nxcJZJ0wg/VwwVH0guSj7OYh7HWNsdN5Fsq4jl1Lpcxycr9C BYsPHA7YsaTIUEBCEETEgjNJPXBQ2Dbn77ikol4m4PO9UjJBd77WHOHLMbKPfs9cD3877WY5WKB 4e4LoMtm3qPV+q4s4ve52a8U02wXAQD9XrSfbSpErnjrIx7YA2SCvlOvweotn5aaW6niTHrLt/f meFtv+MPxPN/GYzFs5094CPl1oDIVvtc4NgBCEeTcMAG++Dhwy0aoMwQq+9adgA1oeIDFf95uPH 1Vv8IoMix+elRyfzX/48kE1/tI6HwBfV5qvdRs6O96sUcO4kfxVHfmwu7RTSE+5f4b9vPQ80E51 fa41L7QjUfcuhyuIJUvx8hpw4H48vol99TI22I4mUxQQlewK58Lm2Nt+z+rR/WiByaZSQPB52ky O47oYt4CA0 X-Google-Smtp-Source: AGHT+IH8IURFUXv0dHkJ6ZHGGTBAvnY0vbnOOUIKPpXB78BdN5c4T+GwBYMwLuqJLnGxvgOdporgtA== X-Received: by 2002:a17:903:f8c:b0:26c:2e56:ec27 with SMTP id d9443c01a7336-297e5627d67mr89334695ad.19.1762745570152; Sun, 09 Nov 2025 19:32:50 -0800 (PST) Received: from wanpengli.. ([124.93.80.37]) by smtp.googlemail.com with ESMTPSA id 41be03b00d2f7-ba900fa571esm10913877a12.26.2025.11.09.19.32.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Nov 2025 19:32:49 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH 03/10] sched/fair: Add cgroup LCA finder for hierarchical yield Date: Mon, 10 Nov 2025 11:32:24 +0800 Message-ID: <20251110033232.12538-4-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com> References: <20251110033232.12538-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li From: Wanpeng Li Implement yield_deboost_find_lca() to locate the lowest common ancestor (LCA) in the cgroup hierarchy for EEVDF-aware yield operations. The LCA represents the appropriate hierarchy level where vruntime adjustments should be applied to ensure fairness is maintained across cgroup boundaries. This is critical for virtualization workloads where vCPUs may be organized in nested cgroups. For CONFIG_FAIR_GROUP_SCHED, walk up both entity hierarchies by aligning depths, then ascend together until a common cfs_rq is found. For flat hierarchy, verify both entities share the same cfs_rq. Validate that meaningful contention exists (nr_queued > 1) and ensure the yielding entity has non-zero slice for safe penalty calculation. The function operates under rq->lock protection. This static helper will be integrated in subsequent patches. Signed-off-by: Wanpeng Li --- kernel/sched/fair.c | 60 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a7dc21c2dbdb..740c002b8f1c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9058,6 +9058,66 @@ static bool __maybe_unused yield_deboost_validate_ta= sks(struct rq *rq, struct ta return true; } =20 +/* + * Find the lowest common ancestor (LCA) in the cgroup hierarchy for EEVDF. + * We walk up both entity hierarchies under rq->lock protection. + * Task migration requires task_rq_lock, ensuring parent chains remain sta= ble. + * We locate the first common cfs_rq where both entities coexist, represen= ting + * the appropriate level for vruntime adjustments and EEVDF field updates + * (deadline, vlag) to maintain scheduler consistency. + */ +static bool __maybe_unused yield_deboost_find_lca(struct sched_entity *se_= y, struct sched_entity *se_t, + struct sched_entity **se_y_lca_out, + struct sched_entity **se_t_lca_out, + struct cfs_rq **cfs_rq_common_out) +{ + struct sched_entity *se_y_lca, *se_t_lca; + struct cfs_rq *cfs_rq_common; + +#ifdef CONFIG_FAIR_GROUP_SCHED + se_t_lca =3D se_t; + se_y_lca =3D se_y; + + while (se_t_lca && se_y_lca && se_t_lca->depth !=3D se_y_lca->depth) { + if (se_t_lca->depth > se_y_lca->depth) + se_t_lca =3D se_t_lca->parent; + else + se_y_lca =3D se_y_lca->parent; + } + + while (se_t_lca && se_y_lca) { + if (cfs_rq_of(se_t_lca) =3D=3D cfs_rq_of(se_y_lca)) { + cfs_rq_common =3D cfs_rq_of(se_t_lca); + goto found_lca; + } + se_t_lca =3D se_t_lca->parent; + se_y_lca =3D se_y_lca->parent; + } + return false; +#else + if (cfs_rq_of(se_y) !=3D cfs_rq_of(se_t)) + return false; + cfs_rq_common =3D cfs_rq_of(se_y); + se_y_lca =3D se_y; + se_t_lca =3D se_t; +#endif + +found_lca: + if (!se_y_lca || !se_t_lca) + return false; + + if (cfs_rq_common->nr_queued <=3D 1) + return false; + + if (!se_y_lca->slice) + return false; + + *se_y_lca_out =3D se_y_lca; + *se_t_lca_out =3D se_t_lca; + *cfs_rq_common_out =3D cfs_rq_common; + return true; +} + /* * sched_yield() is very simple */ --=20 2.43.0 From nobody Sat Feb 7 18:42:25 2026 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B46B1288513 for ; Mon, 10 Nov 2025 03:32:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745576; cv=none; b=PLwOZfhevJuOuJNMYuWDmw92qRAn48yCqtcd5wBFAdpoWV8ykqIiW7+6I9bNY4zvvUM9kBs1BxKoz24rPy3mHxGTbGDgaBsfLx18umtdFG7PlKNJTuVk883wRWrVlR2faSmYB4hlIe0s71kAtDjMr4vXxX6U4tS1E8NQJN88sxU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745576; c=relaxed/simple; bh=0DhqCZEAUwk5GW0YZMqACGIGSxBV+gPljFPnDh14bNQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=c/suqKsozCLQSfDO0mzCQPvjP/ASjf+BATPTLT7yfSFWuQOyBgHPCbExaWLKdzGzQ+zN0/ns84t0EZP1O9s/fsS9ZYUqKrrrmxWH0OSONVElXhsloFekbWO8RTfYc89znCx27mXsG/HEz9K4pbACAi1Q3AVF4cGswClSGFA8OXY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MELO5OJP; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MELO5OJP" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-29812589890so7334855ad.3 for ; Sun, 09 Nov 2025 19:32:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762745574; x=1763350374; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mPX8tZ0O6TTAOa2ZKkEwpS16B7GgfDVYswzaVPR09nc=; b=MELO5OJPOkHUDrAW/N82XYbs//lVRGq3kiGOw6E77r7YgtQgA+fcL2d3P1ODrTmMYA JXVRANrTvcwpR1kWRPuWSZ1FCn7YU/fE2QlyMe7fa6CTlBn2qd0uipatl6qN98HHS6mR dv5Al01BBolAFn96NZUC1NE4Jcs7Jh5VKMXCkhsszRFruwrLW40YBtabsMW3ftuIxS50 MNCOQengD5e1FMAguuScpBFWW+phDjufPcyb4jz/0f1WSpAB+CaoMafOKbLgH6gej5tM onq+pfmZho5iQ6TFDQq/wgs+fEaUnyeS/sj3+2bEjDzNM1+SenNkDSlf+lceEQS8hgcZ o87A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762745574; x=1763350374; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=mPX8tZ0O6TTAOa2ZKkEwpS16B7GgfDVYswzaVPR09nc=; b=mh/6X6euazljjGThDNBZmMwPEdCc+qOsQKGPqGhQrkliStRC3gZCTyARP9nQjmoZC/ 2Lgk3obwDxth3lYziFUevEkU9yu2716XZpgqF33lrU3eNUo204O7Oktm73iXvr9giFCF FRo12ZO9DFbJWiV7nXSmxGf3PnA1Z1N8kX6j/kamkAJvRw032ZbpAexosuMzasrmDe0K 2F/Cxk3eOBOyYGyDiMm9zMWH2zr+MdWVvs/UeSjxYh4H3o+JpiYlsekCmTBhEuQn3Ijz Mf3mYdw2dAudbyp6gUfeERwQ7xwFOj/sSA7KHQZQ7JMchS4V/IoagkF8z6CD8NTgbUzG +JjA== X-Forwarded-Encrypted: i=1; AJvYcCU06Y0Az0fjsYB4OrfKeYk9t4ugDQLqUl2KPJTqJb/RHtIp5G3PsE4R7qKA1naTXb4r7sSxlBAR26GEBRw=@vger.kernel.org X-Gm-Message-State: AOJu0Yyhv8fyBbV5oovyAMBA3TOuMnmoCGzUwwIu7F0Ze4xhHpD8CvfA ESbfefzpEsarOjiZadd/GRjrtkom7CI4iuDMA8VhSw8neNmlcN01Hu8j X-Gm-Gg: ASbGncu6y3u8vuDcgWd2Qq+p4DVwkPxavgQ0r8LYdT8VXC4aporUznL7ZYF77ckLnqG cejRT6wcCCNNLr3MAd8fJ61+cL0KUudEzKzvMIhkafmBat6RndoJN6l607AJgifvLCfFxpm1yQU hhvWaeWP4LWHJ46TKdIF9PVoT/lqN5f+PyW9PKgeZ3QsRqU2lxDMtTko3p5U8wXPTuAYCe9CwLc pQQZiDCdKdxoJbnnMHft5UL8lKvY0PYViaj4y1Xz18PAaOChHQcBOE9FT9SxTiR+9yEb9qMrVhH HYvdVpAf/nwDKYQuYacLnczNZpncQzOdwSyTbaQRAZsV0DIJ44AGspX7w9g9InQMNI2SMbzXQgl Dwzi1YILfW4edOn3bhMztWa8hZCvQYzY+cUmvDs0rEusF9jH8e+HucpOJVVLJ+nGrH8DHRutFE7 omfWd9AFWM X-Google-Smtp-Source: AGHT+IGjidzUsSplxn0P3HZuGz53BOScmTeAp5wKz3ixJpJAL/2ddgSpIa9esU2unCmxQgSvrv47wg== X-Received: by 2002:a17:903:2c03:b0:295:5a15:63db with SMTP id d9443c01a7336-297e570bf2emr92689815ad.61.1762745573950; Sun, 09 Nov 2025 19:32:53 -0800 (PST) Received: from wanpengli.. ([124.93.80.37]) by smtp.googlemail.com with ESMTPSA id 41be03b00d2f7-ba900fa571esm10913877a12.26.2025.11.09.19.32.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Nov 2025 19:32:53 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH 04/10] sched/fair: Add penalty calculation and application logic Date: Mon, 10 Nov 2025 11:32:25 +0800 Message-ID: <20251110033232.12538-5-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com> References: <20251110033232.12538-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Wanpeng Li From: Wanpeng Li Implement core penalty calculation and application mechanisms for yield deboost operations. Add yield_deboost_apply_debounce() for reverse-pair debouncing to prevent ping-pong behavior. When A=E2=86=92B then B=E2=86=92A occurs within= ~600us, downscale the penalty. Add yield_deboost_calculate_penalty() to calculate vruntime penalty based on the fairness gap (vruntime delta between yielding and target tasks), scheduling granularity with safety floor for abnormal values, and queue-size-based caps (2 tasks: 6.0=C3=97gran, 3: 4.0=C3=97, 4-6: 2.5= =C3=97, 7-8: 2.0=C3=97, 9-12: 1.5=C3=97, >12: 1.0=C3=97). Apply special handling fo= r zero gap with refined multipliers and 10% boost weighting on positive gaps. Add yield_deboost_apply_penalty() to apply the penalty with overflow protection and update EEVDF fields (deadline, vlag) and min_vruntime. The penalty is tuned to provide meaningful preference while avoiding starvation, scales with queue depth, and prevents oscillation through debouncing. These static functions will be integrated in the next patch. Signed-off-by: Wanpeng Li --- kernel/sched/fair.c | 153 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 153 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 740c002b8f1c..4bad324f3662 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9118,6 +9118,159 @@ static bool __maybe_unused yield_deboost_find_lca(s= truct sched_entity *se_y, str return true; } =20 +/* + * Apply debounce for reverse pair within ~600us to reduce ping-pong. + * Downscales penalty to max(need, gran) when the previous pair was target= ->source, + * and updates per-rq debounce tracking fields to avoid cross-CPU races. + */ +static u64 yield_deboost_apply_debounce(struct rq *rq, struct sched_entity= *se_t, + u64 penalty, u64 need, u64 gran) +{ + u64 now_ns =3D rq->clock; + struct task_struct *p_yielding =3D rq->curr; + struct task_struct *p_target =3D task_of(se_t); + + if (p_yielding && p_target) { + pid_t src_pid =3D p_yielding->pid; + pid_t dst_pid =3D p_target->pid; + pid_t last_src =3D rq->yield_deboost_last_src_pid; + pid_t last_dst =3D rq->yield_deboost_last_dst_pid; + u64 last_ns =3D rq->yield_deboost_last_pair_time_ns; + + if (last_src =3D=3D dst_pid && last_dst =3D=3D src_pid && + (now_ns - last_ns) <=3D (600ULL * NSEC_PER_USEC)) { + u64 alt =3D need; + if (alt < gran) + alt =3D gran; + if (penalty > alt) + penalty =3D alt; + } + + /* Update per-rq tracking */ + rq->yield_deboost_last_src_pid =3D src_pid; + rq->yield_deboost_last_dst_pid =3D dst_pid; + rq->yield_deboost_last_pair_time_ns =3D now_ns; + } + + return penalty; +} + +/* + * Calculate penalty with debounce logic for EEVDF yield deboost. + * Computes vruntime penalty based on fairness gap (need) plus granularity, + * applies queue-size-based caps to prevent excessive penalties in small q= ueues, + * and implements reverse-pair debounce (~300us) to reduce ping-pong effec= ts. + * Returns 0 if no penalty needed, otherwise returns clamped penalty value. + */ +static u64 __maybe_unused yield_deboost_calculate_penalty(struct rq *rq, s= truct sched_entity *se_y_lca, + struct sched_entity *se_t_lca, struct sched_entity *se_t, + int nr_queued) +{ + u64 gran, need, penalty, maxp; + u64 gran_floor; + u64 weighted_need, base; + + gran =3D calc_delta_fair(sysctl_sched_base_slice, se_y_lca); + /* Low-bound safeguard for gran when slice is abnormally small */ + gran_floor =3D calc_delta_fair(sysctl_sched_base_slice >> 1, se_y_lca); + if (gran < gran_floor) + gran =3D gran_floor; + + need =3D 0; + if (se_t_lca->vruntime > se_y_lca->vruntime) + need =3D se_t_lca->vruntime - se_y_lca->vruntime; + + /* Apply 10% boost to need when positive (weighted_need =3D need * 1.10) = */ + penalty =3D gran; + if (need) { + /* weighted_need =3D need + 10% */ + weighted_need =3D need + need / 10; + /* clamp to avoid overflow when adding to gran (still capped later) */ + if (weighted_need > U64_MAX - penalty) + weighted_need =3D U64_MAX - penalty; + penalty +=3D weighted_need; + } + + /* Apply debounce via helper to avoid ping-pong */ + penalty =3D yield_deboost_apply_debounce(rq, se_t, penalty, need, gran); + + /* Upper bound (cap): slightly more aggressive for mid-size queues */ + if (nr_queued =3D=3D 2) + maxp =3D gran * 6; /* Strongest push for 2-task ping-pong */ + else if (nr_queued =3D=3D 3) + maxp =3D gran * 4; /* 4.0 * gran */ + else if (nr_queued <=3D 6) + maxp =3D (gran * 5) / 2; /* 2.5 * gran */ + else if (nr_queued <=3D 8) + maxp =3D gran * 2; /* 2.0 * gran */ + else if (nr_queued <=3D 12) + maxp =3D (gran * 3) / 2; /* 1.5 * gran */ + else + maxp =3D gran; /* 1.0 * gran */ + + if (penalty < gran) + penalty =3D gran; + if (penalty > maxp) + penalty =3D maxp; + + /* If no need, apply refined baseline push (low risk + mid risk combined)= . */ + if (need =3D=3D 0) { + /* + * Baseline multiplier for need=3D=3D0: + * 2 -> 1.00 * gran + * 3 -> 0.9375 * gran + * 4=E2=80=936 -> 0.625 * gran + * 7=E2=80=938 -> 0.50 * gran + * 9=E2=80=9312 -> 0.375 * gran + * >12 -> 0.25 * gran + */ + base =3D gran; + if (nr_queued =3D=3D 3) + base =3D (gran * 15) / 16; /* 0.9375 */ + else if (nr_queued >=3D 4 && nr_queued <=3D 6) + base =3D (gran * 5) / 8; /* 0.625 */ + else if (nr_queued >=3D 7 && nr_queued <=3D 8) + base =3D gran / 2; /* 0.5 */ + else if (nr_queued >=3D 9 && nr_queued <=3D 12) + base =3D (gran * 3) / 8; /* 0.375 */ + else if (nr_queued > 12) + base =3D gran / 4; /* 0.25 */ + + if (penalty < base) + penalty =3D base; + } + + return penalty; +} + +/* + * Apply penalty and update EEVDF fields for scheduler consistency. + * Safely applies vruntime penalty with overflow protection, then updates + * EEVDF-specific fields (deadline, vlag) and cfs_rq min_vruntime to maint= ain + * scheduler state consistency. Returns true on successful application, + * false if penalty cannot be safely applied. + */ +static void __maybe_unused yield_deboost_apply_penalty(struct rq *rq, stru= ct sched_entity *se_y_lca, + struct cfs_rq *cfs_rq_common, u64 penalty) +{ + u64 new_vruntime; + + /* Overflow protection */ + if (se_y_lca->vruntime > (U64_MAX - penalty)) + return; + + new_vruntime =3D se_y_lca->vruntime + penalty; + + /* Validity check */ + if (new_vruntime <=3D se_y_lca->vruntime) + return; + + se_y_lca->vruntime =3D new_vruntime; + se_y_lca->deadline =3D se_y_lca->vruntime + calc_delta_fair(se_y_lca->sli= ce, se_y_lca); + se_y_lca->vlag =3D avg_vruntime(cfs_rq_common) - se_y_lca->vruntime; + update_min_vruntime(cfs_rq_common); +} + /* * sched_yield() is very simple */ --=20 2.43.0 From nobody Sat Feb 7 18:42:25 2026 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2637285C99 for ; Mon, 10 Nov 2025 03:32:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745580; cv=none; b=p8/meX4TmyjAGmdc33lIwTMs4D8vLug7k7i8ypGv7uxaJWQP2aumjQANfRhXR8SD14ZRXpB2brBEfXi0E8mTJpqYkphjk9phV5cvrzYZkqudS378TqLtEOMX56Im56QgdYi1nNXcdHNgmsNzljEajOp6dT1MwvzhB+nOgaCuJHQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745580; c=relaxed/simple; bh=K+lphNvEVvrv9gfmCYZhBpGCmBM0Fe+h3Fs8GflYSas=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uZQbsJBY/ed5JiuVYR80NS6EcOI4/kMfvANas7MOYLw6DHzBcsPDt7AMFOrr9wQwu9aKq+cE7QvP3QEEtoQ3HA4geK+fZM6whJqSU2E5ztk787iA8L8G9LHNIPGbR1VFga1Qif28BVuc36y1clrgv0aLONV2BhF/FeieoNRvNnY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gqqtVuFI; arc=none smtp.client-ip=209.85.215.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gqqtVuFI" Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-b99bfb451e5so1479023a12.2 for ; Sun, 09 Nov 2025 19:32:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762745578; x=1763350378; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=j5CylEexNgZw6tRk9eSa035lMMZnjoNawAO+XSe5fkM=; b=gqqtVuFI5BgXNC8FYPEreoEUH16NklznaayPlzRgrfS35rZpeXJaSZunlPI0/WVECF nJk7x2A5RwY7wBDJ28zJJXyXLJM5vcaAFva9Xl9mskpjdBboIV023a9BQ0NqmCI31L87 7R42C2azNqOgM2jsgcmUCFJoEFX5MH5txFg4bYvIDZ/m0tG2wRjuuTB5K/6KT0JjBPIW 6wb84XvzxF9Livb8BU9JyY0gvZ23BVOdaO+JLzdIjQSBqWMXKwd+6aHL0vnlVaQMsjzX /CJIjmm0tnNDBYayFmXHMQqNA9G4CV1WtCr80ZUoktlTe53w0fa34oLimARVNbvSWe8Y 8gyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762745578; x=1763350378; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=j5CylEexNgZw6tRk9eSa035lMMZnjoNawAO+XSe5fkM=; b=rul7604cJxiDM82VfbhXL8AY3O7RZ0SZfNklwWNkGt2KoeVVOxQ861Bld8IrFG1RR9 sdKx9amRJKGHmyc9rB9PAjJTMH2mFQRNGkMfq+zLAxNHNaO6XFGlMmHkn5E/epj5vs31 q0dBrhmHly2K+378QlUzjRu9TGJ/tZ3xqPp908MUpbmh+oL6N5sHcLKXgVPzQs3GgSht L1v3MxsfgambaiPyvig/QNH4fWN0C05oL+KCK0oSOFvXve3SOKW9Rmbav+E8rjfiioC1 ps6APZ08B4NE84zUnz+p7HYkOjcuBzBY+nPfjt8bT/PjNr3cMDKxUONh0st285Y0PdQ5 ECNQ== X-Forwarded-Encrypted: i=1; AJvYcCVOOc8spNz0N3uvI/HAqEmxMmin/qQNCVTQFRVlYk6dLQqmd6NHb/3DqLdqlD/VNzf3vVpvSpNpOReHU9E=@vger.kernel.org X-Gm-Message-State: AOJu0YzxkZ9p3YiHrhdVLvu5F/srQodB5jC7bXDd0uGGtd8yeWnhsbiV /iwz1ADegLfWQmi008rpSVCunWxiBcSZ0uFtoH8pymc9nA6ptNJ/9MxK X-Gm-Gg: ASbGnctj3xaoVhlssc+zB4A1pflwTquXiRzX8QKZGrlXvf84rFI1PxTFkNx+Pq0avdh 0h75c4sGC1iqqrkAIEgPhwG0HNSf4MtsqVygMcRoFn+xNs3sQZTyi+XYU9Btjf3DfHd/s6YYi65 l4gw22Bc2K71+MsiywNLmlxnCjA/E9umg3ZNz7hnefq4WvGBuE0oNkN661IZBkvhfM7RIK0b+L+ O9JnBaL7j0ro/GfUAriokYHdo1heyJ5DTH4YFamsvWpS+MW+8BQW9hcs30DJM1E0wwJvZkVJ/Eu 049YnUHwQi2LqyhqJSh2EirZXqywtCrQ0Wer3R8COeAFUjYCptveQnvjnadyPWgRUUYXPhP8sQU vzeIZQvdQDWPOpZgAvi4XZGP2cjiqyyXpZ38rTzahz5RK/HGg+ldyNPGlPKxWuoDa7/MnwN3rQg == X-Google-Smtp-Source: AGHT+IGez4kk9g88q/buP2zyNcZIHIq1gn1zm8dGzl3wEIxqyDd+cNXnlR/pLncvcW2A4Il/2swyiA== X-Received: by 2002:a17:902:e80e:b0:268:cc5:5e4e with SMTP id d9443c01a7336-297e561a8femr97012465ad.1.1762745577754; Sun, 09 Nov 2025 19:32:57 -0800 (PST) Received: from wanpengli.. ([124.93.80.37]) by smtp.googlemail.com with ESMTPSA id 41be03b00d2f7-ba900fa571esm10913877a12.26.2025.11.09.19.32.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Nov 2025 19:32:57 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH 05/10] sched/fair: Wire up yield deboost in yield_to_task_fair() Date: Mon, 10 Nov 2025 11:32:26 +0800 Message-ID: <20251110033232.12538-6-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com> References: <20251110033232.12538-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li From: Wanpeng Li Integrate the yield deboost mechanism into yield_to_task_fair() to improve yield_to() effectiveness for virtualization workloads. Add yield_to_deboost() as the main entry point that validates tasks, finds cgroup LCA, updates rq clock and accounting, calculates penalty, and applies EEVDF field adjustments. The integration point after set_next_buddy() and before yield_task_fair() works in concert with the existing buddy mechanism: set_next_buddy() provides immediate preference, yield_to_deboost() applies bounded vruntime penalty for sustained advantage, and yield_task_fair() completes the standard yield path. This is particularly beneficial for vCPU workloads where lock holder detection triggers yield_to(), the holder needs sustained preference to make progress, vCPUs may be organized in nested cgroups, high-frequency yields require rate limiting, and ping-pong patterns need debouncing. Operation occurs under rq->lock with bounded penalties. The feature can be disabled at runtime via /sys/kernel/debug/sched/sched_vcpu_debooster_enabled. Dbench workload in a virtualized environment (16 pCPUs host, 16 vCPUs per VM running dbench-16 benchmark) shows consistent gains: 2 VMs: +14.4% throughput 3 VMs: +9.8% throughput 4 VMs: +6.7% throughput Performance gains stem from more effective yield_to() behavior, enabling lock holders to make faster progress and reducing contention overhead in overcommitted scenarios. Signed-off-by: Wanpeng Li --- kernel/sched/fair.c | 58 +++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 54 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4bad324f3662..619af60b7ce6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9017,7 +9017,7 @@ static bool yield_deboost_rate_limit(struct rq *rq, u= 64 now_ns) * Returns false with appropriate debug logging if any validation fails, * ensuring only safe and meaningful yield operations proceed. */ -static bool __maybe_unused yield_deboost_validate_tasks(struct rq *rq, str= uct task_struct *p_target, +static bool yield_deboost_validate_tasks(struct rq *rq, struct task_struct= *p_target, struct task_struct **p_yielding_out, struct sched_entity **se_y_out, struct sched_entity **se_t_out) @@ -9066,7 +9066,7 @@ static bool __maybe_unused yield_deboost_validate_tas= ks(struct rq *rq, struct ta * the appropriate level for vruntime adjustments and EEVDF field updates * (deadline, vlag) to maintain scheduler consistency. */ -static bool __maybe_unused yield_deboost_find_lca(struct sched_entity *se_= y, struct sched_entity *se_t, +static bool yield_deboost_find_lca(struct sched_entity *se_y, struct sched= _entity *se_t, struct sched_entity **se_y_lca_out, struct sched_entity **se_t_lca_out, struct cfs_rq **cfs_rq_common_out) @@ -9162,7 +9162,7 @@ static u64 yield_deboost_apply_debounce(struct rq *rq= , struct sched_entity *se_t * and implements reverse-pair debounce (~300us) to reduce ping-pong effec= ts. * Returns 0 if no penalty needed, otherwise returns clamped penalty value. */ -static u64 __maybe_unused yield_deboost_calculate_penalty(struct rq *rq, s= truct sched_entity *se_y_lca, +static u64 yield_deboost_calculate_penalty(struct rq *rq, struct sched_ent= ity *se_y_lca, struct sched_entity *se_t_lca, struct sched_entity *se_t, int nr_queued) { @@ -9250,7 +9250,7 @@ static u64 __maybe_unused yield_deboost_calculate_pen= alty(struct rq *rq, struct * scheduler state consistency. Returns true on successful application, * false if penalty cannot be safely applied. */ -static void __maybe_unused yield_deboost_apply_penalty(struct rq *rq, stru= ct sched_entity *se_y_lca, +static void yield_deboost_apply_penalty(struct rq *rq, struct sched_entity= *se_y_lca, struct cfs_rq *cfs_rq_common, u64 penalty) { u64 new_vruntime; @@ -9303,6 +9303,52 @@ static void yield_task_fair(struct rq *rq) se->deadline +=3D calc_delta_fair(se->slice, se); } =20 +/* + * yield_to_deboost - deboost the yielding task to favor the target on the= same rq + * @rq: runqueue containing both tasks; rq->lock must be held + * @p_target: task to favor in scheduling + * + * Cooperates with yield_to_task_fair(): buddy provides immediate preferen= ce; + * this routine applies a bounded vruntime penalty at the cgroup LCA so the + * target keeps advantage beyond the buddy effect. EEVDF fields are updated + * to keep scheduler state consistent. + * + * Only operates on tasks resident on the same rq; throttled hierarchies a= re + * rejected early. Penalty is bounded by granularity and queue-size caps. + * + * Intended primarily for virtualization workloads where a yielding vCPU + * should defer to a target vCPU within the same runqueue. + * Does not change runnable order directly; complements buddy selection wi= th + * a bounded fairness adjustment. + */ +static void yield_to_deboost(struct rq *rq, struct task_struct *p_target) +{ + struct task_struct *p_yielding; + struct sched_entity *se_y, *se_t, *se_y_lca, *se_t_lca; + struct cfs_rq *cfs_rq_common; + u64 penalty; + + /* Step 1: validate tasks and inputs */ + if (!yield_deboost_validate_tasks(rq, p_target, &p_yielding, &se_y, &se_t= )) + return; + + /* Step 2: find LCA in cgroup hierarchy */ + if (!yield_deboost_find_lca(se_y, se_t, &se_y_lca, &se_t_lca, &cfs_rq_com= mon)) + return; + + /* Step 3: update clock and current accounting */ + update_rq_clock(rq); + if (se_y_lca !=3D cfs_rq_common->curr) + update_curr(cfs_rq_common); + + /* Step 4: calculate penalty (caps + debounce) */ + penalty =3D yield_deboost_calculate_penalty(rq, se_y_lca, se_t_lca, se_t, + cfs_rq_common->nr_queued); + + /* Step 5: apply penalty and update EEVDF fields */ + yield_deboost_apply_penalty(rq, se_y_lca, cfs_rq_common, penalty); +} + static bool yield_to_task_fair(struct rq *rq, struct task_struct *p) { struct sched_entity *se =3D &p->se; @@ -9314,6 +9360,10 @@ static bool yield_to_task_fair(struct rq *rq, struct= task_struct *p) /* Tell the scheduler that we'd really like se to run next. */ set_next_buddy(se); =20 + /* Apply deboost under rq lock. */ + yield_to_deboost(rq, p); + + /* Complete the standard yield path. */ yield_task_fair(rq); =20 return true; --=20 2.43.0 From nobody Sat Feb 7 18:42:25 2026 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53FF328A701 for ; Mon, 10 Nov 2025 03:33:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745583; cv=none; b=PEGIVPl6gWVlZxuuAYhfx1O3Bti6MA0K40RyjlwYS2pvTzOTZLaIqakXmQlWpAlItF+MotC7xMR4rMfZS+Ka5kjBL34DOmjXRW2DqxNOqnk2IoRkzJvi57S61JSqT9SSqw9m1d7L6XfWPzEOh3j26YIVxIU9qnwmW/7JN7dLC3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745583; c=relaxed/simple; bh=wUyiywX8HHaEWbLEv2lmqnlnMpiUcQdXEeaTse2sVjA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HLX71dc3Tytm1F5wMffLqtVZ1OdEXaKSVFSnvCa1zdT8+EpKXE/t7IBWNJh6OK+x/i/QxGiWV0qBV0Xod8br7nZyFOZYti/6odkuCBWqGGJS+2W/psrFhnPOX/fbltJdox1z2MLsFyXPqNaQtGe3Y+g5ebodc6el4q2o+ggIziU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZdUGXTRp; arc=none smtp.client-ip=209.85.216.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZdUGXTRp" Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-3436a97f092so2056712a91.3 for ; Sun, 09 Nov 2025 19:33:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762745581; x=1763350381; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2ybWNsbg/E05WgR7p5k3lqGaIc94oz7znz8JCSDp+Uo=; b=ZdUGXTRpC3pFdmNnKuSkPtZQ+rHg5XmEf8ygqOGLWJZFkCGPjR5/Z5HGlcYWyQoiQm XU/Uu40lmqaY4gTxMD7BTPW3FWd1u95cPmiKpuRUqaYG3S3LLIyjcas2rgfkU7BK5BRe dpqtfzjqtUQ19xzgDci+QmCBShikH3k6agDlCNgDp0vAu06e9xfbWgL2Z7AR+9hDQW0u 6uHbqnXunVfgHolvuhprMUihHBZSVq7cUG1ZlI2k6KgjzyXXOfUxCamdbUlexVEUoyUw jOo+l6jBgVHwJoIheI3Js93XbgDDdPev0pKYAaOjn5/RPF2p5VGH85KH7ua39kwtUq83 1i1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762745581; x=1763350381; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=2ybWNsbg/E05WgR7p5k3lqGaIc94oz7znz8JCSDp+Uo=; b=b78Mw6oI0RPXJ/inYiSaZshB2ZiRTa8L6LxaIQdh9raP6sjnKbiVD1XRQ+aGES3fDZ VjzNmIwMipwTd3wnDUYUL76ORWP9v0mc1Xp1zode+8njJax4x2dNtZESKwLUedqELAWc ZCsJRAs4mgWozFWCZ7Exq/a0ahHGUN2dOEDR58WaNX+XscajAXoFPH5zXPBPiuFJckGd FBOoQ1KH1GSQ0ocLTeAXECh5TvGctIH0ZAgooIuJXfVxfEKGewvpQQz8V0j1Cv93SJJz qu39zJoYQqqKL9HWOawEij/YSEC905l+kIZM9O3zcujxYOETl+Nm2yk7fM0fLj7nCjgn +jIg== X-Forwarded-Encrypted: i=1; AJvYcCVJDa6mYuL5vz1JmZ3etbgDedPcClHFwGn795TD6NsdVKs3djXAGSUDI7v7nZhPCn1n0EukJy3OzHX+UWg=@vger.kernel.org X-Gm-Message-State: AOJu0YzcRfcxINBx+fABCPSCJnH6UnETTBrU3CgVpMwngftdOO3Howa8 9Zv0+3/4VBb4Zau4TNuIhEhTPmRaFcg/7MC+XYyJm57flFLMQWZMYwMZ X-Gm-Gg: ASbGncutxYKq/3J2zJ9sbQeKiXvZQQIPKnOOab7OZYEAidm2ObktmsgslwGHKDIUR+3 8OFfsUvR2aLRLEqCn9j80D1W+4otIAgwAVDdYqYPpOdxAglDXfbWPlxweTlv/9xuaELKR0vsSes AByG8WOmRg5AICFuMSEogpSrcgN04jS2JMOjvDezXD4UgzVmsyuiN8keLDHo+Wf/GSQSMcY3VYu ZPPcdGOP85LQEIcC5Vx1C3wGZtegXi6otA8mb/EbiiehPWRstB37JGjId3r6Pkx63ooNrDCHEiL EiMkARxZAlYSDu1Q3wPz6pHtlbiX+xAwirmVuFNI2HeRPava2B8WlQnjIwPgP5InTmlMkI1wUz5 PmGl1ZgLrUqJUyo9cbivfmOvnYZlCReFgewNYAoigPdfGutsqAbXuLaj/KTDxs0swUC5cTN0jrg == X-Google-Smtp-Source: AGHT+IGB0ChoFCeyPh389xbUK5wIixXaUmpWEyfozp174FCz0qkbFk4vEPQ2Z/4r106b3OFiDjFdOg== X-Received: by 2002:a17:90b:270a:b0:340:ff89:8b62 with SMTP id 98e67ed59e1d1-3436cbb4076mr8133343a91.21.1762745581630; Sun, 09 Nov 2025 19:33:01 -0800 (PST) Received: from wanpengli.. ([124.93.80.37]) by smtp.googlemail.com with ESMTPSA id 41be03b00d2f7-ba900fa571esm10913877a12.26.2025.11.09.19.32.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Nov 2025 19:33:01 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH 06/10] KVM: Fix last_boosted_vcpu index assignment bug Date: Mon, 10 Nov 2025 11:32:27 +0800 Message-ID: <20251110033232.12538-7-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com> References: <20251110033232.12538-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li From: Wanpeng Li In kvm_vcpu_on_spin(), the loop counter 'i' is incorrectly written to last_boosted_vcpu instead of the actual vCPU index 'idx'. This causes last_boosted_vcpu to store the loop iteration count rather than the vCPU index, leading to incorrect round-robin behavior in subsequent directed yield operations. Fix this by using 'idx' instead of 'i' in the assignment. Signed-off-by: Wanpeng Li Reviewed-by: Sean Christopherson --- virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b7a0ae2a7b20..cde1eddbaa91 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4026,7 +4026,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield= _to_kernel_mode) =20 yielded =3D kvm_vcpu_yield_to(vcpu); if (yielded > 0) { - WRITE_ONCE(kvm->last_boosted_vcpu, i); + WRITE_ONCE(kvm->last_boosted_vcpu, idx); break; } else if (yielded < 0 && !--try) { break; --=20 2.43.0 From nobody Sat Feb 7 18:42:25 2026 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20A16298CC9 for ; Mon, 10 Nov 2025 03:33:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745587; cv=none; b=CEhkbAg9CEXawFyiyGAcYXdiDjWcqgEaEQVhQj/zZ/JGDZsnVhhoC2KhxvX+3lfyhc+r9g1tQ3LoIxEGPLvUG1xXk57zMaVZ8C6qyxsojacNiVWcbbU0OYAbziiSluIACZFVuwwk6mHXpIqC+NmQZ6jXocS+rSjasMsx9lSUsD4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745587; c=relaxed/simple; bh=omCSAA7vln4anVLAOIJztO6RTXHY5tp1jYfhHNg3aTo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=djcHHUXnah0wnrH5rJIrTz41rgOZn52Cbsae4Xk+jHRciJ3AkYfs3kBDF5MNnA8smfl4KAJOn8cOIQ6V8cRPMcEr4buBH5UecdbusOSyN5FSBWLuNoOIx3jKtDGvDGf2HBk4qae7MQw3482/3HouztNus2QGWYvNNEktF282X30= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=KDdf6lve; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KDdf6lve" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-7aca3e4f575so1982783b3a.2 for ; Sun, 09 Nov 2025 19:33:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762745585; x=1763350385; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zI25yvgMgmKpZ+riRGmrC49sBpY1vnEsOnvyePmQqA8=; b=KDdf6lve+8t9LvGK810AG7ubqt55Bw17rHuq5hSNgOHSoKqex7Oglk6I0NZ6YAG2Yw 1BdONmDLG4j3bdE/44CoZgtAldE/GomM7zXzI8RO8mDRQfEqm5I+/oKXOZhVzDJOXT/j oREy1XFe/02icdbKzK+lAWqWUWOF2xLOKQTw7nrdA45bSNuS4bS0uSSQQgIRkZvqFdCv 75Ylsc+h13Ip3GmO5ToRj92QYe15pLn4ZrPwByonZmchBZQ9TZzasrggPjDo6WIzgeRW 8kYxOSqjhsDsgypyguLoS8etPufxzhPwKr+zONA0QC4SHH3NPEPRFAr7qcIpUhcERaUv lrBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762745585; x=1763350385; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=zI25yvgMgmKpZ+riRGmrC49sBpY1vnEsOnvyePmQqA8=; b=L7NtwJxKsXjLLmUiVlsBm8uYi1xrlWk5n+cLPOFsfpNz/Oy2euWqDqHCm7PdAlueC+ S7qs8c5wray6XwlQ9ome8/2iP2weceRvSizpVBFPf2U0CqZgjn8vOENqP2Jw2EjwsdL6 LoYiE/dXWNqVxPNpXqXMF/isQVWV5agcmLbtuq/ajKN/t3qFyQpSwfzdrAffXOe9+lh9 Q+YThnmT6fps+Dzlg1yZ/8+Mkhw8F9KwoRhZ46poToq8GjAmA8mxlFUX+7jeDcw442v+ ND+elfE/xZYdoZoE49n3z5MEjyiOtsn/fDezlNfsXoB6yjuNu6E2qY8SYKq6E3apbYgX /HMg== X-Forwarded-Encrypted: i=1; AJvYcCWqpF35Q0yQl+iJS+sCmQAG+DRXHydQlXwNquyv0qm6xQd7Jw7bFAmyIXCZLGuw+lGmtamm7YVxkcuPUVk=@vger.kernel.org X-Gm-Message-State: AOJu0YwqL8o/57jYAvAYlQiPmaCWwms6volfbMIQNaPDOSyJ4AxBZcdS n6NPMUHi5hfPq9MxtTouJVF/l8Lif8Mv+B10b4v/eul813PGIXTTUe8X X-Gm-Gg: ASbGncsS+yVzzoUaD0lV53KulmFKhe4O2p4p60q8/UaD8rOl0CkplLA3Qu2buMY6iTf aTZ/Vc5SVnP17Yw04ULT2gCkpGQl7jwCQvNcVmEN3pcQV7LWQL5u2+ROgtPBn1zktyoZHHOvs3L vf82C2JOecsgZrfbh7LCOdilBsuSNdtfZkqISEAAXI4qYRY9TP6k1jYeTetN+UKjO2BoO0ADHxX O/qhMJhp9GUzuPLwTizsKg8QBo7934cfO74tKg0R6ur9vk6WewSlsnncWImu4DIuYAb+9+vzqyj IH8lQH8xVIHZf69gDGRWiitER+s8ceP61SKSYvJc7uXTT9xi/9Lh6Xq+84olQjpFGn0M9fPooVU 6D5Ir+Ze1A7pAAYbvmPOwa7S8elgqpP11oKgU44rEc1cGP0gGj12cQkdsUFYTVqapsFFH7E2u5E L1ATB9szFX X-Google-Smtp-Source: AGHT+IFfM6QJhgtxRy0KI76qJOOp5q6ndwJLMeTfKL9VJrg5PQCm1tz8ioZF+ZERw1dgXpiNoCMtDg== X-Received: by 2002:a05:6a20:7f9f:b0:33d:7c76:5d68 with SMTP id adf61e73a8af0-353a3d59618mr10257486637.46.1762745585408; Sun, 09 Nov 2025 19:33:05 -0800 (PST) Received: from wanpengli.. ([124.93.80.37]) by smtp.googlemail.com with ESMTPSA id 41be03b00d2f7-ba900fa571esm10913877a12.26.2025.11.09.19.33.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Nov 2025 19:33:05 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH 07/10] KVM: x86: Add IPI tracking infrastructure Date: Mon, 10 Nov 2025 11:32:28 +0800 Message-ID: <20251110033232.12538-8-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com> References: <20251110033232.12538-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li From: Wanpeng Li Introduce IPI tracking infrastructure for directed yield optimization. Add per-vCPU IPI tracking context in kvm_vcpu_arch with last_ipi_sender/receiver to track IPI communication pairs, pending_ipi flag to indicate awaiting IPI response, and ipi_time_ns monotonic timestamp for recency validation. Add module parameters ipi_tracking_enabled (global toggle, default true) and ipi_window_ns (recency window, default 50ms). Add core helper functions: kvm_track_ipi_communication() to record sender/receiver pairs, kvm_vcpu_is_ipi_receiver() to validate recent IPI relationship, and kvm_vcpu_clear/reset_ipi_context() for lifecycle management. Use lockless READ_ONCE/WRITE_ONCE for minimal overhead. The short time window prevents stale IPI information from affecting throughput workloads. The infrastructure is inert until integrated with interrupt delivery in subsequent patches. Signed-off-by: Wanpeng Li --- arch/x86/include/asm/kvm_host.h | 8 ++++ arch/x86/kvm/lapic.c | 65 +++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 6 +++ arch/x86/kvm/x86.h | 4 ++ include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 5 +++ 6 files changed, 89 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 48598d017d6f..b5bdc115ff45 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1052,6 +1052,14 @@ struct kvm_vcpu_arch { int pending_external_vector; int highest_stale_pending_ioapic_eoi; =20 + /* IPI tracking for directed yield (x86 only) */ + struct { + int last_ipi_sender; /* vCPU ID of last IPI sender */ + int last_ipi_receiver; /* vCPU ID of last IPI receiver */ + bool pending_ipi; /* Pending IPI response */ + u64 ipi_time_ns; /* Monotonic ns when IPI was sent */ + } ipi_context; + /* be preempted when it's in kernel-mode(cpl=3D0) */ bool preempted_in_kernel; =20 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0ae7f913d782..98ec2b18b02c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -75,6 +75,12 @@ module_param(lapic_timer_advance, bool, 0444); /* step-by-step approximation to mitigate fluctuation */ #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 =20 +/* IPI tracking window and runtime toggle (runtime-adjustable) */ +static bool ipi_tracking_enabled =3D true; +static unsigned long ipi_window_ns =3D 50 * NSEC_PER_MSEC; +module_param(ipi_tracking_enabled, bool, 0644); +module_param(ipi_window_ns, ulong, 0644); + static bool __read_mostly vector_hashing_enabled =3D true; module_param_named(vector_hashing, vector_hashing_enabled, bool, 0444); =20 @@ -1113,6 +1119,65 @@ static int kvm_apic_compare_prio(struct kvm_vcpu *vc= pu1, struct kvm_vcpu *vcpu2) return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio; } =20 +/* + * Track IPI communication for directed yield when a unique receiver exist= s. + * This only writes sender/receiver context and timestamp; ignores self-IP= I. + */ +void kvm_track_ipi_communication(struct kvm_vcpu *sender, struct kvm_vcpu = *receiver) +{ + if (!sender || !receiver || sender =3D=3D receiver) + return; + if (unlikely(!READ_ONCE(ipi_tracking_enabled))) + return; + + WRITE_ONCE(sender->arch.ipi_context.last_ipi_receiver, receiver->vcpu_idx= ); + WRITE_ONCE(sender->arch.ipi_context.pending_ipi, true); + WRITE_ONCE(sender->arch.ipi_context.ipi_time_ns, ktime_get_mono_fast_ns()= ); + + WRITE_ONCE(receiver->arch.ipi_context.last_ipi_sender, sender->vcpu_idx); +} + +/* + * Check if 'receiver' is the recent IPI target of 'sender'. + * + * Rationale: + * - Use a short window to avoid stale IPI inflating boost priority + * on throughput-sensitive workloads. + */ +bool kvm_vcpu_is_ipi_receiver(struct kvm_vcpu *sender, struct kvm_vcpu *re= ceiver) +{ + u64 then, now; + + if (unlikely(!READ_ONCE(ipi_tracking_enabled))) + return false; + + then =3D READ_ONCE(sender->arch.ipi_context.ipi_time_ns); + now =3D ktime_get_mono_fast_ns(); + if (READ_ONCE(sender->arch.ipi_context.pending_ipi) && + READ_ONCE(sender->arch.ipi_context.last_ipi_receiver) =3D=3D + receiver->vcpu_idx && + now - then <=3D ipi_window_ns) + return true; + + return false; +} + +void kvm_vcpu_clear_ipi_context(struct kvm_vcpu *vcpu) +{ + WRITE_ONCE(vcpu->arch.ipi_context.pending_ipi, false); + WRITE_ONCE(vcpu->arch.ipi_context.last_ipi_sender, -1); + WRITE_ONCE(vcpu->arch.ipi_context.last_ipi_receiver, -1); +} + +/* + * Reset helper: clear ipi_context and zero ipi_time for hard reset paths. + */ +void kvm_vcpu_reset_ipi_context(struct kvm_vcpu *vcpu) +{ + kvm_vcpu_clear_ipi_context(vcpu); + WRITE_ONCE(vcpu->arch.ipi_context.ipi_time_ns, 0); +} + /* Return true if the interrupt can be handled by using *bitmap as index m= ask * for valid destinations in *dst array. * Return false if kvm_apic_map_get_dest_lapic did nothing useful. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b4b5d2d09634..649e016c131f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12708,6 +12708,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) goto free_guest_fpu; =20 kvm_xen_init_vcpu(vcpu); + /* Initialize IPI tracking */ + kvm_vcpu_reset_ipi_context(vcpu); vcpu_load(vcpu); kvm_vcpu_after_set_cpuid(vcpu); kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz); @@ -12781,6 +12783,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) kvm_mmu_destroy(vcpu); srcu_read_unlock(&vcpu->kvm->srcu, idx); free_page((unsigned long)vcpu->arch.pio_data); + /* Clear IPI tracking context */ + kvm_vcpu_reset_ipi_context(vcpu); kvfree(vcpu->arch.cpuid_entries); } =20 @@ -12846,6 +12850,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool ini= t_event) kvm_leave_nested(vcpu); =20 kvm_lapic_reset(vcpu, init_event); + /* Clear IPI tracking context on reset */ + kvm_vcpu_clear_ipi_context(vcpu); =20 WARN_ON_ONCE(is_guest_mode(vcpu) || is_smm(vcpu)); vcpu->arch.hflags =3D 0; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index f3dc77f006f9..86a10c653eac 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -451,6 +451,10 @@ fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu= ); fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int r= eg); fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu); fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu); +void kvm_track_ipi_communication(struct kvm_vcpu *sender, + struct kvm_vcpu *receiver); +void kvm_vcpu_clear_ipi_context(struct kvm_vcpu *vcpu); +void kvm_vcpu_reset_ipi_context(struct kvm_vcpu *vcpu); =20 extern struct kvm_caps kvm_caps; extern struct kvm_host_values kvm_host; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 5bd76cf394fa..5ae8327fdf21 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1532,6 +1532,7 @@ static inline void kvm_vcpu_kick(struct kvm_vcpu *vcp= u) } #endif =20 +bool kvm_vcpu_is_ipi_receiver(struct kvm_vcpu *sender, struct kvm_vcpu *re= ceiver); int kvm_vcpu_yield_to(struct kvm_vcpu *target); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool yield_to_kernel_mode); =20 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index cde1eddbaa91..495e769c7ddf 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3963,6 +3963,11 @@ bool __weak kvm_arch_dy_has_pending_interrupt(struct= kvm_vcpu *vcpu) return false; } =20 +bool __weak kvm_vcpu_is_ipi_receiver(struct kvm_vcpu *sender, struct kvm_v= cpu *receiver) +{ + return false; +} + void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) { int nr_vcpus, start, i, idx, yielded; --=20 2.43.0 From nobody Sat Feb 7 18:42:25 2026 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8CDD286422 for ; Mon, 10 Nov 2025 03:33:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745591; cv=none; b=CYbgboY8swAvbiaCE8EY2J7SB8q54EBS/CqQg7jVVKPYoX8xRi8L4RkebZRjQ9Iz0Pr1GQTSfvSjhKtKTkdnXttaq8aeiL9pjb9i0rp7frgrU+d8uV1GbyPBouimD1bVLRWeuzrXNncZCBBhKiI+4kxhycLwXHR+LJ3hzg5EbeY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745591; c=relaxed/simple; bh=09b+vbHhrBOEYNn8FA0xT6XTomv/Hy0JapSdFE/zeY8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=u5a4uQdGy+3SqfzN/H0aFPS4/WhqvZDdHZydtCGIPhCTMV8I0EDepswyxEnikrzJ4tZZdwKiv/qsvp2XMa+ELbbgVZ7ufn0HOI5LMJJHcMcL0/vx382CuzMjUoUSxE8n14H21DkUA8QgUKQ5y8cpp9XTj4364WL/pRBc2THTtas= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AB1d/TuL; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AB1d/TuL" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-7afd7789ccdso2327188b3a.2 for ; Sun, 09 Nov 2025 19:33:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762745589; x=1763350389; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zoVSVco95aSsZ+WLRPu1rVoCQZWRoP65wsu+BZxzmJo=; b=AB1d/TuLHoxQ8Y9Uw7hmzGxySoawRICP9MXhQ3be1p7BJVndkvO+17sTVGOjbNDvQ7 YaUVuPDyoMAATM9vl2SOKIhxPIZgjMG+kZwf4IPiiCYX7Lbd9KtaVB5rAcESk65BNNIa LqKnEI8ZnlL5n2s09o5e+sftpKB7bTu52UYVHZcGV2tJ/92tugGaXahvf9vs3DV3yR9Y pBB2pUVlMKggOolXyD1oByQvXsTjEPsBGd6dLiV0vetmLg0UmvtQqEKT2HfqeRk1r41X 8sxnk5WCiTBbagJf/40RhnyvYisEbXvM/2pW2kXNdTrSSjPooaDHqm3ugi5DdXP/i67q X/dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762745589; x=1763350389; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=zoVSVco95aSsZ+WLRPu1rVoCQZWRoP65wsu+BZxzmJo=; b=XDEzcfLQZewcd9QGJnHJyZBGGeElKl9OUsy2b0idci0Ogcq2GlGAaa5H03rczn8dnM ZeDqA8oR5aVTUGmeuIt9565iF3qO6+EuT8h5MWyk/dWH+aBzBPlidf/UevL6lq7WxCug OAPEubOwbcYajgOoKB+hgsIp8wFGz6NvuP6LKLFEA1e2DV7UGy2fOE0KC7/9jNy37SlA yi+js/MPBe2OtgzCkd51MufWJzmROE7YlgxnWV1SxZKmGOcRcFylMVpKnOeUnoGZvA3I wJRvaiSj46hIJmWwWxmpxQTAOTvGhXeNlZKYjJllq6HOBVaMSpxSBz1GYo0eaUkRdmXl AcKQ== X-Forwarded-Encrypted: i=1; AJvYcCVAxCUBzKU4AeZFemH+hFXeuxS3UqVMtYYUTyVvf8+hkbfMI4PZmd2czjDjreM2ZpCyfbjMIRl5foSmXI8=@vger.kernel.org X-Gm-Message-State: AOJu0YzL/ylTD4ZfsWQCseeGRDXUjWsk7FTnwtCt0V6TgvWX0hZ5kq9m l+NHj4Bx1/l9qG1/iV4UlVrRF983gbTLS89IfWBqKpXAED9gxuqMZmqa X-Gm-Gg: ASbGncsRjLXZHzyEKveerD9iyIMVZB/7ELo6H+/yYgfCMQjta9Y/rw/g+EQljDx23Tn 5fmFXuk02A4QwOHoFSKm2etwKPtv8I/SzEFIVP3aFXtqQ2ltiidGGExpifMyZnjGveTkoiCfJpP k4xWQ6FJ8nISLuevRmFIW+81+GMNmCbPRaqHajhoN9KLyv1sOfLiiFFGeKwyGGsVPifxjdH6iKu 8pAvSGwEtudpO8wuRGk2A8i565rx2AWf3YAnV3Sx9Gt+k975+GM6lEbpImV7uozDJW7EPCVktkq JF/48tNXHtQCBgY8fas5/zbc4AAPNTNw9TXTJqu+h4J+YX/UxFX2plY9G5/wA8yHFWkzzMHAIR/ RNsP3sLOipWWC0h+0Wp9wf2TY8EfxDZIMgpbagWps1i8t6AdaYQeyDSk6zi+vrLKQTW0KEZJCfi sFHYAZkAbZ X-Google-Smtp-Source: AGHT+IEzdOGW9LWlgg85G8xU5GsMwAx7axVYpwr4bSwlgM2m5DSuAh0+Gpq6gNufx4C6g/hc7Bclzg== X-Received: by 2002:a05:6a21:6d9d:b0:33e:eb7a:4465 with SMTP id adf61e73a8af0-353a1ddf726mr8793770637.22.1762745589091; Sun, 09 Nov 2025 19:33:09 -0800 (PST) Received: from wanpengli.. ([124.93.80.37]) by smtp.googlemail.com with ESMTPSA id 41be03b00d2f7-ba900fa571esm10913877a12.26.2025.11.09.19.33.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Nov 2025 19:33:08 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH 08/10] KVM: x86/lapic: Integrate IPI tracking with interrupt delivery Date: Mon, 10 Nov 2025 11:32:29 +0800 Message-ID: <20251110033232.12538-9-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com> References: <20251110033232.12538-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li From: Wanpeng Li Integrate IPI tracking with LAPIC interrupt delivery and EOI handling. Hook into kvm_irq_delivery_to_apic() after destination resolution to record sender/receiver pairs when the interrupt is LAPIC-originated, APIC_DM_FIXED mode, with exactly one destination vCPU. Use counting for efficient single-destination detection. Add kvm_clear_ipi_on_eoi() called from both EOI paths to ensure complete IPI context cleanup: 1. apic_set_eoi(): Software-emulated EOI path (traditional/non-APICv) 2. kvm_apic_set_eoi_accelerated(): Hardware-accelerated EOI path (APICv/AVIC) Without dual-path cleanup, APICv/AVIC-enabled guests would retain stale IPI state, causing directed yield to rely on obsolete sender/ receiver information and potentially boosting the wrong vCPU. Both paths must call kvm_clear_ipi_on_eoi() to maintain consistency across different virtual interrupt delivery modes. The cleanup implements two-stage logic to avoid premature clearing: unconditionally clear the receiver's IPI context, and conditionally clear the sender's pending flag only when the sender exists, last_ipi_receiver matches, and the IPI is recent. This prevents unrelated EOIs from disrupting valid IPI tracking state. Use lockless accessors for minimal overhead. The tracking only activates for unicast fixed IPIs where directed yield provides value. Signed-off-by: Wanpeng Li --- arch/x86/kvm/lapic.c | 107 +++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 103 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 98ec2b18b02c..d38e64691b78 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1178,6 +1178,47 @@ void kvm_vcpu_reset_ipi_context(struct kvm_vcpu *vcp= u) WRITE_ONCE(vcpu->arch.ipi_context.ipi_time_ns, 0); } =20 +/* + * Clear IPI context on EOI at receiver side; clear sender's pending + * only when matches and is fresh. + * + * This function implements precise cleanup to avoid stale IPI boosts: + * 1) Always clear the receiver's IPI context (unconditional cleanup) + * 2) Conditionally clear the sender's pending flag only when: + * - The sender vCPU still exists and is valid + * - The sender's last_ipi_receiver matches this receiver + * - The IPI was sent recently (within ~window) + */ +static void kvm_clear_ipi_on_eoi(struct kvm_lapic *apic) +{ + struct kvm_vcpu *receiver; + int sender_idx; + u64 then, now; + + if (unlikely(!READ_ONCE(ipi_tracking_enabled))) + return; + + receiver =3D apic->vcpu; + sender_idx =3D READ_ONCE(receiver->arch.ipi_context.last_ipi_sender); + + /* Step 1: Always clear receiver's IPI context */ + kvm_vcpu_clear_ipi_context(receiver); + + /* Step 2: Conditionally clear sender's pending flag */ + if (sender_idx >=3D 0) { + struct kvm_vcpu *sender =3D kvm_get_vcpu(receiver->kvm, sender_idx); + + if (sender && + READ_ONCE(sender->arch.ipi_context.last_ipi_receiver) =3D=3D + receiver->vcpu_idx) { + then =3D READ_ONCE(sender->arch.ipi_context.ipi_time_ns); + now =3D ktime_get_mono_fast_ns(); + if (now - then <=3D ipi_window_ns) + WRITE_ONCE(sender->arch.ipi_context.pending_ipi, false); + } + } +} + /* Return true if the interrupt can be handled by using *bitmap as index m= ask * for valid destinations in *dst array. * Return false if kvm_apic_map_get_dest_lapic did nothing useful. @@ -1259,6 +1300,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, = struct kvm_lapic *src, struct kvm_lapic **dst =3D NULL; int i; bool ret; + /* Count actual delivered targets to identify a unique recipient. */ + int targets =3D 0; + int delivered =3D 0; + struct kvm_vcpu *unique =3D NULL; =20 *r =3D -1; =20 @@ -1280,8 +1325,26 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, = struct kvm_lapic *src, for_each_set_bit(i, &bitmap, 16) { if (!dst[i]) continue; - *r +=3D kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map); + delivered =3D kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map); + *r +=3D delivered; + /* Fast path may still fan out; count delivered targets. */ + if (delivered > 0) { + targets++; + unique =3D dst[i]->vcpu; + } } + + /* + * Record unique recipient for IPI-aware boost: + * only for LAPIC-originated APIC_DM_FIXED without + * shorthand, and when exactly one recipient was + * delivered; ignore self-IPI. + */ + if (src && + irq->delivery_mode =3D=3D APIC_DM_FIXED && + irq->shorthand =3D=3D APIC_DEST_NOSHORT && + targets =3D=3D 1 && unique && unique !=3D src->vcpu) + kvm_track_ipi_communication(src->vcpu, unique); } =20 rcu_read_unlock(); @@ -1366,6 +1429,13 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct= kvm_lapic *src, struct kvm_vcpu *vcpu, *lowest =3D NULL; unsigned long i, dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)]; unsigned int dest_vcpus =3D 0; + /* + * Count actual delivered targets to identify a unique recipient + * for IPI tracking in the slow path. + */ + int targets =3D 0; + int delivered =3D 0; + struct kvm_vcpu *unique =3D NULL; =20 if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map)) return r; @@ -1389,7 +1459,13 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct= kvm_lapic *src, if (!kvm_lowest_prio_delivery(irq)) { if (r < 0) r =3D 0; - r +=3D kvm_apic_set_irq(vcpu, irq, dest_map); + delivered =3D kvm_apic_set_irq(vcpu, irq, dest_map); + r +=3D delivered; + /* Slow path can deliver to multiple vCPUs; count them. */ + if (delivered > 0) { + targets++; + unique =3D vcpu; + } } else if (kvm_apic_sw_enabled(vcpu->arch.apic)) { if (!vector_hashing_enabled) { if (!lowest) @@ -1410,8 +1486,28 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct= kvm_lapic *src, lowest =3D kvm_get_vcpu(kvm, idx); } =20 - if (lowest) - r =3D kvm_apic_set_irq(lowest, irq, dest_map); + if (lowest) { + delivered =3D kvm_apic_set_irq(lowest, irq, dest_map); + r =3D delivered; + /* + * Lowest-priority / vector-hashing paths ultimately deliver to + * a single vCPU. + */ + if (delivered > 0) { + targets =3D 1; + unique =3D lowest; + } + } + + /* + * Record unique recipient for IPI-aware boost only for LAPIC- + * originated APIC_DM_FIXED without shorthand, and when exactly + * one recipient was delivered; ignore self-IPI. + */ + if (src && irq->delivery_mode =3D=3D APIC_DM_FIXED && + irq->shorthand =3D=3D APIC_DEST_NOSHORT && + targets =3D=3D 1 && unique && unique !=3D src->vcpu) + kvm_track_ipi_communication(src->vcpu, unique); =20 return r; } @@ -1632,6 +1728,7 @@ void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vc= pu, int vector) trace_kvm_eoi(apic, vector); =20 kvm_ioapic_send_eoi(apic, vector); + kvm_clear_ipi_on_eoi(apic); kvm_make_request(KVM_REQ_EVENT, apic->vcpu); } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_set_eoi_accelerated); @@ -2424,6 +2521,8 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic= , u32 reg, u32 val) =20 case APIC_EOI: apic_set_eoi(apic); + /* Precise cleanup for IPI-aware boost */ + kvm_clear_ipi_on_eoi(apic); break; =20 case APIC_LDR: --=20 2.43.0 From nobody Sat Feb 7 18:42:25 2026 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82E4529BD8C for ; Mon, 10 Nov 2025 03:33:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745595; cv=none; b=sCEeeJDZhZIobWkLHzIGRzs1DAOyY5+goBz3jlVMoVPR5CaRj56aUKfByfg0ZD1/9uZuD+XoSOHmediNZmRe0vDpDeeHZ4Ph613ylXMIuuOdk0ICZfRUQKvQwAWfE0IgbctNOtUyW0FoiVk9HrJUmvVrlgCden5sFpVXV0MYSFU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762745595; c=relaxed/simple; bh=AgDAzUdg+SaLte7CqAVvGATj4ZUCqMIUJ5UviEwZxnc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aOQ/pTdxRTAKZlXL1fBjJkA80hBtqn8IfqYt3otSA1pung+5JBk0vt8uSqEyZb2jvaDWJ38YCiYTuCupG6zoMRf8kskFT7LOhu6pqFnPmt7K2vNhWqrQ51Q+R+CetY56J6xmSAFLnwkIM2oHgoCb9oU4HYLO36YxKcvqUT23woM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=J+VQKoA4; arc=none smtp.client-ip=209.85.216.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="J+VQKoA4" Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-3436a97f092so2056867a91.3 for ; Sun, 09 Nov 2025 19:33:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762745593; x=1763350393; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PUaeSNQcivBLj1RPgLBfG7y+iCnCseUMHvxeI3I0QGs=; b=J+VQKoA48Ce64F4HbyxJZgMvEWzI4Ks0h4EiC6eJqEktSfMRDIyUbNUV+99uauuIvj AK3ZOumUv9fOjKxnEHvi+Y1rTuVp61QC3BUcz9jpGDEm+rpu0ZCSvr8/KN46rCpt4u46 IisgbgAm+f2Fw75+2I+hE0EWs39gXS4vnsMXJJaO7ZtYteBvZV0v7rUWJEiZQ7NPfWjO We5771CyE7Z6ZGQ1URCyo9l/mjTD0tYXJVmTKXToYzhUlHUXzp/lg2+hgbJFFzY1HVjr 9NMjdRXX9BE72O2f6ZnrT1fCNewoaplF4vsH7lXKNK9tyf2h1lcq7fXs+8dKNu552eDr GOXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762745593; x=1763350393; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=PUaeSNQcivBLj1RPgLBfG7y+iCnCseUMHvxeI3I0QGs=; b=cli0QIZ4w6JQx9moxU+k753w1D21/FWEyKHH5zLY8gapQHTPXVZ0fxofPMTqRMNg9F 4QYxrX0MR3/Jxw2BmJ+VT9yKhQoD6NTTghG+IcoXMaGHluEjCTbkbN/6kpyNQN/XdtJ8 esswNsayx8mi2CJCarF7Z69mMwIR3g8RQFo5G9qmgOLg6wlX3N3PmmYdgeiijtMx2cQ0 R03VnZpH8NVuvF0tfBh9vV1m5E4e9IaBKeymk2fQ1N75MyHTOiqajo2zUt/94g7PS9mv Eni0bEmEa+TuHShGFv1jfY/2joMO0pisRERS95HWMiLIbZ/8ZMFRtJlrIdCloi9PdZRQ ngTg== X-Forwarded-Encrypted: i=1; AJvYcCVCnOYmr9KtskSSrme63MJ0eXz55l+A1HsEGFs/CyxjIPqiDY0MCf8boLPqbJaCXIE6zK7knBF8k/XMOGo=@vger.kernel.org X-Gm-Message-State: AOJu0Yz2gU8jtvfYlQ8OQI85uhxDh7SIw1cOL5joOqTRsfsaaiUusVhu gummEqXmQjOsDAOCK5FnTWbDc818pf6+eSctuA5CS2Kw4hBRUmbKVxwR X-Gm-Gg: ASbGncsEAs7EwmKwrQnLKOkRZbQCYjv7Wm0PNQ2EiF3OP372a5h43fByf0jTZTX6iAT FyXZw+lcwqtaV7fH1eCojggqQcVmGK1ra1n0IxiVb8rx1FmttMvm73YUaT98Oz853MvA4BSWkDY zGJlcTYR+inGhf68/224BoxHETaS1LCIARIE8ZAJQWa3uf+BPMubvGhkAweSWToRLMbFduoPRZ2 T7Jl9ggeF1DOqCSsKnzHHSMF31Vq6HcaxMp+Th0lt3LVX63+tkD5Cgx+jBoXLWmlEhxkeAjXJvX 4u21le1U/WJD+FFnl3EX/rb85oM+EWIwi87p1U16sf4lwX92k1NIJWmKNq2XzTKhaS211sIyIrH CNeLAKFxR+VqOk5NB4UyDmHZwYy0tJ4Yu62d1fpNAD3K5Gf3219C6qHA/5Lf+TeJxKCFQLYd2aA == X-Google-Smtp-Source: AGHT+IGneSsO29/OFlMVcRrdPIGOGb1MTjabk3LmWe9/4garbnC9hW44ZxV84CyN5mxzd+phtLE4FA== X-Received: by 2002:a17:90b:55c4:b0:343:5f43:933e with SMTP id 98e67ed59e1d1-3436cbb3b58mr8661608a91.19.1762745592826; Sun, 09 Nov 2025 19:33:12 -0800 (PST) Received: from wanpengli.. ([124.93.80.37]) by smtp.googlemail.com with ESMTPSA id 41be03b00d2f7-ba900fa571esm10913877a12.26.2025.11.09.19.33.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Nov 2025 19:33:12 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH 09/10] KVM: Implement IPI-aware directed yield candidate selection Date: Mon, 10 Nov 2025 11:32:30 +0800 Message-ID: <20251110033232.12538-10-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com> References: <20251110033232.12538-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li Integrate IPI tracking with directed yield to improve scheduling when vCPUs spin waiting for IPI responses. Implement priority-based candidate selection in kvm_vcpu_on_spin() with three tiers: Priority 1 uses kvm_vcpu_is_ipi_receiver() to identify confirmed IPI targets within the recency window, addressing lock holders spinning on IPI acknowledgment. Priority 2 leverages existing kvm_arch_dy_has_pending_interrupt() for compatibility with arch-specific fast paths. Priority 3 falls back to conventional preemption-based logic when yield_to_kernel_mode is requested, providing a safety net for non-IPI scenarios. Add kvm_vcpu_is_good_yield_candidate() helper to consolidate these checks, preventing over-aggressive boosting while enabling targeted optimization when IPI patterns are detected. Performance testing (16 pCPUs host, 16 vCPUs/VM): Dedup (simlarge): 2 VMs: +47.1% throughput 3 VMs: +28.1% throughput 4 VMs: +1.7% throughput VIPS (simlarge): 2 VMs: +26.2% throughput 3 VMs: +12.7% throughput 4 VMs: +6.0% throughput Gains stem from effective directed yield when vCPUs spin on IPI delivery, reducing synchronization overhead. The improvement is most pronounced at moderate overcommit (2-3 VMs) where contention reduction outweighs context switching cost. Signed-off-by: Wanpeng Li --- virt/kvm/kvm_main.c | 52 +++++++++++++++++++++++++++++++++++++-------- 1 file changed, 43 insertions(+), 9 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 495e769c7ddf..9cf44b6b396d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3968,6 +3968,47 @@ bool __weak kvm_vcpu_is_ipi_receiver(struct kvm_vcpu= *sender, struct kvm_vcpu *r return false; } =20 +/* + * IPI-aware candidate selection for directed yield + * + * Priority order: + * 1) Confirmed IPI receiver of 'me' within a short window (always boost) + * 2) Arch-provided fast pending interrupt (user-mode boost) + * 3) Kernel-mode yield: preempted-in-kernel vCPU (traditional boost) + * 4) Otherwise, be conservative + */ +static bool kvm_vcpu_is_good_yield_candidate(struct kvm_vcpu *me, struct k= vm_vcpu *vcpu, + bool yield_to_kernel_mode) +{ + /* Priority 1: recently targeted IPI receiver */ + if (kvm_vcpu_is_ipi_receiver(me, vcpu)) + return true; + + /* Priority 2: fast pending-interrupt hint (arch-specific). */ + if (kvm_arch_dy_has_pending_interrupt(vcpu)) + return true; + + /* + * Minimal preempted gate for remaining cases: + * - If the target is neither a confirmed IPI receiver nor has a fast + * pending interrupt, require that the target has been preempted. + * - If yielding to kernel mode is requested, additionally require + * that the target was preempted while in kernel mode. + * + * This avoids expanding the candidate set too aggressively and helps + * prevent overboost in workloads where the IPI context is not + * involved. + */ + if (!READ_ONCE(vcpu->preempted)) + return false; + + if (yield_to_kernel_mode && + !kvm_arch_vcpu_preempted_in_kernel(vcpu)) + return false; + + return true; +} + void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) { int nr_vcpus, start, i, idx, yielded; @@ -4015,15 +4056,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yiel= d_to_kernel_mode) if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu)) continue; =20 - /* - * Treat the target vCPU as being in-kernel if it has a pending - * interrupt, as the vCPU trying to yield may be spinning - * waiting on IPI delivery, i.e. the target vCPU is in-kernel - * for the purposes of directed yield. - */ - if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode && - !kvm_arch_dy_has_pending_interrupt(vcpu) && - !kvm_arch_vcpu_preempted_in_kernel(vcpu)) + /* IPI-aware candidate selection */ + if (!kvm_vcpu_is_good_yield_candidate(me, vcpu, yield_to_kernel_mode)) continue; =20 if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) --=20 2.43.0 From nobody Sat Feb 7 18:42:25 2026 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1FD1228505E for ; Mon, 10 Nov 2025 03:40:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762746002; cv=none; b=q40xVqnPdjEzpBIIhNPR2ruHfzS7ZX79kKs19zvPovqhsApHxYPYPSr+sOOzjn6livEHAiVX7XKAuI7RJ0wfMDZfc6oeO4YlSmvOdbqldKG2pyKU2CjLAbOlGs3puZQohadzHljk8mhjp5D+mNn3UBvmNhwKAV/w4LsSlJmV6fM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762746002; c=relaxed/simple; bh=9fSVaG1uC1BiaaraAMlQRxKUAMgCSO5mz0dNmTqscUY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ehYlHMS/8eZxvdZErL2qGFQPlPEcKrErTstnkQpDZsWCWcdq7cwyOS0dINs2Nn3z+N2vyTclA4qw1ORAICyzSufiW7tGzCDBndXcqUpYAbkjOnlFqt3rZ7eMaVjwsfrYr9DPMAqRXH/XIaAQ/GM/v7t+YNO0Qqj4ibODvKpqG0E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XgnCPLtC; arc=none smtp.client-ip=209.85.215.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XgnCPLtC" Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-b98983bae8eso1273867a12.0 for ; Sun, 09 Nov 2025 19:40:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762746000; x=1763350800; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kyHdinwOK0g1o44QvaD951QSRNsd3g9ZIX/MxVLDNPY=; b=XgnCPLtCVigIljv8y+CSnw6JNOAamiWNdKO4Ol8P2mtKBOaVp8MBzm+EZ7i918jvKq YTnHmQiph8wgKqHDTTUAwzEfwm1SBgbLSrrbvUWOA4xP2MDZVVY9wr2kIkom+bmEiOtQ 2UyJw+740wRIYAqnFYU3qKRhx4Bp+AG2G5lMt+/5Eec63hL9augL67Z90FZe5W63zgDb mEagfuRbUmJM6OuGR74nMSV/pYVZ5skO/c01xSNl6x+wCHmrd3+L2dRt3z3kVQ7zLgBG ZRAVpnbYrTC4pi3zPw1ClKtYj+43c/EvSaoINaKpV5wuDX5XS6L/BsDeQYjnk2S5pPiG 6+pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762746000; x=1763350800; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=kyHdinwOK0g1o44QvaD951QSRNsd3g9ZIX/MxVLDNPY=; b=F1rq99LM7RWvQbrShJKWl8a1J6Tz46hkYc8RIe0w8fjFYKGd2tt06UwTjBzzv1b6im giHOQm0mhdyvJ0W4QGgc7mzOn5CwL7bsKDkikyJtAry5ev8/CAG1Hdp/tGCcsrTKfTIC QiJ/o2X3FVzMlq9pvYqTiGkFTMu/h2i7W0wCRYWVhRuyH0WnXoEyDD26vGeGJsO5HLcc qvSrr2a1Ojc8zPxbmycVTmjuQkDbAbUu5jyirIo1b14S8xhBEH8xqfBxQSe1lYkS/TxG aYetGrgEm9ShibCY7OWQsg61K1lO+YjVmCq3X7HJwFG4mJVfXqQVtVli+q1JzhTiOXAb FDxQ== X-Forwarded-Encrypted: i=1; AJvYcCX4dTdh7CrPM3i0VDuIxikO3Jat0pfkzz1yCPKl9fFxrR+nU/gSPrs5g3Y/APV0tYuASMKkNr5cDMV7Xk0=@vger.kernel.org X-Gm-Message-State: AOJu0Yy2sQwlTUTKmy7FakBGjAI2IvsQ997cWPrfYfd1o4bsAnRXdd1r Njl91HtwDXrbVVJ2D2qf5wy0IaO5YA95yvf3pZkVd7j4Tlj9Hi8St/p3 X-Gm-Gg: ASbGncujlhJT87RhUby2LBkwrLPB6pRw9MVVoBL2TAn9qAQRI17piTnT7wiFTMEqHBZ u/Kbmb+RnQW/wf6tQfvb+yv2qv5nwfMno3ZoarXaV/UQJ3HFoKbCrfP62Q8+TS9T8C9YHW3Wkyv hPovljypeEXmlBMukolGrqE+Dcfy/FOD7aAVq23k2i627bqNlDFnnhh3vK2ve0mBd9OoBn7DfrA wb/lTjpIfNwQB2aVeR/H+JjnGwlSLFDEsoYS5uohLtkRwNtU9doGp9QkNhkGPGsjTfsFUenqriu bKKNQ8eygvMXkmKV+eGv+2o+GrU+gY2sn+dPBKCdLg/m0J7Vx75F1l/kUJvRz+a7zhwpvANdbNV 2+TQLOjpY7/gEj/jYmP/KO0m6H+yH5GUl5/8lb7SFzjKLGlKrpdJP4UbQXsWBAT0lie533/mtRI W3LkK3vVXI X-Google-Smtp-Source: AGHT+IFpEw2uaxPZOv2frrYrOOFzZim7qBgUpLz1BMg7jwJmYB9wgwu99U615UVe3F+N/pX5E14Y6Q== X-Received: by 2002:a17:903:1a0c:b0:297:fc22:3a9f with SMTP id d9443c01a7336-297fc223c26mr50766735ad.38.1762746000290; Sun, 09 Nov 2025 19:40:00 -0800 (PST) Received: from wanpengli.. ([124.93.80.37]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2964f2a9716sm131118915ad.0.2025.11.09.19.39.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Nov 2025 19:39:59 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH 10/10] KVM: Relaxed boost as safety net Date: Mon, 10 Nov 2025 11:39:54 +0800 Message-ID: <20251110033954.13524-1-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com> References: <20251110033232.12538-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li Add a minimal two-round fallback mechanism in kvm_vcpu_on_spin() to avoid pathological stalls when the first round finds no eligible target. Round 1 applies strict IPI-aware candidate selection (existing behavior). Round 2 provides a relaxed scan gated only by preempted state as a safety net, addressing cases where IPI context is missed or the runnable set is transient. The second round is controlled by module parameter enable_relaxed_boost (bool, 0644, default on) to allow easy disablement by distributions if needed. Introduce the enable_relaxed_boost parameter, add a first_round flag, retry label, and reset of yielded counter. Gate the IPI-aware check in round 1 and use preempted-only gating in round 2. Keep churn minimal by reusing the same scan logic while preserving all existing heuristics, tracing, and bookkeeping. Signed-off-by: Wanpeng Li --- virt/kvm/kvm_main.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9cf44b6b396d..b03be8d9ae4c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -101,6 +101,9 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_shrink); static bool allow_unsafe_mappings; module_param(allow_unsafe_mappings, bool, 0444); =20 +static bool enable_relaxed_boost =3D true; +module_param(enable_relaxed_boost, bool, 0644); + /* * Ordering of locks: * @@ -4015,6 +4018,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield= _to_kernel_mode) struct kvm *kvm =3D me->kvm; struct kvm_vcpu *vcpu; int try =3D 3; + bool first_round =3D true; =20 nr_vcpus =3D atomic_read(&kvm->online_vcpus); if (nr_vcpus < 2) @@ -4025,6 +4029,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield= _to_kernel_mode) =20 kvm_vcpu_set_in_spin_loop(me, true); =20 +retry: + yielded =3D 0; + /* * The current vCPU ("me") is spinning in kernel mode, i.e. is likely * waiting for a resource to become available. Attempt to yield to a @@ -4057,7 +4064,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yiel= d_to_kernel_mode) continue; =20 /* IPI-aware candidate selection */ - if (!kvm_vcpu_is_good_yield_candidate(me, vcpu, yield_to_kernel_mode)) + if (first_round && + !kvm_vcpu_is_good_yield_candidate(me, vcpu, yield_to_kernel_mode)) + continue; + + /* Minimal preempted gate for second round */ + if (!first_round && !READ_ONCE(vcpu->preempted)) continue; =20 if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) @@ -4071,6 +4083,16 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yiel= d_to_kernel_mode) break; } } + + /* + * Second round: relaxed boost as safety net, with preempted gate. + * Only execute when enabled and when the first round yielded nothing. + */ + if (enable_relaxed_boost && first_round && yielded <=3D 0) { + first_round =3D false; + goto retry; + } + kvm_vcpu_set_in_spin_loop(me, false); =20 /* Ensure vcpu is not eligible during next spinloop */ --=20 2.43.0