From nobody Sun Feb 8 00:13:59 2026 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 635C22472A5 for ; Fri, 19 Dec 2025 03:53:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116427; cv=none; b=nad/lHCzwOgMf5vnqO10qaGh+8NQgtD4+FZbTxmY75wfdKBKMrXxAjAgtmc6HGwP3O/sy07i8T33uzEzqjX3/25awkiQcdzJj5MIywf0w7ZQ2bbJESz7G2+vW6DkDqdBikjg7cNY0JTq7MlA9iAPJhen9eUfYDReqAT7plU0Gsc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116427; c=relaxed/simple; bh=w0CW7uo/vOZWuw/zve3/fMNCvsXnM4VsZHL3uPvJ9u8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GoviyIYg6jI8ERX1z/TBoUHEAmesWa58yhrnhxxcSRxSY6t2U8llLsJENYlOhoPCEMHHsCpKZhTjr/lm+kaDGP+Uo6NBDGOG+5n6VT1deR7nosCxQxcj/rWxWdPJWuNoh5OxY9zIuNH9fS3gcpx+y5kWQqy0c2l/tltiWofy83E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PUkhpetp; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PUkhpetp" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-2a0a33d0585so12696045ad.1 for ; Thu, 18 Dec 2025 19:53:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766116424; x=1766721224; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iU1adEHK7jmf27ulImKBopSB3D0y4RPvKljgfgBZ8/o=; b=PUkhpetpfzTTjY1w+XfzjaY3U0Idbq1rfoY9/9UYaxGuJEyjJVCeNT0sPcMrSlJpnY q8SQsay+xWgv2JQT1sM7UOYBTc7a3e845zlUYZS9nXdWdF7+eq6RrP/12GHS32WWsBjC ZMHEEZP7VNE9kQfAv6L4o3Zy6ZaGy1z3qt1b+bg5eO9rbINscZh0AB3JGpbYFIqrkU05 SDIP1W1UEqdBDjC2j0X5kcsC3MGVZWegzwLsHc9zJ6zCrGqNKg7ecJcAMBMEPpibndLK EDNrcLzXa0NH3t9WggVURhzORSCHDQqhJ1iH25HwKsHUfBN9lrQCqFAqGb97Zoa4LEFw ts6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766116424; x=1766721224; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=iU1adEHK7jmf27ulImKBopSB3D0y4RPvKljgfgBZ8/o=; b=FViGONVKUxdPum1E+KAHAqUBB8MlskDZ/u5lypxhBwWlWSyS7+AcotRSyUp8s1T/y9 F1LN3Rv9lZ3/yh2HYEWrp/0d+I2+NkfIIkO2P8BTzYnQgil8FEo81mZGY4RIX7bmx68e ti6KU5ZXmAAvohFXa4hoKSpkYJMJZRdYqwdmrHWwjih4nKmUEgl9NZNPzlOi8v1pf/dz B/iobIZFVkDnzld5ZJGcBWFZIDjI8PGdJ15jdRD1r6oNn6x3zxgqDa4lU9y7YbMUlNZs HiLymiqW6lIc/mRfCDFsWvfgK13Z6bTNIVhbfYZurAERHvEoOkEP9Jiu/eWrWwteriuX PhEQ== X-Forwarded-Encrypted: i=1; AJvYcCVn9PHSGmva5w3XMhQC5gNhyvtg2un8h7WwJfpMK9c12ld0ey7NwEz4RdVCk2heZN0CLZn/rH7Fu8X7s70=@vger.kernel.org X-Gm-Message-State: AOJu0Yyw6h408VSiUrHI5TLHUCxk3nDcFwgk5UV7z4z5bPDVz85rOcEp OdN0Z1QIWJIbpCNc+wFcx7BCRMAvhdQwv69kBCVrou8nrQ8BOqiRkEup X-Gm-Gg: AY/fxX5ZULTAfeo75nQnFFkdYWg0T4tshqhNPlgxYZ0zolVrFPqyHOaSg/xHmpY5akt iYSoXjyxJ9dih9dOPjfD1mz1hZM4UFwjzAn6COKkNnVZsQPYpUXDQr+G4hquC4NJ7kGqdFnLbRF SB2GjxSlQx8f1i0JIp696JTC2U60YIHrOkhEg8KsIL+KDiTVmRlG8FqM0fT2JBmIzsnwo79Ur5B qrYS7jY6jsol9rPo998UWlTAxniNbH+CF7y0Id02nrmWDTKHp8FoeZpCDI+p/jL2FEOttSiufgv QRJdVRXLVej14X6Wdcz6+xLw8+Xy8Nvr8m3fjGSc6yvE1lQliQ4lf1OIxjcQHK2EwZ0gsf1VMam 98jHGK2fEmMWthtTfkSh3+lkk1zB5lfXgK87ANJ7bk+DkcOs612asVeSxGf91hiu9Gb3whrukVb vH8YxkJIC2CQ== X-Google-Smtp-Source: AGHT+IHq/XTLa4HqH0djQQD62duqDqJhKgnIGFSZ4XSY56Y+J9Hdfzsd3Z/LiRxZAWPyY68QIrnyoQ== X-Received: by 2002:a17:902:c943:b0:295:9db1:ff3a with SMTP id d9443c01a7336-2a2f2735164mr15581985ad.28.1766116424485; Thu, 18 Dec 2025 19:53:44 -0800 (PST) Received: from wanpengli.. ([175.170.92.22]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2a2f3d4d36esm7368135ad.63.2025.12.18.19.53.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Dec 2025 19:53:44 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: K Prateek Nayak , Christian Borntraeger , Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH v2 1/9] sched: Add vCPU debooster infrastructure Date: Fri, 19 Dec 2025 11:53:25 +0800 Message-ID: <20251219035334.39790-2-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251219035334.39790-1-kernellwp@gmail.com> References: <20251219035334.39790-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li Introduce foundational infrastructure for the vCPU debooster mechanism to improve yield_to() effectiveness in virtualization workloads. Add per-rq tracking fields for rate limiting (yield_deboost_last_time_ns) and debouncing (yield_deboost_last_src/dst_pid, last_pair_time_ns). Introduce global sysctl knob sysctl_sched_vcpu_debooster_enabled for runtime control, defaulting to enabled. Add debugfs interface for observability and initialization in sched_init(). The infrastructure is inert at this stage as no deboost logic is implemented yet, allowing independent verification that existing behavior remains unchanged. v1 -> v2: - Rename debugfs entry from sched_vcpu_debooster_enabled to vcpu_debooster_enabled for consistency with other sched debugfs entries - Add explicit initialization of yield_deboost_last_time_ns to 0 in sched_init() for clarity - Improve comments to follow kernel documentation style Signed-off-by: Wanpeng Li --- kernel/sched/core.c | 9 +++++++-- kernel/sched/debug.c | 2 ++ kernel/sched/fair.c | 7 +++++++ kernel/sched/sched.h | 12 ++++++++++++ 4 files changed, 28 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 41ba0be16911..9f0936b9c1c9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8606,9 +8606,14 @@ void __init sched_init(void) #endif /* CONFIG_CGROUP_SCHED */ =20 for_each_possible_cpu(i) { - struct rq *rq; + struct rq *rq =3D cpu_rq(i); + + /* Initialize vCPU debooster per-rq state */ + rq->yield_deboost_last_time_ns =3D 0; + rq->yield_deboost_last_src_pid =3D -1; + rq->yield_deboost_last_dst_pid =3D -1; + rq->yield_deboost_last_pair_time_ns =3D 0; =20 - rq =3D cpu_rq(i); raw_spin_lock_init(&rq->__lock); rq->nr_running =3D 0; rq->calc_load_active =3D 0; diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 41caa22e0680..13e67617549d 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -508,6 +508,8 @@ static __init int sched_init_debug(void) debugfs_create_file("tunable_scaling", 0644, debugfs_sched, NULL, &sched_= scaling_fops); debugfs_create_u32("migration_cost_ns", 0644, debugfs_sched, &sysctl_sche= d_migration_cost); debugfs_create_u32("nr_migrate", 0644, debugfs_sched, &sysctl_sched_nr_mi= grate); + debugfs_create_u32("vcpu_debooster_enabled", 0644, debugfs_sched, + &sysctl_sched_vcpu_debooster_enabled); =20 sched_domains_mutex_lock(); update_sched_domain_debugfs(); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index da46c3164537..87c30db2c853 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -81,6 +81,13 @@ static unsigned int normalized_sysctl_sched_base_slice = =3D 700000ULL; =20 __read_mostly unsigned int sysctl_sched_migration_cost =3D 500000UL; =20 +/* + * vCPU debooster: runtime toggle for yield_to() vruntime penalty mechanis= m. + * When enabled (default), yield_to() applies bounded vruntime penalties to + * improve lock holder scheduling in virtualized environments. + */ +unsigned int sysctl_sched_vcpu_debooster_enabled __read_mostly =3D 1; + static int __init setup_sched_thermal_decay_shift(char *str) { pr_warn("Ignoring the deprecated sched_thermal_decay_shift=3D option\n"); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d30cca6870f5..b7aa0d35c793 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1294,6 +1294,16 @@ struct rq { unsigned int push_busy; struct cpu_stop_work push_work; =20 + /* + * vCPU debooster: per-rq state for yield_to() optimization. + * Used to rate-limit and debounce vruntime penalties applied + * when a vCPU yields to a lock holder. + */ + u64 yield_deboost_last_time_ns; + pid_t yield_deboost_last_src_pid; + pid_t yield_deboost_last_dst_pid; + u64 yield_deboost_last_pair_time_ns; + #ifdef CONFIG_SCHED_CORE /* per rq */ struct rq *core; @@ -2958,6 +2968,8 @@ extern int sysctl_resched_latency_warn_once; =20 extern unsigned int sysctl_sched_tunable_scaling; =20 +extern unsigned int sysctl_sched_vcpu_debooster_enabled; + extern unsigned int sysctl_numa_balancing_scan_delay; extern unsigned int sysctl_numa_balancing_scan_period_min; extern unsigned int sysctl_numa_balancing_scan_period_max; --=20 2.43.0 From nobody Sun Feb 8 00:13:59 2026 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0A911F5834 for ; Fri, 19 Dec 2025 03:53:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116431; cv=none; b=P6U9jNRCMfiJnguskMeY4q/UCaGdlVVs85HYOr+Md4rbNR5AZlBbxlDJhlB71oXDw9tcJbbZWaTk73cTIU0sX9BKTA61J0BiG1MMAjq5HL5L9jCYrrhP9pegicdgZUaULgFGX+sxmklRj7JO9aVTztHitBssnHyPnxGf2qg9AZ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116431; c=relaxed/simple; bh=uyTv52x6ZHnufc68tUyR5LEwDwfJcWQT5mBt9gd8Sac=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kNIBy9gabxI5F/WqXkfR1tyYfZ++Z4hxWGYJ6cI6mIn9JJ7NP+puE0TaTeb/nGma6KK9Yv4b57pH2CK44HNtlGQ0uMDiazFALDdBZoOgXXjZbN66J9ajw0EMC0uSD3AtUNXipGDfLeJ21LFnvPY3K/N3uvpumeu+wxz4zQE7uTo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=G63yyoXX; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="G63yyoXX" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-2a0bae9aca3so18538075ad.3 for ; Thu, 18 Dec 2025 19:53:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766116428; x=1766721228; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OMdxKLElCH6dq+q/a5oZXIHsPg35QMvjNqTyuXC9Pks=; b=G63yyoXX90j9rPY6yazBvt59twEN+FEBolKsSOdmwofhO5LXXEcIciPAyy5o0umJoF 0rjSz9Bm/plmzOV1yk0fxdYnOJYnSCwi6T2LS4rAGe1AqWXdcXPKpEygsRIRFu6vUDgK ySGFVXEWEIuwjez4FInBEOHKdeMfvZ2aGvpaNYL7kQo/YuXK9WYPF1jrWz1rh2azJooy tUQURw+yMx1ntfnO/Z7/1vGTw/qcTlWZgorw2QpcXOVlrCbYSJEMRwr23Hd0G5m4aQb+ c518vrBsJSlh3wqJXpZhJTOa4gN1rQYmJfhfzvRzafNF+ryYCy2se8JWCdVmPzg/9LiJ XgOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766116428; x=1766721228; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=OMdxKLElCH6dq+q/a5oZXIHsPg35QMvjNqTyuXC9Pks=; b=H7iP5ZKN6ACdqkSi+IvouYFtwM+MJCfzFfNuBRfMzhkLIrquOZ/hFZJ179xXRN9XTD jUtQhibsnxkafyeemIX+4mAsU3jkuSayUePT9k4s9dY3jr4J2kHfZr/aXhfuuOklutK5 ZvmcMuD9ThqNhvAtlwwUTaf8oAp7uQHhPAZ5p9PQLqPUFRJxO++N38c+qg3mU75GnYle dXf9Vuu+Zt0JxC3i/jytrmZgJCmm5mL3HAMJEo/COpkhFlLqsTPlm6XhpSHnCuHdbrFa +OM2BpgWosr9eU/6it3JUNlLxPxa/36ZJi8qvvpO0+n+b4xmgPFBGq6IK01kuN083OR+ 6ouA== X-Forwarded-Encrypted: i=1; AJvYcCVfsSYJZImA9OnR+0/mHlqYCqQ4Ata5hCEYkYMrlC6JU32C//qRCD7VOaYC0GRR9/XK98HqznhmfZnQ5GM=@vger.kernel.org X-Gm-Message-State: AOJu0YymZuZdmpdOMLDOjfWdtn8VYp0rAUXHnJboYeQGjjGAYoKo6yug DqLvGmx2j8eYScplTkP/IxCvHfXNM1AaJZwMawuQZ9bDASMheXsVEzwi X-Gm-Gg: AY/fxX7WLbR+C/ECM0b1JiY5sX7xTBnhAiNrvbztZsg/OfT1s7eJx2rpqVk1QTQkSQi j+dJpk9h3IvKBm0mBGF2bncmySX5a+AhxZCdS9P/spUy9ck3BpdDhz7ehDVRDQUaS5C04l4GE5f VnWJXzdb1Mm9aVL9WWL+ftnjEcUjlDaM0HzoqoBDeOvWVSKNo6E7FwnZhZEyMCKG2VoiiAs1Hao 0DgNYiNXCf1fc4kVLcRSAYOo7VEIV21EPSFlu5HATnuRBQh6KjYDRvY0V3tzj2x+wXfz2Vu3Ik/ CkR2vHwNc3wJTY0WB3zATEHUa99IzpqSfxN2x0jk2mby5b7Y+C+uzBQgGaqzovxWlu+VigVRhsD 5cMegpn3AS5o5iA1jHYHJKZena593+OBHGQ9CVtQGSCIUrntgsCuPlFEqi8WRTizlLL/mFceOBR Cr2rI49tok1A== X-Google-Smtp-Source: AGHT+IFnJ2R42OMGoxKqOusa0MAzx7Gd6hkBgQwWawENhau2L5GQ8F2uMlxqxfr4j40Vr0WoY7jZWQ== X-Received: by 2002:a17:903:2285:b0:29b:e512:752e with SMTP id d9443c01a7336-2a2f293b6c1mr13029455ad.47.1766116428140; Thu, 18 Dec 2025 19:53:48 -0800 (PST) Received: from wanpengli.. ([175.170.92.22]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2a2f3d4d36esm7368135ad.63.2025.12.18.19.53.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Dec 2025 19:53:47 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: K Prateek Nayak , Christian Borntraeger , Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH v2 2/9] sched/fair: Add rate-limiting and validation helpers Date: Fri, 19 Dec 2025 11:53:26 +0800 Message-ID: <20251219035334.39790-3-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251219035334.39790-1-kernellwp@gmail.com> References: <20251219035334.39790-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li Implement core safety mechanisms for yield deboost operations. Add yield_deboost_rate_limit() for high-frequency gating to prevent excessive overhead on compute-intensive workloads. The 6ms threshold balances responsiveness with overhead reduction. Add yield_deboost_validate_tasks() for comprehensive validation ensuring both tasks are valid and distinct, both belong to fair_sched_class, target is on the same runqueue, and tasks are runnable. The rate limiter prevents pathological high-frequency cases while validation ensures only appropriate task pairs proceed. Both functions are static and will be integrated in subsequent patches. v1 -> v2: - Remove unnecessary READ_ONCE/WRITE_ONCE for per-rq fields accessed under rq->lock - Change rq->clock to rq_clock(rq) helper for consistency - Change yield_deboost_rate_limit() signature from (rq, now_ns) to (rq), obtaining time internally via rq_clock() - Remove redundant sched_class check for p_yielding (already implied by rq->donor being fair) - Simplify task_rq check to only verify p_target - Change rq->curr to rq->donor for correct EEVDF donor tracking - Move sysctl_sched_vcpu_debooster_enabled and NULL checks to caller (yield_to_deboost) for early exit before update_rq_clock() - Simplify function signature by returning p_yielding directly instead of using output pointer parameters - Add documentation explaining the 6ms rate limit threshold Signed-off-by: Wanpeng Li --- kernel/sched/fair.c | 62 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 87c30db2c853..2f327882bf4d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9040,6 +9040,68 @@ static void put_prev_task_fair(struct rq *rq, struct= task_struct *prev, struct t } } =20 +/* + * Rate-limit yield deboost operations to prevent excessive overhead. + * Returns true if the operation should be skipped due to rate limiting. + * + * The 6ms threshold balances responsiveness with overhead reduction: + * - Short enough to allow timely yield boosting for lock contention + * - Long enough to prevent pathological high-frequency penalty application + * + * Called under rq->lock, so direct field access is safe. + */ +static bool yield_deboost_rate_limit(struct rq *rq) +{ + u64 now =3D rq_clock(rq); + u64 last =3D rq->yield_deboost_last_time_ns; + + if (last && (now - last) <=3D 6 * NSEC_PER_MSEC) + return true; + + rq->yield_deboost_last_time_ns =3D now; + return false; +} + +/* + * Validate tasks for yield deboost operation. + * Returns the yielding task on success, NULL on validation failure. + * + * Checks: feature enabled, valid target, same runqueue, target is fair cl= ass, + * both on_rq. Called under rq->lock. + * + * Note: p_yielding (rq->donor) is guaranteed to be fair class by the call= er + * (yield_to_task_fair is only called when curr->sched_class =3D=3D p->sch= ed_class). + */ +static struct task_struct __maybe_unused * +yield_deboost_validate_tasks(struct rq *rq, struct task_struct *p_target) +{ + struct task_struct *p_yielding; + + if (!sysctl_sched_vcpu_debooster_enabled) + return NULL; + + if (!p_target) + return NULL; + + if (yield_deboost_rate_limit(rq)) + return NULL; + + p_yielding =3D rq->donor; + if (!p_yielding || p_yielding =3D=3D p_target) + return NULL; + + if (p_target->sched_class !=3D &fair_sched_class) + return NULL; + + if (task_rq(p_target) !=3D rq) + return NULL; + + if (!p_target->se.on_rq || !p_yielding->se.on_rq) + return NULL; + + return p_yielding; +} + /* * sched_yield() is very simple */ --=20 2.43.0 From nobody Sun Feb 8 00:13:59 2026 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C0502BDC03 for ; Fri, 19 Dec 2025 03:53:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116434; cv=none; b=uCYtxFvFP9p2+B+CijJlirZWHF5ZzkAWiczqLqZCTYUTVEpLOPokVX8UyFU6yrqH9GQWvWqjcFl8Vq3T0z/9CLyaBTM6GGwDySxoZI7S0Ow3iHI2Sh0l6GIOojmLC/wq8x/QFq9Dk48dSDpZSquyZsXX2RgCwUMxnmJYH0kteBg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116434; c=relaxed/simple; bh=nCfhV3t3w0fDBZ86SZ2f5UYt0sbOmM5oEt/LOCV1HW4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e7fTOZAkrwSh/LqYR/9P/YIDobuG1iCRtE6ZaEBdjfEopSrwuBam48m8wreapWZW6GI/LjzPwNrSdg+talb8vkOJpjQtstYZuc2mp3ss8Qo3Ssc/wsfsXnVQKfuSxpMtk7ShibvJ1pt089HVbI2o7O5riWwTg5+WJShibfmT2CM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cCZjaWc1; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cCZjaWc1" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-2a081c163b0so12810215ad.0 for ; Thu, 18 Dec 2025 19:53:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766116432; x=1766721232; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PKNOibSL1/cWKHhBQtnXUGps0akL1GNIAh6E4cPLko8=; b=cCZjaWc1R4JwB3IAFxYJEg8sRuhJCQ6qPTuB6WDCOb47OFHUPBve4lbtnxH1GS+hqb HowBcqxHnw4ZLp4RO7KeGJ5BT6sVt/Td24IW6BvuekBzXu3PB5DTZGKm1Plp7Xs+Q7MY 7o/yfgpR4lsI9TCApEMeMEUX31Vipoh1YV5UKlc+O0cKS+s1hXJB93SMQmZlojpmejyU GvyNUbEqvYPVVmKiXA0oXOVV1hu21qle+H+GhzGUeUfJpbI9EBCcxF7OdAXpxtcq/wj0 VrP6oVMS21hlZ6z1yTssCT+Nbhy+D/pMXp6ZnWrgJhVenjk88+PFlFHXgq79i9UzAgzD BR/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766116432; x=1766721232; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=PKNOibSL1/cWKHhBQtnXUGps0akL1GNIAh6E4cPLko8=; b=pLcdmGDibw6eqfu27qrgIhU/RIkSxK15M47cku4IgMOWHL4GaElti5G9WYYBDVXCna M55pLhjQfe+njNPhJRWzqAci9j1YyvjpCmIF/WylKgINsiMBiO2DMVpRpuYHjw/JkALY 1OHFJv3ztWjLQzE6bVWLZy/X2BOxmpMP+ugLEcQg7UispEe1X8pEUB8kjRKOb3M0ggUy jWVt6XJsrd5Osl+3OiI1ljpaLAPo0jPlLBi7sR5mnTdZLBihsuzkxMQDWt7Fl3930jvC dQF+myfdCJhtaSOakVXSZmZyw9L6oSB7c1lRd5XAlRMCMczRxwOe9JrTpQG8L2v3ZeLJ j6DA== X-Forwarded-Encrypted: i=1; AJvYcCUr8fHm1Q+uzytrquftbnCKo2OpKXY9fAk8KutRuDy9nKoo1w+Qtxh01CC5EbY4eX1HcRRQfgM3qJz+w5s=@vger.kernel.org X-Gm-Message-State: AOJu0Ywq0Zq338TJ4PSapFwFEk9bQ6UPGYkDQV3WJFrjrLmlEdeA88vV XaIz/2c5Fq4dfoH5cygdOkx0OMxRZ34Uku2+NPzdUeSty6TrZ6vtyD8EeR3MXDw22sLV8w== X-Gm-Gg: AY/fxX7Lli0Qgty29qqUZsbRgCm2+E8nZL683htin0mNbNRYTbUOmeCCXnb9XMVUd+u r7JRXbP01GxJ/ftWgt17pPPDyr4PC+02lBsSIRh8hs3uhSbOJ5FMmhKHuZkNT+5TRkUBnOnKNuJ 2jk9EtSJ4VE9dkcamJgsiCL4oeLMzIJJDaBHkyk6UB6zWg3kWD/dUvDxFQh6wWjGU+m8wpMc+Ge wwviqDOVbaHPdfo0wtrWFC4UQnzG1th+GmIDT8Li0TdPTuhlE7Nl1xHzH28hsZut4SkzAmNrN3r mtGk2VB/SH5gViyNeNgPWke3wQU4tM+iYGngVn+l7mtJHzkt91CX1sqaHxXn5eFKyUC2ZNZuUuG Tjwvu9CTfHrx2P4ssy2/qG4aYQOSO0ZKREb5HA/ri4TWlLLgWD7866B9ns9g/QsJMIVvO/Qe9DC trwda2x6Ymyw== X-Google-Smtp-Source: AGHT+IEJYyJNSB3Ukzqrx3Oaw5wAgD+IMqeB5MdBvVM8lj03fHxXJGocNZ4Q2zrJZ+PSFeJR4jFMuQ== X-Received: by 2002:a17:903:2348:b0:2a1:2b5f:d16b with SMTP id d9443c01a7336-2a2f28367e7mr12961555ad.31.1766116431728; Thu, 18 Dec 2025 19:53:51 -0800 (PST) Received: from wanpengli.. ([175.170.92.22]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2a2f3d4d36esm7368135ad.63.2025.12.18.19.53.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Dec 2025 19:53:51 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: K Prateek Nayak , Christian Borntraeger , Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH v2 3/9] sched/fair: Add cgroup LCA finder for hierarchical yield Date: Fri, 19 Dec 2025 11:53:27 +0800 Message-ID: <20251219035334.39790-4-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251219035334.39790-1-kernellwp@gmail.com> References: <20251219035334.39790-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li Implement yield_deboost_find_lca() to locate the lowest common ancestor (LCA) in the cgroup hierarchy for EEVDF-aware yield operations. The LCA represents the appropriate hierarchy level where vruntime adjustments should be applied to ensure fairness is maintained across cgroup boundaries. This is critical for virtualization workloads where vCPUs may be organized in nested cgroups. Key aspects: - For CONFIG_FAIR_GROUP_SCHED: Walk up both entity hierarchies by aligning depths, then ascending together until common cfs_rq found - For flat hierarchy: Simply verify both entities share the same cfs_rq - Validate that meaningful contention exists (h_nr_queued > 1) - Ensure yielding entity has non-zero slice for safe penalty calculation Function operates under rq->lock protection. Static helper integrated in subsequent patches. v1 -> v2: - Change nr_queued to h_nr_queued for accurate hierarchical task counting that includes tasks in child cgroups - Improve comments to clarify the LCA algorithm Signed-off-by: Wanpeng Li --- kernel/sched/fair.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2f327882bf4d..39dbdd222687 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9102,6 +9102,36 @@ yield_deboost_validate_tasks(struct rq *rq, struct t= ask_struct *p_target) return p_yielding; } =20 +/* + * Find the lowest common ancestor (LCA) in the cgroup hierarchy. + * Uses find_matching_se() to locate sibling entities at the same level, + * then returns their common cfs_rq for vruntime adjustments. + * + * Returns true if a valid LCA with meaningful contention (h_nr_queued > 1) + * is found, storing the LCA entities and common cfs_rq in output paramete= rs. + */ +static bool __maybe_unused +yield_deboost_find_lca(struct sched_entity *se_y, struct sched_entity *se_= t, + struct sched_entity **se_y_lca_out, + struct sched_entity **se_t_lca_out, + struct cfs_rq **cfs_rq_out) +{ + struct sched_entity *se_y_lca =3D se_y; + struct sched_entity *se_t_lca =3D se_t; + struct cfs_rq *cfs_rq; + + find_matching_se(&se_y_lca, &se_t_lca); + + cfs_rq =3D cfs_rq_of(se_y_lca); + if (cfs_rq->h_nr_queued <=3D 1) + return false; + + *se_y_lca_out =3D se_y_lca; + *se_t_lca_out =3D se_t_lca; + *cfs_rq_out =3D cfs_rq; + return true; +} + /* * sched_yield() is very simple */ --=20 2.43.0 From nobody Sun Feb 8 00:13:59 2026 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 356DE1F5834 for ; Fri, 19 Dec 2025 03:53:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116439; cv=none; b=rRfMH98nsN/UJVMDjC30mr4i4J6aLpl6aEODOKkhDsJn5LjSBb4riSU0nVW/Ma89Zyu+qhfa4G+NHOYjH7l/AHZrlCZ0iS2mEyUWrK5vIHn+ZbnQdeHFcc6QpeT+UBfyJpGjPO0l1tqxEo634jKteLicxhLExN/cnrBpgiatq+c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116439; c=relaxed/simple; bh=PyzQEsSg7kw/qd32fHAPwWOyK7EtXgvb42CetiCJpCk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KP7FZAi0dlJrcC1+Plz1j6x9WOWhG16+sA7Fq2bASUYvSppEAOpasFNtWlo831pOrsgOzCzSV/MOgOOWQloGUvVK9k78oJ0cq0nSdInRHOF7RyEe/ACWZZTlyUQ2t2kylav3EVn6HMbX73P0vVFOZn8Iq8BQ+9UmH239kuLo4PY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bPBVJrtX; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bPBVJrtX" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-2a081c163b0so12810445ad.0 for ; Thu, 18 Dec 2025 19:53:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766116435; x=1766721235; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hYp6mne9RQBnm/zJso5fmZJvi26B9MVfbc7EUoVI1Fo=; b=bPBVJrtXuDu9wT96uH+79m4FJIvGRtynSjvqvj6waIhHHaI8LparlITe9F/0IrV8IP yiSLuBnuWJ1O37v5422aGR7ai3ykPdlQCy61tSZ0+2NaX69HGlA/KLv1zQOSqm5NeCq+ VGxehlnnebLKOzuzoIKocwU8zG/kdE5Kft0kqlMyapVPRfDnd/7k0fZNGQClkJWePrf2 bm3iq1mPvdp2M23zjUFErszrSLa4mrboTE4kwX+HDss3hP9GKGDrqJZqPKb6JIcFJmoS 2rszX88JPqSxr/PFtw999LtjTxVNR3JCyJUaNIdiRDPJFcbt3q11O47Ae2KmngA3Km7S gniw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766116435; x=1766721235; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=hYp6mne9RQBnm/zJso5fmZJvi26B9MVfbc7EUoVI1Fo=; b=g1OG/6pZP6ReOcxL3Uv2ejnbzl4lDkc/FGGoyH4pg+WGTmDkQZQ4S2hQArmnV074YP cqjyff1cJd0HQTpHSLWWPBkc33YR+u0ADvMa4UBhuQX1rSd1r9ftr1/12bmvP6uV8YFl i9y+2V7uUuIz3XEw9BEFqf0gnHfCQ5uYWcKVX8spwremKHS8dzSkSeI4C6gL1iLldAPt 0XHfm3QlsUPWTYu4qaudYVGN4xwprIEC6B4mE/UN02bsDvFpJqJfRy+g8KhCMnBooFRr 7ozI8niqym1BLYnsP6Ijean0fhUebsMe8d7vSSG6hqP/54yZvTp0jUDZLUOAU1du3t+b GPQA== X-Forwarded-Encrypted: i=1; AJvYcCWJZlCOX4FmA0XtXVAH4tfBGfOVTNOMPyxfYgYvvbJq0tegvhGujd4dSjiTy1wxJQT6dShSWf1JOeBdnXU=@vger.kernel.org X-Gm-Message-State: AOJu0YxgoVJ8TSgLHc6D8qW/xus7wICAS49cFJ1xkSTcb9z8P3E9MIwQ GAV9Ljp/PO+PtOXZ1s7KGtuB02/R4r42R53N54h6ND4BVJrxADy7o3tN X-Gm-Gg: AY/fxX4cI4AWqFwEX7HOZTkZcQrwQg+sfZpVoVOUco4+aeITO76p4n6aZKEJHmU3kmw TCqZF43JIFNBsAHkpUnD6VeMHWnK4nwWRZBhUFni+KiZ3lLPWTpw0FormKkWRMXimktgR9uW4qk WrhGJep20tX7XadFsy+f2JmVYLxgkWnX3z4VIaKus+VIEUPBYDdl4nVly6shEP4x3jFcycbd2ZZ VejWKcqkN5cWsOI0mAToxr4TKBUia50FwvUuW+TcRIz75Glm3hRoZYXfInWH9wgPuLs7AvDKl07 AUl2JK01xhAyANOarTgFSKGf5cmVmKhUk3piPArsloEuBvuvObzD6mVlV0TqIkhzkhAA+NCBtc0 hENV7gWaI5xxn5s358Ai2aKdkuLcY2epb4dArq9oE+DWfuFVF8afgwsk8BprEsEX34FcW3tQ5qF oPVY33MUmvog== X-Google-Smtp-Source: AGHT+IHiDoS1clMCz/1DsX5HsR1i60zES2fsmiCSAhFgHJzfjtR8nSUq08UUT+JvRXYXAyqsQ9X/YA== X-Received: by 2002:a17:902:cec6:b0:2a0:fb05:879a with SMTP id d9443c01a7336-2a2f2a4f6damr14153035ad.51.1766116435362; Thu, 18 Dec 2025 19:53:55 -0800 (PST) Received: from wanpengli.. ([175.170.92.22]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2a2f3d4d36esm7368135ad.63.2025.12.18.19.53.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Dec 2025 19:53:55 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: K Prateek Nayak , Christian Borntraeger , Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH v2 4/9] sched/fair: Add penalty calculation and application logic Date: Fri, 19 Dec 2025 11:53:28 +0800 Message-ID: <20251219035334.39790-5-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251219035334.39790-1-kernellwp@gmail.com> References: <20251219035334.39790-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li Implement core penalty calculation and application mechanisms for yield deboost operations. yield_deboost_apply_debounce(): Reverse-pair debouncing prevents ping-pong. When A->B then B->A within ~600us, penalty is downscaled. yield_deboost_calculate_penalty(): Calculate vruntime penalty based on: - Fairness gap (vruntime delta between yielding and target tasks) - Scheduling granularity based on yielding entity's weight - Queue-size-based caps (2 tasks: 6.0x gran, 3: 4.0x, 4-6: 2.5x, 7-8: 2.0x, 9-12: 1.5x, >12: 1.0x) - Special handling for zero gap with refined multipliers - 10% weighting on positive gaps (alpha=3D1.10) yield_deboost_apply_penalty(): Apply calculated penalty to EEVDF state, updating vruntime and deadline atomically. The penalty mechanism provides sustained scheduling preference beyond the transient buddy hint, critical for lock holder boosting in virtualized environments. v1 -> v2: - Change nr_queued to h_nr_queued for accurate hierarchical task counting in penalty cap calculation - Remove vlag assignment as it will be recalculated on dequeue/enqueue and modifying it for on-rq entity is incorrect - Remove update_min_vruntime() call: in EEVDF the yielding entity is always cfs_rq->curr (dequeued from RB-tree), so modifying its vruntime does not affect min_vruntime calculation - Remove unnecessary gran_floor safeguard (calc_delta_fair already handles edge cases correctly) - Change rq->curr to rq->donor for correct EEVDF donor tracking - Simplify debounce function signature Signed-off-by: Wanpeng Li --- kernel/sched/fair.c | 155 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 155 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 39dbdd222687..8738cfc3109c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9132,6 +9132,161 @@ yield_deboost_find_lca(struct sched_entity *se_y, s= truct sched_entity *se_t, return true; } =20 +/* + * Apply debounce for reverse yield pairs to reduce ping-pong effects. + * When A yields to B, then B yields back to A within ~600us, downscale + * the penalty to prevent oscillation. + * + * The 600us threshold is chosen to be: + * - Long enough to catch rapid back-and-forth yields + * - Short enough to not affect legitimate sequential yields + * + * Returns the (possibly reduced) penalty value. + */ +static u64 yield_deboost_apply_debounce(struct rq *rq, struct task_struct = *p_target, + u64 penalty, u64 need, u64 gran) +{ + u64 now =3D rq_clock(rq); + struct task_struct *p_yielding =3D rq->donor; + pid_t src_pid, dst_pid; + pid_t last_src, last_dst; + u64 last_ns; + + if (!p_yielding || !p_target) + return penalty; + + src_pid =3D p_yielding->pid; + dst_pid =3D p_target->pid; + last_src =3D rq->yield_deboost_last_src_pid; + last_dst =3D rq->yield_deboost_last_dst_pid; + last_ns =3D rq->yield_deboost_last_pair_time_ns; + + /* Detect reverse pair: previous was target->source */ + if (last_src =3D=3D dst_pid && last_dst =3D=3D src_pid && + (now - last_ns) <=3D 600 * NSEC_PER_USEC) { + u64 alt =3D max(need, gran); + + if (penalty > alt) + penalty =3D alt; + } + + /* Update tracking state */ + rq->yield_deboost_last_src_pid =3D src_pid; + rq->yield_deboost_last_dst_pid =3D dst_pid; + rq->yield_deboost_last_pair_time_ns =3D now; + + return penalty; +} + +/* + * Calculate vruntime penalty for yield deboost. + * + * The penalty is based on: + * - Fairness gap: vruntime difference between yielding and target tasks + * - Scheduling granularity: base unit for penalty calculation + * - Queue size: adaptive caps to prevent starvation in larger queues + * + * Queue-size-based caps (multiplier of granularity): + * 2 tasks: 6.0x - Strongest push for 2-task ping-pong scenarios + * 3 tasks: 4.0x + * 4-6: 2.5x + * 7-8: 2.0x + * 9-12: 1.5x + * >12: 1.0x - Minimal push to avoid starvation + * + * Returns the calculated penalty value. + */ +static u64 __maybe_unused +yield_deboost_calculate_penalty(struct rq *rq, struct sched_entity *se_y_l= ca, + struct sched_entity *se_t_lca, + struct task_struct *p_target, int h_nr_queued) +{ + u64 gran, need, penalty, maxp; + u64 weighted_need, base; + + gran =3D calc_delta_fair(sysctl_sched_base_slice, se_y_lca); + + /* Calculate fairness gap */ + need =3D 0; + if (se_t_lca->vruntime > se_y_lca->vruntime) + need =3D se_t_lca->vruntime - se_y_lca->vruntime; + + /* Base penalty is granularity plus 110% of fairness gap */ + penalty =3D gran; + if (need) { + weighted_need =3D need + need / 10; + if (weighted_need > U64_MAX - penalty) + weighted_need =3D U64_MAX - penalty; + penalty +=3D weighted_need; + } + + /* Apply debounce to reduce ping-pong */ + penalty =3D yield_deboost_apply_debounce(rq, p_target, penalty, need, gra= n); + + /* Queue-size-based upper bound */ + if (h_nr_queued =3D=3D 2) + maxp =3D gran * 6; + else if (h_nr_queued =3D=3D 3) + maxp =3D gran * 4; + else if (h_nr_queued <=3D 6) + maxp =3D (gran * 5) / 2; + else if (h_nr_queued <=3D 8) + maxp =3D gran * 2; + else if (h_nr_queued <=3D 12) + maxp =3D (gran * 3) / 2; + else + maxp =3D gran; + + penalty =3D clamp(penalty, gran, maxp); + + /* Baseline push when no fairness gap exists */ + if (need =3D=3D 0) { + if (h_nr_queued =3D=3D 3) + base =3D (gran * 15) / 16; + else if (h_nr_queued >=3D 4 && h_nr_queued <=3D 6) + base =3D (gran * 5) / 8; + else if (h_nr_queued >=3D 7 && h_nr_queued <=3D 8) + base =3D gran / 2; + else if (h_nr_queued >=3D 9 && h_nr_queued <=3D 12) + base =3D (gran * 3) / 8; + else if (h_nr_queued > 12) + base =3D gran / 4; + else + base =3D gran; + + if (penalty < base) + penalty =3D base; + } + + return penalty; +} + +/* + * Apply vruntime penalty and update EEVDF fields for consistency. + * Updates vruntime and deadline; vlag is not modified as it will be + * recalculated when the entity is dequeued/enqueued. + * + * Caller must call update_curr(cfs_rq) before invoking this function + * to ensure accounting is up-to-date before modifying vruntime. + */ +static void __maybe_unused +yield_deboost_apply_penalty(struct sched_entity *se_y_lca, + struct cfs_rq *cfs_rq, u64 penalty) +{ + u64 new_vruntime; + + /* Overflow protection */ + if (se_y_lca->vruntime > U64_MAX - penalty) + return; + + new_vruntime =3D se_y_lca->vruntime + penalty; + if (new_vruntime <=3D se_y_lca->vruntime) + return; + + se_y_lca->vruntime =3D new_vruntime; + se_y_lca->deadline =3D new_vruntime + calc_delta_fair(se_y_lca->slice, se= _y_lca); +} + /* * sched_yield() is very simple */ --=20 2.43.0 From nobody Sun Feb 8 00:13:59 2026 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F6D328DEE9 for ; Fri, 19 Dec 2025 03:54:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116442; cv=none; b=SLayoj+UvCt5lB8A1fYkWnW4Y1HXgj33NsfFtEgiF3+VkIcgv+r4DgHADgIKsMlwGkwE2UgGFWoOJCZJOsHQuTaXhtWIdMVgPTCGCG+T7LxvEgnxpyksAUCpK/LWA3wwBsheokCx2QomYbyhMZjOtfpps3RjejZOI71ZMZYU2SI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116442; c=relaxed/simple; bh=H8U0wRFSdW5gcTaqf8OG8+irmfu/waefCnMBKyOk/y4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HjMQma7t+rJAAgrfgBLiOy67hQG5gC2O8s/MaMxWMMx+tylWVEqik9RmEMDXF3D6YfrC0FVBn953sYfCo5i62H/ubOnmk9u3j0ZSDF8ifZLnRygH7cCVtbx9ntT7ycuWlC3i2oX9BixRNaZgPC0uLH1WXvgBbygYQx0V+EVyPQk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UZRR+Cya; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UZRR+Cya" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-2a0d6f647e2so20874045ad.1 for ; Thu, 18 Dec 2025 19:54:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766116439; x=1766721239; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Qf8yDAv14VzvWW94c4Eap1J+svTAb90Oiob+iJ0N9Kw=; b=UZRR+Cya8HgdcEUuONWiEgVQk1lo1FJWM3QeXpZ1Jjd9+rQxWk8FwtWQRhhm5OBPCH JLJdowOsu3SqQcgNZQzjtltlXdYD5W8JV8YuvSBxMUjUTrAQWZSjjTIdmDG5b0L8VFa2 Exd2csCzIwJQsiiMdYyOfoU9ZkTGRo3nS8LwQaysjOo+DFdz6igIu5tJXXK69E9nIo/w ESWmQ4mBM9gSPXLrhziPi1Bn5cazSyX9HFX+U0z14xmpPkaqBV7kkR76eKIFvxh3/p9x og08McNNOjrFEBfA8QMLkkFdFntkIzlE2V4SsBjtKiArigSLryKtlYa36XytEErOfbB9 m8jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766116439; x=1766721239; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Qf8yDAv14VzvWW94c4Eap1J+svTAb90Oiob+iJ0N9Kw=; b=sCy2vRTf+Pre4T6rVPXzVPEyl9VZqclWeC4sS+RK54LUsUlNGHmQhthLuobU7lZUJK hGpevhh18rZCz+hkEubFq8wDkoWzAMkqAZnuRzGeP7toOWxgAYavV9zgqmnB4kRGR1tJ OsGNWwV1p6VAHxa71+dHocqitieijgtOSrljjGEbEQoOA95Hnap9yYGlRoXgQf0yM4F7 U6cEfBasBV9e6ZjhIVSElWlnz/LNX2FglY4Xc5o4EtCmPzwyi4KRnQutpxORwqhpfFSy cQBdAkRZHjI+xFyQCl/nUnT82Y4yBJ5c708QZcoZ8NaPukEQUvTqFfcrUUyRi0cSIZny xulA== X-Forwarded-Encrypted: i=1; AJvYcCX1VOHx4bBdE0Sde3O+XlqhPwjdbDf3OLEnpB0ZaM/4n3Ju66CaymL94RUlp+YA3SPN1koMFIrGizHIaPk=@vger.kernel.org X-Gm-Message-State: AOJu0YyMwjVZNDvRAOLBgLZyor8X35Kau0iPjq01Q1L/oyJU+HUuEXCc IkOZlG+vdkUHUGdxqhPSq+GtP3UkBD0HrrwxvxYWFU4GVDqMOqsR9xhyhTAeMoGie8ej9g== X-Gm-Gg: AY/fxX4/7y+9q58DToL9vUHp0jTaiOHfrgDGTO8UHgRQ0riPMBgz0wsJz+MrG2VZtAr Rw6QOp7XD0nM7rfMuJja+iwC+8NO/GrdkQjYQ5Qe6oVUyQxf+Vr4U9DlV1DUDUD0FjJ4L3q5xzH qgnegNfiFrMUo2/7HTf5PgP3YEufNRqsAx2WDkY/twJpsOkY01NdHdqXBJ2BYEh+Zt9c2lExf62 8kR/TGIZajqzo3wVnA0gKf3l9zaruW6/BTOnuDrZ7N60KVnjC1/1GVCHMsxUIP3Cfn+gjVIZMh0 qXsBT5KnBi4V/Ra1Op1pr8IzQJnr2yoInpvachpjW6DuApRahQvCgGY0zU/KzwX61eW0yRmQRz+ n+sLMM44IT36FfDZEBAcVMEtkNRS5hakHchefudXzVLEjbgh4nOCi6iMkKakBEFXhgnt3aDG2Ay eeICoCFUkPcA== X-Google-Smtp-Source: AGHT+IG3CrHjUqC8LaluiLhOYvEUK1BTjRjaMa/qSz9KV9sZCRV9vB/gQavWA0z81NhCPjki2yEypw== X-Received: by 2002:a17:902:ec90:b0:297:e59c:63cc with SMTP id d9443c01a7336-2a2f2737be9mr15900185ad.35.1766116439121; Thu, 18 Dec 2025 19:53:59 -0800 (PST) Received: from wanpengli.. ([175.170.92.22]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2a2f3d4d36esm7368135ad.63.2025.12.18.19.53.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Dec 2025 19:53:58 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: K Prateek Nayak , Christian Borntraeger , Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH v2 5/9] sched/fair: Wire up yield deboost in yield_to_task_fair() Date: Fri, 19 Dec 2025 11:53:29 +0800 Message-ID: <20251219035334.39790-6-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251219035334.39790-1-kernellwp@gmail.com> References: <20251219035334.39790-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li Integrate yield_to_deboost() into yield_to_task_fair() to activate the vCPU debooster mechanism. The integration works in concert with the existing buddy mechanism: set_next_buddy() provides immediate preference, yield_to_deboost() applies bounded vruntime penalty based on the fairness gap, and yield_task_fair() completes the standard yield path including the EEVDF forfeit operation. Note: yield_to_deboost() must be called BEFORE yield_task_fair() because v6.19+ kernels perform forfeit (se->vruntime =3D se->deadline) in yield_task_fair(). If deboost runs after forfeit, the fairness gap calculation would see the already-inflated vruntime, resulting in need=3D0 and only baseline penalty being applied. Performance testing (16 pCPUs host, 16 vCPUs/VM): Dbench 16 clients per VM: 2 VMs: +14.4% throughput 3 VMs: +9.8% throughput 4 VMs: +6.7% throughput Gains stem from sustained lock holder preference reducing ping-pong between yielding vCPUs and lock holders. Most pronounced at moderate overcommit where contention reduction outweighs context switch cost. v1 -> v2: - Move sysctl_sched_vcpu_debooster_enabled check to yield_to_deboost() entry point for early exit before update_rq_clock() - Restore conditional update_curr() check (se_y_lca !=3D cfs_rq->curr) to avoid unnecessary accounting updates - Keep yield_task_fair() unchanged (no for_each_sched_entity loop) to avoid double-penalizing the yielding task - Move yield_to_deboost() BEFORE yield_task_fair() to preserve fairness gap calculation (v6.19+ forfeit would otherwise inflate vruntime before penalty calculation) - Improve function documentation Signed-off-by: Wanpeng Li --- kernel/sched/fair.c | 67 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 59 insertions(+), 8 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8738cfc3109c..9e0991f0c618 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9066,23 +9066,19 @@ static bool yield_deboost_rate_limit(struct rq *rq) * Validate tasks for yield deboost operation. * Returns the yielding task on success, NULL on validation failure. * - * Checks: feature enabled, valid target, same runqueue, target is fair cl= ass, - * both on_rq. Called under rq->lock. + * Checks: valid target, same runqueue, target is fair class, + * both on_rq, rate limiting. Called under rq->lock. * * Note: p_yielding (rq->donor) is guaranteed to be fair class by the call= er * (yield_to_task_fair is only called when curr->sched_class =3D=3D p->sch= ed_class). + * Note: sysctl_sched_vcpu_debooster_enabled is checked by caller before + * update_rq_clock() to avoid unnecessary clock updates. */ static struct task_struct __maybe_unused * yield_deboost_validate_tasks(struct rq *rq, struct task_struct *p_target) { struct task_struct *p_yielding; =20 - if (!sysctl_sched_vcpu_debooster_enabled) - return NULL; - - if (!p_target) - return NULL; - if (yield_deboost_rate_limit(rq)) return NULL; =20 @@ -9287,6 +9283,57 @@ yield_deboost_apply_penalty(struct sched_entity *se_= y_lca, se_y_lca->deadline =3D new_vruntime + calc_delta_fair(se_y_lca->slice, se= _y_lca); } =20 +/* + * yield_to_deboost - Apply vruntime penalty to favor the target task + * @rq: runqueue containing both tasks (rq->lock must be held) + * @p_target: task to favor in scheduling + * + * Cooperates with yield_to_task_fair(): set_next_buddy() provides immedia= te + * preference; this routine applies a bounded vruntime penalty at the cgro= up + * LCA so the target maintains scheduling advantage beyond the buddy effec= t. + * + * Only operates on tasks resident on the same rq. Penalty is bounded by + * granularity and queue-size caps to prevent starvation. + */ +static void yield_to_deboost(struct rq *rq, struct task_struct *p_target) +{ + struct task_struct *p_yielding; + struct sched_entity *se_y, *se_t, *se_y_lca, *se_t_lca; + struct cfs_rq *cfs_rq_common; + u64 penalty; + + /* Quick validation before updating clock */ + if (!sysctl_sched_vcpu_debooster_enabled) + return; + + if (!p_target) + return; + + /* Update clock - rate limiting and debounce use rq_clock() */ + update_rq_clock(rq); + + /* Full validation including rate limiting */ + p_yielding =3D yield_deboost_validate_tasks(rq, p_target); + if (!p_yielding) + return; + + se_y =3D &p_yielding->se; + se_t =3D &p_target->se; + + /* Find LCA in cgroup hierarchy */ + if (!yield_deboost_find_lca(se_y, se_t, &se_y_lca, &se_t_lca, &cfs_rq_com= mon)) + return; + + /* Update current accounting before modifying vruntime */ + if (se_y_lca !=3D cfs_rq_common->curr) + update_curr(cfs_rq_common); + + /* Calculate and apply penalty */ + penalty =3D yield_deboost_calculate_penalty(rq, se_y_lca, se_t_lca, + p_target, cfs_rq_common->h_nr_queued); + yield_deboost_apply_penalty(se_y_lca, cfs_rq_common, penalty); +} + /* * sched_yield() is very simple */ @@ -9341,6 +9388,10 @@ static bool yield_to_task_fair(struct rq *rq, struct= task_struct *p) /* Tell the scheduler that we'd really like se to run next. */ set_next_buddy(se); =20 + /* Apply deboost BEFORE forfeit to preserve fairness gap calculation */ + yield_to_deboost(rq, p); + + /* Complete the standard yield path (includes forfeit in v6.19+) */ yield_task_fair(rq); =20 return true; --=20 2.43.0 From nobody Sun Feb 8 00:13:59 2026 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BE4129B200 for ; Fri, 19 Dec 2025 03:54:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116446; cv=none; b=JRBt8zy+4uv7boCVBxnc+S/nmisaBbEMYnzrYdOAuWHYwh72E/VDlVPOr3YwvqUyYWOkQ51hB/cvh6tGMauP3Hq40haGld17VA+SuO4x7nEJNFpDtihWzgyLVzXugpDCYj8UstTIJVJ31UeHPS0CreO2Q6tWs1DGVXqad+M4MwA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116446; c=relaxed/simple; bh=sVeNQkwu85XNwFxKvxwdIm3b68K3U61I7sq3MYJ3BoY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Q/ejclt6Aj0y81eoTA3/HGMgISBD+aPnbBSw8b5lDFuFmybLKk1h6zUn1mQPHtMavpfnzX90kr2ZleW7TiCUq7ZEWLVWiM5aDHYY5mNt39ua0vwmyOntbzjt/X3TjQ/zbOYsRST8Mbs31q0beDIVIqzHxv/FhXP4d2ki66AMZTo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RjyKucZ5; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RjyKucZ5" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2a110548cdeso18318805ad.0 for ; Thu, 18 Dec 2025 19:54:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766116443; x=1766721243; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=A9JN/VeY7DZSzpzlV6Lpconwg2zYhw/ALk1DMLwxvu4=; b=RjyKucZ5WKB4ZEzUrPPRqiKEPveNjGedCwU+SeB03B716VLe3vzV8SWMu/aGugDARV PzAfgmqr+7I6bVDMKyive32TvS34hUoLIuH72MdacS3qvIk+blg2PYXggWtASLckLUIz yNRMyUoshMeAJQvUmqcChJdts+vxBZfgScCIdJ7a083lWd7rY7goMB2QlqDode35CcC/ SE4/4fn7uz9MN8KVxnm78PkaSZbt/za2ao26fZBhBkyhLhTdsGlBAaC0SHNGietll6F0 ZzlIVUpGXU6HG+AQNQRngO6h7v+A2S2lJfIEMg2j8voe3fxVqJbDt+BOqn8ONeCm94Da gizg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766116443; x=1766721243; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=A9JN/VeY7DZSzpzlV6Lpconwg2zYhw/ALk1DMLwxvu4=; b=DBV/HFjOH36PeZMevC/iauVwMiC+KQkE4Wdu6g1PPFCQrxRNrzGGTavUCeP3t147OW oscoCu++OA7cndYYZGSAvcR0pROX/fejLgHT1+jnL61bKH4EjvymWUdG6ZE1E1zswiPm 6aBhaA1EqNmKUvrmq2u1UeoO8VHdIXvEOs/V0VcTPY2/ZuoYz7mrokmBlR7uanUy0rSr h8fDz16LfEqHJ2u8fR2wZHKqbCsGDjLKIO90zi1vPevbkyGnE+cipXcurKI06egpVvcI hzB8e1Hz/O1djMvAlFc29fWDXeT+J3TARAdsgzY8h4ISmtM0PA7ii4y4/qiIb7656Kc6 k9zA== X-Forwarded-Encrypted: i=1; AJvYcCVnnijgilaipAzb/sDybEB1aoxa1knYWFlEFIkmhgOAtQ6WR/hJk5mo5t77lVy3uIEGZlZx0akoP0K84u8=@vger.kernel.org X-Gm-Message-State: AOJu0YypFo2BG+JiUEhcT3pXchrTH/soz04LGA+h2RtCbO0cShH+hJSJ atSxmSHzrM79UxbAItC5KO7pZywMQao5uwqbVUd2RwKFyUGh0LFwHayn X-Gm-Gg: AY/fxX7CpFWSJgj/M+hi1YGJREucrZ/9ur8+zYMBf4KyKng0OMC27MFA5lqgILJ/shf gd+jx8RhiePWX7CzdRhaP3QNLgFHeVCXcCbFSWIw/OJf3PINFgQOfynfZz/Z/7VvVLmmsoWHRxQ E4N5VSAYiL4DuoknnkcPkPmSyqp+lIVc1N1ESu4OHEcu0SMhRY96irmY2TCx4woFAE1SCGdRBe1 hVzAta7GKAponeGZiYm86fu9ju7glcisF/cGX2uYk47XEPGCe9ATdJ4ZkJ56EUEQsG8YQytGTNS 3NxaKTWSgX+YtJ7tMos+MX4FESI0WrRHQjpbnOh258RQmZsuOGjGh9eiBvLQFJ2M7UxN4jkBLMK koSbCPfYhlJ7PGLwb6Okx3+AqbMpM/pwO8Q9VNuecZVJ7x8wMNLYBEUtoSlgzcpygaYkBR/61ik Q5FrEq4OXQI2IofKGFDoNb X-Google-Smtp-Source: AGHT+IF9f0bhXfHfqC6drh9F1fDq7qKUfvR+6r5Ke7n7dMoyvnUf3088wxma2hCwi2MheKEe1yqRDQ== X-Received: by 2002:a17:902:c94f:b0:29f:2b9:6cca with SMTP id d9443c01a7336-2a2f293d118mr15012375ad.44.1766116442731; Thu, 18 Dec 2025 19:54:02 -0800 (PST) Received: from wanpengli.. ([175.170.92.22]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2a2f3d4d36esm7368135ad.63.2025.12.18.19.53.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Dec 2025 19:54:02 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: K Prateek Nayak , Christian Borntraeger , Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH v2 6/9] KVM: x86: Add IPI tracking infrastructure Date: Fri, 19 Dec 2025 11:53:30 +0800 Message-ID: <20251219035334.39790-7-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251219035334.39790-1-kernellwp@gmail.com> References: <20251219035334.39790-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li Add foundational infrastructure for tracking IPI sender/receiver relationships to improve directed yield candidate selection. Introduce per-vCPU ipi_context structure containing: - last_ipi_receiver: vCPU index that received the last IPI from this vCPU - last_ipi_time_ns: timestamp of the last IPI send - ipi_pending: flag indicating an unacknowledged IPI - last_ipi_sender: vCPU index that sent an IPI to this vCPU - last_ipi_recv_time_ns: timestamp when IPI was received Add module parameters for runtime control: - ipi_tracking_enabled (default: true): master switch for IPI tracking - ipi_window_ns (default: 50ms): recency window for IPI validity Implement helper functions: - kvm_ipi_tracking_enabled(): check if tracking is active - kvm_vcpu_is_ipi_receiver(): determine if a vCPU is a recent IPI target The infrastructure is inert until integrated with interrupt delivery in subsequent patches. v1 -> v2: - Improve documentation for module parameters explaining the 50ms window rationale - Add kvm_vcpu_is_ipi_receiver() declaration to x86.h header - Add weak function annotation comment in kvm_host.h Signed-off-by: Wanpeng Li --- arch/x86/include/asm/kvm_host.h | 12 ++++++ arch/x86/kvm/lapic.c | 76 +++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 3 ++ arch/x86/kvm/x86.h | 8 ++++ include/linux/kvm_host.h | 3 ++ virt/kvm/kvm_main.c | 6 +++ 6 files changed, 108 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 5a3bfa293e8b..2464c310f0a2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1052,6 +1052,18 @@ struct kvm_vcpu_arch { int pending_external_vector; int highest_stale_pending_ioapic_eoi; =20 + /* + * IPI tracking for directed yield optimization. + * Records sender/receiver relationships when IPIs are delivered + * to enable IPI-aware vCPU scheduling decisions. + */ + struct { + int last_ipi_sender; /* vCPU index of last IPI sender */ + int last_ipi_receiver; /* vCPU index of last IPI receiver */ + bool pending_ipi; /* Awaiting IPI response */ + u64 ipi_time_ns; /* Timestamp when IPI was sent */ + } ipi_context; + /* be preempted when it's in kernel-mode(cpl=3D0) */ bool preempted_in_kernel; =20 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 1597dd0b0cc6..23f247a3b127 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -75,6 +75,19 @@ module_param(lapic_timer_advance, bool, 0444); /* step-by-step approximation to mitigate fluctuation */ #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 =20 +/* + * IPI tracking for directed yield optimization. + * - ipi_tracking_enabled: global toggle (default on) + * - ipi_window_ns: recency window for IPI validity (default 50ms) + * The 50ms window is chosen to be long enough to capture IPI response + * patterns while short enough to avoid stale information affecting + * scheduling decisions in throughput-sensitive workloads. + */ +static bool ipi_tracking_enabled =3D true; +static unsigned long ipi_window_ns =3D 50 * NSEC_PER_MSEC; +module_param(ipi_tracking_enabled, bool, 0644); +module_param(ipi_window_ns, ulong, 0644); + static bool __read_mostly vector_hashing_enabled =3D true; module_param_named(vector_hashing, vector_hashing_enabled, bool, 0444); =20 @@ -1113,6 +1126,69 @@ static int kvm_apic_compare_prio(struct kvm_vcpu *vc= pu1, struct kvm_vcpu *vcpu2) return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio; } =20 +/* + * Track IPI communication for directed yield optimization. + * Records sender/receiver relationship when a unicast IPI is delivered. + * Only tracks when a unique receiver exists; ignores self-IPI. + */ +void kvm_track_ipi_communication(struct kvm_vcpu *sender, struct kvm_vcpu = *receiver) +{ + if (!sender || !receiver || sender =3D=3D receiver) + return; + if (unlikely(!READ_ONCE(ipi_tracking_enabled))) + return; + + WRITE_ONCE(sender->arch.ipi_context.last_ipi_receiver, receiver->vcpu_idx= ); + WRITE_ONCE(sender->arch.ipi_context.pending_ipi, true); + WRITE_ONCE(sender->arch.ipi_context.ipi_time_ns, ktime_get_mono_fast_ns()= ); + + WRITE_ONCE(receiver->arch.ipi_context.last_ipi_sender, sender->vcpu_idx); +} + +/* + * Check if 'receiver' is the recent IPI target of 'sender'. + * + * Rationale: + * - Use a short window to avoid stale IPI inflating boost priority + * on throughput-sensitive workloads. + */ +bool kvm_vcpu_is_ipi_receiver(struct kvm_vcpu *sender, struct kvm_vcpu *re= ceiver) +{ + u64 then, now; + + if (unlikely(!READ_ONCE(ipi_tracking_enabled))) + return false; + + then =3D READ_ONCE(sender->arch.ipi_context.ipi_time_ns); + now =3D ktime_get_mono_fast_ns(); + if (READ_ONCE(sender->arch.ipi_context.pending_ipi) && + READ_ONCE(sender->arch.ipi_context.last_ipi_receiver) =3D=3D + receiver->vcpu_idx && + now - then <=3D ipi_window_ns) + return true; + + return false; +} + +/* + * Clear IPI context for a vCPU (e.g., on EOI or reset). + */ +void kvm_vcpu_clear_ipi_context(struct kvm_vcpu *vcpu) +{ + WRITE_ONCE(vcpu->arch.ipi_context.pending_ipi, false); + WRITE_ONCE(vcpu->arch.ipi_context.last_ipi_sender, -1); + WRITE_ONCE(vcpu->arch.ipi_context.last_ipi_receiver, -1); +} + +/* + * Reset IPI context completely (e.g., on vCPU creation/destruction). + */ +void kvm_vcpu_reset_ipi_context(struct kvm_vcpu *vcpu) +{ + kvm_vcpu_clear_ipi_context(vcpu); + WRITE_ONCE(vcpu->arch.ipi_context.ipi_time_ns, 0); +} + /* Return true if the interrupt can be handled by using *bitmap as index m= ask * for valid destinations in *dst array. * Return false if kvm_apic_map_get_dest_lapic did nothing useful. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0c6d899d53dd..d4c401ef04ca 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12728,6 +12728,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) goto free_guest_fpu; =20 kvm_xen_init_vcpu(vcpu); + kvm_vcpu_reset_ipi_context(vcpu); vcpu_load(vcpu); kvm_vcpu_after_set_cpuid(vcpu); kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz); @@ -12795,6 +12796,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) kvm_mmu_destroy(vcpu); srcu_read_unlock(&vcpu->kvm->srcu, idx); free_page((unsigned long)vcpu->arch.pio_data); + kvm_vcpu_reset_ipi_context(vcpu); kvfree(vcpu->arch.cpuid_entries); } =20 @@ -12871,6 +12873,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool ini= t_event) kvm_leave_nested(vcpu); =20 kvm_lapic_reset(vcpu, init_event); + kvm_vcpu_clear_ipi_context(vcpu); =20 WARN_ON_ONCE(is_guest_mode(vcpu) || is_smm(vcpu)); vcpu->arch.hflags =3D 0; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index fdab0ad49098..cfc24fb207e0 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -466,6 +466,14 @@ fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *= vcpu, u32 msr, int reg); fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu); fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu); =20 +/* IPI tracking helpers for directed yield */ +void kvm_track_ipi_communication(struct kvm_vcpu *sender, + struct kvm_vcpu *receiver); +bool kvm_vcpu_is_ipi_receiver(struct kvm_vcpu *sender, + struct kvm_vcpu *receiver); +void kvm_vcpu_clear_ipi_context(struct kvm_vcpu *vcpu); +void kvm_vcpu_reset_ipi_context(struct kvm_vcpu *vcpu); + extern struct kvm_caps kvm_caps; extern struct kvm_host_values kvm_host; =20 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d93f75b05ae2..f42315d341b3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1535,6 +1535,9 @@ static inline void kvm_vcpu_kick(struct kvm_vcpu *vcp= u) int kvm_vcpu_yield_to(struct kvm_vcpu *target); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool yield_to_kernel_mode); =20 +/* Weak function, overridden by arch/x86/kvm for IPI-aware directed yield = */ +bool kvm_vcpu_is_ipi_receiver(struct kvm_vcpu *sender, struct kvm_vcpu *re= ceiver); + void kvm_flush_remote_tlbs(struct kvm *kvm); void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 nr_pages); void kvm_flush_remote_tlbs_memslot(struct kvm *kvm, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5fcd401a5897..ff771a872c6d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3964,6 +3964,12 @@ bool __weak kvm_arch_dy_has_pending_interrupt(struct= kvm_vcpu *vcpu) return false; } =20 +bool __weak kvm_vcpu_is_ipi_receiver(struct kvm_vcpu *sender, + struct kvm_vcpu *receiver) +{ + return false; +} + void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) { int nr_vcpus, start, i, idx, yielded; --=20 2.43.0 From nobody Sun Feb 8 00:13:59 2026 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9585E27A10F for ; Fri, 19 Dec 2025 03:54:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116450; cv=none; b=pd1rZqmWeDHhyY3XbHYVp/S5wXi30j1jiecYi9l459kgI36KTiwsH2s2/anLYqJcJxbchGXPGRdS/mdok2UKmx2BscCVtycFMmHcrwNE/8m3VJypnnzwmMvelkQg1P3qIobQXvm6aACmMGA/RpXMUg0GhJXBj5vxa/wSUL9PIgc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116450; c=relaxed/simple; bh=wyu0Xy5M14HONUNUhNjxjrYaG5XoUEX/b1Lfkrm3zVM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uK+tLWa1seFbF0AveRUMQGJZZXFJ6/KuNWsFI3zMqu/LwLdwcuX1PbcKc1w4Yoe3iVmzBpgcTTPcLtfOcZKz9Q+cg/SmqeOCTtJVtFiRHffq/RuFqYfNxwMTXRuS04ee2oHwyPg3owRw8FGVIPAxQjykuNowVzIMfb+JUPE4qA0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UP38xo9u; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UP38xo9u" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-2a081c163b0so12811255ad.0 for ; Thu, 18 Dec 2025 19:54:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766116446; x=1766721246; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XZA9xLUe1wdJNehNCsIwk1F3wttPASSXZeP33LjNfbM=; b=UP38xo9uPmHzhH2RsU+4nrQ1tO6TWTLRhvuVpg8MdyvB6Ee7LPF6NQ3sCW4pJvRXlt BozeuPmvPyYd+FTf5WNWXJSknOvYMKBiMlxN7V1JiwZf3baXgvSGkhrqFlM0SmnP0oVa gOfzsybf0hKWLhGFRTiJ/+ff8ie0nZbjvAbdcfwn+ObyhAPJMAEOLZt2+jyf0vP6X8Ej XmIwzZozLgk04OZyVvyl8+1IMg4djXqk94LosbdwE5eorSvFtToqXStBRuEU+xD9NRM+ hb7MD8xFiFJv9tVH4p8f2u0Kgny+SF+1X1z5XHdJo2Khl/jEp57xj2nOHy9n8VGcwH68 rhzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766116446; x=1766721246; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=XZA9xLUe1wdJNehNCsIwk1F3wttPASSXZeP33LjNfbM=; b=dZsvnZSrvythAwBXkXnfrb8JR+9hjd8U2UNpcMc5zAU34kQuyHYRs5UVahtFdkRT55 P6M026Y1/l+TNBB8tzzPRnnCoM+xWpNGGyLYBAeCEM2QfoC1rF7QeveZZxDy6PLbY5wj Fc/0s0IIS+Vj+O9YXjwmI24+ttiXQRe65CPumvU2/vZdqUFR1oxa7hDEq6d0K59QwCm3 XEGnWrxpkWHjaRW99Tp9/tOqL90UV7VjaspvMVAhNmpEYY+mh74306UsPmZ8AQpCZiY8 PrSov2H+uY/IX9ByQiAABPkZZ4G7+NVjXgG6aEPt6Vmd2paOtvJzDZN0kdQeKWpulBOr kY5A== X-Forwarded-Encrypted: i=1; AJvYcCXiNdO9KtSAg+CXjw69DNZCaBZChmfUuhX8ucpWGh6R043mahQMWl+Vt0WMqHEGi+FLFL3nmqJEY6rlYkk=@vger.kernel.org X-Gm-Message-State: AOJu0YwzffYXLRJUVekKBP3zTtv4hj2LWwIou3cn04VQNALDZI36vQ/b eGScP4A04eEqZ9XF/gm4cAjAi8LErEOqBhrilEFMh6kuMQ8QXWrvRCzG X-Gm-Gg: AY/fxX7mz3XGlMSUEN7NE53KRhBDB7vTKJ0b/soLr0Vok2gUJpntx1gJZqe6WuB2yko qzajTqewUPC7nCRungqQnV7PTlp25xcqutFX49Bi6/VoQ+wz8onLDsprdo0Mtw2QMM6Y5mH9WpS IWDXY220Ii99Q/TEAoc1U/KF+VFEtFgKTaasEltl/VUDEkQCoxfgKbBra7/VLqkvoKV69ClOX/e 46Gh87eu1SSto9HdZuImdvyCIyYK7aFw3Lk8XaRT0Y++Fb911J6Zqi881KA04+FK8KnJBOICsbf /yI10SceiWVAXuk2wKT/0yo8nCPik8aiH9wUmS3/B0hpa5nIrLUHZN3Axw49xvW07WXTs/VqV1g RIjM89jbDYOy68TtxigWZk8VOlO3w6kJQrLNret0iOoEqy7gjzT4AYPwCPKSOu4JbPdfAJ/0rhN MR9ntnp4HGyA== X-Google-Smtp-Source: AGHT+IH7i1wMUpIAvo5+XIESo2FqbxEFz6LLhGsiwcOV+fLp/Up+b2cbDa//nLVFrcxtfnqkDIXK1g== X-Received: by 2002:a17:902:d488:b0:2a0:8f6f:1a12 with SMTP id d9443c01a7336-2a2f222ac10mr15671525ad.17.1766116446401; Thu, 18 Dec 2025 19:54:06 -0800 (PST) Received: from wanpengli.. ([175.170.92.22]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2a2f3d4d36esm7368135ad.63.2025.12.18.19.54.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Dec 2025 19:54:06 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: K Prateek Nayak , Christian Borntraeger , Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH v2 7/9] KVM: x86/lapic: Integrate IPI tracking with interrupt delivery Date: Fri, 19 Dec 2025 11:53:31 +0800 Message-ID: <20251219035334.39790-8-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251219035334.39790-1-kernellwp@gmail.com> References: <20251219035334.39790-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li Hook IPI tracking into the LAPIC interrupt delivery path to capture sender/receiver relationships for directed yield optimization. Implement kvm_ipi_track_send() called from kvm_irq_delivery_to_apic() when a unicast fixed IPI is detected (exactly one destination). Record sender vCPU index, receiver vCPU index, and timestamp using lockless WRITE_ONCE for minimal overhead. Implement kvm_ipi_track_eoi() called from kvm_apic_set_eoi_accelerated() and handle_apic_eoi() to clear IPI context when interrupts are acknowledged. Use two-stage clearing: 1. Unconditionally clear the receiver's context (it processed the IPI) 2. Conditionally clear sender's pending flag only when the sender exists, last_ipi_receiver matches, and the IPI is recent Use lockless accessors for minimal overhead. The tracking only activates for unicast fixed IPIs where directed yield provides value. Signed-off-by: Wanpeng Li --- arch/x86/kvm/lapic.c | 90 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 86 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 23f247a3b127..d4fb6f49390b 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1270,6 +1270,9 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, s= truct kvm_lapic *src, struct kvm_lapic **dst =3D NULL; int i; bool ret; + int targets =3D 0; + int delivered; + struct kvm_vcpu *unique =3D NULL; =20 *r =3D -1; =20 @@ -1291,8 +1294,22 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, = struct kvm_lapic *src, for_each_set_bit(i, &bitmap, 16) { if (!dst[i]) continue; - *r +=3D kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map); + delivered =3D kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map); + *r +=3D delivered; + if (delivered > 0) { + targets++; + unique =3D dst[i]->vcpu; + } } + + /* + * Track IPI for directed yield: only for LAPIC-originated + * APIC_DM_FIXED without shorthand, with exactly one recipient. + */ + if (src && irq->delivery_mode =3D=3D APIC_DM_FIXED && + irq->shorthand =3D=3D APIC_DEST_NOSHORT && + targets =3D=3D 1 && unique && unique !=3D src->vcpu) + kvm_track_ipi_communication(src->vcpu, unique); } =20 rcu_read_unlock(); @@ -1377,6 +1394,9 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct = kvm_lapic *src, struct kvm_vcpu *vcpu, *lowest =3D NULL; unsigned long i, dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)]; unsigned int dest_vcpus =3D 0; + int targets =3D 0; + int delivered; + struct kvm_vcpu *unique =3D NULL; =20 if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map)) return r; @@ -1400,7 +1420,12 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct= kvm_lapic *src, if (!kvm_lowest_prio_delivery(irq)) { if (r < 0) r =3D 0; - r +=3D kvm_apic_set_irq(vcpu, irq, dest_map); + delivered =3D kvm_apic_set_irq(vcpu, irq, dest_map); + r +=3D delivered; + if (delivered > 0) { + targets++; + unique =3D vcpu; + } } else if (kvm_apic_sw_enabled(vcpu->arch.apic)) { if (!vector_hashing_enabled) { if (!lowest) @@ -1421,8 +1446,23 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct= kvm_lapic *src, lowest =3D kvm_get_vcpu(kvm, idx); } =20 - if (lowest) - r =3D kvm_apic_set_irq(lowest, irq, dest_map); + if (lowest) { + delivered =3D kvm_apic_set_irq(lowest, irq, dest_map); + r =3D delivered; + if (delivered > 0) { + targets =3D 1; + unique =3D lowest; + } + } + + /* + * Track IPI for directed yield: only for LAPIC-originated + * APIC_DM_FIXED without shorthand, with exactly one recipient. + */ + if (src && irq->delivery_mode =3D=3D APIC_DM_FIXED && + irq->shorthand =3D=3D APIC_DEST_NOSHORT && + targets =3D=3D 1 && unique && unique !=3D src->vcpu) + kvm_track_ipi_communication(src->vcpu, unique); =20 return r; } @@ -1608,6 +1648,45 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *ap= ic, int vector) #endif } =20 +/* + * Clear IPI context on EOI to prevent stale boost decisions. + * + * Two-stage cleanup: + * 1. Always clear receiver's IPI context (it processed the interrupt) + * 2. Conditionally clear sender's pending flag only when: + * - Sender vCPU exists and is valid + * - Sender's last_ipi_receiver matches this receiver + * - IPI was sent recently (within window) + */ +static void kvm_clear_ipi_on_eoi(struct kvm_lapic *apic) +{ + struct kvm_vcpu *receiver =3D apic->vcpu; + int sender_idx; + u64 then, now; + + if (unlikely(!READ_ONCE(ipi_tracking_enabled))) + return; + + sender_idx =3D READ_ONCE(receiver->arch.ipi_context.last_ipi_sender); + + /* Step 1: Always clear receiver's IPI context */ + kvm_vcpu_clear_ipi_context(receiver); + + /* Step 2: Conditionally clear sender's pending flag */ + if (sender_idx >=3D 0) { + struct kvm_vcpu *sender =3D kvm_get_vcpu(receiver->kvm, sender_idx); + + if (sender && + READ_ONCE(sender->arch.ipi_context.last_ipi_receiver) =3D=3D + receiver->vcpu_idx) { + then =3D READ_ONCE(sender->arch.ipi_context.ipi_time_ns); + now =3D ktime_get_mono_fast_ns(); + if (now - then <=3D ipi_window_ns) + WRITE_ONCE(sender->arch.ipi_context.pending_ipi, false); + } + } +} + static int apic_set_eoi(struct kvm_lapic *apic) { int vector =3D apic_find_highest_isr(apic); @@ -1643,6 +1722,7 @@ void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vc= pu, int vector) trace_kvm_eoi(apic, vector); =20 kvm_ioapic_send_eoi(apic, vector); + kvm_clear_ipi_on_eoi(apic); kvm_make_request(KVM_REQ_EVENT, apic->vcpu); } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_set_eoi_accelerated); @@ -2453,6 +2533,8 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic= , u32 reg, u32 val) =20 case APIC_EOI: apic_set_eoi(apic); + /* Precise cleanup for IPI-aware boost */ + kvm_clear_ipi_on_eoi(apic); break; =20 case APIC_LDR: --=20 2.43.0 From nobody Sun Feb 8 00:13:59 2026 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC9F22D0618 for ; Fri, 19 Dec 2025 03:54:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116452; cv=none; b=IZYjogkIqDV3bPdNKKOyiQk9ohtsAm/BGSHz0UAVssHv9JBkND5nXGjbyeNrA1SbPJy+1uXEF5i9KWvUa8kWFltCfk2cJy7ViBzIPyUTlGlrR6hMMnxk9k2dG/l3Ey1laiEmNMMc1x6SkOA2tuwBOUncXdYFnB29vw0M1mo1ZLE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116452; c=relaxed/simple; bh=Vm1yvhrALThablxTgf2Bk66kAYF/A5K0Ayh/8yyyxiU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pmtH9woMa7SXVFAUbUAZjQ0Rqm3z1Etc982zW6QyyAjHrWJgeYEDYgzhtV7+TR35G3BL8CBpV5zRDbvSs+Q+wmkDHIKsnWj8KjC9eNMfCfNCGCHT1tFEiSu8DACpS8BBlZ0Ou+Tj855fpObPQGyd6Nl1jS1Wcska1tkuL+OjtHQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VrHAQ4/q; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VrHAQ4/q" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-2a0b4320665so20100425ad.1 for ; Thu, 18 Dec 2025 19:54:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766116450; x=1766721250; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=amuilco9pGT1STDJq/nTPkMUszn8Ukw2dbEwnl20qH0=; b=VrHAQ4/q6It+YU9HqJXDP8r7MkeSI8kDEfm6mTMPLKdJqRUeooqE46oBttXtqHYVD5 M+Uob9M4u9DKBTeeXz6aLRXjGJa9byJGBh01jgIK2eaS1jMKOaMFdof3hSlp8kXpLPZ+ qkXhmF5wXjdrPuhnesG2InHXdeKqRG6ClKDWZ5zV5Iqb1B+RDZdZDampoDmqUN6QVHlc JCUA9EDVXPisMNEz3827zz3+EBeYck3qOkn+HKiZyAaF0G7/OutzTHXyDFCv761/knGU hYZzvrl3BvKc6QrisnxFtDFfr2Bh4NrzXxWUTBrn8eRNS0DH4dXSI0LAqW2bOk03XS2t 27pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766116450; x=1766721250; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=amuilco9pGT1STDJq/nTPkMUszn8Ukw2dbEwnl20qH0=; b=MuYJnOslPucihDyylzbWrxAp309CLm1lhCSEcTPar8/1e9woyeN4Kfz0bJ33mqQLo3 MRfFALnVON06TpgN1LKEt0LWy0yYtDUJ7qmR8xCI6hLgrhuFfV990iXqLqdFgDMoasq7 PMlhjVGthK+gpflhgJqDTFaANUik81Foh6Ykm++L3SQHvZvcrw7KIYyuMaNVVJrmVHtF rewxM6+ufrw1zdK+t4uV5fiKnJT4K3B/ZxJWRUztH6lBL/pQkvla9qjCxBmGnEzom2ok VGUAL6lQGnMa/c6y8IOYfVNxyX2yos+8ynla3eIA00Ch3gjPTetS0GdLcsjEc0Y44Vsc JHqw== X-Forwarded-Encrypted: i=1; AJvYcCUaSDEy/MZbOPPs90Y67L0RPQn8mmzJAUvUHvM2AU0c/1LtSEOzk3siq2wLsxKsM0f7KCmFz+Ozwfk3TpQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yx+PxABEIqCjeBY0ecA1ZT45uywqEpVF9w+DSijMh0YsLFOhAZ2 eizDF5gcqwv1cw5euczpxQPqS6afBOYdoQ7AXUqWkrS6GQ9kzS4kjwBN X-Gm-Gg: AY/fxX7ojon40nQFBL6F4VPlgR89+/tdAs9EZGjllLNQZEBvob7EzM3guWaf7TsA3qw Fyssq5ugAjIzuUZEIj3tdnp6wQTXW89iJ16rteuGo9sa81mdRhqXpi9Db3Js/qnYVbw8eUsQw7T DANSuBRr+d+lkOwpk1XDbxindQkUSxf7NyFFD3Qa+zbNh5fkfOfB7K2umxszXiZOnwBTBhrxH6K dOz/Oin2FnkFM1sYXY5iDigHm3o/Fep1Q4rV543ivBzMjdD7l8noP3VjU3ANYcnPqWdipyag5Nw TFx8CqSaQSf3tdOT8wcvMno7sBtge13RzG8dQvnnh53MwoMkeYYHmmhQyRCzizbANKJuLL9/TWn ROBFOU1tlbcshavvY3bwzbudrxruit1krgveyeD3MzkuDBFekcpYnUAedqmYlYvhLeVACgVYzD0 afyDA05E70sQ== X-Google-Smtp-Source: AGHT+IGU7h1ZB61bV1O9bR0i7f90SpjRR6d9sACghaf5ssfNNxyJmctob1L6dzPqjLfMJm/y/IKHBA== X-Received: by 2002:a17:902:db12:b0:2a0:9040:6377 with SMTP id d9443c01a7336-2a2f242aaddmr14800655ad.18.1766116449999; Thu, 18 Dec 2025 19:54:09 -0800 (PST) Received: from wanpengli.. ([175.170.92.22]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2a2f3d4d36esm7368135ad.63.2025.12.18.19.54.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Dec 2025 19:54:09 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: K Prateek Nayak , Christian Borntraeger , Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH v2 8/9] KVM: Implement IPI-aware directed yield candidate selection Date: Fri, 19 Dec 2025 11:53:32 +0800 Message-ID: <20251219035334.39790-9-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251219035334.39790-1-kernellwp@gmail.com> References: <20251219035334.39790-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li Integrate IPI tracking with directed yield to improve scheduling when vCPUs spin waiting for IPI responses. Implement priority-based candidate selection in kvm_vcpu_on_spin() with three tiers: Priority 1: Use kvm_vcpu_is_ipi_receiver() to identify confirmed IPI targets within the recency window, addressing lock holders spinning on IPI acknowledgment. Priority 2: Leverage existing kvm_arch_dy_has_pending_interrupt() for compatibility with arch-specific fast paths. Priority 3: Fall back to conventional preemption-based logic when yield_to_kernel_mode is requested, providing a safety net for non-IPI scenarios. Add kvm_vcpu_is_good_yield_candidate() helper to consolidate these checks, preventing over-aggressive boosting while enabling targeted optimization when IPI patterns are detected. Performance testing (16 pCPUs host, 16 vCPUs/VM): Dedup (simlarge): 2 VMs: +47.1% throughput 3 VMs: +28.1% throughput 4 VMs: +1.7% throughput VIPS (simlarge): 2 VMs: +26.2% throughput 3 VMs: +12.7% throughput 4 VMs: +6.0% throughput Gains stem from effective directed yield when vCPUs spin on IPI delivery, reducing synchronization overhead. The improvement is most pronounced at moderate overcommit (2-3 VMs) where contention reduction outweighs context switching cost. Signed-off-by: Wanpeng Li --- virt/kvm/kvm_main.c | 46 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 37 insertions(+), 9 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ff771a872c6d..45ede950314b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3970,6 +3970,41 @@ bool __weak kvm_vcpu_is_ipi_receiver(struct kvm_vcpu= *sender, return false; } =20 +/* + * IPI-aware candidate selection for directed yield. + * + * Priority order: + * 1) Confirmed IPI receiver of 'me' within recency window (always boost) + * 2) Arch-provided fast pending interrupt hint (user-mode boost) + * 3) Kernel-mode yield: preempted-in-kernel vCPU (traditional boost) + * 4) Otherwise, be conservative and skip + */ +static bool kvm_vcpu_is_good_yield_candidate(struct kvm_vcpu *me, + struct kvm_vcpu *vcpu, + bool yield_to_kernel_mode) +{ + /* Priority 1: recently targeted IPI receiver */ + if (kvm_vcpu_is_ipi_receiver(me, vcpu)) + return true; + + /* Priority 2: fast pending-interrupt hint (arch-specific) */ + if (kvm_arch_dy_has_pending_interrupt(vcpu)) + return true; + + /* + * Minimal preempted gate for remaining cases: + * Require that the target has been preempted, and if yielding to + * kernel mode, additionally require preempted-in-kernel. + */ + if (!READ_ONCE(vcpu->preempted)) + return false; + + if (yield_to_kernel_mode && !kvm_arch_vcpu_preempted_in_kernel(vcpu)) + return false; + + return true; +} + void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) { int nr_vcpus, start, i, idx, yielded; @@ -4017,15 +4052,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yiel= d_to_kernel_mode) if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu)) continue; =20 - /* - * Treat the target vCPU as being in-kernel if it has a pending - * interrupt, as the vCPU trying to yield may be spinning - * waiting on IPI delivery, i.e. the target vCPU is in-kernel - * for the purposes of directed yield. - */ - if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode && - !kvm_arch_dy_has_pending_interrupt(vcpu) && - !kvm_arch_vcpu_preempted_in_kernel(vcpu)) + /* IPI-aware candidate selection */ + if (!kvm_vcpu_is_good_yield_candidate(me, vcpu, yield_to_kernel_mode)) continue; =20 if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) --=20 2.43.0 From nobody Sun Feb 8 00:13:59 2026 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A18129B8DB for ; Fri, 19 Dec 2025 03:54:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.193 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116458; cv=none; b=cYwD4QzZDIqxkNqhGnwn11VP77ku2A61ab7C3uTNJyqpAVbn+MFgK9qPsWdDPRMy5wlpq81NpRFdABEfeJMOtmMNySqbvame4fRBuHgiUK61687kEdFm2Kz718uFYt6gwFo6ApIrozEnBVWlmd3RIr6np6LViO2SxJziq9MOTow= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116458; c=relaxed/simple; bh=5W48IL3bdW1PhdyVuywXEKn2tfcpwop424aGol93d3U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=U7J2Ud4wJFao0iPUqFzEr7ywzaVvAhq3pD00/kJC2ntXtonLcX7SDkwWSjZmjZv+P6buVABfWkb4mHrCnnc9JiXCWJTWoqLTxiyB8dSYnPYAV8CyjYP8jYGFfJbbrzMjkrvrRYf40hbXGFzaivfgV9CqsSvNTEGSG77pQiemgXc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=G91n7riW; arc=none smtp.client-ip=209.85.210.193 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="G91n7riW" Received: by mail-pf1-f193.google.com with SMTP id d2e1a72fcca58-7aa9be9f03aso1202856b3a.2 for ; Thu, 18 Dec 2025 19:54:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766116454; x=1766721254; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xHHylnVK4XkJx4ZGgOulzOiaDq14MqhIJ3X9iv/iIfM=; b=G91n7riWyNG1KnGQ6kaD2ozT8uIpBqxgB12Ho4gLs6pOfw3bIcst1lSQTyIfOkrQdz eUnnjMNDf9WecySbVmVWnrd9brDbJwgmJyRt5xND+HbJLGoiTzAFns0+5k4b16LCycun SooVA/gMeTmfpf107NuqLrntSrz6mLaPz3ZHna91Ed/VnM97OPVY3q6jPLyrldbEuecK Qr7tabDESc/mamAutdfDuR3f1bsHiQTnlwT3RTkFzjrc547FpD85g5ZCYWOOWR710zCp YczIsFgWdOWerSbhnntRJhFvN5DrLD3u6nbcexTsm2sw+Uf63vR5/2QXHz+O9f+dFvt+ 9onw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766116454; x=1766721254; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=xHHylnVK4XkJx4ZGgOulzOiaDq14MqhIJ3X9iv/iIfM=; b=m5nQ1e2vy5UGr+x+83FvgrOGsPlt3+4+aeGCvXn/i76Emehn/iDeT/NLZM6uUuwvyR J3Dc3n9dWhWIwhhFmTfMRelHlT/Z0+ZNxpKoX7tFuvI4j9W3B7ywrnKVrcyGNPABKjPs vvdwVGWGrglfPZ5JFnL2PA6OqSfI51PHtDH2ITqs5ykZ7XjTtpvily2zAAcyu0HiavFD nXuoTBAhoNIlfII6EQKipMsj7GynACY4REZJ4aBrn43Uy8rKig7NKydgu18hPARtI6p5 BmnYcPJhvJTO6MQJOFVUuzU0S+ICFVlWZrbFNN3ZvlaFHOVVbqrzCW1EU5lL75PxJ6rr qmWQ== X-Forwarded-Encrypted: i=1; AJvYcCV327gaoBudSjXIGlN+ZeTx6FPoBSs5gXS30QeztqxagI6WfkSANDzpQGJxCwW6WAHPfGnF/nZPfpPFlF4=@vger.kernel.org X-Gm-Message-State: AOJu0YwzsDgcvTM0m7/Q2wh9lK5TuVs0IXrjdqxk4r9rPt0KibEn8cfN o14UdfW10NmvEtOUx9Pn5RlAAXpRLmnRDJkCdWn3H1XRml+wBr4FVr0A X-Gm-Gg: AY/fxX66648Isc+U5o5UVwKNxvuHpztWVxYgqo4nOfz1cqri/vj3yKOvx3xuBgbN7nB raEp7Lq5zhhfQmtfrUxKC6lE8uktU6fJ5wzt+b1P8rw+zQO7RFoTR2tR/+PFGEtrJAVXoMKdADg IoxfBZylCvDffiuktYPGIUFHqlAjCRn3W/JHG3ELh+b7XcjyT8Ad9BvwThJryZP4F0V0K3IRdrV BTYtRK6GpuLUYpr2vpAYGXt06u2fM61CL498tZPJ2uGYDRN6jZ7iK4ZoOyzIzf7PvLpnT7k5kGw fa6ZuBk24BWO9BeaY4ZG+/bZN85jGbG7n2f7FSsZVX+gn8cfbDk9KBh30PgvxIhVGIxKKNZWfgg fHYEBpSxRtPsYCqX2MbggSExWuhZFY/BhTCqdDtSikdWM940WcfwQ6B2JWBgDy5meJX+THXQJdn K7qoA17YIgrQ== X-Google-Smtp-Source: AGHT+IF0bhXQdMZ92vZEQxR6Fache4q5+RNfQoshWlrbjpqkOwVElhYo3bzP5BK/41CJOmVAmF81SQ== X-Received: by 2002:a05:6a20:2583:b0:34e:4352:6c65 with SMTP id adf61e73a8af0-376a9acee63mr1654219637.38.1766116453636; Thu, 18 Dec 2025 19:54:13 -0800 (PST) Received: from wanpengli.. ([175.170.92.22]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2a2f3d4d36esm7368135ad.63.2025.12.18.19.54.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Dec 2025 19:54:13 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: K Prateek Nayak , Christian Borntraeger , Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH v2 9/9] KVM: Relaxed boost as safety net Date: Fri, 19 Dec 2025 11:53:33 +0800 Message-ID: <20251219035334.39790-10-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251219035334.39790-1-kernellwp@gmail.com> References: <20251219035334.39790-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wanpeng Li Add a minimal two-round fallback mechanism in kvm_vcpu_on_spin() to avoid pathological stalls when the first round finds no eligible target. Round 1 applies strict IPI-aware candidate selection (existing behavior). Round 2 provides a relaxed scan gated only by preempted state as a safety net, addressing cases where IPI context is missed or the runnable set is transient. The second round is controlled by module parameter enable_relaxed_boost (bool, 0644, default on) to allow easy disablement by distributions if needed. Introduce the enable_relaxed_boost parameter, add a first_round flag, retry label, and reset of yielded counter. Gate the IPI-aware check in round 1 and use preempted-only gating in round 2. Keep churn minimal by reusing the same scan logic while preserving all existing heuristics, tracing, and bookkeeping. Signed-off-by: Wanpeng Li --- virt/kvm/kvm_main.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 45ede950314b..662a907a79e1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -102,6 +102,9 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_shrink); static bool allow_unsafe_mappings; module_param(allow_unsafe_mappings, bool, 0444); =20 +static bool enable_relaxed_boost =3D true; +module_param(enable_relaxed_boost, bool, 0644); + /* * Ordering of locks: * @@ -4011,6 +4014,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield= _to_kernel_mode) struct kvm *kvm =3D me->kvm; struct kvm_vcpu *vcpu; int try =3D 3; + bool first_round =3D true; =20 nr_vcpus =3D atomic_read(&kvm->online_vcpus); if (nr_vcpus < 2) @@ -4021,6 +4025,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield= _to_kernel_mode) =20 kvm_vcpu_set_in_spin_loop(me, true); =20 +retry: + yielded =3D 0; + /* * The current vCPU ("me") is spinning in kernel mode, i.e. is likely * waiting for a resource to become available. Attempt to yield to a @@ -4052,8 +4059,13 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yiel= d_to_kernel_mode) if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu)) continue; =20 - /* IPI-aware candidate selection */ - if (!kvm_vcpu_is_good_yield_candidate(me, vcpu, yield_to_kernel_mode)) + /* IPI-aware candidate selection in first round */ + if (first_round && + !kvm_vcpu_is_good_yield_candidate(me, vcpu, yield_to_kernel_mode)) + continue; + + /* Minimal preempted gate for second round */ + if (!first_round && !READ_ONCE(vcpu->preempted)) continue; =20 if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) @@ -4067,6 +4079,16 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yiel= d_to_kernel_mode) break; } } + + /* + * Second round: relaxed boost as safety net, with preempted gate. + * Only execute when enabled and when the first round yielded nothing. + */ + if (enable_relaxed_boost && first_round && yielded <=3D 0) { + first_round =3D false; + goto retry; + } + kvm_vcpu_set_in_spin_loop(me, false); =20 /* Ensure vcpu is not eligible during next spinloop */ --=20 2.43.0