From nobody Thu Apr 2 20:28:04 2026 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 791262F9D83 for ; Fri, 13 Feb 2026 07:38:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.193 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770968316; cv=none; b=OKQFCX8DfWGeT5YM55Xm7ZfquvhrCAx85cXG0HSfDY16WPFhzycs68JIFGuOfnZlvjDq5k0FLGJzDa85jBe4Hqts0aRM04LhCku3wePaqsNqpeOBI/bcW/+fCOW62CdRmGlpqU1UJTWCj2A5LyqbOWsXqKbTwRaUIfDY/XksZFQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770968316; c=relaxed/simple; bh=HNCSRMt+uVOtWUpx9BoHkeZenvzhAbsE081cSwO+Qw0=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=XdMLymmJLXRt6/8MqGswsTPumvbBp4xDOWrHWHvkV65PQOW7H+L5v5yj4tlGlam8gjR7BW4Zd3W2patPlyOZzRtqlwTKC2VCab9EiTeUk26n/bGjNxAgCR+6xi/NXpKEFrDqMIdG1ADLsGwNVOWezdfd6ewF7JY7fhNcaABLUOU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fDr4/+Ey; arc=none smtp.client-ip=209.85.210.193 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fDr4/+Ey" Received: by mail-pf1-f193.google.com with SMTP id d2e1a72fcca58-824a3509a12so414945b3a.2 for ; Thu, 12 Feb 2026 23:38:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770968315; x=1771573115; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=TUNbAPwCyhGMoDBC5WsOOsA+LVhyxhs3/AZuxGFmVLQ=; b=fDr4/+Ey4gxFhAoGsNGFV0vOB2RPktLXpGndCXHa6/RCOAdOXdzdU8onGUqxf9hr2G U0YNmganwdBfBPSxfAMZ27VU7ZmGkTSfdmc7B1l3Iv8Jgi8FjTtnR5KRTzuJhL6ys1Vw WreOEzDnFjFUetpD7qf7mqtEwF+n9w8TsI/Z2xQWHwIVivOI6O2e7L2+nTQUSxv7EZSR SmolaOo52hE+XMXrca0YeBT/SFdrMd9JuSNjvfHgcNxoPOAJU/ufIglT8+Z/J6M5HxE8 CHgoibOXZ/tBog6MGU8yEobXzydNbfqMpbWJt8dQueezjW5yp/Wq4uGs9YbiMBc/m0Qx +n0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770968315; x=1771573115; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=TUNbAPwCyhGMoDBC5WsOOsA+LVhyxhs3/AZuxGFmVLQ=; b=EO4yT3h6y7T8MubIRiTuHs2FTVVxDRfTZqr7N84u1ie34RUV+yL8xT++Y/+8S3nOaS /wGxGjOoLq5n3Fhgj6RsEg5nOOzmb4PYwcxofjOhb8CtiLYmAOl+b6hjwjF1KBAbxS8o x4hZaml+iyANTy4IgV6CD5RXBkUBCIPNBikwxWBJ1J1TvBnQPjO/zDx+Ztk/ObADXQ43 Tv9uB93fC1Hq40GDK7bgZG9JVTs25XSwzmKmQFQuKstpjfAi5kF5QHkDqWQbyc0RtqKW ENgFwrQOTH7jr7sKP+RUo4tmIlVEHB6Xy5cC6ZdVGezKqtod5HFcRT6aIKwyfYKDHQfR uKtg== X-Forwarded-Encrypted: i=1; AJvYcCX9qe5P/gp0aMJq+jqFcLgrir/vNwLub2QqTE8HjY0GU9ixbEwDPlpVqINl3iZewEmq845Xzy6x9nh7kQI=@vger.kernel.org X-Gm-Message-State: AOJu0Yyz+1imeheRU7JPHpcvsW34o/C4Xlmy25F2G4mM+gx14c6JVMzR FS88J9pYEapmRYjk7DxOncLUHHYLBHdIZ2VqMOV+0ISffLzj7qbX+AW6 X-Gm-Gg: AZuq6aKdKl+IpsJ8tSLoHXdp4mlegwIp1zIINyfe4v+rDj6PHhL3Ze7khl4Cc0YuMsQ jCkD0psrtoLraYKH5vMrnURmnYfs4eAAckl6XgKNbfpZ3jtIGYDGipm7jrSY9Mv1eq3T3ejfqDo Dqx6cw717KsheubrkDrjzMstQneEmnCPFCwfvQ3XZEcy6Oejt24bF5LVOo3Yi3I0DOtFz/yeYx1 7YbUNwGhXbvnsdHMnDTtCWjt+yFIXf4wLvTitYw68fyP3zHfVpEx96USGJYfSFXbJHpo0aXP8y8 TwrM1UBVvqcNr1FWmM+IjNZvwcKTffLK7TUr5z2/AS3DroY3ztU0X/Mr2qlncqj9ja0SF4bggpC Xc1iemhwuKeeWZZSHovGSZ0JHyXUJzEwgCFnQZ0ICaGgh3SAArAsQK2Wh9DRY+YDrZqi3fAzNb/ LDu1bii1u9OVCZeF2hmLG8Bhufk5F5N6/kEQ== X-Received: by 2002:a05:6a00:2e85:b0:824:ae74:5725 with SMTP id d2e1a72fcca58-824c95c3afemr1122224b3a.35.1770968314668; Thu, 12 Feb 2026 23:38:34 -0800 (PST) Received: from archwsl.localdomain ([117.184.79.158]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-824c6a2e936sm1498959b3a.6.2026.02.12.23.38.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Feb 2026 23:38:34 -0800 (PST) From: Jialin Wang To: tj@kernel.org, josef@toxicpanda.com, axboe@kernel.dk Cc: lianux.mm@gmail.com, cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Jialin Wang Subject: [RFC PATCH] blk-iocost: introduce 'linear-max' cost model for cloud disk Date: Fri, 13 Feb 2026 15:38:29 +0800 Message-ID: <20260213073829.182168-1-wjl.linux@gmail.com> X-Mailer: git-send-email 2.52.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In public cloud environments, block devices usually enforce performance limits based on two independent token buckets: IOPS and BPS. The device is throttled when either the IOPS limit or the BPS limit is reached. To effectively manage "noisy neighbor" problems, we configure iocost model parameters (or vrate max) to approximately 95% of the cloud provider's provisioned limits. The goal is to strictly avoid hitting the storage backend's hard BPS/IOPS limits. By saturating the virtual budget before the physical limit, iocost engages throttling first. Unlike the indiscriminate throttling applied by cloud storage backends, iocost selectively penalizes low-weight cgroups or heavy-traffic perpetrators. Consequently, IO-latency-sensitive critical workloads remain entirely unaffected by the congestion. Extensive testing has verified that this approach yields excellent isolation results. However, the existing 'linear' cost model leads to significant performance loss in this specific configuration due to its additive nature. Using tools/cgroup/iocost_coef_gen.py, we measured the following performance data on a typical cloud disk: 8:16 rbps=3D173471131 rseqiops=3D3566 rrandiops=3D3566 wbps=3D173333269 wse= qiops=3D3566 wrandiops=3D3559 Dividing BPS by IOPS (173471131 / 3566) yields approximately 48607 bytes. When running fio with bs=3D48607, we observed a 50% drop in throughput compared to running without iocost enabled. The reason is that the current 'linear' model calculates cost as: Cost =3D BaseCost + (Pages * PerPageCost) Expanding the internal variables relative to IOPS and BPS, this is effectively: Cost =3D VTIME_PER_SEC * ((1 / IOPS - 4096 / BPS) + size / BPS) When the I/O size is such that the IOPS cost component roughly equals the BPS cost component (as in the bs=3D48607 case above), the linear model sums them up. Since cloud disks throttle based on *either* IOPS *or* BPS (whichever is exhausted first), summing them effectively doubles the calculated cost. This causes iocost to drain virtual time twice as fast as necessary, throttling the device to 50% utilization. To solve this, this patch introduces a new 'linear-max' cost model. Instead of adding the components, it takes the maximum: Cost =3D VTIME_PER_SEC * max(1 / IOPS, size / BPS) Which translates to: Cost =3D max(BaseCost + PerPageCost, Pages * PerPageCost) This formula correctly models the dual-bucket behavior of cloud disks. It ensures that for any block size, the calculated cost aligns with the actual bottleneck (IOPS or BPS). This allows the system to reach close to the provisioned BPS/IOPS limits without premature throttling, while still maintaining the latency protection benefits of iocost. Signed-off-by: Jialin Wang --- block/blk-iocost.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index ef543d163d46..ead478d8e5bc 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -445,6 +445,7 @@ struct ioc { int autop_idx; bool user_qos_params:1; bool user_cost_model:1; + bool cost_model_linear_max:1; }; =20 struct iocg_pcpu_stat { @@ -2565,7 +2566,12 @@ static void calc_vtime_cost_builtin(struct bio *bio,= struct ioc_gq *iocg, cost +=3D coef_seqio; } } - cost +=3D pages * coef_page; + + if (ioc->cost_model_linear_max) + cost =3D max(cost + coef_page, pages * coef_page); + else + cost +=3D pages * coef_page; + out: *costp =3D cost; } @@ -3368,10 +3374,11 @@ static u64 ioc_cost_model_prfill(struct seq_file *s= f, return 0; =20 spin_lock(&ioc->lock); - seq_printf(sf, "%s ctrl=3D%s model=3Dlinear " + seq_printf(sf, "%s ctrl=3D%s model=3D%s " "rbps=3D%llu rseqiops=3D%llu rrandiops=3D%llu " "wbps=3D%llu wseqiops=3D%llu wrandiops=3D%llu\n", dname, ioc->user_cost_model ? "user" : "auto", + ioc->cost_model_linear_max ? "linear-max" : "linear", u[I_LCOEF_RBPS], u[I_LCOEF_RSEQIOPS], u[I_LCOEF_RRANDIOPS], u[I_LCOEF_WBPS], u[I_LCOEF_WSEQIOPS], u[I_LCOEF_WRANDIOPS]); spin_unlock(&ioc->lock); @@ -3412,6 +3419,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_ope= n_file *of, char *input, struct ioc *ioc; u64 u[NR_I_LCOEFS]; bool user; + bool linear_max; char *body, *p; int ret; =20 @@ -3442,6 +3450,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_ope= n_file *of, char *input, spin_lock_irq(&ioc->lock); memcpy(u, ioc->params.i_lcoefs, sizeof(u)); user =3D ioc->user_cost_model; + linear_max =3D ioc->cost_model_linear_max; =20 while ((p =3D strsep(&body, " \t\n"))) { substring_t args[MAX_OPT_ARGS]; @@ -3464,7 +3473,11 @@ static ssize_t ioc_cost_model_write(struct kernfs_op= en_file *of, char *input, continue; case COST_MODEL: match_strlcpy(buf, &args[0], sizeof(buf)); - if (strcmp(buf, "linear")) + if (!strcmp(buf, "linear")) + linear_max =3D false; + else if (!strcmp(buf, "linear-max")) + linear_max =3D true; + else goto einval; continue; } @@ -3481,8 +3494,10 @@ static ssize_t ioc_cost_model_write(struct kernfs_op= en_file *of, char *input, if (user) { memcpy(ioc->params.i_lcoefs, u, sizeof(u)); ioc->user_cost_model =3D true; + ioc->cost_model_linear_max =3D linear_max; } else { ioc->user_cost_model =3D false; + ioc->cost_model_linear_max =3D false; } ioc_refresh_params(ioc, true); spin_unlock_irq(&ioc->lock); --=20 2.52.0