[v2] Support Multi-frequency scale for UFS

[PATCH v2 5/8] scsi: ufs: core: Enable multi-level gear scaling

Posted by Ziqi Chen 2 weeks ago

From: Can Guo <quic_cang@quicinc.com>

With OPP V2 enabled, devfreq can scale clocks amongst multiple frequency
plans. However, the gear speed is only toggled between min and max during
clock scaling. Enable multi-level gear scaling by mapping clock frequencies
to gear speeds, so that when devfreq scales clock frequencies we can put
the UFS link at the appropraite gear speeds accordingly.

Signed-off-by: Can Guo <quic_cang@quicinc.com>
Co-developed-by: Ziqi Chen <quic_ziqichen@quicinc.com>
Signed-off-by: Ziqi Chen <quic_ziqichen@quicinc.com>
---

v1 -> v2:
Rename the lable "do_pmc" to "config_pwr_mode".
---
 drivers/ufs/core/ufshcd.c | 46 ++++++++++++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 8d295cc827cc..e0fc198328a5 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -1308,16 +1308,28 @@ static int ufshcd_wait_for_doorbell_clr(struct ufs_hba *hba,
 /**
  * ufshcd_scale_gear - scale up/down UFS gear
  * @hba: per adapter instance
+ * @target_gear: target gear to scale to
  * @scale_up: True for scaling up gear and false for scaling down
  *
  * Return: 0 for success; -EBUSY if scaling can't happen at this time;
  * non-zero for any other errors.
  */
-static int ufshcd_scale_gear(struct ufs_hba *hba, bool scale_up)
+static int ufshcd_scale_gear(struct ufs_hba *hba, u32 target_gear, bool scale_up)
 {
 	int ret = 0;
 	struct ufs_pa_layer_attr new_pwr_info;
 
+	if (target_gear) {
+		memcpy(&new_pwr_info, &hba->pwr_info,
+		       sizeof(struct ufs_pa_layer_attr));
+
+		new_pwr_info.gear_tx = target_gear;
+		new_pwr_info.gear_rx = target_gear;
+
+		goto config_pwr_mode;
+	}
+
+	/* Legacy gear scaling, in case vops_freq_to_gear_speed() is not implemented */
 	if (scale_up) {
 		memcpy(&new_pwr_info, &hba->clk_scaling.saved_pwr_info,
 		       sizeof(struct ufs_pa_layer_attr));
@@ -1338,6 +1350,7 @@ static int ufshcd_scale_gear(struct ufs_hba *hba, bool scale_up)
 		}
 	}
 
+config_pwr_mode:
 	/* check if the power mode needs to be changed or not? */
 	ret = ufshcd_config_pwr_mode(hba, &new_pwr_info);
 	if (ret)
@@ -1408,15 +1421,19 @@ static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba, int err, bool sc
 static int ufshcd_devfreq_scale(struct ufs_hba *hba, unsigned long freq,
 				bool scale_up)
 {
+	u32 old_gear = hba->pwr_info.gear_rx;
+	u32 new_gear = 0;
 	int ret = 0;
 
+	ufshcd_vops_freq_to_gear_speed(hba, freq, &new_gear);
+
 	ret = ufshcd_clock_scaling_prepare(hba, 1 * USEC_PER_SEC);
 	if (ret)
 		return ret;
 
 	/* scale down the gear before scaling down clocks */
 	if (!scale_up) {
-		ret = ufshcd_scale_gear(hba, false);
+		ret = ufshcd_scale_gear(hba, new_gear, false);
 		if (ret)
 			goto out_unprepare;
 	}
@@ -1424,13 +1441,13 @@ static int ufshcd_devfreq_scale(struct ufs_hba *hba, unsigned long freq,
 	ret = ufshcd_scale_clks(hba, freq, scale_up);
 	if (ret) {
 		if (!scale_up)
-			ufshcd_scale_gear(hba, true);
+			ufshcd_scale_gear(hba, old_gear, true);
 		goto out_unprepare;
 	}
 
 	/* scale up the gear after scaling up clocks */
 	if (scale_up) {
-		ret = ufshcd_scale_gear(hba, true);
+		ret = ufshcd_scale_gear(hba, new_gear, true);
 		if (ret) {
 			ufshcd_scale_clks(hba, hba->devfreq->previous_freq,
 					  false);
@@ -1723,6 +1740,8 @@ static ssize_t ufshcd_clkscale_enable_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t count)
 {
 	struct ufs_hba *hba = dev_get_drvdata(dev);
+	struct ufs_clk_info *clki;
+	unsigned long freq;
 	u32 value;
 	int err = 0;
 
@@ -1746,14 +1765,21 @@ static ssize_t ufshcd_clkscale_enable_store(struct device *dev,
 
 	if (value) {
 		ufshcd_resume_clkscaling(hba);
-	} else {
-		ufshcd_suspend_clkscaling(hba);
-		err = ufshcd_devfreq_scale(hba, ULONG_MAX, true);
-		if (err)
-			dev_err(hba->dev, "%s: failed to scale clocks up %d\n",
-					__func__, err);
+		goto out_rel;
 	}
 
+	clki = list_first_entry(&hba->clk_list_head, struct ufs_clk_info, list);
+	freq = clki->max_freq;
+
+	ufshcd_suspend_clkscaling(hba);
+	err = ufshcd_devfreq_scale(hba, freq, true);
+	if (err)
+		dev_err(hba->dev, "%s: failed to scale clocks up %d\n",
+				__func__, err);
+	else
+		hba->clk_scaling.target_freq = freq;
+
+out_rel:
 	ufshcd_release(hba);
 	ufshcd_rpm_put_sync(hba);
 out:
-- 
2.34.1

Re: [PATCH v2 5/8] scsi: ufs: core: Enable multi-level gear scaling

Posted by Bart Van Assche 2 weeks ago

On 1/22/25 2:02 AM, Ziqi Chen wrote:
> +	if (target_gear) {
> +		memcpy(&new_pwr_info, &hba->pwr_info,
> +		       sizeof(struct ufs_pa_layer_attr));

Why memcpy() instead of an assignment? The advantage of an assignment is 
that the compiler can perform type checking.

Thanks,

Bart.

Re: [PATCH v2 5/8] scsi: ufs: core: Enable multi-level gear scaling

Posted by Ziqi Chen 1 week, 6 days ago

On 1/23/2025 2:32 AM, Bart Van Assche wrote:
> On 1/22/25 2:02 AM, Ziqi Chen wrote:
>> +    if (target_gear) {
>> +        memcpy(&new_pwr_info, &hba->pwr_info,
>> +               sizeof(struct ufs_pa_layer_attr));
> 
> Why memcpy() instead of an assignment? The advantage of an assignment is 
> that the compiler can perform type checking.
> 
> Thanks,
> 
> Bart.
> 

Hi Bart,

We use memcpy() here is due to memcpy() can be faster than direct 
assignment. We don't worry about safety because they are same struct 
"ufs_pa_layer_attr" so that we can ensure the accuracy of number of 
bytes and member type.

-Ziqi

Re: [PATCH v2 5/8] scsi: ufs: core: Enable multi-level gear scaling

Posted by Bart Van Assche 1 week, 6 days ago

On 1/22/25 11:41 PM, Ziqi Chen wrote:
> We use memcpy() here is due to memcpy() can be faster than direct 
> assignment. We don't worry about safety because they are same struct 
> "ufs_pa_layer_attr" so that we can ensure the accuracy of number of 
> bytes and member type.

The memcpy() call we are discussing is not in the hot path so it doesn't
have to be hyper-optimized. Making the compiler perform type checking is
more important in this code path than micro-optimizing the code.

Additionally, please do not try to be smarter than the compiler. 
Compilers are able to convert struct assignments into a memcpy() call if
there are good reasons to assume that the memcpy() call will be faster.

Given the small size of struct ufs_pa_layer_attr (7 * 4 = 28 bytes),
memberwise assignment probably is faster than a memcpy() call. The trunk
version of gcc (ARM64) translates a memberwise assignment of struct 
ufs_pa_layer_attr into the following four assembler instructions (x0 and
x1 point to struct ufs_pa_layer_attr instances, q30 and q31 are 128 bit
registers):

         ldr     q30, [x1]
         ldr     q31, [x1, 12]
         str     q30, [x0]
         str     q31, [x0, 12]

Thanks,

Bart.

Re: [PATCH v2 5/8] scsi: ufs: core: Enable multi-level gear scaling

Posted by Ziqi Chen 1 week, 5 days ago


On 1/24/2025 2:02 AM, Bart Van Assche wrote:
> On 1/22/25 11:41 PM, Ziqi Chen wrote:
>> We use memcpy() here is due to memcpy() can be faster than direct 
>> assignment. We don't worry about safety because they are same struct 
>> "ufs_pa_layer_attr" so that we can ensure the accuracy of number of 
>> bytes and member type.
> 
> The memcpy() call we are discussing is not in the hot path so it doesn't
> have to be hyper-optimized. Making the compiler perform type checking is
> more important in this code path than micro-optimizing the code.
> 
> Additionally, please do not try to be smarter than the compiler. 
> Compilers are able to convert struct assignments into a memcpy() call if
> there are good reasons to assume that the memcpy() call will be faster.
> 
> Given the small size of struct ufs_pa_layer_attr (7 * 4 = 28 bytes),
> memberwise assignment probably is faster than a memcpy() call. The trunk
> version of gcc (ARM64) translates a memberwise assignment of struct 
> ufs_pa_layer_attr into the following four assembler instructions (x0 and
> x1 point to struct ufs_pa_layer_attr instances, q30 and q31 are 128 bit
> registers):
> 
>          ldr     q30, [x1]
>          ldr     q31, [x1, 12]
>          str     q30, [x0]
>          str     q31, [x0, 12]
> 
> Thanks,
> 
> Bart.
> 
Sure , Let me try and test it. If works fine , I will update in next 
version.

-Ziqi