.../devicetree/bindings/ufs/qcom,ufs.yaml | 21 ++++++++++++------- arch/arm64/boot/dts/qcom/sm8650.dtsi | 9 +++++++- arch/arm64/boot/dts/qcom/sm8750.dtsi | 10 +++++++-- 3 files changed, 29 insertions(+), 11 deletions(-)
This patch series enables Multi-Circular Queue (MCQ) support for the UFS host controller on Qualcomm SM8650 and SM8750 platforms. MCQ is a modern queuing model that improves performance and scalability by allowing multiple hardware queues. Although MCQ support has been present in the UFS driver for several years, this is the first time it is being enabled via Device Tree for these platforms. Patch 1 updates the device tree bindings to allow the additional register regions and reg-names required for MCQ operation. Patches 2 and 3 update the device trees for SM8650 and SM8750 respectively to enable MCQ by adding the necessary register mappings and MSI parent. Tested on internal hardware for both platforms. Palash Kambar (1): arm64: dts: qcom: sm8750: Enable MCQ support for UFS controller Ram Kumar Dwivedi (2): dt-bindings: ufs: qcom: Add MCQ support to reg and reg-names arm64: dts: qcom: sm8650: Enable MCQ support for UFS controller .../devicetree/bindings/ufs/qcom,ufs.yaml | 21 ++++++++++++------- arch/arm64/boot/dts/qcom/sm8650.dtsi | 9 +++++++- arch/arm64/boot/dts/qcom/sm8750.dtsi | 10 +++++++-- 3 files changed, 29 insertions(+), 11 deletions(-) -- 2.50.1
Hi, On 30/07/2025 10:22, Ram Kumar Dwivedi wrote: > This patch series enables Multi-Circular Queue (MCQ) support for the UFS > host controller on Qualcomm SM8650 and SM8750 platforms. MCQ is a modern > queuing model that improves performance and scalability by allowing > multiple hardware queues. > > Although MCQ support has been present in the UFS driver for several years, > this is the first time it is being enabled via Device Tree for these > platforms. > > Patch 1 updates the device tree bindings to allow the additional register > regions and reg-names required for MCQ operation. > > Patches 2 and 3 update the device trees for SM8650 and SM8750 respectively > to enable MCQ by adding the necessary register mappings and MSI parent. > > Tested on internal hardware for both platforms. > > Palash Kambar (1): > arm64: dts: qcom: sm8750: Enable MCQ support for UFS controller > > Ram Kumar Dwivedi (2): > dt-bindings: ufs: qcom: Add MCQ support to reg and reg-names > arm64: dts: qcom: sm8650: Enable MCQ support for UFS controller > > .../devicetree/bindings/ufs/qcom,ufs.yaml | 21 ++++++++++++------- > arch/arm64/boot/dts/qcom/sm8650.dtsi | 9 +++++++- > arch/arm64/boot/dts/qcom/sm8750.dtsi | 10 +++++++-- > 3 files changed, 29 insertions(+), 11 deletions(-) > I ran some tests on the SM8650-QRD, and it works so please add my: Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD I ran some fio tests, comparing the v6.15, v6.16 (with threaded irqs) and next + mcq support, and here's the analysis on the results: Significant Performance Gains in Write Operations with Multiple Jobs: The "mcq" change shows a substantial improvement in both IOPS and bandwidth for write operations with 8 jobs. Moderate Improvement in Single Job Operations (Read and Write): For single job operations (read and write), the "mcq" change generally leads to positive, albeit less dramatic, improvements in IOPS and bandwidth. Slight Decrease in Read Operations with Multiple Jobs: Interestingly, for read operations with 8 jobs, there's a slight decrease in both IOPS and bandwidth with the "mcq" kernel. The raw results are: Board: sm8650-qrd read / 1 job v6.15 v6.16 next+mcq iops (min) 3,996.00 5,921.60 4,661.20 iops (max) 4,772.80 6,491.20 5,027.60 iops (avg) 4,526.25 6,295.31 4,979.81 cpu % usr 4.62 2.96 5.68 cpu % sys 21.45 17.88 25.58 bw (MB/s) 18.54 25.78 20.40 read / 8 job v6.15 v6.16 next+mcq iops (min) 51,867.60 51,575.40 56,818.40 iops (max) 67,513.60 64,456.40 65,379.60 iops (avg) 64,314.80 62,136.76 63,016.07 cpu % usr 3.98 3.72 3.85 cpu % sys 16.70 17.16 14.87 bw (MB/s) 263.60 254.40 258.20 write / 1 job v6.15 v6.16 next+mcq iops (min) 5,654.80 8,060.00 7,117.20 iops (max) 6,720.40 8,852.00 7,706.80 iops (avg) 6,576.91 8,579.81 7,459.97 cpu % usr 7.48 3.79 6.73 cpu % sys 41.09 23.27 30.66 bw (MB/s) 26.96 35.16 30.56 write / 8 job v6.15 v6.16 next+mcq iops (min) 84,687.80 95,043.40 114,054.00 iops (max) 107,620.80 113,572.00 164,526.00 iops (avg) 97,910.86 105,927.38 149,071.43 cpu % usr 5.43 4.38 2.88 cpu % sys 21.73 20.29 16.09 bw (MB/s) 400.80 433.80 610.40 The test suite is: for rw in read write ; do echo "rw: ${rw}" for jobs in 1 8 ; do echo "jobs: ${jobs}" for it in $(seq 1 5) ; do fio --name=rand${rw} --rw=rand${rw} \ --ioengine=libaio --direct=1 \ --bs=4k --numjobs=${jobs} --size=32m \ --runtime=30 --time_based --end_fsync=1 \ --group_reporting --filename=/dev/disk/by-partlabel/super \ | grep -E '(iops|sys=|READ:|WRITE:)' sleep 5 done done done Thanks, Neil
On Thu, Jul 31, 2025 at 10:50:21AM GMT, neil.armstrong@linaro.org wrote: > Hi, > > On 30/07/2025 10:22, Ram Kumar Dwivedi wrote: > > This patch series enables Multi-Circular Queue (MCQ) support for the UFS > > host controller on Qualcomm SM8650 and SM8750 platforms. MCQ is a modern > > queuing model that improves performance and scalability by allowing > > multiple hardware queues. > > > > Although MCQ support has been present in the UFS driver for several years, > > this is the first time it is being enabled via Device Tree for these > > platforms. > > > > Patch 1 updates the device tree bindings to allow the additional register > > regions and reg-names required for MCQ operation. > > > > Patches 2 and 3 update the device trees for SM8650 and SM8750 respectively > > to enable MCQ by adding the necessary register mappings and MSI parent. > > > > Tested on internal hardware for both platforms. > > > > Palash Kambar (1): > > arm64: dts: qcom: sm8750: Enable MCQ support for UFS controller > > > > Ram Kumar Dwivedi (2): > > dt-bindings: ufs: qcom: Add MCQ support to reg and reg-names > > arm64: dts: qcom: sm8650: Enable MCQ support for UFS controller > > > > .../devicetree/bindings/ufs/qcom,ufs.yaml | 21 ++++++++++++------- > > arch/arm64/boot/dts/qcom/sm8650.dtsi | 9 +++++++- > > arch/arm64/boot/dts/qcom/sm8750.dtsi | 10 +++++++-- > > 3 files changed, 29 insertions(+), 11 deletions(-) > > > > I ran some tests on the SM8650-QRD, and it works so please add my: > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD > Thanks Neil for testing it out! > I ran some fio tests, comparing the v6.15, v6.16 (with threaded irqs) > and next + mcq support, and here's the analysis on the results: > > Significant Performance Gains in Write Operations with Multiple Jobs: > The "mcq" change shows a substantial improvement in both IOPS and bandwidth for write operations with 8 jobs. > Moderate Improvement in Single Job Operations (Read and Write): > For single job operations (read and write), the "mcq" change generally leads to positive, albeit less dramatic, improvements in IOPS and bandwidth. > Slight Decrease in Read Operations with Multiple Jobs: > Interestingly, for read operations with 8 jobs, there's a slight decrease in both IOPS and bandwidth with the "mcq" kernel. > > The raw results are: > Board: sm8650-qrd > > read / 1 job > v6.15 v6.16 next+mcq > iops (min) 3,996.00 5,921.60 4,661.20 > iops (max) 4,772.80 6,491.20 5,027.60 > iops (avg) 4,526.25 6,295.31 4,979.81 > cpu % usr 4.62 2.96 5.68 > cpu % sys 21.45 17.88 25.58 > bw (MB/s) 18.54 25.78 20.40 > It is interesting to note the % of CPU time spent with MCQ in the 1 job case. Looks like it is spending more time here. I'm wondering if it is the ESI limitation/overhead. - Mani > read / 8 job > v6.15 v6.16 next+mcq > iops (min) 51,867.60 51,575.40 56,818.40 > iops (max) 67,513.60 64,456.40 65,379.60 > iops (avg) 64,314.80 62,136.76 63,016.07 > cpu % usr 3.98 3.72 3.85 > cpu % sys 16.70 17.16 14.87 > bw (MB/s) 263.60 254.40 258.20 > > write / 1 job > v6.15 v6.16 next+mcq > iops (min) 5,654.80 8,060.00 7,117.20 > iops (max) 6,720.40 8,852.00 7,706.80 > iops (avg) 6,576.91 8,579.81 7,459.97 > cpu % usr 7.48 3.79 6.73 > cpu % sys 41.09 23.27 30.66 > bw (MB/s) 26.96 35.16 30.56 > > write / 8 job > v6.15 v6.16 next+mcq > iops (min) 84,687.80 95,043.40 114,054.00 > iops (max) 107,620.80 113,572.00 164,526.00 > iops (avg) 97,910.86 105,927.38 149,071.43 > cpu % usr 5.43 4.38 2.88 > cpu % sys 21.73 20.29 16.09 > bw (MB/s) 400.80 433.80 610.40 > > The test suite is: > for rw in read write ; do > echo "rw: ${rw}" > for jobs in 1 8 ; do > echo "jobs: ${jobs}" > for it in $(seq 1 5) ; do > fio --name=rand${rw} --rw=rand${rw} \ > --ioengine=libaio --direct=1 \ > --bs=4k --numjobs=${jobs} --size=32m \ > --runtime=30 --time_based --end_fsync=1 \ > --group_reporting --filename=/dev/disk/by-partlabel/super \ > | grep -E '(iops|sys=|READ:|WRITE:)' > sleep 5 > done > done > done > > Thanks, > Neil -- மணிவண்ணன் சதாசிவம்
© 2016 - 2025 Red Hat, Inc.