[RFC PATCH -next 16/16] mm/damon/core: handle quota->esz overflow issues

Quanmin Yan posted 16 patches 1 month, 3 weeks ago
There is a newer version of this series
[RFC PATCH -next 16/16] mm/damon/core: handle quota->esz overflow issues
Posted by Quanmin Yan 1 month, 3 weeks ago
In the original quota enforcement implementation, the traffic
calculation multiplied A by 1000000 due to time unit conversion,
making it highly prone to overflow on 32-bit systems:

damos_set_effective_quota
  if (quota->total_charged_ns)
    throughput = quota->total_charged_sz * 1000000 /
		quota->total_charged_ns;

Requiring total_charged_sz to be less than 4GB/1000000 is unreasonable.
Additionally, when overflow occurs and causes quota->esz to become
extremely small, the subsequent damos_apply_scheme logic permanently
sets sz to 0, while quota stop updating, ultimately leading to complete
functional failure:

damos_apply_scheme
  if (quota->esz && quota->charged_sz + sz > quota->esz)
    sz = ALIGN_DOWN(quota->esz - quota->charged_sz, DAMON_MIN_REGION);

Total charged stats use the unsigned long long data type to reduce
overflow risk, with data reset capability after overflow occurs.

Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com>
---
 include/linux/damon.h |  4 ++--
 mm/damon/core.c       | 18 ++++++++++++------
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index d85850cf06c5..45aab331dfb7 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -247,8 +247,8 @@ struct damos_quota {
 
 /* private: */
 	/* For throughput estimation */
-	unsigned long total_charged_sz;
-	unsigned long total_charged_ns;
+	unsigned long long total_charged_sz;
+	unsigned long long total_charged_ns;
 
 	/* For charging the quota */
 	unsigned long charged_sz;
diff --git a/mm/damon/core.c b/mm/damon/core.c
index bc764f9dc5c5..5e05fdd91c12 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -15,6 +15,7 @@
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/string_choices.h>
+#include <linux/math64.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/damon.h>
@@ -2059,8 +2060,8 @@ static unsigned long damos_quota_score(struct damos_quota *quota)
  */
 static void damos_set_effective_quota(struct damos_quota *quota)
 {
-	unsigned long throughput;
-	unsigned long esz = ULONG_MAX;
+	unsigned long long throughput;
+	unsigned long long esz = ULLONG_MAX;
 
 	if (!quota->ms && list_empty(&quota->goals)) {
 		quota->esz = quota->sz;
@@ -2077,11 +2078,16 @@ static void damos_set_effective_quota(struct damos_quota *quota)
 	}
 
 	if (quota->ms) {
-		if (quota->total_charged_ns)
-			throughput = quota->total_charged_sz * 1000000 /
-				quota->total_charged_ns;
-		else
+		if (quota->total_charged_ns &&
+			likely(quota->total_charged_sz < ULLONG_MAX / 1000000)) {
+			throughput = div64_u64(quota->total_charged_sz * 1000000,
+					quota->total_charged_ns);
+		} else {
 			throughput = PAGE_SIZE * 1024;
+			/* Reset the variable when an overflow occurs */
+			quota->total_charged_ns = 0;
+			quota->total_charged_sz = 0;
+		}
 		esz = min(throughput * quota->ms, esz);
 	}
 
-- 
2.34.1
Re: [RFC PATCH -next 16/16] mm/damon/core: handle quota->esz overflow issues
Posted by SeongJae Park 1 month, 3 weeks ago
On Wed, 13 Aug 2025 13:07:06 +0800 Quanmin Yan <yanquanmin1@huawei.com> wrote:

> In the original quota enforcement implementation, the traffic
> calculation multiplied A by 1000000 due to time unit conversion,
> making it highly prone to overflow on 32-bit systems:
> 
> damos_set_effective_quota
>   if (quota->total_charged_ns)
>     throughput = quota->total_charged_sz * 1000000 /
> 		quota->total_charged_ns;
> 
> Requiring total_charged_sz to be less than 4GB/1000000 is unreasonable.
> Additionally, when overflow occurs and causes quota->esz to become
> extremely small, the subsequent damos_apply_scheme logic permanently
> sets sz to 0, while quota stop updating, ultimately leading to complete
> functional failure:
> 
> damos_apply_scheme
>   if (quota->esz && quota->charged_sz + sz > quota->esz)
>     sz = ALIGN_DOWN(quota->esz - quota->charged_sz, DAMON_MIN_REGION);
> 
> Total charged stats use the unsigned long long data type to reduce
> overflow risk, with data reset capability after overflow occurs.

Thank you for finding this issue!  I don't want to change the data type if
possible, though.  Could replacing the easily-overflowing throughput
calculation with mult_frac() fix the issue?


Thanks,
SJ

[...]
Re: [RFC PATCH -next 16/16] mm/damon/core: handle quota->esz overflow issues
Posted by Quanmin Yan 1 month, 2 weeks ago
Hi SJ,

在 2025/8/14 1:15, SeongJae Park 写道:
> On Wed, 13 Aug 2025 13:07:06 +0800 Quanmin Yan <yanquanmin1@huawei.com> wrote:
>
>> In the original quota enforcement implementation, the traffic
>> calculation multiplied A by 1000000 due to time unit conversion,
>> making it highly prone to overflow on 32-bit systems:
>>
>> damos_set_effective_quota
>>    if (quota->total_charged_ns)
>>      throughput = quota->total_charged_sz * 1000000 /
>> 		quota->total_charged_ns;
>>
>> Requiring total_charged_sz to be less than 4GB/1000000 is unreasonable.
>> Additionally, when overflow occurs and causes quota->esz to become
>> extremely small, the subsequent damos_apply_scheme logic permanently
>> sets sz to 0, while quota stop updating, ultimately leading to complete
>> functional failure:
>>
>> damos_apply_scheme
>>    if (quota->esz && quota->charged_sz + sz > quota->esz)
>>      sz = ALIGN_DOWN(quota->esz - quota->charged_sz, DAMON_MIN_REGION);
>>
>> Total charged stats use the unsigned long long data type to reduce
>> overflow risk, with data reset capability after overflow occurs.
> Thank you for finding this issue!  I don't want to change the data type if
> possible, though.  Could replacing the easily-overflowing throughput
> calculation with mult_frac() fix the issue?

Thank you for your guidance, it does work effectively. The relevant changes
have been included in patch #12 of the v2 series[1].

[1] https://lore.kernel.org/all/20250820080623.3799131-13-yanquanmin1@huawei.com/

Thanks, Quanmin Yan