include/linux/memcontrol.h | 3 +++ mm/hugetlb.c | 5 +++++ mm/memcontrol.c | 6 ++++++ 3 files changed, 14 insertions(+)
HugeTLB usage is a metric that can provide utility for monitors hoping to get more insight into the memory usage patterns in cgroups. It also helps identify if large folios are being distributed efficiently across workloads, so that tasks that can take most advantage of reduced TLB misses are prioritized. While cgroupv2's hugeTLB controller does report this value, some users who wish to track hugeTLB usage might not want to take on the additional overhead or the features of the controller just to use the metric. This patch introduces hugeTLB usage in the memcg stats, mirroring the value in the hugeTLB controller and offering a more fine-grained cgroup-level breakdown of the value in /proc/meminfo. Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Joshua Hahn (1): Adding hugeTLB counters to memory controller include/linux/memcontrol.h | 3 +++ mm/hugetlb.c | 5 +++++ mm/memcontrol.c | 6 ++++++ 3 files changed, 14 insertions(+) -- 2.43.5
On Thu 17-10-24 09:04:37, Joshua Hahn wrote: > HugeTLB usage is a metric that can provide utility for monitors hoping > to get more insight into the memory usage patterns in cgroups. It also > helps identify if large folios are being distributed efficiently across > workloads, so that tasks that can take most advantage of reduced TLB > misses are prioritized. > > While cgroupv2's hugeTLB controller does report this value, some users > who wish to track hugeTLB usage might not want to take on the additional > overhead or the features of the controller just to use the metric. > This patch introduces hugeTLB usage in the memcg stats, mirroring the > value in the hugeTLB controller and offering a more fine-grained > cgroup-level breakdown of the value in /proc/meminfo. This seems really confusing because memcg controller is not responsible for the hugetlb memory. Could you be more specific why enabling hugetlb controller is not really desirable when the actual per-group tracking is needed? > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> > > Joshua Hahn (1): > Adding hugeTLB counters to memory controller > > include/linux/memcontrol.h | 3 +++ > mm/hugetlb.c | 5 +++++ > mm/memcontrol.c | 6 ++++++ > 3 files changed, 14 insertions(+) > > -- > 2.43.5 -- Michal Hocko SUSE Labs
On Fri, Oct 18, 2024 at 12:12:00PM +0200, Michal Hocko wrote: > On Thu 17-10-24 09:04:37, Joshua Hahn wrote: > > HugeTLB usage is a metric that can provide utility for monitors hoping > > to get more insight into the memory usage patterns in cgroups. It also > > helps identify if large folios are being distributed efficiently across > > workloads, so that tasks that can take most advantage of reduced TLB > > misses are prioritized. > > > > While cgroupv2's hugeTLB controller does report this value, some users > > who wish to track hugeTLB usage might not want to take on the additional > > overhead or the features of the controller just to use the metric. > > This patch introduces hugeTLB usage in the memcg stats, mirroring the > > value in the hugeTLB controller and offering a more fine-grained > > cgroup-level breakdown of the value in /proc/meminfo. > > This seems really confusing because memcg controller is not responsible > for the hugetlb memory. Could you be more specific why enabling hugetlb > controller is not really desirable when the actual per-group tracking is > needed? We have competition over memory, but not specifically over hugetlb. The maximum hugetlb footprint of jobs is known in advance, and we configure hugetlb_cma= accordingly. There are no static boot time hugetlb reservations, and there is no opportunistic use of hugetlb from jobs or other parts of the system. So we don't need control over the hugetlb pool, and no limit enforcement on hugetlb specifically. However, memory overall is overcommitted between job and system management. If the main job is using hugetlb, we need that to show up in memory.current and be taken into account for memory.high and memory.low enforcement. It's the old memory fungibility argument: if you use hugetlb, it should reduce the budget for cache/anon. Nhat's recent patch to charge hugetlb to memcg accomplishes that. However, we now have potentially a sizable portion of memory in memory.current that doesn't show up in memory.stat. Joshua's patch addresses that, so userspace can understand its memory footprint. I hope that makes sense.
On Fri 18-10-24 08:31:22, Johannes Weiner wrote: > On Fri, Oct 18, 2024 at 12:12:00PM +0200, Michal Hocko wrote: > > On Thu 17-10-24 09:04:37, Joshua Hahn wrote: > > > HugeTLB usage is a metric that can provide utility for monitors hoping > > > to get more insight into the memory usage patterns in cgroups. It also > > > helps identify if large folios are being distributed efficiently across > > > workloads, so that tasks that can take most advantage of reduced TLB > > > misses are prioritized. > > > > > > While cgroupv2's hugeTLB controller does report this value, some users > > > who wish to track hugeTLB usage might not want to take on the additional > > > overhead or the features of the controller just to use the metric. > > > This patch introduces hugeTLB usage in the memcg stats, mirroring the > > > value in the hugeTLB controller and offering a more fine-grained > > > cgroup-level breakdown of the value in /proc/meminfo. > > > > This seems really confusing because memcg controller is not responsible > > for the hugetlb memory. Could you be more specific why enabling hugetlb > > controller is not really desirable when the actual per-group tracking is > > needed? > > We have competition over memory, but not specifically over hugetlb. > > The maximum hugetlb footprint of jobs is known in advance, and we > configure hugetlb_cma= accordingly. There are no static boot time > hugetlb reservations, and there is no opportunistic use of hugetlb > from jobs or other parts of the system. So we don't need control over > the hugetlb pool, and no limit enforcement on hugetlb specifically. > > However, memory overall is overcommitted between job and system > management. If the main job is using hugetlb, we need that to show up > in memory.current and be taken into account for memory.high and > memory.low enforcement. It's the old memory fungibility argument: if > you use hugetlb, it should reduce the budget for cache/anon. > > Nhat's recent patch to charge hugetlb to memcg accomplishes that. > > However, we now have potentially a sizable portion of memory in > memory.current that doesn't show up in memory.stat. Joshua's patch > addresses that, so userspace can understand its memory footprint. > > I hope that makes sense. Looking at 8cba9576df60 ("hugetlb: memcg: account hugetlb-backed memory in memory controller") describes this limitation * Hugetlb pages utilized while this option is not selected will not be tracked by the memory controller (even if cgroup v2 is remounted later on). and it would be great to have an explanation why the lack of tracking has proven problematic. Also the above doesn't really explain why those who care cannot really enabled hugetlb controller to gain the consumption information. Also what happens if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled. Should we report potentially misleading data? -- Michal Hocko SUSE Labs
On Fri, Oct 18, 2024 at 03:42:13PM +0200, Michal Hocko wrote: > On Fri 18-10-24 08:31:22, Johannes Weiner wrote: > > On Fri, Oct 18, 2024 at 12:12:00PM +0200, Michal Hocko wrote: > > > On Thu 17-10-24 09:04:37, Joshua Hahn wrote: > > > > HugeTLB usage is a metric that can provide utility for monitors hoping > > > > to get more insight into the memory usage patterns in cgroups. It also > > > > helps identify if large folios are being distributed efficiently across > > > > workloads, so that tasks that can take most advantage of reduced TLB > > > > misses are prioritized. > > > > > > > > While cgroupv2's hugeTLB controller does report this value, some users > > > > who wish to track hugeTLB usage might not want to take on the additional > > > > overhead or the features of the controller just to use the metric. > > > > This patch introduces hugeTLB usage in the memcg stats, mirroring the > > > > value in the hugeTLB controller and offering a more fine-grained > > > > cgroup-level breakdown of the value in /proc/meminfo. > > > > > > This seems really confusing because memcg controller is not responsible > > > for the hugetlb memory. Could you be more specific why enabling hugetlb > > > controller is not really desirable when the actual per-group tracking is > > > needed? > > > > We have competition over memory, but not specifically over hugetlb. > > > > The maximum hugetlb footprint of jobs is known in advance, and we > > configure hugetlb_cma= accordingly. There are no static boot time > > hugetlb reservations, and there is no opportunistic use of hugetlb > > from jobs or other parts of the system. So we don't need control over > > the hugetlb pool, and no limit enforcement on hugetlb specifically. > > > > However, memory overall is overcommitted between job and system > > management. If the main job is using hugetlb, we need that to show up > > in memory.current and be taken into account for memory.high and > > memory.low enforcement. It's the old memory fungibility argument: if > > you use hugetlb, it should reduce the budget for cache/anon. > > > > Nhat's recent patch to charge hugetlb to memcg accomplishes that. > > > > However, we now have potentially a sizable portion of memory in > > memory.current that doesn't show up in memory.stat. Joshua's patch > > addresses that, so userspace can understand its memory footprint. > > > > I hope that makes sense. > > Looking at 8cba9576df60 ("hugetlb: memcg: account hugetlb-backed memory > in memory controller") describes this limitation > > * Hugetlb pages utilized while this option is not selected will not > be tracked by the memory controller (even if cgroup v2 is remounted > later on). > > and it would be great to have an explanation why the lack of tracking > has proven problematic. Yes, I agree it would be good to outline this in the changelog. The argument being that memory.stat breaks down the consumers that are charged to memory.current. hugetlb is (can be) charged, but is not broken out. This is a significant gap in the memcg stats picture. > Also the above doesn't really explain why those who care cannot > really enabled hugetlb controller to gain the consumption > information. Well, I have explained why we don't need it at least. Enabling almost a thousand lines of basically abandoned code, compared to the few lines in this patch, doesn't strike me as reasonable. That said, I don't think the hugetlb controller is relevant. With hugetlb being part of memory.current (for arguments that are already settled), it needs to be itemized in memory.stat. It's a gap in the memory controller in any case. > Also what happens if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled. > Should we report potentially misleading data? Good point. The stat item tracking should follow the same rules as charging, such that memory.current and memory.stat are always in sync. A stat helper that mirrors the mem_cgroup_hugetlb_try_charge() checks would make sense to me. E.g. lruvec_stat_mod_hugetlb_folio().
On Fri, Oct 18, 2024 at 03:42:13PM GMT, Michal Hocko wrote: > On Fri 18-10-24 08:31:22, Johannes Weiner wrote: > > On Fri, Oct 18, 2024 at 12:12:00PM +0200, Michal Hocko wrote: > > > On Thu 17-10-24 09:04:37, Joshua Hahn wrote: > > > > HugeTLB usage is a metric that can provide utility for monitors hoping > > > > to get more insight into the memory usage patterns in cgroups. It also > > > > helps identify if large folios are being distributed efficiently across > > > > workloads, so that tasks that can take most advantage of reduced TLB > > > > misses are prioritized. > > > > > > > > While cgroupv2's hugeTLB controller does report this value, some users > > > > who wish to track hugeTLB usage might not want to take on the additional > > > > overhead or the features of the controller just to use the metric. > > > > This patch introduces hugeTLB usage in the memcg stats, mirroring the > > > > value in the hugeTLB controller and offering a more fine-grained > > > > cgroup-level breakdown of the value in /proc/meminfo. > > > > > > This seems really confusing because memcg controller is not responsible > > > for the hugetlb memory. Could you be more specific why enabling hugetlb > > > controller is not really desirable when the actual per-group tracking is > > > needed? > > > > We have competition over memory, but not specifically over hugetlb. > > > > The maximum hugetlb footprint of jobs is known in advance, and we > > configure hugetlb_cma= accordingly. There are no static boot time > > hugetlb reservations, and there is no opportunistic use of hugetlb > > from jobs or other parts of the system. So we don't need control over > > the hugetlb pool, and no limit enforcement on hugetlb specifically. > > > > However, memory overall is overcommitted between job and system > > management. If the main job is using hugetlb, we need that to show up > > in memory.current and be taken into account for memory.high and > > memory.low enforcement. It's the old memory fungibility argument: if > > you use hugetlb, it should reduce the budget for cache/anon. > > > > Nhat's recent patch to charge hugetlb to memcg accomplishes that. > > > > However, we now have potentially a sizable portion of memory in > > memory.current that doesn't show up in memory.stat. Joshua's patch > > addresses that, so userspace can understand its memory footprint. > > > > I hope that makes sense. > > Looking at 8cba9576df60 ("hugetlb: memcg: account hugetlb-backed memory > in memory controller") describes this limitation > > * Hugetlb pages utilized while this option is not selected will not > be tracked by the memory controller (even if cgroup v2 is remounted > later on). > > and it would be great to have an explanation why the lack of tracking > has proven problematic. Also the above doesn't really explain why those > who care cannot really enabled hugetlb controller to gain the > consumption information. Let me give my take on this. The reason is the ease and convenience to see what is happening when I see unexpectedly large memory.current value. Logically I would look at memory.stat to make sense of it. Without this I have to remember that the user might have hugetlb memcg accounting option enabled and they are use hugetlb cgroup to find the answer. If they have not enabled hugetlb cgroup then I am in dark. > > Also what happens if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled. > Should we report potentially misleading data? I think with what Johannes has suggested (to use lruvec_stat_mod_folio), the metric will only get updated when hugetlb memcg accounting is enabled and zero otherwise.
On Fri, Oct 18, 2024 at 2:11 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > On Fri, Oct 18, 2024 at 03:42:13PM GMT, Michal Hocko wrote: > > On Fri 18-10-24 08:31:22, Johannes Weiner wrote: > > > On Fri, Oct 18, 2024 at 12:12:00PM +0200, Michal Hocko wrote: > > > > On Thu 17-10-24 09:04:37, Joshua Hahn wrote: > > > > > HugeTLB usage is a metric that can provide utility for monitors hoping > > > > > to get more insight into the memory usage patterns in cgroups. It also > > > > > helps identify if large folios are being distributed efficiently across > > > > > workloads, so that tasks that can take most advantage of reduced TLB > > > > > misses are prioritized. > > > > > > > > This seems really confusing because memcg controller is not responsible > > > > for the hugetlb memory. Could you be more specific why enabling hugetlb > > > > controller is not really desirable when the actual per-group tracking is > > > > needed? > > > > > > However, we now have potentially a sizable portion of memory in > > > memory.current that doesn't show up in memory.stat. Joshua's patch > > > addresses that, so userspace can understand its memory footprint. > > > > > > I hope that makes sense. > > > > and it would be great to have an explanation why the lack of tracking > > has proven problematic. Also the above doesn't really explain why those > > who care cannot really enabled hugetlb controller to gain the > > consumption information. > > Let me give my take on this. The reason is the ease and convenience to > see what is happening when I see unexpectedly large memory.current > value. Logically I would look at memory.stat to make sense of it. > Without this I have to remember that the user might have hugetlb memcg > accounting option enabled and they are use hugetlb cgroup to find the > answer. If they have not enabled hugetlb cgroup then I am in dark. > > > > > Also what happens if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled. > > Should we report potentially misleading data? > > I think with what Johannes has suggested (to use lruvec_stat_mod_folio), > the metric will only get updated when hugetlb memcg accounting is > enabled and zero otherwise. Hi Michal, Johannes, and Shakeel, Thank you all for taking the time to review my patch. I was writing my response when Shakeel responded, and I think it includes an important point. I am sending out this message in the hopes that I can gather insight on what direction would make most sense for everyone. Michal -- You brought up several points in your response, so I'll do my best to answer them below. 1. Why is the lack of tracking hugeTLB problematic? The biggest problem that I see is that if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is enabled, then there is a discrepancy between what is reported in memory.stat and memory.current, as Johannes explained in his response above. As Shakeel expanded as well, it is just convenient to have the value explicitly there, so users don't have to go through and remember where hugeTLB pages might be used and where they might not be used. Aside from consistency between the two files, we can see benefits in observability. There are many reasons userspace might be intersted in understanding the hugeTLB footprint of cgroups. To name a few, system administrators might want to verify that hugeTLB usage is distributed as expected across tasks: i.e. memory-intensive tasks are using more hugeTLB pages than tasks that don't consume a lot of memory, or is seen to fault frequently. Note that this is separate from wanting to inspect the distribution for limiting purposes (in that case, it makes sense to use the controller) 2. Why can't you enable the hugeTLB controller, if tracking is so important? By turning on the hugeTLB controller, we gain all of the observability that I mentioned above; users can just check the respective hugetlb files. However, the discrepancy between memory.stat and memory.current is still there. When I check memory.current, I expect to be able to explain the usage by looking at memory.stat and trying to understand the breakdown, not by going into the various hugetlb controller files to check how/if the memory is accounted for. But even if we are okay with this, I think it might be overkill to enable the hugeTLB controller for the convenience of being able to inspect the hugeTLB usage for cgroups. This is especially true in workloads where we can predict what usage patterns will be like, and we do not need to enforce specific limits on hugeTLB usage. 3. What if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled? This is a great point. The way the patch is currently implemented, it should still do the accounting to memory.stat, even if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled. This would give us the reverse problem where hugeTLB usage that is reported in the statistics are no longer accounted for in current... I think it makes sense to show hugeTLB statistics in memory.stat only if hugeTLB is accounted for in memory.current as well (i.e. check if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is enabled before doing the accounting, or move the accounting from hugeTLB alloc/free --> hugeTLB charge/uncharge, which should only happen if hugeTLBs are accounted for in memory.current). What do you think? Joshua
On Fri 18-10-24 14:38:48, Joshua Hahn wrote: > On Fri, Oct 18, 2024 at 2:11 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > On Fri, Oct 18, 2024 at 03:42:13PM GMT, Michal Hocko wrote: [...] > > > and it would be great to have an explanation why the lack of tracking > > > has proven problematic. Also the above doesn't really explain why those > > > who care cannot really enabled hugetlb controller to gain the > > > consumption information. > > > > Let me give my take on this. The reason is the ease and convenience to > > see what is happening when I see unexpectedly large memory.current > > value. Logically I would look at memory.stat to make sense of it. > > Without this I have to remember that the user might have hugetlb memcg > > accounting option enabled and they are use hugetlb cgroup to find the > > answer. If they have not enabled hugetlb cgroup then I am in dark. Yes, I thought that this was an acceptable limitation of the accounting. If that is not the case then it is really preferable to mention reasons in the changelog. The reasoning was that hugetlb controller which would be a natural source of that information is not really great because of an overhead which hasn't really been evaluated - hence my questioning. [...] > Aside from consistency between the two files, we can see benefits in > observability. There are many reasons userspace might be intersted in > understanding the hugeTLB footprint of cgroups. To name a few, system > administrators might want to verify that hugeTLB usage is distributed as > expected across tasks: i.e. memory-intensive tasks are using more hugeTLB > pages than tasks that don't consume a lot of memory, or is seen to fault > frequently. Note that this is separate from wanting to inspect the > distribution for limiting purposes (in that case, it makes sense to use > the controller) Please add this information into the changelog. > 2. Why can't you enable the hugeTLB controller, if tracking is so important? > > By turning on the hugeTLB controller, we gain all of the observability > that I mentioned above; users can just check the respective hugetlb files. > However, the discrepancy between memory.stat and memory.current is still > there. When I check memory.current, I expect to be able to explain the usage > by looking at memory.stat and trying to understand the breakdown, not by going > into the various hugetlb controller files to check how/if the memory is > accounted for. Well, as mentioned in the previous response this has been an acceptable limitation of the hugetlb accounting. It is fair to reconsider this based on existing experience but that should be a part of the changelog. > But even if we are okay with this, I think it might be overkill to > enable the hugeTLB controller for the convenience of being able to inspect > the hugeTLB usage for cgroups. This is especially true in workloads where > we can predict what usage patterns will be like, and we do not need to enforce > specific limits on hugeTLB usage. I am sorry but I do not understand the overkill part of the argument. Is there any runtime or configuration cost that is visible? > 3. What if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled? > > This is a great point. The way the patch is currently implemented, it > should still do the accounting to memory.stat, even if > CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled. This would give us the reverse > problem where hugeTLB usage that is reported in the statistics are no longer > accounted for in current... Exactly! And this is a problem. > I think it makes sense to show hugeTLB statistics in memory.stat only if > hugeTLB is accounted for in memory.current as well (i.e. check if > CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is enabled before doing the accounting, > or move the accounting from hugeTLB alloc/free --> hugeTLB charge/uncharge, > which should only happen if hugeTLBs are accounted for in memory.current). > > What do you think? yes, not only shown but also accounted only if the feature is enabled because we do not want to introduce any (even tiny) overhead for feature that is not enabled. TL;DR 1) you need to make the stats accounting aligned with the existing charge accounting. 2) make the stat visible only when feature is enabled 3) work more on the justification - i.e. changelog part and give us a better insight why a) hugetlb cgroup is seen is a bad choice and b) why the original limitation hasn't proven good since the feature was introduced. Makes sense? -- Michal Hocko SUSE Labs
> On Mon, Oct 21, 2024 at 3:15 AM Michal Hocko <mhocko@suse.com> wrote: > > On Fri 18-10-24 14:38:48, Joshua Hahn wrote: > > But even if we are okay with this, I think it might be overkill to > > enable the hugeTLB controller for the convenience of being able to inspect > > the hugeTLB usage for cgroups. This is especially true in workloads where > > we can predict what usage patterns will be like, and we do not need to enforce > > specific limits on hugeTLB usage. > > I am sorry but I do not understand the overkill part of the argument. > Is there any runtime or configuration cost that is visible? I think an argument could be made that any amount of incremental overhead is unnecessary. With that said however, I think a bigger reason why this is overkill is that a user who wishes to use the hugeTLB counter (which this patch achieves in ~10 lines) should not have to enable a ~1000 line feature, as Johannes suggested. A diligent user will have to spend time learning how the hugeTLB controller works and figuring out the settings that will basically make the controller do no enforcing (and basically, the same as if the feature was not enabled). A not-so-diligent user will not spend the time to make sure that the configs make sense, and may run into unexpected & unwanted hugeTLB behavior [1]. > TL;DR > 1) you need to make the stats accounting aligned with the existing > charge accounting. > 2) make the stat visible only when feature is enabled > 3) work more on the justification - i.e. changelog part and give us a > better insight why a) hugetlb cgroup is seen is a bad choice and b) why > the original limitation hasn't proven good since the feature was > introduced. > > Makes sense? > -- > Michal Hocko > SUSE Labs Hi Michal, Thank you for your input. Yes -- this makes sense to me. I apologize, as it seems that I definitely left a lot to be assumed & inferred, based on my original patch changelog. In my next version of this patch, I am planning to add the changes that have been suggested by Johannes, write code with the try_charge cleanup that Shakeel suggested in mind, and change the behavior to make sense only when hugeTLB accounting is enabled, as you suggested as well. I'll also update the changelog & commit message and add any information that will hopefully make future reviewers aware of the motivation for this patch. Please let me know if you have any remaining concerns with the implementation or motivation, and I will be happy to incorporate your ideas into the next version as well. Joshua [1] Of course, this argument isn't really generalizable to *all* features, we can't make a separate config for every small feature that a user might want to enable with the smallest granularity. However, given that the existing solution of enabling the hugeTLB controller is an imperfect solution that still leaves a discrepancy between memory.stat and memory.current when hugeTLB accounting is enabled, I think it is reasonable to isolate this feature.
On Mon 21-10-24 10:51:43, Joshua Hahn wrote: > > On Mon, Oct 21, 2024 at 3:15 AM Michal Hocko <mhocko@suse.com> wrote: > > > On Fri 18-10-24 14:38:48, Joshua Hahn wrote: > > > But even if we are okay with this, I think it might be overkill to > > > enable the hugeTLB controller for the convenience of being able to inspect > > > the hugeTLB usage for cgroups. This is especially true in workloads where > > > we can predict what usage patterns will be like, and we do not need to enforce > > > specific limits on hugeTLB usage. > > > > I am sorry but I do not understand the overkill part of the argument. > > Is there any runtime or configuration cost that is visible? > > I think an argument could be made that any amount of incremental overhead > is unnecessary. With that said however, I think a bigger reason why this is > overkill is that a user who wishes to use the hugeTLB counter (which this > patch achieves in ~10 lines) should not have to enable a ~1000 line feature, > as Johannes suggested. > > A diligent user will have to spend time learning how the hugeTLB controller > works and figuring out the settings that will basically make the controller > do no enforcing (and basically, the same as if the feature was not enabled). > A not-so-diligent user will not spend the time to make sure that the configs > make sense, and may run into unexpected & unwanted hugeTLB behavior [1]. Heh, a lazy user would just enable the controller and hope for the best. And it would actually worked out of the box because there is no limit imposed by default so the only downside is a theoretical overhead due to charging. Anyway, I get the point and I guess it is fair to say the half baked memcg accounting is not optimal because it only provides half baked insight and you aim to fix that. This is fair intentention and justification. I have to say I really disliked this extension to the memcg when it was proposed but it was claimed this was good enough for people who know what they are doing. > > TL;DR > > 1) you need to make the stats accounting aligned with the existing > > charge accounting. > > 2) make the stat visible only when feature is enabled > > 3) work more on the justification - i.e. changelog part and give us a > > better insight why a) hugetlb cgroup is seen is a bad choice and b) why > > the original limitation hasn't proven good since the feature was > > introduced. > > > > Makes sense? > > -- > > Michal Hocko > > SUSE Labs > > Hi Michal, > > Thank you for your input. Yes -- this makes sense to me. I apologize, as it > seems that I definitely left a lot to be assumed & inferred, based on my > original patch changelog. > > In my next version of this patch, I am planning to add the changes that have > been suggested by Johannes, write code with the try_charge cleanup that > Shakeel suggested in mind, and change the behavior to make sense only when > hugeTLB accounting is enabled, as you suggested as well. I'll also update > the changelog & commit message and add any information that will hopefully > make future reviewers aware of the motivation for this patch. Thanks a lot! > Please let me know if you have any remaining concerns with the implementation > or motivation, and I will be happy to incorporate your ideas into the next > version as well. I think clarification and fixing the reporting is good enough. This still won't make the hugetlb sneaking into memcg more likeable to me but nothing that would force me awake during nights ;) Thanks! -- Michal Hocko SUSE Labs
© 2016 - 2024 Red Hat, Inc.