nr_irqs is required for debugging the kernel, and needs to be
accessible for kmemdump into vmcoreinfo.
Signed-off-by: Eugen Hristev <eugen.hristev@linaro.org>
---
kernel/irq/irqdesc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index db714d3014b5..6c3c8c4687fd 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -139,7 +139,7 @@ static void desc_set_defaults(unsigned int irq, struct irq_desc *desc, int node,
desc_smp_init(desc, node, affinity);
}
-static unsigned int nr_irqs = NR_IRQS;
+unsigned int nr_irqs = NR_IRQS;
/**
* irq_get_nr_irqs() - Number of interrupts supported by the system.
--
2.43.0
On Fri, Sep 12 2025 at 18:08, Eugen Hristev wrote: > nr_irqs is required for debugging the kernel, and needs to be > accessible for kmemdump into vmcoreinfo. That's a patently bad idea. Care to grep how many instances of 'nr_irqs' variables are in the kernel? That name is way too generic to be made global. Thanks, tglx
On Tue, Sep 16 2025 at 23:10, Thomas Gleixner wrote: > On Fri, Sep 12 2025 at 18:08, Eugen Hristev wrote: >> nr_irqs is required for debugging the kernel, and needs to be >> accessible for kmemdump into vmcoreinfo. > > That's a patently bad idea. > > Care to grep how many instances of 'nr_irqs' variables are in the > kernel? > > That name is way too generic to be made global. Aside of that there is _ZERO_ justification to expose variables globaly, which have been made file local with a lot of effort in the past. I pointed you to a solution for that and just because David does not like it means that it's acceptable to fiddle in subsystems and expose their carefully localized variables. Thanks tglx
On 9/17/25 00:16, Thomas Gleixner wrote: > On Tue, Sep 16 2025 at 23:10, Thomas Gleixner wrote: >> On Fri, Sep 12 2025 at 18:08, Eugen Hristev wrote: >>> nr_irqs is required for debugging the kernel, and needs to be >>> accessible for kmemdump into vmcoreinfo. >> >> That's a patently bad idea. >> >> Care to grep how many instances of 'nr_irqs' variables are in the >> kernel? >> >> That name is way too generic to be made global. > > Aside of that there is _ZERO_ justification to expose variables globaly, > which have been made file local with a lot of effort in the past. > > I pointed you to a solution for that and just because David does not > like it means that it's acceptable to fiddle in subsystems and expose > their carefully localized variables. > I agree. I explained the solution to David. He wanted to un-static everything. I disagreed. I implemented your idea in the v2 of the patch series. Did you have a look on how it turned out ? Perhaps I can improve on that and make it more acceptable. Eugen > Thanks > > tglx
On 17.09.25 07:43, Eugen Hristev wrote: > > > On 9/17/25 00:16, Thomas Gleixner wrote: >> On Tue, Sep 16 2025 at 23:10, Thomas Gleixner wrote: >>> On Fri, Sep 12 2025 at 18:08, Eugen Hristev wrote: >>>> nr_irqs is required for debugging the kernel, and needs to be >>>> accessible for kmemdump into vmcoreinfo. >>> >>> That's a patently bad idea. >>> >>> Care to grep how many instances of 'nr_irqs' variables are in the >>> kernel? >>> >>> That name is way too generic to be made global. >> >> Aside of that there is _ZERO_ justification to expose variables globaly, >> which have been made file local with a lot of effort in the past. >> >> I pointed you to a solution for that and just because David does not >> like it means that it's acceptable to fiddle in subsystems and expose >> their carefully localized variables. It would have been great if we could have had that discussion in the previous thread. I didn't like what I saw in v2. In particular, having subsystems fiddle with kmemdump specifics. I prefer if we can find a way to not have subsystems to that. >> > > I agree. I explained the solution to David. He wanted to un-static > everything. I disagreed. Some other subsystem wants to have access to this information. I agree that exposing these variables as r/w globally is not ideal. I raised the alternative of exposing areas or other information through simple helper functions that kmemdump can just use to compose whatever it needs to compose. Do we really need that .section thingy? -- Cheers David / dhildenb
On Wed, Sep 17 2025 at 09:16, David Hildenbrand wrote: > On 17.09.25 07:43, Eugen Hristev wrote: >> On 9/17/25 00:16, Thomas Gleixner wrote: >>> I pointed you to a solution for that and just because David does not >>> like it means that it's acceptable to fiddle in subsystems and expose >>> their carefully localized variables. > > It would have been great if we could have had that discussion in the > previous thread. Sorry. I was busy with other stuff and did not pay attention to that discussion. > Some other subsystem wants to have access to this information. I agree > that exposing these variables as r/w globally is not ideal. It's a nono in this case. We had bugs (long ago) where people fiddled with this stuff (I assume accidentally for my mental sanity sake) and caused really nasty to debug issues. C is a horrible language to encapsulate stuff properly as we all know. > I raised the alternative of exposing areas or other information through > simple helper functions that kmemdump can just use to compose whatever > it needs to compose. > > Do we really need that .section thingy? The section thing is simple and straight forward as it just puts the annotated stuff into the section along with size and id and I definitely find that more palatable, than sprinkling random functions all over the place to register stuff. Sure, you can achieve the same thing with an accessor function. In case of nr_irqs there is already one: irq_get_nr_irqs(), but for places which do not expose the information already for real functional reasons adding such helpers just for this coredump muck is really worse than having a clearly descriptive and obvious annotation which results in the section build. The charm of sections is that they don't neither extra code nor stubs or ifdeffery when a certain subsystem is disabled and therefore no information available. I'm not insisting on sections, but having a table of 2k instead of hundred functions, stubs and whatever is definitely a win to me. Thanks, tglx
On 17.09.25 16:10, Thomas Gleixner wrote: > On Wed, Sep 17 2025 at 09:16, David Hildenbrand wrote: >> On 17.09.25 07:43, Eugen Hristev wrote: >>> On 9/17/25 00:16, Thomas Gleixner wrote: >>>> I pointed you to a solution for that and just because David does not >>>> like it means that it's acceptable to fiddle in subsystems and expose >>>> their carefully localized variables. >> >> It would have been great if we could have had that discussion in the >> previous thread. > > Sorry. I was busy with other stuff and did not pay attention to that > discussion. I understand, I'm busy with too much stuff such that sometimes it might be good to interrupt me earlier: "David, nooo, you're all wrong" > >> Some other subsystem wants to have access to this information. I agree >> that exposing these variables as r/w globally is not ideal. > > It's a nono in this case. We had bugs (long ago) where people fiddled > with this stuff (I assume accidentally for my mental sanity sake) and > caused really nasty to debug issues. C is a horrible language to > encapsulate stuff properly as we all know. Yeah, there is this ACCESS_PRIVATE stuff but it only works with structs and relies on sparse IIRC. > >> I raised the alternative of exposing areas or other information through >> simple helper functions that kmemdump can just use to compose whatever >> it needs to compose. >> >> Do we really need that .section thingy? > > The section thing is simple and straight forward as it just puts the > annotated stuff into the section along with size and id and I definitely > find that more palatable, than sprinkling random functions all over the > place to register stuff. > > Sure, you can achieve the same thing with an accessor function. In case > of nr_irqs there is already one: irq_get_nr_irqs(), but for places which Right, the challenge really is that we want the memory range covered by that address, otherwise it would be easy. > do not expose the information already for real functional reasons adding > such helpers just for this coredump muck is really worse than having a > clearly descriptive and obvious annotation which results in the section > build. Yeah, I'm mostly unhappy about the "#include <linux/kmemdump.h>" stuff. Guess it would all feel less "kmemdump" specific if we would just have a generic way to tag/describe certain physical memory areas and kmemdump would simply make use of that. For example, wondering if it could come in handy to have an ordinary vmcoreinfo header contain this information as well? Case in point, right now we do in crash_save_vmcoreinfo_init() VMCOREINFO_SYMBOL_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); And in kmemdump code we do kmemdump_register_id(KMEMDUMP_ID_COREIMAGE_mem_section, (void *)&mem_section, sizeof(mem_section)); I guess both cases actually describe roughly the same information: An area with a given name. Note 1: Wondering if sizeof(mem_section) is actually correct in the kmemdump case Note 2: Wondering if kmemdump would also want the struct size, not just the area length. (memblock alloc wrappers are a separate discussion) > > The charm of sections is that they don't neither extra code nor stubs or > ifdeffery when a certain subsystem is disabled and therefore no > information available. Extra code is a very good point. > > I'm not insisting on sections, but having a table of 2k instead of > hundred functions, stubs and whatever is definitely a win to me. So far it looks like it's not that many, but of course, the question would be how it evolves. -- Cheers David / dhildenb
On 9/17/25 17:46, David Hildenbrand wrote: > On 17.09.25 16:10, Thomas Gleixner wrote: >> On Wed, Sep 17 2025 at 09:16, David Hildenbrand wrote: >>> On 17.09.25 07:43, Eugen Hristev wrote: >>>> On 9/17/25 00:16, Thomas Gleixner wrote: >>>>> I pointed you to a solution for that and just because David does not >>>>> like it means that it's acceptable to fiddle in subsystems and expose >>>>> their carefully localized variables. >>> >>> It would have been great if we could have had that discussion in the >>> previous thread. >> >> Sorry. I was busy with other stuff and did not pay attention to that >> discussion. > > I understand, I'm busy with too much stuff such that sometimes it might > be good to interrupt me earlier: "David, nooo, you're all wrong" > >> >>> Some other subsystem wants to have access to this information. I agree >>> that exposing these variables as r/w globally is not ideal. >> >> It's a nono in this case. We had bugs (long ago) where people fiddled >> with this stuff (I assume accidentally for my mental sanity sake) and >> caused really nasty to debug issues. C is a horrible language to >> encapsulate stuff properly as we all know. > > Yeah, there is this ACCESS_PRIVATE stuff but it only works with structs > and relies on sparse IIRC. > >> >>> I raised the alternative of exposing areas or other information through >>> simple helper functions that kmemdump can just use to compose whatever >>> it needs to compose. >>> >>> Do we really need that .section thingy? >> >> The section thing is simple and straight forward as it just puts the >> annotated stuff into the section along with size and id and I definitely >> find that more palatable, than sprinkling random functions all over the >> place to register stuff. >> >> Sure, you can achieve the same thing with an accessor function. In case >> of nr_irqs there is already one: irq_get_nr_irqs(), but for places which > > Right, the challenge really is that we want the memory range covered by > that address, otherwise it would be easy. > >> do not expose the information already for real functional reasons adding >> such helpers just for this coredump muck is really worse than having a >> clearly descriptive and obvious annotation which results in the section >> build. > > Yeah, I'm mostly unhappy about the "#include <linux/kmemdump.h>" stuff. > > Guess it would all feel less "kmemdump" specific if we would just have a > generic way to tag/describe certain physical memory areas and kmemdump > would simply make use of that. The idea was to make "kmemdump" exactly this generic way to tag/describe the memory. If we would call it differently , simply dump , would it be better ? e.g. include linux/dump.h and then DUMP(var, size) ? could we call it maybe MARK ? or TAG ? TAG_MEM(area, size) this would go to a separate section called .tagged_memory. Then anyone can walk through the section and collect the data. I am just coming up with ideas here. Could it be even part of mm.h instead of having a new header perhaps ? Then we won't need to include one more. > > For example, wondering if it could come in handy to have an ordinary > vmcoreinfo header contain this information as well? > > Case in point, right now we do in crash_save_vmcoreinfo_init() > > VMCOREINFO_SYMBOL_ARRAY(mem_section); > VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); > VMCOREINFO_STRUCT_SIZE(mem_section); > > And in kmemdump code we do > > kmemdump_register_id(KMEMDUMP_ID_COREIMAGE_mem_section, > (void *)&mem_section, sizeof(mem_section)); > > I guess both cases actually describe roughly the same information: An > area with a given name. > > Note 1: Wondering if sizeof(mem_section) is actually correct in the > kmemdump case > > Note 2: Wondering if kmemdump would also want the struct size, not just > the area length. For kmemdump, right now, debugging without vmlinux symbols is rather impossible, so we have all that information from vmlinux. > > (memblock alloc wrappers are a separate discussion) > >> >> The charm of sections is that they don't neither extra code nor stubs or >> ifdeffery when a certain subsystem is disabled and therefore no >> information available. > > Extra code is a very good point. > >> >> I'm not insisting on sections, but having a table of 2k instead of >> hundred functions, stubs and whatever is definitely a win to me. > > So far it looks like it's not that many, but of course, the question > would be how it evolves. >
On 17.09.25 17:02, Eugen Hristev wrote: > > > On 9/17/25 17:46, David Hildenbrand wrote: >> On 17.09.25 16:10, Thomas Gleixner wrote: >>> On Wed, Sep 17 2025 at 09:16, David Hildenbrand wrote: >>>> On 17.09.25 07:43, Eugen Hristev wrote: >>>>> On 9/17/25 00:16, Thomas Gleixner wrote: >>>>>> I pointed you to a solution for that and just because David does not >>>>>> like it means that it's acceptable to fiddle in subsystems and expose >>>>>> their carefully localized variables. >>>> >>>> It would have been great if we could have had that discussion in the >>>> previous thread. >>> >>> Sorry. I was busy with other stuff and did not pay attention to that >>> discussion. >> >> I understand, I'm busy with too much stuff such that sometimes it might >> be good to interrupt me earlier: "David, nooo, you're all wrong" >> >>> >>>> Some other subsystem wants to have access to this information. I agree >>>> that exposing these variables as r/w globally is not ideal. >>> >>> It's a nono in this case. We had bugs (long ago) where people fiddled >>> with this stuff (I assume accidentally for my mental sanity sake) and >>> caused really nasty to debug issues. C is a horrible language to >>> encapsulate stuff properly as we all know. >> >> Yeah, there is this ACCESS_PRIVATE stuff but it only works with structs >> and relies on sparse IIRC. >> >>> >>>> I raised the alternative of exposing areas or other information through >>>> simple helper functions that kmemdump can just use to compose whatever >>>> it needs to compose. >>>> >>>> Do we really need that .section thingy? >>> >>> The section thing is simple and straight forward as it just puts the >>> annotated stuff into the section along with size and id and I definitely >>> find that more palatable, than sprinkling random functions all over the >>> place to register stuff. >>> >>> Sure, you can achieve the same thing with an accessor function. In case >>> of nr_irqs there is already one: irq_get_nr_irqs(), but for places which >> >> Right, the challenge really is that we want the memory range covered by >> that address, otherwise it would be easy. >> >>> do not expose the information already for real functional reasons adding >>> such helpers just for this coredump muck is really worse than having a >>> clearly descriptive and obvious annotation which results in the section >>> build. >> >> Yeah, I'm mostly unhappy about the "#include <linux/kmemdump.h>" stuff. >> >> Guess it would all feel less "kmemdump" specific if we would just have a >> generic way to tag/describe certain physical memory areas and kmemdump >> would simply make use of that. > > The idea was to make "kmemdump" exactly this generic way to tag/describe > the memory. That's probably where I got lost, after reading the cover letter assuming that this is primarily to program kmemdump backends, which I understood to just special hw/firmware areas, whereby kinfo acts as a filter. > If we would call it differently , simply dump , would it be better ? > e.g. include linux/dump.h > and then DUMP(var, size) ? > > could we call it maybe MARK ? or TAG ? > TAG_MEM(area, size) I'm wondering whether there could be any other user for this kind of information. Like R/O access in a debug kernel to these areas, exporting the ranges/names + easy read access to content through debugfs or something. Guess that partially falls under the "dump" category. Including that information in a vmcore info would probably allow to quickly extract some information even without the debug symbols around (I run into that every now and then). > > this would go to a separate section called .tagged_memory. > Maybe just "tagged_memory.h" or sth. like that? I'm bad at naming, so I would let others make better suggestions. > Then anyone can walk through the section and collect the data. > > I am just coming up with ideas here. > Could it be even part of mm.h instead of having a new header perhaps ? > Then we won't need to include one more. I don't really have something against a new include, just not one that sounded like a very specific subsystem, not something more generic. -- Cheers David / dhildenb
On Wed, Sep 17 2025 at 17:18, David Hildenbrand wrote: > On 17.09.25 17:02, Eugen Hristev wrote: >> On 9/17/25 17:46, David Hildenbrand wrote: >>> On 17.09.25 16:10, Thomas Gleixner wrote: >>>> Sorry. I was busy with other stuff and did not pay attention to that >>>> discussion. >>> >>> I understand, I'm busy with too much stuff such that sometimes it might >>> be good to interrupt me earlier: "David, nooo, you're all wrong" I know that feeling. >> The idea was to make "kmemdump" exactly this generic way to tag/describe >> the memory. > > That's probably where I got lost, after reading the cover letter > assuming that this is primarily to program kmemdump backends, which I > understood to just special hw/firmware areas, whereby kinfo acts as a > filter. > >> If we would call it differently , simply dump , would it be better ? >> e.g. include linux/dump.h >> and then DUMP(var, size) ? >> >> could we call it maybe MARK ? or TAG ? >> TAG_MEM(area, size) > > I'm wondering whether there could be any other user for this kind of > information. > > Like R/O access in a debug kernel to these areas, exporting the > ranges/names + easy read access to content through debugfs or something. > > Guess that partially falls under the "dump" category. I'd rather call it inspection. > Including that information in a vmcore info would probably allow to > quickly extract some information even without the debug symbols around > (I run into that every now and then). Correct. >> this would go to a separate section called .tagged_memory. That'd be confusing vs. actual memory tags, no? > Maybe just "tagged_memory.h" or sth. like that? I'm bad at naming, so I > would let others make better suggestions. inspect.h :) I'm going to use 'inspect' as prefix for the thoughts below, but that's obviously subject to s/inspect/$BETTERNAME/g :) >> Then anyone can walk through the section and collect the data. >> >> I am just coming up with ideas here. >> Could it be even part of mm.h instead of having a new header perhaps ? >> Then we won't need to include one more. > > I don't really have something against a new include, just not one that > sounded like a very specific subsystem, not something more generic. Right. We really don't want to have five different mechanisms for five infrastructures which all allow to inspect kernel memory (life or dead) in one way or the other. The difference between them is mostly: - Which subset of the information they expose for inspection - The actual exposure mechanism: crash dump, firmware storage, run-time snapshots in a filesystem, .... Having one shared core infrastructure to expose data to those mechanisms makes everyones life simpler. That obviously needs to collect the superset of data, but that's just a bit more memory consumed. That's arguably significantly smaller than supporting a zoo of mechanisms to register data for different infrastructures. I'm quite sure that at least a substantial amount of the required information can be collected at compile time in special section tables. The rest can be collected in runtime tables, which have the same format as the compile time section tables to avoid separate parsers. Let me just float some ideas here, how that might look like. It might be completely inpractical, but then it might be at least fodder for thoughts. As this is specific for the compiled kernel version you can define an extensible struct format for the table. struct inspect_entry { unsigned long properties; unsigned int type; unsigned int id; const char name[$MAX_NAME_LEN]; unsigned long address; unsigned long length; .... }; @type refers either to a table with type information, which describes the struct in some way or just generate a detached compile time description. @id a unique id created at compile time or via registration at runtime. Might not be required @name: Name of the memory region. That might go into a separate table which is referenced by @id, but that's up for debate. @address: @length: obvious :) ... Whatever a particular consumer might require @properties: A "bitfield", which allows to mark this entry as (in)valid for a particular consumer. That obviously requires to modify these properties when the requirements of a consumer change, new consumers arrive or new producers are added, but I think it's easier to do that at the producer side than maintaining filters on all consumer ends forever. Though I might be wrong as usual. IOW this needs some thoughts. :) The interesting engineering challenge with such a scheme is to come up with a annotation mechanism which is extensible. Runtime is trivial as it just needs to fill in the new field in the datastructure and all other runtime users have that zero initialized automatically, if you get the mechanism correct in the first place. Think in templates :) Compile time is a bit more effort, but that should be solvable with key/value pairs. Don't even waste a thought about creating the final tables and sections in macro magic. All the annotation macros have to do is to emit the pairs in a structured way into discardable sections. Those section are then converted in post processing into the actual section table formats and added to the kernel image. Not a spectacular new concept. The kernel build does this already today. Just keep the compile time annotation macro magic simple and stupid. It can waste 10k per entry at compile time and then let postprocessing worry about downsizing and consolidation. Nothing to see here :) Hope that helps. Thanks, tglx
> >>> this would go to a separate section called .tagged_memory. > > That'd be confusing vs. actual memory tags, no? Yeah, I came to the conclusion just after an upstream call we just had about that topic (bi-weekly MM alignment session). I'm open for any suggestions that make it more generic. My first instinct was "named memory regions". > >> Maybe just "tagged_memory.h" or sth. like that? I'm bad at naming, so I >> would let others make better suggestions. > > inspect.h :) > > I'm going to use 'inspect' as prefix for the thoughts below, but that's > obviously subject to s/inspect/$BETTERNAME/g :) > >>> Then anyone can walk through the section and collect the data. >>> >>> I am just coming up with ideas here. >>> Could it be even part of mm.h instead of having a new header perhaps ? >>> Then we won't need to include one more. >> >> I don't really have something against a new include, just not one that >> sounded like a very specific subsystem, not something more generic. > > Right. We really don't want to have five different mechanisms for five > infrastructures which all allow to inspect kernel memory (life or > dead) in one way or the other. The difference between them is mostly: > > - Which subset of the information they expose for inspection > > - The actual exposure mechanism: crash dump, firmware storage, > run-time snapshots in a filesystem, .... > > Having one shared core infrastructure to expose data to those mechanisms > makes everyones life simpler. > > That obviously needs to collect the superset of data, but that's just a > bit more memory consumed. That's arguably significantly smaller than > supporting a zoo of mechanisms to register data for different > infrastructures. > > I'm quite sure that at least a substantial amount of the required > information can be collected at compile time in special section > tables. The rest can be collected in runtime tables, which have the same > format as the compile time section tables to avoid separate parsers. > > Let me just float some ideas here, how that might look like. It might be > completely inpractical, but then it might be at least fodder for > thoughts. Thanks a bunch for writing all that down! > > As this is specific for the compiled kernel version you can define an > extensible struct format for the table. > > struct inspect_entry { > unsigned long properties; > unsigned int type; > unsigned int id; > const char name[$MAX_NAME_LEN]; > unsigned long address; > unsigned long length; > .... > }; > > @type > refers either to a table with type information, which describes > the struct in some way or just generate a detached compile time > description. > > @id > a unique id created at compile time or via registration at > runtime. Might not be required We discussed that maybe one would want some kind of a "class" description. For example we might have to register one pgdat area per node. Giving each one a unique name might be impractical / unreasonable. Still, someone would want to select / filter out all entries of the same "class". Just a thought. > > @name: > Name of the memory region. That might go into a separate table > which is referenced by @id, but that's up for debate. Jup. > > @address: > @length: > obvious :) > > ... > Whatever a particular consumer might require > > @properties: > > A "bitfield", which allows to mark this entry as (in)valid for a > particular consumer. > > That obviously requires to modify these properties when the > requirements of a consumer change, new consumers arrive or new > producers are added, but I think it's easier to do that at the > producer side than maintaining filters on all consumer ends > forever. Question would be if that is not up to a consumer to decide ("allowlist" / filter) by class or id, stored elsewhere. > > Though I might be wrong as usual. IOW this needs some thoughts. :) > > The interesting engineering challenge with such a scheme is to come up > with a annotation mechanism which is extensible. > > Runtime is trivial as it just needs to fill in the new field in the > datastructure and all other runtime users have that zero > initialized automatically, if you get the mechanism correct in the > first place. Think in templates :) > > Compile time is a bit more effort, but that should be solvable with > key/value pairs. > > Don't even waste a thought about creating the final tables and > sections in macro magic. All the annotation macros have to do is to > emit the pairs in a structured way into discardable sections. > > Those section are then converted in post processing into the actual > section table formats and added to the kernel image. Not a > spectacular new concept. The kernel build does this already today. > > Just keep the compile time annotation macro magic simple and > stupid. It can waste 10k per entry at compile time and then let > postprocessing worry about downsizing and consolidation. Nothing to > see here :) Sounds interesting! -- Cheers David / dhildenb
On Wed, Sep 17 2025 at 21:03, David Hildenbrand wrote: >> As this is specific for the compiled kernel version you can define an >> extensible struct format for the table. >> >> struct inspect_entry { >> unsigned long properties; >> unsigned int type; >> unsigned int id; >> const char name[$MAX_NAME_LEN]; >> unsigned long address; >> unsigned long length; >> .... >> }; >> >> @type >> refers either to a table with type information, which describes >> the struct in some way or just generate a detached compile time >> description. >> >> @id >> a unique id created at compile time or via registration at >> runtime. Might not be required > > We discussed that maybe one would want some kind of a "class" > description. For example we might have to register one pgdat area per > node. Giving each one a unique name might be impractical / unreasonable. > > Still, someone would want to select / filter out all entries of the same > "class". > > Just a thought. Right. As I said this was mostly a insta brain dump to start a discussion. Seems it worked :) >> @properties: >> >> A "bitfield", which allows to mark this entry as (in)valid for a >> particular consumer. >> >> That obviously requires to modify these properties when the >> requirements of a consumer change, new consumers arrive or new >> producers are added, but I think it's easier to do that at the >> producer side than maintaining filters on all consumer ends >> forever. > > Question would be if that is not up to a consumer to decide ("allowlist" > / filter) by class or id, stored elsewhere. Yes, I looked at it the wrong way round. We should leave the filtering to the consumers. If you use allow lists, then a newly introduced class won't be automatically exposed everywhere. Thanks, tglx
On 9/18/25 11:23, Thomas Gleixner wrote: > On Wed, Sep 17 2025 at 21:03, David Hildenbrand wrote: >>> As this is specific for the compiled kernel version you can define an >>> extensible struct format for the table. >>> >>> struct inspect_entry { >>> unsigned long properties; >>> unsigned int type; >>> unsigned int id; >>> const char name[$MAX_NAME_LEN]; >>> unsigned long address; >>> unsigned long length; >>> .... >>> }; >>> >>> @type >>> refers either to a table with type information, which describes >>> the struct in some way or just generate a detached compile time >>> description. >>> >>> @id >>> a unique id created at compile time or via registration at >>> runtime. Might not be required >> >> We discussed that maybe one would want some kind of a "class" >> description. For example we might have to register one pgdat area per >> node. Giving each one a unique name might be impractical / unreasonable. >> >> Still, someone would want to select / filter out all entries of the same >> "class". >> >> Just a thought. > > Right. As I said this was mostly a insta brain dump to start a > discussion. Seems it worked :) > >>> @properties: >>> >>> A "bitfield", which allows to mark this entry as (in)valid for a >>> particular consumer. >>> >>> That obviously requires to modify these properties when the >>> requirements of a consumer change, new consumers arrive or new >>> producers are added, but I think it's easier to do that at the >>> producer side than maintaining filters on all consumer ends >>> forever. >> >> Question would be if that is not up to a consumer to decide ("allowlist" >> / filter) by class or id, stored elsewhere. > > Yes, I looked at it the wrong way round. We should leave the filtering > to the consumers. If you use allow lists, then a newly introduced class > won't be automatically exposed everywhere. > > Thanks, > > tglx So, one direction to follow from this discussion is to have the inspection entry and inspection table for all these entries. Now, one burning question open for debate, is, should this reside into mm ? mm/inspect.h would have to define the inspection entry struct, and some macros to help everyone add an inspection entry. E.g. INSPECTION_ENTRY(my ptr, my size); and this would be used all over the kernel wherever folks want to register something. Now the second part is, where to keep all the inspection drivers ? Would it make sense to have mm/inspection/inspection_helpers.h which would keep the table start/end, some macros to traverse the tables, and this would be included by the inspection drivers. inspection drivers would then probe via any mechanism, and tap into the inspection table. I am thinking that my model with a single backend can be enhanced by allowing any inspection driver to access it. And further on, each inspection driver would register a notifier to be called when an entry is being created or not. This would mean N possible drivers connected to the table at the same time. ( if that would make sense...) Would it make sense for pstore to have an inspection driver that would be connected here to get different kinds of stuff ? Would it make sense to have some debugfs driver that would just expose to user space different regions ? Perhaps something similar with /proc/kcore but not the whole kernel memory rather only the exposed inspection entries. Now, for the dynamic memory, e.g. memblock_alloc and friends , would it be interesting to have a flag e.g. MEMBLOCK_INSPECT, that would be used when calling it, and in the background, this would request an inspection_entry being created ? Or it makes more sense to call some function like inspect_register as a different call directly at the allocation point ? Feel free to throw your opinion at each of the above. Thanks for helping out !
On 18.09.25 15:53, Eugen Hristev wrote: > > > On 9/18/25 11:23, Thomas Gleixner wrote: >> On Wed, Sep 17 2025 at 21:03, David Hildenbrand wrote: >>>> As this is specific for the compiled kernel version you can define an >>>> extensible struct format for the table. >>>> >>>> struct inspect_entry { >>>> unsigned long properties; >>>> unsigned int type; >>>> unsigned int id; >>>> const char name[$MAX_NAME_LEN]; >>>> unsigned long address; >>>> unsigned long length; >>>> .... >>>> }; >>>> >>>> @type >>>> refers either to a table with type information, which describes >>>> the struct in some way or just generate a detached compile time >>>> description. >>>> >>>> @id >>>> a unique id created at compile time or via registration at >>>> runtime. Might not be required >>> >>> We discussed that maybe one would want some kind of a "class" >>> description. For example we might have to register one pgdat area per >>> node. Giving each one a unique name might be impractical / unreasonable. >>> >>> Still, someone would want to select / filter out all entries of the same >>> "class". >>> >>> Just a thought. >> >> Right. As I said this was mostly a insta brain dump to start a >> discussion. Seems it worked :) >> >>>> @properties: >>>> >>>> A "bitfield", which allows to mark this entry as (in)valid for a >>>> particular consumer. >>>> >>>> That obviously requires to modify these properties when the >>>> requirements of a consumer change, new consumers arrive or new >>>> producers are added, but I think it's easier to do that at the >>>> producer side than maintaining filters on all consumer ends >>>> forever. >>> >>> Question would be if that is not up to a consumer to decide ("allowlist" >>> / filter) by class or id, stored elsewhere. >> >> Yes, I looked at it the wrong way round. We should leave the filtering >> to the consumers. If you use allow lists, then a newly introduced class >> won't be automatically exposed everywhere. >> >> Thanks, >> >> tglx > > > So, one direction to follow from this discussion is to have the > inspection entry and inspection table for all these entries. > Now, one burning question open for debate, is, should this reside into mm ? > mm/inspect.h would have to define the inspection entry struct, and some > macros to help everyone add an inspection entry. > E.g. INSPECTION_ENTRY(my ptr, my size); > and this would be used all over the kernel wherever folks want to > register something. If we're moving this to kernel/ or similar I'd suggest to not call this only "inspect" but something that somehow contains the term "mem". "mem-inspect.h" ? > Now the second part is, where to keep all the inspection drivers ? > Would it make sense to have mm/inspection/inspection_helpers.h which > would keep the table start/end, some macros to traverse the tables, and > this would be included by the inspection drivers. > inspection drivers would then probe via any mechanism, and tap into the > inspection table. Good question. I think some examples of alternatives might help to driver that discussion. > I am thinking that my model with a single backend can be enhanced by > allowing any inspection driver to access it. And further on, each > inspection driver would register a notifier to be called when an entry > is being created or not. This would mean N possible drivers connected to > the table at the same time. ( if that would make sense...) Yeah, I think some notifier mechanism is what we want. > Would it make sense for pstore to have an inspection driver that would > be connected here to get different kinds of stuff ? Something for the pstore folks to answer :) > Would it make sense to have some debugfs driver that would just expose > to user space different regions ? Perhaps something similar with > /proc/kcore but not the whole kernel memory rather only the exposed > inspection entries. Definetly, this is what I previously mentioned. Maybe we would only indicate region metadata and actual access to regions would simply happen through /proc/kcore if someone wants to dump data from user space. > Now, for the dynamic memory, e.g. memblock_alloc and friends , > would it be interesting to have a flag e.g. MEMBLOCK_INSPECT, that would > be used when calling it, and in the background, this would request an > inspection_entry being created ? Or it makes more sense to call some > function like inspect_register as a different call directly at the > allocation point ? We'd probably want some interface to define the metadata (name/class/whatever), a simple flag likely will not do, right? -- Cheers David / dhildenb
On 9/18/25 6:53 AM, Eugen Hristev wrote: > > > So, one direction to follow from this discussion is to have the > inspection entry and inspection table for all these entries. > Now, one burning question open for debate, is, should this reside into mm ? > mm/inspect.h would have to define the inspection entry struct, and some > macros to help everyone add an inspection entry. > E.g. INSPECTION_ENTRY(my ptr, my size); > and this would be used all over the kernel wherever folks want to > register something. > Now the second part is, where to keep all the inspection drivers ? > Would it make sense to have mm/inspection/inspection_helpers.h which > would keep the table start/end, some macros to traverse the tables, and > this would be included by the inspection drivers. > inspection drivers would then probe via any mechanism, and tap into the > inspection table. Surely someone wants to inspect more than mm/ variables. I prefer kernel/inspect/ etc. > I am thinking that my model with a single backend can be enhanced by > allowing any inspection driver to access it. And further on, each > inspection driver would register a notifier to be called when an entry > is being created or not. This would mean N possible drivers connected to > the table at the same time. ( if that would make sense...) > Would it make sense for pstore to have an inspection driver that would > be connected here to get different kinds of stuff ? > Would it make sense to have some debugfs driver that would just expose > to user space different regions ? Perhaps something similar with > /proc/kcore but not the whole kernel memory rather only the exposed > inspection entries. > Now, for the dynamic memory, e.g. memblock_alloc and friends , > would it be interesting to have a flag e.g. MEMBLOCK_INSPECT, that would > be used when calling it, and in the background, this would request an > inspection_entry being created ? Or it makes more sense to call some > function like inspect_register as a different call directly at the > allocation point ? > > Feel free to throw your opinion at each of the above. > Thanks for helping out ! In general I like the way that this is going. Thanks to all of you for this discussion. -- ~Randy
On 9/17/25 18:18, David Hildenbrand wrote: > On 17.09.25 17:02, Eugen Hristev wrote: >> >> >> On 9/17/25 17:46, David Hildenbrand wrote: >>> On 17.09.25 16:10, Thomas Gleixner wrote: >>>> On Wed, Sep 17 2025 at 09:16, David Hildenbrand wrote: >>>>> On 17.09.25 07:43, Eugen Hristev wrote: >>>>>> On 9/17/25 00:16, Thomas Gleixner wrote: >>>>>>> I pointed you to a solution for that and just because David does not >>>>>>> like it means that it's acceptable to fiddle in subsystems and expose >>>>>>> their carefully localized variables. >>>>> >>>>> It would have been great if we could have had that discussion in the >>>>> previous thread. >>>> >>>> Sorry. I was busy with other stuff and did not pay attention to that >>>> discussion. >>> >>> I understand, I'm busy with too much stuff such that sometimes it might >>> be good to interrupt me earlier: "David, nooo, you're all wrong" >>> >>>> >>>>> Some other subsystem wants to have access to this information. I agree >>>>> that exposing these variables as r/w globally is not ideal. >>>> >>>> It's a nono in this case. We had bugs (long ago) where people fiddled >>>> with this stuff (I assume accidentally for my mental sanity sake) and >>>> caused really nasty to debug issues. C is a horrible language to >>>> encapsulate stuff properly as we all know. >>> >>> Yeah, there is this ACCESS_PRIVATE stuff but it only works with structs >>> and relies on sparse IIRC. >>> >>>> >>>>> I raised the alternative of exposing areas or other information through >>>>> simple helper functions that kmemdump can just use to compose whatever >>>>> it needs to compose. >>>>> >>>>> Do we really need that .section thingy? >>>> >>>> The section thing is simple and straight forward as it just puts the >>>> annotated stuff into the section along with size and id and I definitely >>>> find that more palatable, than sprinkling random functions all over the >>>> place to register stuff. >>>> >>>> Sure, you can achieve the same thing with an accessor function. In case >>>> of nr_irqs there is already one: irq_get_nr_irqs(), but for places which >>> >>> Right, the challenge really is that we want the memory range covered by >>> that address, otherwise it would be easy. >>> >>>> do not expose the information already for real functional reasons adding >>>> such helpers just for this coredump muck is really worse than having a >>>> clearly descriptive and obvious annotation which results in the section >>>> build. >>> >>> Yeah, I'm mostly unhappy about the "#include <linux/kmemdump.h>" stuff. >>> >>> Guess it would all feel less "kmemdump" specific if we would just have a >>> generic way to tag/describe certain physical memory areas and kmemdump >>> would simply make use of that. >> >> The idea was to make "kmemdump" exactly this generic way to tag/describe >> the memory. > > That's probably where I got lost, after reading the cover letter > assuming that this is primarily to program kmemdump backends, which I > understood to just special hw/firmware areas, whereby kinfo acts as a > filter. If there is a mechanism to tag all this memory, or regions, into a specific section, what we would do with it next ? It would have a purpose to be parsed and reused by different drivers, that would be able to actually use it. So there has a to be some kind of middleman, that holds onto this list of regions, manages it (unique id, add/remove), and allows certain drivers to use it. Now it would be interesting to have different kind of drivers connect to it (or backends how I called them). One of these programs an internal table for the firmware to use. Another , writes information into a dedicated reserved-memory for the bootloader to use on the next soft reboot (memory preserved). I called this middleman kmemdump. But it can be named differently, and it can reside in different places in the kernel. But what I would like to avoid is to just tag all this memory and have any kind of driver connect to the table. That works, but it's quite loose on having control over the table. E.g. no kmemdump, tag all the memory to sections, and have specific drivers (that would reside where?) walk it. > >> If we would call it differently , simply dump , would it be better ? >> e.g. include linux/dump.h >> and then DUMP(var, size) ? >> >> could we call it maybe MARK ? or TAG ? >> TAG_MEM(area, size) > > I'm wondering whether there could be any other user for this kind of > information. > > Like R/O access in a debug kernel to these areas, exporting the > ranges/names + easy read access to content through debugfs or something. One idea I had to to have a jtag script read the table , parse it, and know where some information resides. Another idea is to use Uboot in case of persistent memory across reboot, and Uboot can read all the sections and assemble a ready-to-download coredump. (sure this doesn't work in all cases) What can be done in case of hypervisor is to implement there a routine that would read it, in case the OS is non-responsive, or even in the secure monitor. Another suggestion I had from someone was to use a pure software default backend in which to just keep the regions stored, and it could be accessed through userspace or read by a crash analyzer. > > Guess that partially falls under the "dump" category. > > Including that information in a vmcore info would probably allow to > quickly extract some information even without the debug symbols around > (I run into that every now and then). > >> >> this would go to a separate section called .tagged_memory. >> > > Maybe just "tagged_memory.h" or sth. like that? I'm bad at naming, so I > would let others make better suggestions. > >> Then anyone can walk through the section and collect the data. >> >> I am just coming up with ideas here. >> Could it be even part of mm.h instead of having a new header perhaps ? >> Then we won't need to include one more. > > I don't really have something against a new include, just not one that > sounded like a very specific subsystem, not something more generic. >
On 17.09.25 17:32, Eugen Hristev wrote: > > > On 9/17/25 18:18, David Hildenbrand wrote: >> On 17.09.25 17:02, Eugen Hristev wrote: >>> >>> >>> On 9/17/25 17:46, David Hildenbrand wrote: >>>> On 17.09.25 16:10, Thomas Gleixner wrote: >>>>> On Wed, Sep 17 2025 at 09:16, David Hildenbrand wrote: >>>>>> On 17.09.25 07:43, Eugen Hristev wrote: >>>>>>> On 9/17/25 00:16, Thomas Gleixner wrote: >>>>>>>> I pointed you to a solution for that and just because David does not >>>>>>>> like it means that it's acceptable to fiddle in subsystems and expose >>>>>>>> their carefully localized variables. >>>>>> >>>>>> It would have been great if we could have had that discussion in the >>>>>> previous thread. >>>>> >>>>> Sorry. I was busy with other stuff and did not pay attention to that >>>>> discussion. >>>> >>>> I understand, I'm busy with too much stuff such that sometimes it might >>>> be good to interrupt me earlier: "David, nooo, you're all wrong" >>>> >>>>> >>>>>> Some other subsystem wants to have access to this information. I agree >>>>>> that exposing these variables as r/w globally is not ideal. >>>>> >>>>> It's a nono in this case. We had bugs (long ago) where people fiddled >>>>> with this stuff (I assume accidentally for my mental sanity sake) and >>>>> caused really nasty to debug issues. C is a horrible language to >>>>> encapsulate stuff properly as we all know. >>>> >>>> Yeah, there is this ACCESS_PRIVATE stuff but it only works with structs >>>> and relies on sparse IIRC. >>>> >>>>> >>>>>> I raised the alternative of exposing areas or other information through >>>>>> simple helper functions that kmemdump can just use to compose whatever >>>>>> it needs to compose. >>>>>> >>>>>> Do we really need that .section thingy? >>>>> >>>>> The section thing is simple and straight forward as it just puts the >>>>> annotated stuff into the section along with size and id and I definitely >>>>> find that more palatable, than sprinkling random functions all over the >>>>> place to register stuff. >>>>> >>>>> Sure, you can achieve the same thing with an accessor function. In case >>>>> of nr_irqs there is already one: irq_get_nr_irqs(), but for places which >>>> >>>> Right, the challenge really is that we want the memory range covered by >>>> that address, otherwise it would be easy. >>>> >>>>> do not expose the information already for real functional reasons adding >>>>> such helpers just for this coredump muck is really worse than having a >>>>> clearly descriptive and obvious annotation which results in the section >>>>> build. >>>> >>>> Yeah, I'm mostly unhappy about the "#include <linux/kmemdump.h>" stuff. >>>> >>>> Guess it would all feel less "kmemdump" specific if we would just have a >>>> generic way to tag/describe certain physical memory areas and kmemdump >>>> would simply make use of that. >>> >>> The idea was to make "kmemdump" exactly this generic way to tag/describe >>> the memory. >> >> That's probably where I got lost, after reading the cover letter >> assuming that this is primarily to program kmemdump backends, which I >> understood to just special hw/firmware areas, whereby kinfo acts as a >> filter. > > If there is a mechanism to tag all this memory, or regions, into a > specific section, what we would do with it next ? > It would have a purpose to be parsed and reused by different drivers, > that would be able to actually use it. > So there has a to be some kind of middleman, that holds onto this list > of regions, manages it (unique id, add/remove), and allows certain > drivers to use it. Right, just someone that maintains the list and possibly allows traversing the list and possibly getting notifications on add/remove. > Now it would be interesting to have different kind of drivers connect to > it (or backends how I called them). > One of these programs an internal table for the firmware to use. > Another , writes information into a dedicated reserved-memory for the > bootloader to use on the next soft reboot (memory preserved). > I called this middleman kmemdump. But it can be named differently, and > it can reside in different places in the kernel. > But what I would like to avoid is to just tag all this memory and have > any kind of driver connect to the table. That works, but it's quite > loose on having control over the table. E.g. no kmemdump, tag all the > memory to sections, and have specific drivers (that would reside where?) > walk it. Yeah, you want just some simple "registry" with traversal+notification. > >> >>> If we would call it differently , simply dump , would it be better ? >>> e.g. include linux/dump.h >>> and then DUMP(var, size) ? >>> >>> could we call it maybe MARK ? or TAG ? >>> TAG_MEM(area, size) Just because I thought about it again, "named memory" could be an alternative to "tagged memory". -- Cheers David / dhildenb
On 9/17/25 17:10, Thomas Gleixner wrote: > On Wed, Sep 17 2025 at 09:16, David Hildenbrand wrote: >> On 17.09.25 07:43, Eugen Hristev wrote: >>> On 9/17/25 00:16, Thomas Gleixner wrote: >>>> I pointed you to a solution for that and just because David does not >>>> like it means that it's acceptable to fiddle in subsystems and expose >>>> their carefully localized variables. >> >> It would have been great if we could have had that discussion in the >> previous thread. > > Sorry. I was busy with other stuff and did not pay attention to that > discussion. > >> Some other subsystem wants to have access to this information. I agree >> that exposing these variables as r/w globally is not ideal. > > It's a nono in this case. We had bugs (long ago) where people fiddled > with this stuff (I assume accidentally for my mental sanity sake) and > caused really nasty to debug issues. C is a horrible language to > encapsulate stuff properly as we all know. > >> I raised the alternative of exposing areas or other information through >> simple helper functions that kmemdump can just use to compose whatever >> it needs to compose. >> >> Do we really need that .section thingy? > > The section thing is simple and straight forward as it just puts the > annotated stuff into the section along with size and id and I definitely > find that more palatable, than sprinkling random functions all over the > place to register stuff. +1 from my side. > > Sure, you can achieve the same thing with an accessor function. In case > of nr_irqs there is already one: irq_get_nr_irqs(), but for places which Not really. I cannot use this accessory function because it returns the <value> of nr_irqs. To have this working with a debug tool, I need to dump the actual memory where nr_irqs reside. This is because any debug tool will not call any function or code, rather look in the dump where is the variable to find its value. And nr_irqs is not in the coredump image if it's not registered itself into kmemdump. So to make it work, the accessory would have to return a pointer to nr_irqs. Which is wrong. Returning a pointer to a static, outside of the subsystem, is not right from my point of view. > do not expose the information already for real functional reasons adding > such helpers just for this coredump muck is really worse than having a > clearly descriptive and obvious annotation which results in the section > build. > > The charm of sections is that they don't neither extra code nor stubs or > ifdeffery when a certain subsystem is disabled and therefore no > information available. > > I'm not insisting on sections, but having a table of 2k instead of > hundred functions, stubs and whatever is definitely a win to me. > > Thanks, > > tglx
© 2016 - 2025 Red Hat, Inc.