include/linux/huge_mm.h | 8 +-- include/linux/mm.h | 16 ----- mm/huge_memory.c | 141 +++++++++++++++++++++++----------------- 3 files changed, 85 insertions(+), 80 deletions(-)
The zap_huge_pmd() function is overly complicated, clean it up and also add an assert in the case that we encounter a buggy PMD entry that doesn't match expectations. This is motivated by a bug discovered [0] where the PMD entry was none of: * A non-DAX, PFN or mixed map. * The huge zero folio * A present PMD entry * A softleaf entry In zap_huge_pmd(), but due to the bug we manged to reach this code. It is useful to explicitly call this out rather than have an arbitrary NULL pointer dereference happen, which also improves understanding of what's going on. [0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/ v2: * Added tags thanks everybody! * Fixed issue with returning false on bug case potentially looping forever as per Baolin. * Fixed further issue in bug path in 5/8 with double pte unlock. * Add patch to use vm_normal_folio_pmd() as per David. v1: https://lore.kernel.org/all/cover.1773865827.git.ljs@kernel.org/ Lorenzo Stoakes (Oracle) (9): mm/huge_memory: simplify vma_is_specal_huge() mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: deduplicate zap_huge_pmd() further by tracking state mm/huge_memory: have zap_huge_pmd() use vm_normal_folio_pmd() include/linux/huge_mm.h | 8 +-- include/linux/mm.h | 16 ----- mm/huge_memory.c | 141 +++++++++++++++++++++++----------------- 3 files changed, 85 insertions(+), 80 deletions(-) -- 2.53.
On Thu, 19 Mar 2026 13:00:06 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote: > The zap_huge_pmd() function is overly complicated, clean it up and also add > an assert in the case that we encounter a buggy PMD entry that doesn't > match expectations. > > This is motivated by a bug discovered [0] where the PMD entry was none of: > > * A non-DAX, PFN or mixed map. > * The huge zero folio > * A present PMD entry > * A softleaf entry > > In zap_huge_pmd(), but due to the bug we manged to reach this code. > > It is useful to explicitly call this out rather than have an arbitrary NULL > pointer dereference happen, which also improves understanding of what's > going on. > > [0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/ AI review has questions, which I assume you've seen https://sashiko.dev/#/patchset/cover.1773924928.git.ljs%40kernel.org This isn't going well from a workflow POV. I merge stuff (this was v2) then half a day later a bunch of potential issues are identified. If these reviews are useful (they seem to be, enough) then I guess I'll need to further increase the lag between seeing-it and merging-it. But if there's a 2-day lag before I get onto a series and I'm the first to look at Sashiko then that won't help. So it needs to be something like - series is posted - 24 hours pass - submitter takes a look at the AI review, maybe prepares a new series. - 24 hours pass - rinse, repeat - it gets merged, hopefully with some Reviewed-by"s. Not unreasonable, but it requires that submitter be made aware of Sashiko's comments. At present that's via me being tiresome. Anyway, early days. I'm thinking that an emailed reply-to-all from Sashiko will help. Much hinges on how useful submitters find these questions to be - something which I'm paying close attention to...
Andrew Morton <akpm@linux-foundation.org> writes: > On Thu, 19 Mar 2026 13:00:06 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote: > >> The zap_huge_pmd() function is overly complicated, clean it up and also add >> an assert in the case that we encounter a buggy PMD entry that doesn't >> match expectations. >> >> This is motivated by a bug discovered [0] where the PMD entry was none of: >> >> * A non-DAX, PFN or mixed map. >> * The huge zero folio >> * A present PMD entry >> * A softleaf entry >> >> In zap_huge_pmd(), but due to the bug we manged to reach this code. >> >> It is useful to explicitly call this out rather than have an arbitrary NULL >> pointer dereference happen, which also improves understanding of what's >> going on. >> >> [0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/ > > AI review has questions, which I assume you've seen > https://sashiko.dev/#/patchset/cover.1773924928.git.ljs%40kernel.org > > > > This isn't going well from a workflow POV. I merge stuff (this was v2) > then half a day later a bunch of potential issues are identified. > > If these reviews are useful (they seem to be, enough) then I guess I'll > need to further increase the lag between seeing-it and merging-it. But > if there's a 2-day lag before I get onto a series and I'm the first to > look at Sashiko then that won't help. > > So it needs to be something like > > - series is posted > - 24 hours pass > - submitter takes a look at the AI review, maybe prepares a new > series. > - 24 hours pass > - rinse, repeat > - it gets merged, hopefully with some Reviewed-by"s. > > Not unreasonable, but it requires that submitter be made aware of > Sashiko's comments. At present that's via me being tiresome. > > > Anyway, early days. I'm thinking that an emailed reply-to-all from > Sashiko will help. Much hinges on how useful submitters find these > questions to be - something which I'm paying close attention to... For bpf Alexei suggested to always send the review to the author and cc the bpf mailing list, but ignore maintainers and other mailing lists like lkml to minimize the traffic. It sounds like a good trade off to me. If there are concerns about polluting the mm mailing list, maybe something like a new list like "mm-new" / "mm-early" might work? Idk what's the best thing here to do, just throwing some ideas. Likely next week I'll be able to send reviews over the email and I can send them to whoever we think it's appropriate. Thanks!
On Fri, 20 Mar 2026 20:21:04 -0700 Roman Gushchin <roman.gushchin@linux.dev> wrote: > > > > Anyway, early days. I'm thinking that an emailed reply-to-all from > > Sashiko will help. Much hinges on how useful submitters find these > > questions to be - something which I'm paying close attention to... > > For bpf Alexei suggested to always send the review to the author and > cc the bpf mailing list, but ignore maintainers and other mailing lists > like lkml to minimize the traffic. It sounds like a good trade off to me. I'd like to see them. But I'm figuring it out now - I just let the patchset chill until Sashiko has looked at it. > If there are concerns about polluting the mm mailing list, maybe > something like a new list like "mm-new" / "mm-early" might work? > Idk what's the best thing here to do, just throwing some ideas. Yes, a dedicated list might be the way to go. > Likely next week I'll be able to send reviews over the email > and I can send them to whoever we think it's appropriate. Cool. A lot of patchsets are "failed to apply". What is Sashiko trying to apply MM patches to? It would take some smarts to apply the v2 patchset when v1 is presently in mm.git?
On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > A lot of patchsets are "failed to apply". What is Sashiko trying to > apply MM patches to? It would take some smarts to apply the v2 > patchset when v1 is presently in mm.git? ? The way things are going at present, I'm just not going to apply a series which Sashiko "failed to apply". And that's cool, I'll just wait for a version which Sashiko was able to apply. And then not apply unless all Sashiko questions are resolved or convincingly refuted. Question please: if Sashiko finds an "issue" in v3 and then v4 comes out with changelog words which justifies the questionable alteration, can Sashiko parse that changelog justification and think "OK, never mind"?
On Sat, Mar 21, 2026 at 05:15:30PM -0700, Andrew Morton wrote: > On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > > A lot of patchsets are "failed to apply". What is Sashiko trying to > > apply MM patches to? It would take some smarts to apply the v2 > > patchset when v1 is presently in mm.git? > > ? > > The way things are going at present, I'm just not going to apply a 50% noise vs. signal?... maybe wait until we're in the 9x'%s? > series which Sashiko "failed to apply". And that's cool, I'll just > wait for a version which Sashiko was able to apply. And then not > apply unless all Sashiko questions are resolved or convincingly refuted. Andrew, for crying out loud. Please don't do this. 2 of the 3 series I respan on Friday, working a 13 hour day to do so, don't apply to Sashiko, but do apply to the mm tree. I haven't the _faintest clue_ how we are supposed to factor a 3rd party experimental website applying or not applying series into our work?? And 'not apply unless all Sashiko questions are resolved or convincingly refuted.' is seriously concerning. The workload is already insane, now you're expecting us to answer every bit of nonsense Sashiko hallucinates or misunderstands also? I say that with no disrespect to Roman or his efforts, but as discussed at length, it is not ready for prime time yet. It's clear that Sashiko is not correctly handling applies, and produces a lot of noise. Predicating taking series on this is absurd. Thanks, Lorenzo
"Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes: > On Sat, Mar 21, 2026 at 05:15:30PM -0700, Andrew Morton wrote: >> On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: >> >> > A lot of patchsets are "failed to apply". What is Sashiko trying to >> > apply MM patches to? It would take some smarts to apply the v2 >> > patchset when v1 is presently in mm.git? >> >> ? >> >> The way things are going at present, I'm just not going to apply a > > 50% noise vs. signal?... maybe wait until we're in the 9x'%s? > >> series which Sashiko "failed to apply". And that's cool, I'll just >> wait for a version which Sashiko was able to apply. And then not >> apply unless all Sashiko questions are resolved or convincingly refuted. > > Andrew, for crying out loud. Please don't do this. > > 2 of the 3 series I respan on Friday, working a 13 hour day to do so, don't > apply to Sashiko, but do apply to the mm tree. I'll look into that. > I haven't the _faintest clue_ how we are supposed to factor a 3rd party > experimental website applying or not applying series into our work?? > > And 'not apply unless all Sashiko questions are resolved or convincingly > refuted.' is seriously concerning. > > The workload is already insane, now you're expecting us to answer every bit > of nonsense Sashiko hallucinates or misunderstands also? > > I say that with no disrespect to Roman or his efforts, but as discussed at > length, it is not ready for prime time yet. > > It's clear that Sashiko is not correctly handling applies, and produces a > lot of noise. Predicating taking series on this is absurd. Not trying to pretend that Sashiko is perfect in any way, I think a good mental exercise is to put down our expectation how the "perfect" system would work. The more I work on it, the more I realize it it's far from binary correct/incorrect. In fact, the same applies to humans: I'm sure everyone of us had once this feeling that someone is to picky and just annoying us with finding small nits. At the same time some of these people are extremely useful for the community to find and fix a lot of issues. In the end, we do argue all the time about questions/issues raised by human reviewers. Like do we prefer a system, which finds more real bugs at the cost of being more noisy or we prefer a system which misses more but if it points at the bug, it's certainly real? I'm sure you tempted to prefer the latter, but image a hypothetical system which finds _all_ bugs, but has some false positive rate, e.g. 20%. I think it's pretty attractive. Also lot of raised issues are real, but subjectively are not worth our time. But this is extremely subjective! Depends on the personal level of perfectionism, amount of time available, the state of code before, further plans, etc etc. For example, syzkaller has usually o(100's) open bugs, which are 100% real, but not always are high priority work. I think that asking to address 100% issues raised by any LLM is not reasonable (especially because it's output might be different each time you runt it with the same input), but I also think it's reasonable to address critical & high severity concerns. And I'm happy to tweak Sashiko to be more conservative here, but I think it should be based on some specific examples or data, not purely subjective. tl;dr I increasingly realize the importance of the social context for providing good reviews, and it can't be easily derived from the code. What is acceptable in one subsystem is considered a bad practice in the other. I guess the only way to get the system we all find acceptable (and we still might not like it, who likes being pointed at their bugs?) is collectively codify our expectations in prompts on per-subsystem basis. Thanks!
On Mon, Mar 23, 2026 at 06:08:27PM -0700, Roman Gushchin wrote: > "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes: > > > On Sat, Mar 21, 2026 at 05:15:30PM -0700, Andrew Morton wrote: > >> On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > >> > >> > A lot of patchsets are "failed to apply". What is Sashiko trying to > >> > apply MM patches to? It would take some smarts to apply the v2 > >> > patchset when v1 is presently in mm.git? > >> > >> ? > >> > >> The way things are going at present, I'm just not going to apply a > > > > 50% noise vs. signal?... maybe wait until we're in the 9x'%s? > > > >> series which Sashiko "failed to apply". And that's cool, I'll just > >> wait for a version which Sashiko was able to apply. And then not > >> apply unless all Sashiko questions are resolved or convincingly refuted. > > > > Andrew, for crying out loud. Please don't do this. > > > > 2 of the 3 series I respan on Friday, working a 13 hour day to do so, don't > > apply to Sashiko, but do apply to the mm tree. > > I'll look into that. Thanks. > > > I haven't the _faintest clue_ how we are supposed to factor a 3rd party > > experimental website applying or not applying series into our work?? > > > > And 'not apply unless all Sashiko questions are resolved or convincingly > > refuted.' is seriously concerning. > > > > The workload is already insane, now you're expecting us to answer every bit > > of nonsense Sashiko hallucinates or misunderstands also? > > > > I say that with no disrespect to Roman or his efforts, but as discussed at > > length, it is not ready for prime time yet. > > > > It's clear that Sashiko is not correctly handling applies, and produces a > > lot of noise. Predicating taking series on this is absurd. > > Not trying to pretend that Sashiko is perfect in any way, I think a good > mental exercise is to put down our expectation how the "perfect" system > would work. The more I work on it, the more I realize it it's far from Throughout this discussion I have been making practical points. Nobody expects perfection. I am simpy saying unilaterally demanding that every single point sashiko raises is responded to out of the blue without any community input or input from those doing review AND requiring somehow series all apply is not good. BTW, I don't want to make you the scapegoat for complaints about mm process here :) so I am being careful not to criticise, as I realise when people are frustrated with tooling even if _totally irrelevant_ to you as the maker of the tool, will instinctively want to blame you. I refuse to fall into this trap ;) > binary correct/incorrect. In fact, the same applies to humans: I'm sure > everyone of us had once this feeling that someone is to picky and just > annoying us with finding small nits. At the same time some of these > people are extremely useful for the community to find and fix a lot of > issues. In the end, we do argue all the time about questions/issues > raised by human reviewers. Yes except human reviewers generally evolve over time to be pretty high signal if they remain consistent, that is at least how it is in mm. Even if you think points are trivial. Sashiko is hallucinating, it is raising irrelevant points that have nothing to do with the series, it's creating responses that require serious time to decode. I have not encountered review in mm that is even anwyhere near the ~50% hit rate, rest potentialy violently wrong/wildly irrelevant that sashiko generates. There's an asymmetry too - sashiko can just keep on generating this stuff indefinitely (well, limited by tokens of course :), and potentially generate serious useless work for submitters and reviewers. We _have_ to take that into account when it comes to review process. Again, this is nothing to do with the tooling which I'm grateful, again it's to do with mm process. And sadly you've been dragged into a debate on this which you are ultimately more or less orthogonal to :) > > Like do we prefer a system, which finds more real bugs at the cost of being > more noisy or we prefer a system which misses more but if it points at > the bug, it's certainly real? I'm sure you tempted to prefer the latter, > but image a hypothetical system which finds _all_ bugs, but has some false > positive rate, e.g. 20%. I think it's pretty attractive. I think we are very far from that right now. The issue is how it is _now_ not in some imagined future. And it's easy to pontificate about all this, but in the end it's the sub-maintainers in mm who will have to eventually figure out whether a series is ok or not, and have to decide stuff people might do based on hallucinations/irrelevant points etc. Right now this is going to result in _more work_ for us, and already it feels like in mm the sub-maintainers are the reason things function reasonably, but we don't seem to be having our voices heard here. > > Also lot of raised issues are real, but subjectively are not worth our > time. But this is extremely subjective! Depends on the personal level > of perfectionism, amount of time available, the state of code before, > further plans, etc etc. For example, syzkaller has usually o(100's) open > bugs, which are 100% real, but not always are high priority work. I don't think it's anywhere near as subjective as you say, and I think that's easy to hand wave. One issue here is - trust. There are people in the community we trust to whom we asssign M: and R: entries in MAINTAINERS. Trust on taste, judgement etc. Now sashiko is essentially proposed to be given the same trust despite absolutely not deserving it. What I propose, as I did in the other sub-thread here, is to use it as a _tool_ to _help_ sub-maintainers do their job. Not for it to become a new trusted gatekeeper out of the blue and unilaterally while adding to our workload. > > I think that asking to address 100% issues raised by any LLM is not > reasonable (especially because it's output might be different each time Really, again with respect and trying to dodge the 'blame the tool maker' thing :) that's something of a strawman, nobody is saying they require that. I think >~50% signal is a reasonable ask though. > you runt it with the same input), but I also think it's reasonable to > address critical & high severity concerns. And I'm happy to tweak Right, but with respect you're not an mm maintainer who has to deal with the resultant fallout :) > Sashiko to be more conservative here, but I think it should be based on > some specific examples or data, not purely subjective. Well you can't both say all review is highly subjective and simultaneously ask for objective feedback :) I have provided detailed feedback on a specific example elsewhere, and I'm telling you as an experienced mm maintainer that the hit rate is ~50% in my experience so far. I'm happy to feedback more, but it's again a time and workload thing - the default here shouldn't be that mm is just taking sashiko input as read and we have to jump on everything to explicitly say it's right/wrong. Ideally we'd have some way of feeding back on the website, even if it's as simple as a tick/cross as to what points you are actually accepting or not. That'd be great I think! That could be useful as well to Andrew who could see that in action. User login wise you could have some system where somebody could send a mail from the account that is being reviewed to get a login or something? > > tl;dr I increasingly realize the importance of the social context for > providing good reviews, and it can't be easily derived from the code. Yes for sure. > What is acceptable in one subsystem is considered a bad practice in the > other. I guess the only way to get the system we all find acceptable > (and we still might not like it, who likes being pointed at their bugs?) > is collectively codify our expectations in prompts on per-subsystem basis. Well not only that, we need to figure out, per-subsystem, what our process will be. Again the contentiousness here is not around your tooling but really around the unilateral announcement that we're just going to block on sashiko now. And on that I am pushing back with detailed points as per the rest of the thread. > > Thanks! Cheers, Lorenzo
"Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes: > On Mon, Mar 23, 2026 at 06:08:27PM -0700, Roman Gushchin wrote: >> "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes: >> >> > On Sat, Mar 21, 2026 at 05:15:30PM -0700, Andrew Morton wrote: >> >> On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: >> >> >> >> > A lot of patchsets are "failed to apply". What is Sashiko trying to >> >> > apply MM patches to? It would take some smarts to apply the v2 >> >> > patchset when v1 is presently in mm.git? >> >> >> >> ? >> >> >> >> The way things are going at present, I'm just not going to apply a >> > >> > 50% noise vs. signal?... maybe wait until we're in the 9x'%s? >> > >> >> series which Sashiko "failed to apply". And that's cool, I'll just >> >> wait for a version which Sashiko was able to apply. And then not >> >> apply unless all Sashiko questions are resolved or convincingly refuted. >> > >> > Andrew, for crying out loud. Please don't do this. >> > >> > 2 of the 3 series I respan on Friday, working a 13 hour day to do so, don't >> > apply to Sashiko, but do apply to the mm tree. >> >> I'll look into that. > > Thanks. > >> >> > I haven't the _faintest clue_ how we are supposed to factor a 3rd party >> > experimental website applying or not applying series into our work?? >> > >> > And 'not apply unless all Sashiko questions are resolved or convincingly >> > refuted.' is seriously concerning. >> > >> > The workload is already insane, now you're expecting us to answer every bit >> > of nonsense Sashiko hallucinates or misunderstands also? >> > >> > I say that with no disrespect to Roman or his efforts, but as discussed at >> > length, it is not ready for prime time yet. >> > >> > It's clear that Sashiko is not correctly handling applies, and produces a >> > lot of noise. Predicating taking series on this is absurd. >> >> Not trying to pretend that Sashiko is perfect in any way, I think a good >> mental exercise is to put down our expectation how the "perfect" system >> would work. The more I work on it, the more I realize it it's far from > > Throughout this discussion I have been making practical points. Nobody > expects perfection. > > I am simpy saying unilaterally demanding that every single point sashiko > raises is responded to out of the blue without any community input or input > from those doing review AND requiring somehow series all apply is not > good. I never suggested this and explicitly wrote it below (but looks like I wasn't clear enough and you argue with this statement). > > BTW, I don't want to make you the scapegoat for complaints about mm process > here :) so I am being careful not to criticise, as I realise when people > are frustrated with tooling even if _totally irrelevant_ to you as the > maker of the tool, will instinctively want to blame you. > > I refuse to fall into this trap ;) Agree. Let's separate the mm process from everything else here, otherwise it quickly becomes too messy. > >> binary correct/incorrect. In fact, the same applies to humans: I'm sure >> everyone of us had once this feeling that someone is to picky and just >> annoying us with finding small nits. At the same time some of these >> people are extremely useful for the community to find and fix a lot of >> issues. In the end, we do argue all the time about questions/issues >> raised by human reviewers. > > Yes except human reviewers generally evolve over time to be pretty high > signal if they remain consistent, that is at least how it is in mm. Even if > you think points are trivial. > > Sashiko is hallucinating, it is raising irrelevant points that have nothing > to do with the series, it's creating responses that require serious time to > decode. > > I have not encountered review in mm that is even anwyhere near the ~50% hit > rate, rest potentialy violently wrong/wildly irrelevant that sashiko > generates. > > There's an asymmetry too - sashiko can just keep on generating this stuff > indefinitely (well, limited by tokens of course :), and potentially > generate serious useless work for submitters and reviewers. > > We _have_ to take that into account when it comes to review process. > > Again, this is nothing to do with the tooling which I'm grateful, again > it's to do with mm process. And sadly you've been dragged into a debate on > this which you are ultimately more or less orthogonal to :) > >> >> Like do we prefer a system, which finds more real bugs at the cost of being >> more noisy or we prefer a system which misses more but if it points at >> the bug, it's certainly real? I'm sure you tempted to prefer the latter, >> but image a hypothetical system which finds _all_ bugs, but has some false >> positive rate, e.g. 20%. I think it's pretty attractive. > > I think we are very far from that right now. The issue is how it is _now_ > not in some imagined future. > > And it's easy to pontificate about all this, but in the end it's the > sub-maintainers in mm who will have to eventually figure out whether a > series is ok or not, and have to decide stuff people might do based on > hallucinations/irrelevant points etc. > > Right now this is going to result in _more work_ for us, and already it > feels like in mm the sub-maintainers are the reason things function > reasonably, but we don't seem to be having our voices heard here. > >> >> Also lot of raised issues are real, but subjectively are not worth our >> time. But this is extremely subjective! Depends on the personal level >> of perfectionism, amount of time available, the state of code before, >> further plans, etc etc. For example, syzkaller has usually o(100's) open >> bugs, which are 100% real, but not always are high priority work. > > I don't think it's anywhere near as subjective as you say, and I think > that's easy to hand wave. > > One issue here is - trust. There are people in the community we trust to > whom we asssign M: and R: entries in MAINTAINERS. > > Trust on taste, judgement etc. > > Now sashiko is essentially proposed to be given the same trust despite > absolutely not deserving it. I don't remember anyone ever said this, at least I definitely did not. I think Sashiko can be really useful in finding mechanical bugs, so that _eventually_ maintainers can spend most of their cycles thinking about the direction and high-level ideas rather than checking if all gotos in error handling paths are correct. > > What I propose, as I did in the other sub-thread here, is to use it as a > _tool_ to _help_ sub-maintainers do their job. > > Not for it to become a new trusted gatekeeper out of the blue and > unilaterally while adding to our workload. > >> >> I think that asking to address 100% issues raised by any LLM is not >> reasonable (especially because it's output might be different each time > > Really, again with respect and trying to dodge the 'blame the tool maker' > thing :) that's something of a strawman, nobody is saying they require > that. > > I think >~50% signal is a reasonable ask though. I think you misinterpreted me. > >> you runt it with the same input), but I also think it's reasonable to >> address critical & high severity concerns. And I'm happy to tweak > > Right, but with respect you're not an mm maintainer who has to deal with > the resultant fallout :) I am btw :) > >> Sashiko to be more conservative here, but I think it should be based on >> some specific examples or data, not purely subjective. > > Well you can't both say all review is highly subjective and simultaneously > ask for objective feedback :) > > I have provided detailed feedback on a specific example elsewhere, and I'm > telling you as an experienced mm maintainer that the hit rate is ~50% in my > experience so far. > > I'm happy to feedback more, but it's again a time and workload thing - the > default here shouldn't be that mm is just taking sashiko input as read and > we have to jump on everything to explicitly say it's right/wrong. > > Ideally we'd have some way of feeding back on the website, even if it's as > simple as a tick/cross as to what points you are actually accepting or > not. That'd be great I think! > > That could be useful as well to Andrew who could see that in action. > > User login wise you could have some system where somebody could send a mail > from the account that is being reviewed to get a login or something? This is an option. We have to agree (at least on per-subsystem basis) what's the best option here. For me as Sashiko developer it doesn't really matter which way I get the signal - I need the signal. Thanks
On Tue, Mar 24, 2026 at 08:24:44AM -0700, Roman Gushchin wrote: > "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes: > > > On Mon, Mar 23, 2026 at 06:08:27PM -0700, Roman Gushchin wrote: > >> "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes: > >> > >> > On Sat, Mar 21, 2026 at 05:15:30PM -0700, Andrew Morton wrote: > >> >> On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > >> >> > >> >> > A lot of patchsets are "failed to apply". What is Sashiko trying to > >> >> > apply MM patches to? It would take some smarts to apply the v2 > >> >> > patchset when v1 is presently in mm.git? > >> >> > >> >> ? > >> >> > >> >> The way things are going at present, I'm just not going to apply a > >> > > >> > 50% noise vs. signal?... maybe wait until we're in the 9x'%s? > >> > > >> >> series which Sashiko "failed to apply". And that's cool, I'll just > >> >> wait for a version which Sashiko was able to apply. And then not > >> >> apply unless all Sashiko questions are resolved or convincingly refuted. > >> > > >> > Andrew, for crying out loud. Please don't do this. > >> > > >> > 2 of the 3 series I respan on Friday, working a 13 hour day to do so, don't > >> > apply to Sashiko, but do apply to the mm tree. > >> > >> I'll look into that. > > > > Thanks. > > > >> > >> > I haven't the _faintest clue_ how we are supposed to factor a 3rd party > >> > experimental website applying or not applying series into our work?? > >> > > >> > And 'not apply unless all Sashiko questions are resolved or convincingly > >> > refuted.' is seriously concerning. > >> > > >> > The workload is already insane, now you're expecting us to answer every bit > >> > of nonsense Sashiko hallucinates or misunderstands also? > >> > > >> > I say that with no disrespect to Roman or his efforts, but as discussed at > >> > length, it is not ready for prime time yet. > >> > > >> > It's clear that Sashiko is not correctly handling applies, and produces a > >> > lot of noise. Predicating taking series on this is absurd. > >> > >> Not trying to pretend that Sashiko is perfect in any way, I think a good > >> mental exercise is to put down our expectation how the "perfect" system > >> would work. The more I work on it, the more I realize it it's far from > > > > Throughout this discussion I have been making practical points. Nobody > > expects perfection. > > > > I am simpy saying unilaterally demanding that every single point sashiko > > raises is responded to out of the blue without any community input or input > > from those doing review AND requiring somehow series all apply is not > > good. > > I never suggested this and explicitly wrote it below (but looks like I > wasn't clear enough and you argue with this statement). Yeah, Andrew has proposed this, nothing to do with you! > > > > > BTW, I don't want to make you the scapegoat for complaints about mm process > > here :) so I am being careful not to criticise, as I realise when people > > are frustrated with tooling even if _totally irrelevant_ to you as the > > maker of the tool, will instinctively want to blame you. > > > > I refuse to fall into this trap ;) > > Agree. Let's separate the mm process from everything else here, > otherwise it quickly becomes too messy. Yup :) > > > > >> binary correct/incorrect. In fact, the same applies to humans: I'm sure > >> everyone of us had once this feeling that someone is to picky and just > >> annoying us with finding small nits. At the same time some of these > >> people are extremely useful for the community to find and fix a lot of > >> issues. In the end, we do argue all the time about questions/issues > >> raised by human reviewers. > > > > Yes except human reviewers generally evolve over time to be pretty high > > signal if they remain consistent, that is at least how it is in mm. Even if > > you think points are trivial. > > > > Sashiko is hallucinating, it is raising irrelevant points that have nothing > > to do with the series, it's creating responses that require serious time to > > decode. > > > > I have not encountered review in mm that is even anwyhere near the ~50% hit > > rate, rest potentialy violently wrong/wildly irrelevant that sashiko > > generates. > > > > There's an asymmetry too - sashiko can just keep on generating this stuff > > indefinitely (well, limited by tokens of course :), and potentially > > generate serious useless work for submitters and reviewers. > > > > We _have_ to take that into account when it comes to review process. > > > > Again, this is nothing to do with the tooling which I'm grateful, again > > it's to do with mm process. And sadly you've been dragged into a debate on > > this which you are ultimately more or less orthogonal to :) > > > >> > >> Like do we prefer a system, which finds more real bugs at the cost of being > >> more noisy or we prefer a system which misses more but if it points at > >> the bug, it's certainly real? I'm sure you tempted to prefer the latter, > >> but image a hypothetical system which finds _all_ bugs, but has some false > >> positive rate, e.g. 20%. I think it's pretty attractive. > > > > I think we are very far from that right now. The issue is how it is _now_ > > not in some imagined future. > > > > And it's easy to pontificate about all this, but in the end it's the > > sub-maintainers in mm who will have to eventually figure out whether a > > series is ok or not, and have to decide stuff people might do based on > > hallucinations/irrelevant points etc. > > > > Right now this is going to result in _more work_ for us, and already it > > feels like in mm the sub-maintainers are the reason things function > > reasonably, but we don't seem to be having our voices heard here. > > > >> > >> Also lot of raised issues are real, but subjectively are not worth our > >> time. But this is extremely subjective! Depends on the personal level > >> of perfectionism, amount of time available, the state of code before, > >> further plans, etc etc. For example, syzkaller has usually o(100's) open > >> bugs, which are 100% real, but not always are high priority work. > > > > I don't think it's anywhere near as subjective as you say, and I think > > that's easy to hand wave. > > > > One issue here is - trust. There are people in the community we trust to > > whom we asssign M: and R: entries in MAINTAINERS. > > > > Trust on taste, judgement etc. > > > > Now sashiko is essentially proposed to be given the same trust despite > > absolutely not deserving it. > > I don't remember anyone ever said this, at least I definitely did not. Andrew has said that every single point sashiko raises needs to be addressed or patches will not be taken, that's again a separate process issue. > > I think Sashiko can be really useful in finding mechanical bugs, so that > _eventually_ maintainers can spend most of their cycles thinking about > the direction and high-level ideas rather than checking if all gotos in > error handling paths are correct. > > > > > What I propose, as I did in the other sub-thread here, is to use it as a > > _tool_ to _help_ sub-maintainers do their job. > > > > Not for it to become a new trusted gatekeeper out of the blue and > > unilaterally while adding to our workload. > > > >> > >> I think that asking to address 100% issues raised by any LLM is not > >> reasonable (especially because it's output might be different each time > > > > Really, again with respect and trying to dodge the 'blame the tool maker' > > thing :) that's something of a strawman, nobody is saying they require > > that. > > > > I think >~50% signal is a reasonable ask though. > > I think you misinterpreted me. Right, but this is broadly the hit rate I've experienced. It's not a criticism, just saying that from an RoI point of view, I'd want to see that be higher before putting in _stringent_ requirements as to having to address points. > > > > >> you runt it with the same input), but I also think it's reasonable to > >> address critical & high severity concerns. And I'm happy to tweak > > > > Right, but with respect you're not an mm maintainer who has to deal with > > the resultant fallout :) > > I am btw :) Oh damn I am so sorry! That is me being a scatterbrain and not some strange kind of insult or something :P I promise! I was thinking of you with your sashiko hat on :) The point of saying that was to emphasise the process side of things, and it being separate of course. > > > > >> Sashiko to be more conservative here, but I think it should be based on > >> some specific examples or data, not purely subjective. > > > > Well you can't both say all review is highly subjective and simultaneously > > ask for objective feedback :) > > > > I have provided detailed feedback on a specific example elsewhere, and I'm > > telling you as an experienced mm maintainer that the hit rate is ~50% in my > > experience so far. > > > > I'm happy to feedback more, but it's again a time and workload thing - the > > default here shouldn't be that mm is just taking sashiko input as read and > > we have to jump on everything to explicitly say it's right/wrong. > > > > Ideally we'd have some way of feeding back on the website, even if it's as > > simple as a tick/cross as to what points you are actually accepting or > > not. That'd be great I think! > > > > That could be useful as well to Andrew who could see that in action. > > > > User login wise you could have some system where somebody could send a mail > > from the account that is being reviewed to get a login or something? > > This is an option. We have to agree (at least on per-subsystem basis) > what's the best option here. For me as Sashiko developer it doesn't > really matter which way I get the signal - I need the signal. Right, but from a workflow point of view, it's not really workable to have to respond to every input in any kind of detail. So to me something super simple like tick/cross on responses would be great. > > Thanks Cheers, Lorenzo
On Mon, Mar 23, 2026 at 11:31:29AM +0000, Lorenzo Stoakes (Oracle) wrote: > On Sat, Mar 21, 2026 at 05:15:30PM -0700, Andrew Morton wrote: > > On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > A lot of patchsets are "failed to apply". What is Sashiko trying to > > > apply MM patches to? It would take some smarts to apply the v2 > > > patchset when v1 is presently in mm.git? > > > > ? > > > > The way things are going at present, I'm just not going to apply a > > 50% noise vs. signal?... maybe wait until we're in the 9x'%s? > > > series which Sashiko "failed to apply". And that's cool, I'll just > > wait for a version which Sashiko was able to apply. And then not > > apply unless all Sashiko questions are resolved or convincingly refuted. > > Andrew, for crying out loud. Please don't do this. > > 2 of the 3 series I respan on Friday, working a 13 hour day to do so, don't > apply to Sashiko, but do apply to the mm tree. > > I haven't the _faintest clue_ how we are supposed to factor a 3rd party > experimental website applying or not applying series into our work?? > > And 'not apply unless all Sashiko questions are resolved or convincingly > refuted.' is seriously concerning. FWIW I wholeheartedly agree. I don't understand how we don't require proper M: or R: reviews on patches before merging, but now out of the blue require the magic AI LLM thingy to review it before it's merged. Like, sure, sashiko can be useful, and is better than nothing. But unless sashiko is better than the maintainers, it should be kept as optional. Seriously, I can't wrap my head around the difference in treatment in "human maintainers, experts in the code, aren't required to review a patch" vs "make the fscking AI happy or it's not going anywhere". It's almost insulting. -- Pedro
On Mon, 23 Mar 2026 12:34:31 +0000 Pedro Falcato <pfalcato@suse.de> wrote: > On Mon, Mar 23, 2026 at 11:31:29AM +0000, Lorenzo Stoakes (Oracle) wrote: > > On Sat, Mar 21, 2026 at 05:15:30PM -0700, Andrew Morton wrote: > > > On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > A lot of patchsets are "failed to apply". What is Sashiko trying to > > > > apply MM patches to? It would take some smarts to apply the v2 > > > > patchset when v1 is presently in mm.git? > > > > > > ? > > > > > > The way things are going at present, I'm just not going to apply a > > > > 50% noise vs. signal?... maybe wait until we're in the 9x'%s? > > > > > series which Sashiko "failed to apply". And that's cool, I'll just > > > wait for a version which Sashiko was able to apply. And then not > > > apply unless all Sashiko questions are resolved or convincingly refuted. > > > > Andrew, for crying out loud. Please don't do this. > > > > 2 of the 3 series I respan on Friday, working a 13 hour day to do so, don't > > apply to Sashiko, but do apply to the mm tree. > > > > I haven't the _faintest clue_ how we are supposed to factor a 3rd party > > experimental website applying or not applying series into our work?? > > > > And 'not apply unless all Sashiko questions are resolved or convincingly > > refuted.' is seriously concerning. > > FWIW I wholeheartedly agree. I don't understand how we don't require proper > M: or R: reviews on patches before merging I wish people would stop making this claim, without substantiation. I've looked (deeply) at the data, which is equally available to us all. Has anyone else? After weeding out a few special cases (especially DAMON) (this time also maple_tree), the amount of such unreviewed material which enters mm-stable and mainline is very very low. > Like, sure, sashiko can be useful, and is better than nothing. But unless > sashiko is better than the maintainers, it should be kept as optional. Rule #1 is, surely, "don't add bugs". This thing finds bugs. If its hit rate is 50% then that's plenty high enough to justify people spending time to go through and check its output. > Seriously, I can't wrap my head around the difference in treatment in > "human maintainers, experts in the code, aren't required to review a patch" Speaking of insulting. > vs "make the fscking AI happy or it's not going anywhere". It's almost > insulting. Look, I know people are busy. If checking these reports slows us down and we end up merging less code and less buggy code then that's a good tradeoff. Also, gimme a break. Like everyone else I'm still trying to wrap my head how best to incorporate this new tool into our development processes.
On Mon, Mar 23, 2026 at 02:36:04PM -0700, Andrew Morton wrote: > On Mon, 23 Mar 2026 12:34:31 +0000 Pedro Falcato <pfalcato@suse.de> wrote: > > > > FWIW I wholeheartedly agree. I don't understand how we don't require proper > > M: or R: reviews on patches before merging > > I wish people would stop making this claim, without substantiation. > I've looked (deeply) at the data, which is equally available to us all. > Has anyone else? > > After weeding out a few special cases (especially DAMON) (this time > also maple_tree), the amount of such unreviewed material which enters > mm-stable and mainline is very very low. Here's a breakout of MM commit tags (with DAMON excluded) since 6.10: ------------------------------------------------------------------------------ Release Total Reviewed-by Acked-by only No review DAMON excl ------------------------------------------------------------------------------ v6.10 318 206 (65%) 36 (11%) 76 (24%) 10 v6.11 270 131 (49%) 72 (27%) 67 (25%) 17 v6.12 333 161 (48%) 65 (20%) 107 (32%) 18 v6.13 180 94 (52%) 29 (16%) 57 (32%) 8 v6.14 217 103 (47%) 40 (18%) 74 (34%) 30 v6.15 289 129 (45%) 45 (16%) 115 (40%) 43 v6.16 198 126 (64%) 44 (22%) 28 (14%) 16 v6.17 245 181 (74%) 41 (17%) 23 (9%) 53 v6.18 205 150 (73%) 28 (14%) 27 (13%) 34 v6.19 228 165 (72%) 33 (14%) 30 (13%) 64 ------------------------------------------------------------------------------ There's indeed sharp reduction in amount of unreviewed material that gets merged since v6.15, i.e. after the last LSF/MM when we updated the process and nominated people as sub-maintainers and reviewers for different parts of MM. This very much confirms that splitting up the MM entry and letting people to step up as sub-maintaners pays off. But we are still at double digits for percentage of commits without Reviewed-by tags despite the effort people (especially David and Lorenzo) are putting into review. I wouldn't say that even 9% is "very very low". > > Like, sure, sashiko can be useful, and is better than nothing. But unless > > sashiko is better than the maintainers, it should be kept as optional. > > Rule #1 is, surely, "don't add bugs". This thing finds bugs. If its > hit rate is 50% then that's plenty high enough to justify people > spending time to go through and check its output. > > > Seriously, I can't wrap my head around the difference in treatment in > > "human maintainers, experts in the code, aren't required to review a patch" > > Speaking of insulting. > > > vs "make the fscking AI happy or it's not going anywhere". It's almost > > insulting. > > Look, I know people are busy. If checking these reports slows us down > and we end up merging less code and less buggy code then that's a good > tradeoff. If you think this is a good trade-off, then slowing down to wait for human review so we merge up less buggy or less maintainable code is a good trade-off too. While LLMs can detect potential bugs, they are not capable to identify potential maintainability issues. > Also, gimme a break. Like everyone else I'm still trying to wrap my > head how best to incorporate this new tool into our development > processes. It would be nice if we had a more formal description of our development process in Documentation/process/maintainer-mm.rst and then we can add a few sentences about how to incorporate this tool into the process when we figure this out. Right now our process is a tribal knowledge, having "Rule #1" and a few others written down would help everyone who participates in MM development. -- Sincerely yours, Mike.
On Tue, Mar 24, 2026 at 09:58:12AM +0200, Mike Rapoport wrote: > On Mon, Mar 23, 2026 at 02:36:04PM -0700, Andrew Morton wrote: > > On Mon, 23 Mar 2026 12:34:31 +0000 Pedro Falcato <pfalcato@suse.de> wrote: > > > > > > FWIW I wholeheartedly agree. I don't understand how we don't require proper > > > M: or R: reviews on patches before merging > > > > I wish people would stop making this claim, without substantiation. > > I've looked (deeply) at the data, which is equally available to us all. > > Has anyone else? > > > > After weeding out a few special cases (especially DAMON) (this time > > also maple_tree), the amount of such unreviewed material which enters > > mm-stable and mainline is very very low. > > Here's a breakout of MM commit tags (with DAMON excluded) since 6.10: > > ------------------------------------------------------------------------------ > Release Total Reviewed-by Acked-by only No review DAMON excl > ------------------------------------------------------------------------------ > v6.10 318 206 (65%) 36 (11%) 76 (24%) 10 > v6.11 270 131 (49%) 72 (27%) 67 (25%) 17 > v6.12 333 161 (48%) 65 (20%) 107 (32%) 18 > v6.13 180 94 (52%) 29 (16%) 57 (32%) 8 > v6.14 217 103 (47%) 40 (18%) 74 (34%) 30 > v6.15 289 129 (45%) 45 (16%) 115 (40%) 43 > v6.16 198 126 (64%) 44 (22%) 28 (14%) 16 > v6.17 245 181 (74%) 41 (17%) 23 (9%) 53 > v6.18 205 150 (73%) 28 (14%) 27 (13%) 34 > v6.19 228 165 (72%) 33 (14%) 30 (13%) 64 > ------------------------------------------------------------------------------ Thanks Mike, I've gone a bit deeper, classifying based on the _actually_ requested requirement of sub-maintainer R-b or A-b (not all reviews are equal), and since sub-M's were in place ~6.15. I exclude DAMON from everything, which seems pretty arbitrary, but for the sake of being generous: (getting some slightly different total numbers maybe mildly varying filters) ------------------------------------------------------------------------------ Release Total Sub-M signoff No sub-M signoff ------------------------------------------------------------------------------ v6.15 289 136/289 (47.1%) 153/289 (52.9%) v6.16 198 147/198 (74.2%) 51/198 (25.8%) v6.17 245 201/245 (82.0%) 44/245 (18.0%) v6.18 206 155/206 (75.2%) 51/206 (24.8%) v6.19 232 181/232 (78.0%) 51/232 (22.0%) v7.0 (so far) 188 135/188 (71.8%) 53/188 (28.2%) V6.15..v.7 1358 955/1358 (70.3%) 403/1358 (29.7%) ------------------------------------------------------------------------------ Now if we consider series _sent_ by sub-M's as being reviewed by default: ------------------------------------------------------------------------------ Release Total Sub-M signoff No sub-M signoff ------------------------------------------------------------------------------ v6.15 289 204/289 (70.6%) 85/289 (29.4%) v6.16 198 163/198 (82.3%) 35/198 (17.7%) v6.17 245 212/245 (86.5%) 33/245 (13.5%) v6.18 206 176/206 (85.4%) 30/206 (14.6%) v6.19 232 200/232 (86.2%) 32/232 (13.8%) v7.0 (so far) 188 174/188 (92.6%) 14/188 ( 7.4%) V6.15..v.7 1358 1129/1358 (83.1%) 229/1358 (16.9%) ------------------------------------------------------------------------------ So 'the amount of such unreviewed material which enters mm-stable and mainline is very very low' is clearly untrue. In aggregate there were 229 patches merged (and by that I mean to Linus's tree), or 16.9% without sub-M review or sub-M S-o-b. I seem to recall you claiming there were only one or two series/patches that landed like this for the past year or 2 or something like this? None of the data reflects that. Clearly there is still work to be done and clearly there are still patches being sent that are not getting sub-M signoff. It _is_ improving, but I fear that a lot of that is because of us sub-M's burning ourselves out. Let's look at that. Rather than limiting to mm commits, let's expand and just go with commits which you were the comitter for from 6.15 onward to make life easier: Of those, there were 3,339 commits, and 2,284 had at least one A-b or R-b (68.4% review rate). Looking at commits actually A-b/R-b from 6.15 on and taking those in 3 digits or more: ----------------------------------------- Author R-b/A-b ----------------------------------------- David Hildenbrand 484/2284 (21.2%) Lorenzo Stoakes 356/2284 (15.6%) Vlastimil Babka 276/2284 (12.1%) Zi Yan 213/2284 ( 9.3%) Mike Rapoport 193/2284 ( 8.5%) SJ Park 174/2284 ( 7.6%) Liam Howlett 128/2284 ( 5.6%) Shakeel Butt 115/2284 ( 5.0%) Oscar Salvador 111/2284 ( 4.9%) ----------------------------------------- (Keep in mind I reduced my review sharply for a couple months during this period due to burnout/objecting to mm review policy.) Do you think that maybe some of the people listed here should be consulted about these kinds of decisions at all? Do you notice here that the people listed above (apart from Zi, who is exceptional overall anyway :) are sub-M's? The data overwhelmingly backs the fact that the sub-M/R changes have radically improved review in mm. This is something you have pushed back on, so I gently suggest that you should be a little more accepting of the fact the data lays bare here please. > > There's indeed sharp reduction in amount of unreviewed material that gets > merged since v6.15, i.e. after the last LSF/MM when we updated the process > and nominated people as sub-maintainers and reviewers for different parts > of MM. This very much confirms that splitting up the MM entry and letting > people to step up as sub-maintaners pays off. Yes that's evident obviously in all the data, I felt it had a huge impact and it's great to see the data dmeontrate that! Andrew - hopefully that helps give some basis for the role of sub-maintainers and reviewers in mm, I know you have expressed in the past (on more than one occasion) that you feel these roles are meaningless as you are able to subjectively interpret reviews - the data clearly shows otherwise. As a man of data, I ask you to take this into account please. And as you are showing you are more than happy to wait for review when AI does it, I genuinely do not understand why you would not accept this sub-M signoff rule at this stage. > > But we are still at double digits for percentage of commits without > Reviewed-by tags despite the effort people (especially David and Lorenzo) > are putting into review. I wouldn't say that even 9% is "very very low". Yes, far from it. > > > > Like, sure, sashiko can be useful, and is better than nothing. But unless > > > sashiko is better than the maintainers, it should be kept as optional. > > > > Rule #1 is, surely, "don't add bugs". This thing finds bugs. If its > > hit rate is 50% then that's plenty high enough to justify people > > spending time to go through and check its output. > > > > > Seriously, I can't wrap my head around the difference in treatment in > > > "human maintainers, experts in the code, aren't required to review a patch" > > > > Speaking of insulting. Honestly I think unilaterally instituting radical changes to review in MM without even bothering to consult those who do the actual review-work, and responding to push back either by ignoring or dismissal isn't hugely respectful. I also feel you are not being quite fair to Pedro here, especially when the data bears out his claims. (I refer you back to the above data.) > > > > > vs "make the fscking AI happy or it's not going anywhere". It's almost > > > insulting. > > > > Look, I know people are busy. If checking these reports slows us down > > and we end up merging less code and less buggy code then that's a good > > tradeoff. I mean you're literally ignoring the people who are doing all the review work here and then saying you're fine with adding more work for them (it's clear reviewers will have to account for Sashiko feedback in a regime where that's a hard requirement for merge), as well as to submitters too obviously. So I honestly don't think you do know that, since you are ignoring push-back from the people who are doing the work who are demonstrably VERY busy. > > If you think this is a good trade-off, then slowing down to wait for human > review so we merge up less buggy or less maintainable code is a good > trade-off too. > > While LLMs can detect potential bugs, they are not capable to identify > potential maintainability issues. Yes precisely. > > > Also, gimme a break. Like everyone else I'm still trying to wrap my > > head how best to incorporate this new tool into our development > > processes. > > It would be nice if we had a more formal description of our development > process in Documentation/process/maintainer-mm.rst and then we can add a > few sentences about how to incorporate this tool into the process when we > figure this out. I mean we've been waiting for this for a while :) I actually think at this stage it'd be better for those actually-doing-the-work of review to be writing these documents. But then they won't match what's actually happening, of course. > > Right now our process is a tribal knowledge, having "Rule #1" and a few > others written down would help everyone who participates in MM development. Rule #1 presumably 'don't introduce bugs' has so many caveats in it it's almost meaningless. For instance, as a silly example but one that makes the point - if reviewers were required to do two rounds of review, the second with much more scrutiny after having tagged the first - this would ABSOLUTELY find more bugs. But it'd double the time or more taken to do review. It's like saying 'reduce speed limits to save lives' - invariably you will if you do, but there are other considerations. A 5mph limit nationally might have other knock on effects :) I'd say this requires _discussion_ with those _actually doing the work_ that keeps mm moving and stable, i.e. review. Plus review comprises of more than finding bugs - in fact that's almost secondary to ensuring _architecturally_ changes are valid and we're not causing user interface issues and style and code quality and etc. All things that AI frankly sucks at (at least for now). This new approach, taken out of the blue and without community discussion also FLATLY contradicts mm process thus far - Andrew has repeatedly argued that 'perfectly good series' get 'held up' by review, and he really wants to avoid that. And thus has rejected the reasonable requests, whose requirement is now borne out by statistical evidence, for sub-M signoff. He's even intimated that stable patches don't require proper review in the past. Now AI is being instituted as a trusted gatekeeper and is immediately given full veto power. I don't think documenting this kind of decision making is helpful, but absolutely process docs are needed, were promised, and have not emerged. > > -- > Sincerely yours, > Mike. Hopefully the data helps paint the picture here. Thanks, Lorenzo
On Mon, Mar 23, 2026 at 02:36:04PM -0700, Andrew Morton wrote: > On Mon, 23 Mar 2026 12:34:31 +0000 Pedro Falcato <pfalcato@suse.de> wrote: > > > On Mon, Mar 23, 2026 at 11:31:29AM +0000, Lorenzo Stoakes (Oracle) wrote: > > > On Sat, Mar 21, 2026 at 05:15:30PM -0700, Andrew Morton wrote: > > > > On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > A lot of patchsets are "failed to apply". What is Sashiko trying to > > > > > apply MM patches to? It would take some smarts to apply the v2 > > > > > patchset when v1 is presently in mm.git? > > > > > > > > ? > > > > > > > > The way things are going at present, I'm just not going to apply a > > > > > > 50% noise vs. signal?... maybe wait until we're in the 9x'%s? > > > > > > > series which Sashiko "failed to apply". And that's cool, I'll just > > > > wait for a version which Sashiko was able to apply. And then not > > > > apply unless all Sashiko questions are resolved or convincingly refuted. > > > > > > Andrew, for crying out loud. Please don't do this. > > > > > > 2 of the 3 series I respan on Friday, working a 13 hour day to do so, don't > > > apply to Sashiko, but do apply to the mm tree. > > > > > > I haven't the _faintest clue_ how we are supposed to factor a 3rd party > > > experimental website applying or not applying series into our work?? > > > > > > And 'not apply unless all Sashiko questions are resolved or convincingly > > > refuted.' is seriously concerning. > > > > FWIW I wholeheartedly agree. I don't understand how we don't require proper > > M: or R: reviews on patches before merging > > I wish people would stop making this claim, without substantiation. > I've looked (deeply) at the data, which is equally available to us all. > Has anyone else? > > After weeding out a few special cases (especially DAMON) (this time > also maple_tree), the amount of such unreviewed material which enters > mm-stable and mainline is very very low. That is not what I said. I said "we don't require proper M: or R: reviews on patches before merging". Which as far as I know is still true when it comes to the process. If I have this wrong, then I'm not the only one. The fact that the end result is still high quality is a result of your work (diligently tracking down review states; yes, i've seen your quilt series file and its annotations) and every single one involved in the review process. This is not however codified into the process. (note: the fact that DAMON and maple tree both lack reviews from !authors just shows there is a very low bus factor at stake. we should fix this...) > > > Like, sure, sashiko can be useful, and is better than nothing. But unless > > sashiko is better than the maintainers, it should be kept as optional. > > Rule #1 is, surely, "don't add bugs". This thing finds bugs. If its > hit rate is 50% then that's plenty high enough to justify people > spending time to go through and check its output. I agree. But I don't think it's flawless enough to become mandatory. > > > Seriously, I can't wrap my head around the difference in treatment in > > "human maintainers, experts in the code, aren't required to review a patch" > > Speaking of insulting. Then I sincerely apologize. I see how I was brash. I did not mean to insult. > > > vs "make the fscking AI happy or it's not going anywhere". It's almost > > insulting. > > Look, I know people are busy. If checking these reports slows us down > and we end up merging less code and less buggy code then that's a good > tradeoff. Sure. But I'm thinking about the human factor - I simply don't think either contributors or maintainers will be particularly less stressed with the introduction of obligatory AI reviews. Maintainers are still hardpressed to review (as is their function), and contributors need to go through the tool's output and figure out what's relevant (and _true_) or what's not. IF we were able to codify the MM process like in (https://docs.kernel.org/process/maintainer-netdev.html), with things like: - NO patch is getting in without being 1) written by a maintainer or 2) getting Rb's and Acks from M's and R's - Ideally both, but maple and DAMON need special casing for now, I guess. - NO -next content is being accepted during the merge window. straight to /dev/null. - review state for each patch is <here> it would already be a huge, palpable win for everyone involved. Some of these have been asked for and discussed by people that are much more load-bearing in MM than I am, for longer than I've been around. And would make more of a difference than making sashiko (which is not reliable, experimental software, etc) load-bearing. > > Also, gimme a break. Like everyone else I'm still trying to wrap my > head how best to incorporate this new tool into our development > processes. I understand. Ideally, sashiko would be a tool that maintainers and reviewers (and submitters) could use to help find problems. I don't think having you check every AI review scales. But I also don't think we should be treating LLM output as if it were a normal review from an expert. -- Pedro
On Mon, 23 Mar 2026 23:27:35 +0000 Pedro Falcato <pfalcato@suse.de> wrote: > > also maple_tree), the amount of such unreviewed material which enters > > mm-stable and mainline is very very low. > > That is not what I said. I said "we don't require proper M: or R: reviews > on patches before merging". Which as far as I know is still true when it > comes to the process. If I have this wrong, then I'm not the only one. People never define what they mean by "merged". I define it as "added to mm-stable". Things that are in mm-unstable are unstable! They're subject to alteration or removal. I pipeline things, a lot. The main benefit if this is that material gets sometimes *weeks* of additional testing which they would not have otherwise received. Also there are integration benefits - inter-tree as well and intra-tree. If something is getting close to mm-stable and doesn't appear sufficiently reviewed then I'll send out bleats and if those don't work, it gets deferred or dropped. And, btw, it's really bad to remove material late in the cycle - that means moving an untested code combination into mm-stable, which adds risk. For this reason I do ask that M:aintainers and R:eviewers be attentive to material which is in mm-unstable and to tell me as early as possible if I should defer or drop it. It's a mistake that we've never defined the roles and responsibilities of maintainers and reviewers. If we were to define their responsibilities, I'd place "take care of what's in mm.git" high on the list. > The fact that the end result is still high quality is a result of your work > (diligently tracking down review states; yes, i've seen your quilt series file > and its annotations) and every single one involved in the review process. > This is not however codified into the process. Yeah. mea cupla. > (note: the fact that DAMON and maple tree both lack reviews from !authors > just shows there is a very low bus factor at stake. we should fix this...) Agree. It would be a long haul for someone to effectively pick up something like mapletree. > > > > > Like, sure, sashiko can be useful, and is better than nothing. But unless > > > sashiko is better than the maintainers, it should be kept as optional. > > > > Rule #1 is, surely, "don't add bugs". This thing finds bugs. If its > > hit rate is 50% then that's plenty high enough to justify people > > spending time to go through and check its output. > > I agree. But I don't think it's flawless enough to become mandatory. Well, I looked at some numbers. Data! Searched for linux-mm emails which had from:akpm, message-body contains "sashiko". 22 emails received replies from authors indicating that alterations were needed. 2 emails received replies from authors indicating that no alterations were needed 1 email received a reply from author in which I wasn't able to decide either way. A few more replies said "no alteration, but we need to change other code". 10-15ish have yet to receive replies. That's a really high hit rate! How can we possibly not use this, if we care about Rule #1? > Sure. But I'm thinking about the human factor - I simply don't think either > contributors or maintainers will be particularly less stressed with the > introduction of obligatory AI reviews. Maintainers are still hardpressed > to review (as is their function), and contributors need to go through the > tool's output and figure out what's relevant (and _true_) or what's not. Yeah, it's a matter of figuring this out as we go along. It will be so much better if/when people are able to use sashiko privately. But heck, people forget to run checkpatch ;) > IF we were able to codify the MM process like in (https://docs.kernel.org/process/maintainer-netdev.html), > with things like: > - NO patch is getting in without being 1) written by a maintainer or 2) getting Rb's and Acks from M's and R's Sure. Where "in" means mm-stable. > - Ideally both, but maple and DAMON need special casing for now, I guess. We do get quite a lot of patches from sole maintainers. > - NO -next content is being accepted during the merge window. straight to /dev/null. For sure. Well. I usually park these thing to take a look at after we're all merged up, but it's usually all stale by then. > - review state for each patch is <here> I generate that now, with the occasional "mm.git review status" emails. I could run it daily and add it to mm.git or something, but this doesn't seem to have generated much interest. > I understand. Ideally, sashiko would be a tool that maintainers and > reviewers (and submitters) could use to help find problems. I don't think > having you check every AI review scales. But I also don't think we should be > treating LLM output as if it were a normal review from an expert. Sure, But that hit rate is so high!
On Mon, Mar 23, 2026 at 05:05:37PM -0700, Andrew Morton wrote: > Well, I looked at some numbers. Data! > > Searched for linux-mm emails which had from:akpm, message-body contains > "sashiko". > > 22 emails received replies from authors indicating that alterations > were needed. > > 2 emails received replies from authors indicating that no alterations > were needed > > 1 email received a reply from author in which I wasn't able to decide > either way. > > A few more replies said "no alteration, but we need to change other > code". > > 10-15ish have yet to receive replies. > > > That's a really high hit rate! How can we possibly not use this, if we > care about Rule #1? Really this data doesn't support that. If we're generous and say 10 with no replies, that's 22/35 or ~63% _where sashiko was correct in AT LEAST ONE individual observation_. That is not indicative of a good signal-to-noise ratio. Do you not think we can do better? Roughly in my experience, around ~50% of sashiko INDIVIDUAL REPORTS (i.e. individual comments made line-by-line) have validity. Roman has said that the strategy he takes, partly for sensible token usage, partly to avoid throwing out the baby with the bath water, at this time leads to more noise. And as models improve this is likely to also. This is no criticism of him, I am grateful for this tooling. The issue is with mm process. This adds quite a burden to reviewers to have to deal with _every single thing_ reported. Which is what you unilaterally seemed to say was now a requirement, to which I object. There's further problems here: 1. What if a new engineer comes along and sashiko hallucinates a bunch of stuff and they respin + respin to match it, and now reviewers have to tell them to stop? 2. What if sashiko directly contradicts a human reviewer/maintainer? 3. Are you going to quietly just not take series and people find out in the merge window/when you gather up mm-stable in one of the many batches, because they didn't respond to a hallucination? 4. AI often generates new 'thoughts' just from being ran for a 2nd time, so do we hold series in perpetual flux trying to figure out if the latest set are valid? 5. Often the reported 'issues' are so complicated it requires human expertise to figure out if they're relevant, thereby increasing the already over-strained maintenance workload. And again, I come back to you requiring sashiko to be able to apply a series, based on unknown criteria, probably not correctly apply fix-patches etc. - there is no sensible way for a series author to fulfill that requirement. Really we need input of _those doing the actual review_ in how mm review works. Let me make workable suggestions: 1. Defer this to sub-maintainers. We have the expertise and experience to make judgment calls on this. 2. Don't make this silly series applies demand. It's impossible to adhere to. 3. Don't require that every sashiko point be responded to. 4. Sub-maintainers use it as a tool - and only really consider critical/high bugs as being potentially important, and only if they can determine that the points made are valid AND importantly - only if doing so doesn't take all that much time. Personally I am _already_ using sashiko as part of review for people to some degree. I see that as being the more useful means of using it. Treat it as the experiment it is, rather than reflexively deciding to demand all points get responded to. > > > Sure. But I'm thinking about the human factor - I simply don't think either > > contributors or maintainers will be particularly less stressed with the > > introduction of obligatory AI reviews. Maintainers are still hardpressed > > to review (as is their function), and contributors need to go through the > > tool's output and figure out what's relevant (and _true_) or what's not. > > Yeah, it's a matter of figuring this out as we go along. It will be so > much better if/when people are able to use sashiko privately. But > heck, people forget to run checkpatch ;) But we're not 'figuring it out', you're not discussing anything with sub-maintainers or the community, you're unilaterally telling people they HAVE to respond to everything sashiko says or you WON'T TAKE the patch. And also (you ignored my reply on this and replied to Pedro instead) So where's the figuring out exactly? > > > IF we were able to codify the MM process like in (https://docs.kernel.org/process/maintainer-netdev.html), > > with things like: > > - NO patch is getting in without being 1) written by a maintainer or 2) getting Rb's and Acks from M's and R's > > Sure. Where "in" means mm-stable. I'm not sure anybody said otherwise?? > > > - Ideally both, but maple and DAMON need special casing for now, I guess. > > We do get quite a lot of patches from sole maintainers. > > > - NO -next content is being accepted during the merge window. straight to /dev/null. > > For sure. Well. I usually park these thing to take a look at after we're all > merged up, but it's usually all stale by then. > > > - review state for each patch is <here> > > I generate that now, with the occasional "mm.git review status" emails. > I could run it daily and add it to mm.git or something, but this > doesn't seem to have generated much interest. > > > I understand. Ideally, sashiko would be a tool that maintainers and > > reviewers (and submitters) could use to help find problems. I don't think > > having you check every AI review scales. But I also don't think we should be > > treating LLM output as if it were a normal review from an expert. > > Sure, But that hit rate is so high! Addressed above. Disagree. Please listen to the people doing the actual review in mm. Thanks, Lorenzo
Andrew Morton <akpm@linux-foundation.org> writes: > On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > >> A lot of patchsets are "failed to apply". What is Sashiko trying to >> apply MM patches to? It would take some smarts to apply the v2 >> patchset when v1 is presently in mm.git? > > ? It's displayed in the Baseline section for every patchset. For mm patchsets if the base commit is not specified it's mm-new then mm-unstable then mm-stable then linux-next/HEAD and then linus/HEAD (and now I think that it should not only show HEAD, but the actual sha). I don't have yet support for "previous version is applied, let's revert it and try the new one" case. Something to add later. > The way things are going at present, I'm just not going to apply a > series which Sashiko "failed to apply". And that's cool, I'll just > wait for a version which Sashiko was able to apply. And then not > apply unless all Sashiko questions are resolved or convincingly refuted. > > Question please: if Sashiko finds an "issue" in v3 and then v4 comes > out with changelog words which justifies the questionable alteration, can > Sashiko parse that changelog justification and think "OK, never mind"? Yes, I'm planning to add it. Sashiko will have an access to previous versions of the patchset and the whole discussion thread and take it into the account. Thanks!
On Sat, Mar 21, 2026 at 07:12:13PM -0700, Roman Gushchin wrote: > Andrew Morton <akpm@linux-foundation.org> writes: > > > On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > > >> A lot of patchsets are "failed to apply". What is Sashiko trying to > >> apply MM patches to? It would take some smarts to apply the v2 > >> patchset when v1 is presently in mm.git? > > > > ? > > It's displayed in the Baseline section for every patchset. > > For mm patchsets if the base commit is not specified it's mm-new then > mm-unstable then mm-stable then linux-next/HEAD and then linus/HEAD > (and now I think that it should not only show HEAD, but the actual sha). > > I don't have yet support for "previous version is applied, let's revert > it and try the new one" case. Something to add later. > > > The way things are going at present, I'm just not going to apply a > > series which Sashiko "failed to apply". And that's cool, I'll just > > wait for a version which Sashiko was able to apply. And then not > > apply unless all Sashiko questions are resolved or convincingly refuted. > > > > Question please: if Sashiko finds an "issue" in v3 and then v4 comes > > out with changelog words which justifies the questionable alteration, can > > Sashiko parse that changelog justification and think "OK, never mind"? > > Yes, I'm planning to add it. Sashiko will have an access to previous > versions of the patchset and the whole discussion thread and take it > into the account. Hmm this question presupposes that we should have to respond somehow to Sashiko feedback, but with ~50% signal vs. noise (my experience so far) that's just not sensible, and a painful addition to already overstrained workload. For instance https://sashiko.dev/#/patchset/cover.1774029655.git.ljs%40kernel.org is full of pretty useless stuff, including a silly hallucination (VM_WARN_ON_ONCE() cannot be used as a conditional, it's defined as (void)WARN_ON_ONCE() when CONFIG_DEBUG_VM is enabled). I don't want to have to explain why exactly I'm ignoring certain things each time. Until the noise vs. signal is better, I really don't want Sashiko to block anything or necessitate responses, which is why I'm very reticent to see it send emails other than privately directly to the author perhaps. > > Thanks! Thanks, Lorenzo
On 3/23/26 12:19, Lorenzo Stoakes (Oracle) wrote: > On Sat, Mar 21, 2026 at 07:12:13PM -0700, Roman Gushchin wrote: >> Andrew Morton <akpm@linux-foundation.org> writes: >> >>> >>> >>> ? >> >> It's displayed in the Baseline section for every patchset. >> >> For mm patchsets if the base commit is not specified it's mm-new then >> mm-unstable then mm-stable then linux-next/HEAD and then linus/HEAD >> (and now I think that it should not only show HEAD, but the actual sha). >> >> I don't have yet support for "previous version is applied, let's revert >> it and try the new one" case. Something to add later. >> >>> The way things are going at present, I'm just not going to apply a >>> series which Sashiko "failed to apply". And that's cool, I'll just >>> wait for a version which Sashiko was able to apply. And then not >>> apply unless all Sashiko questions are resolved or convincingly refuted. >>> >>> Question please: if Sashiko finds an "issue" in v3 and then v4 comes >>> out with changelog words which justifies the questionable alteration, can >>> Sashiko parse that changelog justification and think "OK, never mind"? >> >> Yes, I'm planning to add it. Sashiko will have an access to previous >> versions of the patchset and the whole discussion thread and take it >> into the account. > > Hmm this question presupposes that we should have to respond somehow to > Sashiko feedback, but with ~50% signal vs. noise (my experience so far) > that's just not sensible, and a painful addition to already overstrained > workload. > > For instance > https://sashiko.dev/#/patchset/cover.1774029655.git.ljs%40kernel.org is > full of pretty useless stuff, including a silly hallucination > (VM_WARN_ON_ONCE() cannot be used as a conditional, it's defined as > (void)WARN_ON_ONCE() when CONFIG_DEBUG_VM is enabled). > > I don't want to have to explain why exactly I'm ignoring certain things > each time. > > Until the noise vs. signal is better, I really don't want Sashiko to block > anything or necessitate responses, which is why I'm very reticent to see it > send emails other than privately directly to the author perhaps. 100% agreed. It's a pain to dig through the AI output to find something useful. Fortunately there is some useful stuff in there every now and then. I've seen the AI either raises wrong stuff or just brings up stuff that is completely unrelated to the actual code changes, which is quite the time sink and TBH annoying. Particularly annoying if review on a new revision suddenly includes new slop. I wish we could tune Sashiko to focus on serious regressions, and only report them if it is extremely sure that there is something real in there. -- Cheers, David
On Thu, Mar 19, 2026 at 08:09:17PM -0700, Andrew Morton wrote: > On Thu, 19 Mar 2026 13:00:06 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote: > > > The zap_huge_pmd() function is overly complicated, clean it up and also add > > an assert in the case that we encounter a buggy PMD entry that doesn't > > match expectations. > > > > This is motivated by a bug discovered [0] where the PMD entry was none of: > > > > * A non-DAX, PFN or mixed map. > > * The huge zero folio > > * A present PMD entry > > * A softleaf entry > > > > In zap_huge_pmd(), but due to the bug we manged to reach this code. > > > > It is useful to explicitly call this out rather than have an arbitrary NULL > > pointer dereference happen, which also improves understanding of what's > > going on. > > > > [0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/ > > AI review has questions, which I assume you've seen > https://sashiko.dev/#/patchset/cover.1773924928.git.ljs%40kernel.org Nope but I'll have a look through and see what's valid. > > > > This isn't going well from a workflow POV. I merge stuff (this was v2) > then half a day later a bunch of potential issues are identified. > > If these reviews are useful (they seem to be, enough) then I guess I'll > need to further increase the lag between seeing-it and merging-it. But > if there's a 2-day lag before I get onto a series and I'm the first to > look at Sashiko then that won't help. > > So it needs to be something like > > - series is posted > - 24 hours pass > - submitter takes a look at the AI review, maybe prepares a new > series. > - 24 hours pass > - rinse, repeat > - it gets merged, hopefully with some Reviewed-by"s. > > Not unreasonable, but it requires that submitter be made aware of > Sashiko's comments. At present that's via me being tiresome. > > > Anyway, early days. I'm thinking that an emailed reply-to-all from > Sashiko will help. Much hinges on how useful submitters find these > questions to be - something which I'm paying close attention to... > Please not yet, it produces a lot of noise. I've responded at length on the thread on this [0], and while I appreciate the tooling, it's not ready to be treated as giving entirely valid feedback yet :) I think David's on the same page as me on this. Cheers, Lorenzo https://lore.kernel.org/all/39e6b4d2-8a30-4eaa-908d-5d11b746f8d5@lucifer.local/
© 2016 - 2026 Red Hat, Inc.