... of the huge monolithic source file. The series is largely code movement and hence has the intention of not incurring any functional change. It has now been almost a year since the v1 submission, without having had any feedback. Some re-basing was necessary in the meantime, and a new patch (the last one) has been added - even if seemingly unrelated, it was in this context where I did think of that possible adjustment (which may want to be viewed somewhat RFC, as I know there are reservations against the use of -Os). 1: split off opcode 0f01 handling 2: split off opcode 0fae handling 3: split off opcode 0fc7 handling 4: split off FPU opcode handling 5: split off insn decoding 6: move x86_emul_blk() to separate source file 7: move various utility functions to separate source files 8: build with -Os Jan
Hi, It seems that this series has been stale for nearly a month, with nothing heard from maintainers. So I am sending this email as a gentle reminder for maintainers. Thanks! Kind regards, Henry > -----Original Message----- > Subject: [PATCH v2 0/8] x86emul: a few small steps towards disintegration > > ... of the huge monolithic source file. The series is largely code > movement and hence has the intention of not incurring any functional > change. > > It has now been almost a year since the v1 submission, without having > had any feedback. Some re-basing was necessary in the meantime, and a > new patch (the last one) has been added - even if seemingly unrelated, > it was in this context where I did think of that possible adjustment > (which may want to be viewed somewhat RFC, as I know there are > reservations against the use of -Os). > > 1: split off opcode 0f01 handling > 2: split off opcode 0fae handling > 3: split off opcode 0fc7 handling > 4: split off FPU opcode handling > 5: split off insn decoding > 6: move x86_emul_blk() to separate source file > 7: move various utility functions to separate source files > 8: build with -Os > > Jan
On 06.07.2022 09:31, Henry Wang wrote: > It seems that this series has been stale for nearly a month, with nothing heard > from maintainers. So I am sending this email as a gentle reminder for maintainers. A month? That's only since v2 submission. See ... >> -----Original Message----- >> Subject: [PATCH v2 0/8] x86emul: a few small steps towards disintegration >> >> ... of the huge monolithic source file. The series is largely code >> movement and hence has the intention of not incurring any functional >> change. >> >> It has now been almost a year since the v1 submission, without having >> had any feedback. ... here. Jan >> Some re-basing was necessary in the meantime, and a >> new patch (the last one) has been added - even if seemingly unrelated, >> it was in this context where I did think of that possible adjustment >> (which may want to be viewed somewhat RFC, as I know there are >> reservations against the use of -Os). >> >> 1: split off opcode 0f01 handling >> 2: split off opcode 0fae handling >> 3: split off opcode 0fc7 handling >> 4: split off FPU opcode handling >> 5: split off insn decoding >> 6: move x86_emul_blk() to separate source file >> 7: move various utility functions to separate source files >> 8: build with -Os >> >> Jan >
Hi Jan, > -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > > On 06.07.2022 09:31, Henry Wang wrote: > > It seems that this series has been stale for nearly a month, with nothing > heard > > from maintainers. So I am sending this email as a gentle reminder for > maintainers. > > A month? That's only since v2 submission. See ... Oh, yes indeed. Thank you for the background information, normally I use patchwork to track series I did missed the v1 as v1 is superseded by v2 there (thus "a month", but honestly this series is stale for more than a year...). > > >> -----Original Message----- > >> Subject: [PATCH v2 0/8] x86emul: a few small steps towards disintegration > >> > >> ... of the huge monolithic source file. The series is largely code > >> movement and hence has the intention of not incurring any functional > >> change. > >> > >> It has now been almost a year since the v1 submission, without having > >> had any feedback. To possibly make the ping better, I will try add Wei Liu's another 2 emails in the CC list (I was once told by him in this way he will be more likely to notice the email) and put Andrew as TO. Kind regards, Henry > > ... here. > > Jan > > >> Some re-basing was necessary in the meantime, and a > >> new patch (the last one) has been added - even if seemingly unrelated, > >> it was in this context where I did think of that possible adjustment > >> (which may want to be viewed somewhat RFC, as I know there are > >> reservations against the use of -Os). > >> > >> 1: split off opcode 0f01 handling > >> 2: split off opcode 0fae handling > >> 3: split off opcode 0fc7 handling > >> 4: split off FPU opcode handling > >> 5: split off insn decoding > >> 6: move x86_emul_blk() to separate source file > >> 7: move various utility functions to separate source files > >> 8: build with -Os > >> > >> Jan > >
On Wed, Jun 15, 2022 at 11:57:54AM +0200, Jan Beulich wrote: > ... of the huge monolithic source file. The series is largely code > movement and hence has the intention of not incurring any functional > change. I take the intention is to make code simpler and easier to follow by splitting it up into smaller files? Thanks, Roger.
On 28.03.2023 16:19, Roger Pau Monné wrote: > On Wed, Jun 15, 2022 at 11:57:54AM +0200, Jan Beulich wrote: >> ... of the huge monolithic source file. The series is largely code >> movement and hence has the intention of not incurring any functional >> change. > > I take the intention is to make code simpler and easier to follow by > splitting it up into smaller files? Well, I can't say yes or no to "simpler" or "easier to follow", but splitting is the goal, in the hope that these may end up as a side effects. There's always the risk that scattering things around may also make things less obvious. My main motivation, however, is the observation that this huge source file alone consumes a fair part of (non-parallelizable) build time. To the degree that with older gcc building this one file takes ten (or so) times as long as the entire rest of the hypervisor. Jan
On Tue, Mar 28, 2023 at 04:48:10PM +0200, Jan Beulich wrote: > On 28.03.2023 16:19, Roger Pau Monné wrote: > > On Wed, Jun 15, 2022 at 11:57:54AM +0200, Jan Beulich wrote: > >> ... of the huge monolithic source file. The series is largely code > >> movement and hence has the intention of not incurring any functional > >> change. > > > > I take the intention is to make code simpler and easier to follow by > > splitting it up into smaller files? > > Well, I can't say yes or no to "simpler" or "easier to follow", but > splitting is the goal, in the hope that these may end up as a side > effects. There's always the risk that scattering things around may > also make things less obvious. My main motivation, however, is the > observation that this huge source file alone consumes a fair part > of (non-parallelizable) build time. To the degree that with older > gcc building this one file takes ten (or so) times as long as the I wouldn't really trade compiler speed for clarity in a piece of code like the x86 emulator, which is already very complex. Do you have some figures of the performance difference this series makes when building the emulator? A couple of notes from someone that's not familiar with the x86 emulator. It would be clearer if the split files where prefixed with opcode_ for those that deal with emulation (blk.c, 0f01.c, ...) IMO, so that you clearly see this is an opcode family that has been split into a separate file, or maybe insn_{opcode,blk,fpu,...}? I've noticed in some of the newly introduced files the original copyright notice from Keir is lost, I assume that's on purpose because none of the code was contributed by Kier in that file? (see 0f01.c vs 0fae.c for example). Do we expect to be able to split other opcode handling into separate files? I wonder how tied together are the remaining cases, and whether we will be able to split more of them into separate units? Overall I don't think the disintegration makes the code harder, as the split cases seems to be fully isolated, I do however wonder whether further splits might not be so isolated? Thanks, Roger.
On 29.03.2023 16:17, Roger Pau Monné wrote: > On Tue, Mar 28, 2023 at 04:48:10PM +0200, Jan Beulich wrote: >> On 28.03.2023 16:19, Roger Pau Monné wrote: >>> On Wed, Jun 15, 2022 at 11:57:54AM +0200, Jan Beulich wrote: >>>> ... of the huge monolithic source file. The series is largely code >>>> movement and hence has the intention of not incurring any functional >>>> change. >>> >>> I take the intention is to make code simpler and easier to follow by >>> splitting it up into smaller files? >> >> Well, I can't say yes or no to "simpler" or "easier to follow", but >> splitting is the goal, in the hope that these may end up as a side >> effects. There's always the risk that scattering things around may >> also make things less obvious. My main motivation, however, is the >> observation that this huge source file alone consumes a fair part >> of (non-parallelizable) build time. To the degree that with older >> gcc building this one file takes ten (or so) times as long as the > > I wouldn't really trade compiler speed for clarity in a piece of code > like the x86 emulator, which is already very complex. Of course, and I specifically said "main" motivation. The hope is that by splitting things become less entangled / convoluted. I guess fpu.c is a good example where certain non-trivial macros have isolated use, and hence are no longer cluttering other parts of the emulator sources. > Do you have some figures of the performance difference this series > makes when building the emulator? No, I don't. And the difference isn't going to be significant, I expect, as the build being slow is - from all I can tell - directly connected to the huge switch() statement. Yet the number of cases there shrinks only marginally for now. The series is named "a few small steps" for this reason, along with others. See below for a first bigger step, which may then make a noticeable difference. > A couple of notes from someone that's not familiar with the x86 > emulator. It would be clearer if the split files where prefixed with > opcode_ for those that deal with emulation (blk.c, 0f01.c, ...) IMO, > so that you clearly see this is an opcode family that has been split > into a separate file, or maybe insn_{opcode,blk,fpu,...}? Hmm. For one, "blk" isn't really dealing with any opcode family in particular. It contains a helper function for code using the emulator. So it falls more in the group of util*.c. For the others may main objection would be that I'd prefer to keep file names short. At least at this point of splitting I think file names are sufficiently descriptive. Nevertheless, insn-0f01.c or opc-0f01.c may be options, if we really think we want/need to group files visually. However, I don't expect there are going to be more files paralleling 0f01.c et al: The opcode groups split out are the ones that are large/heterogeneous enough to warrant doing it on this basis. Of course new such groups may appear in the ISA down the road. FPU is isolated functionally, and I'd expect a simd.c to appear once it becomes clear if/how to sensibly split out SIMD code. Unlike fpu.c I'd further expect this to (long term) consist of more than just a single function, hopefully replacing the massive use of "goto" within that big switch statement by function calls (but as said, plans here - if one can call it that way in the first place - are very vague at this point). > I've noticed in some of the newly introduced files the original > copyright notice from Keir is lost, I assume that's on purpose because > none of the code was contributed by Kier in that file? (see 0f01.c vs > 0fae.c for example). Right - 0fae.c contains only code which was added later (mostly by me), if I'm not mistaken. > Do we expect to be able to split other opcode handling into separate > files? As per above, "expect" is perhaps too optimistic. I'd say "hope", in particular for SIMD code (which I guess is now the main part of the ISA as well as the sources, at least number-of-insns-wise). One possible (likely intermediate) approach might be to move all SIMD code from the huge switch() statement to its own file/function, invoked from that huge switch()'s default: case. It may then still be a big switch() statement in (e.g.) simd.c, but we'd then at least have two of them, each about half the size of what we have right now. Plus it may allow some (most?) of the X86EMUL_NO_SIMD #ifdef-ary to be dropped, as the new file - like fpu.c - could then itself be built only conditionally. > I wonder how tied together are the remaining cases, and whether we > will be able to split more of them into separate units? That's the big open question indeed. As per above - with some effort I expect all SIMD code could collectively be moved out; further splitting would likely end up more involved. > Overall I don't think the disintegration makes the code harder, as the split > cases seems to be fully isolated, I do however wonder whether further splits > might not be so isolated? And again - indeed. This series, while already a lot of code churn, is only collecting some of the very low hanging fruit. But at least I hope that the pieces now separated out won't need a lot of touching later on, except of course to add support for new insns. Jan
On Thu, Mar 30, 2023 at 09:53:23AM +0200, Jan Beulich wrote: > On 29.03.2023 16:17, Roger Pau Monné wrote: > > On Tue, Mar 28, 2023 at 04:48:10PM +0200, Jan Beulich wrote: > >> On 28.03.2023 16:19, Roger Pau Monné wrote: > >>> On Wed, Jun 15, 2022 at 11:57:54AM +0200, Jan Beulich wrote: > >>>> ... of the huge monolithic source file. The series is largely code > >>>> movement and hence has the intention of not incurring any functional > >>>> change. > >>> > >>> I take the intention is to make code simpler and easier to follow by > >>> splitting it up into smaller files? > >> > >> Well, I can't say yes or no to "simpler" or "easier to follow", but > >> splitting is the goal, in the hope that these may end up as a side > >> effects. There's always the risk that scattering things around may > >> also make things less obvious. My main motivation, however, is the > >> observation that this huge source file alone consumes a fair part > >> of (non-parallelizable) build time. To the degree that with older > >> gcc building this one file takes ten (or so) times as long as the > > > > I wouldn't really trade compiler speed for clarity in a piece of code > > like the x86 emulator, which is already very complex. > > Of course, and I specifically said "main" motivation. The hope is that > by splitting things become less entangled / convoluted. I guess fpu.c > is a good example where certain non-trivial macros have isolated use, > and hence are no longer cluttering other parts of the emulator sources. > > > Do you have some figures of the performance difference this series > > makes when building the emulator? > > No, I don't. And the difference isn't going to be significant, I expect, > as the build being slow is - from all I can tell - directly connected to > the huge switch() statement. Yet the number of cases there shrinks only > marginally for now. The series is named "a few small steps" for this > reason, along with others. See below for a first bigger step, which may > then make a noticeable difference. > > > A couple of notes from someone that's not familiar with the x86 > > emulator. It would be clearer if the split files where prefixed with > > opcode_ for those that deal with emulation (blk.c, 0f01.c, ...) IMO, > > so that you clearly see this is an opcode family that has been split > > into a separate file, or maybe insn_{opcode,blk,fpu,...}? > > Hmm. For one, "blk" isn't really dealing with any opcode family in > particular. It contains a helper function for code using the emulator. > So it falls more in the group of util*.c. For the others may main > objection would be that I'd prefer to keep file names short. At least > at this point of splitting I think file names are sufficiently > descriptive. Nevertheless, insn-0f01.c or opc-0f01.c may be options, if > we really think we want/need to group files visually. However, I don't > expect there are going to be more files paralleling 0f01.c et al: The > opcode groups split out are the ones that are large/heterogeneous > enough to warrant doing it on this basis. Of course new such groups may > appear in the ISA down the road. > > FPU is isolated functionally, and I'd expect a simd.c to appear once > it becomes clear if/how to sensibly split out SIMD code. Unlike fpu.c > I'd further expect this to (long term) consist of more than just a > single function, hopefully replacing the massive use of "goto" within > that big switch statement by function calls (but as said, plans here > - if one can call it that way in the first place - are very vague at > this point). > > > I've noticed in some of the newly introduced files the original > > copyright notice from Keir is lost, I assume that's on purpose because > > none of the code was contributed by Kier in that file? (see 0f01.c vs > > 0fae.c for example). > > Right - 0fae.c contains only code which was added later (mostly by me), > if I'm not mistaken. OK, just wanted to make sure this wasn't an oversight. > > Do we expect to be able to split other opcode handling into separate > > files? > > As per above, "expect" is perhaps too optimistic. I'd say "hope", in > particular for SIMD code (which I guess is now the main part of the > ISA as well as the sources, at least number-of-insns-wise). One > possible (likely intermediate) approach might be to move all SIMD code > from the huge switch() statement to its own file/function, invoked > from that huge switch()'s default: case. It may then still be a big > switch() statement in (e.g.) simd.c, but we'd then at least have two > of them, each about half the size of what we have right now. Plus it > may allow some (most?) of the X86EMUL_NO_SIMD #ifdef-ary to be dropped, > as the new file - like fpu.c - could then itself be built only > conditionally. I don't like the handling of SIMD from a default case in the global switch much, as we then could end up chaining all kind of random handling in the default case. It's IMO clearer if we can detect and forward insn to the SIMD code when we know it's a SIMD instruction. I guess that's for another series anyway, so not really the point here. > > I wonder how tied together are the remaining cases, and whether we > > will be able to split more of them into separate units? > > That's the big open question indeed. As per above - with some effort > I expect all SIMD code could collectively be moved out; further > splitting would likely end up more involved. > > > Overall I don't think the disintegration makes the code harder, as the split > > cases seems to be fully isolated, I do however wonder whether further splits > > might not be so isolated? > > And again - indeed. This series, while already a lot of code churn, is > only collecting some of the very low hanging fruit. But at least I > hope that the pieces now separated out won't need a lot of touching > later on, except of course to add support for new insns. OK, I'm a bit concerned that we end up growing duplicated switch cases, in the sense that we will have the following: switch ( insn ) { case 0x100: case 0x1f0: case 0x200: x86emul_foo(); ... } x86emul_foo() { switch (insn ) { case 0x100: /* Handle. */ break; case 0x1f0: /* Handle. */ break; case 0x200: ... } } So that we would end up having to add the opcodes twice, once in the generic switch, and then again at place the insn are actually handled. So far the introduced splits seems fine in that they deal with a contiguous range of opcodes. For patches 1-7: Acked-by: Roger Pau Monné <roger.pau@citrix.com> Patch 8 I'm unsure, I guess it should be up to the user to decide whether to use -Os or some other optimization? I expect introspection users likely care way more about the speed rather than the size of the generated x86 emulator code? Thanks, Roger.
On 30.03.2023 11:54, Roger Pau Monné wrote: > On Thu, Mar 30, 2023 at 09:53:23AM +0200, Jan Beulich wrote: >> On 29.03.2023 16:17, Roger Pau Monné wrote: >>> Do we expect to be able to split other opcode handling into separate >>> files? >> >> As per above, "expect" is perhaps too optimistic. I'd say "hope", in >> particular for SIMD code (which I guess is now the main part of the >> ISA as well as the sources, at least number-of-insns-wise). One >> possible (likely intermediate) approach might be to move all SIMD code >> from the huge switch() statement to its own file/function, invoked >> from that huge switch()'s default: case. It may then still be a big >> switch() statement in (e.g.) simd.c, but we'd then at least have two >> of them, each about half the size of what we have right now. Plus it >> may allow some (most?) of the X86EMUL_NO_SIMD #ifdef-ary to be dropped, >> as the new file - like fpu.c - could then itself be built only >> conditionally. > > I don't like the handling of SIMD from a default case in the global > switch much, as we then could end up chaining all kind of random > handling in the default case. It's IMO clearer if we can detect and > forward insn to the SIMD code when we know it's a SIMD instruction. Avoiding of which would require ... >>> I wonder how tied together are the remaining cases, and whether we >>> will be able to split more of them into separate units? >> >> That's the big open question indeed. As per above - with some effort >> I expect all SIMD code could collectively be moved out; further >> splitting would likely end up more involved. >> >>> Overall I don't think the disintegration makes the code harder, as the split >>> cases seems to be fully isolated, I do however wonder whether further splits >>> might not be so isolated? >> >> And again - indeed. This series, while already a lot of code churn, is >> only collecting some of the very low hanging fruit. But at least I >> hope that the pieces now separated out won't need a lot of touching >> later on, except of course to add support for new insns. > > OK, I'm a bit concerned that we end up growing duplicated switch > cases, in the sense that we will have the following: > > switch ( insn ) > { > case 0x100: > case 0x1f0: > case 0x200: > x86emul_foo(); > ... > } > > x86emul_foo() > { > switch (insn ) > { > case 0x100: > /* Handle. */ > break; > > case 0x1f0: > /* Handle. */ > break; > > case 0x200: > ... > } > } > > So that we would end up having to add the opcodes twice, once in the > generic switch, and then again at place the insn are actually handled. ... doing exactly this, which looks like we a agree we'd like to avoid. I'm also not really concerned of "chaining all kind of random handling in the default case" - where things are terminated with an error code doesn't really matter. As first step here I'm literally thinking of splitting the huge switch() into two. The only thing that I can see may get in the way is the resulting SIMD function requiring like 20 parameters. > So far the introduced splits seems fine in that they deal with a > contiguous range of opcodes. > > For patches 1-7: > > Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks! > Patch 8 I'm unsure, I guess it should be up to the user to decide > whether to use -Os or some other optimization? > > I expect introspection users likely care way more about the speed > rather than the size of the generated x86 emulator code? Perhaps. But do we want to introduce another Kconfig control just for that? And wouldn't there likely be other performance concerns, if performance really mattered in the introspection case? Jan
On Thu, Mar 30, 2023 at 01:40:43PM +0200, Jan Beulich wrote: > On 30.03.2023 11:54, Roger Pau Monné wrote: > > On Thu, Mar 30, 2023 at 09:53:23AM +0200, Jan Beulich wrote: > >> On 29.03.2023 16:17, Roger Pau Monné wrote: > > Patch 8 I'm unsure, I guess it should be up to the user to decide > > whether to use -Os or some other optimization? > > > > I expect introspection users likely care way more about the speed > > rather than the size of the generated x86 emulator code? > > Perhaps. But do we want to introduce another Kconfig control just > for that? And wouldn't there likely be other performance concerns, > if performance really mattered in the introspection case? I don't really have a strong opinion on the usage of -Os or not. It's likely fine as long as we also stay consistent with the flag we use when building the test harness and the fuzzer, just in case. Thanks, Roger.
On 30.03.2023 16:27, Roger Pau Monné wrote: > On Thu, Mar 30, 2023 at 01:40:43PM +0200, Jan Beulich wrote: >> On 30.03.2023 11:54, Roger Pau Monné wrote: >>> On Thu, Mar 30, 2023 at 09:53:23AM +0200, Jan Beulich wrote: >>>> On 29.03.2023 16:17, Roger Pau Monné wrote: >>> Patch 8 I'm unsure, I guess it should be up to the user to decide >>> whether to use -Os or some other optimization? >>> >>> I expect introspection users likely care way more about the speed >>> rather than the size of the generated x86 emulator code? >> >> Perhaps. But do we want to introduce another Kconfig control just >> for that? And wouldn't there likely be other performance concerns, >> if performance really mattered in the introspection case? > > I don't really have a strong opinion on the usage of -Os or not. It's > likely fine as long as we also stay consistent with the flag we use > when building the test harness and the fuzzer, just in case. While they might be built with different compilers (and hence things can't be compared directly anyway), I can certainly make those two use -Os as well. Jan
© 2016 - 2024 Red Hat, Inc.