target/hppa/translate.c | 18 ++++++++++++++---- target/rx/translate.c | 12 ++++++++---- 2 files changed, 22 insertions(+), 8 deletions(-)
Greetings to all QEMU developers! First of all, I want to say I'm really honored that I will be working with you on the Google Summer of Code project "TCG Continuous Benchmarking" this June, July and August. This is my first set of patches sent to the QEMU mailing list, that came up as a result of some preliminary exploration for this project. *** Dear Mr. Richard, During some performance experiments that Mr. Aleksandar Markovic an I were doing recently, we discovered a performance drop in the the hppa target user mode after the commit below: https://git.qemu.org/?p=qemu.git;a=commit;h=f3b423ec6e The issue arose because the page crossings check in use_goto_tb() function is required only in the system mode. Checking it in both modes causes an unnecessary overhead in the user mode. The issue can be solved by simply checking for the page crossings only in the system mode. By doing so, the runtime of the target decreased by up to 6.93% on some benchmarks that we tested against. The patch 1/2 proposes such solution for the mentioned problem. *** Dear Mr. Yoshinori, For the rx target, the problem is almost the opposite to one described above for the hppa target problem. The page crossings check in rx's use_goto_tb() was missing, so I included it to be checked, and, off course, in the system mode only. Just for the sake of clarity, I added some other relatively minor improvements to that function. Without this patch, you may experience intermittent and hard-to-debug bugs while running rx target in system-mode. The patch 2/2 proposes the solution for the problem I described. *** The approach to the "TCG Continuous Benchmarking" project adopted by Mr. Aleksandar and I can by called "tool-agnostic". This means that we will not favor any single performance-measurement-related tool, but that we will consider a wide range of such tools, examine the details of functionalities they provide, and use the one that fits a given situation the most. For performance measurements related to patch 1/2, we utilized a tool called Valgrind (more precisely, its subtool called Callgrind) - we measured the performance before and after the fix using it. The reason for such choice is that this method gave use the most stable results - the results between subsequent measurements of an identical experiment had very little differences. That quality was not present in other tools, or, at least, was not that good for other tools. We plan to continue such pragmatic approach to tools choice - rather than fix the preferred tool in advance at the beginning of the project. For starters, we will present to you a comparison between Perf and Valgrind/Callgrind tools. To know more on the setup, usage, pros and cons of each of these tools, keep an eye on the mailing list for a detailed post that will be sent by myself very soon. It will cover the basics of what you should know to start integrating these tools in your development workflow. Best regards, Ahmed Karaman Ahmed Karaman (2): target/hppa: Check page crossings in use_goto_tb() only in system mode target/rx: Check for page crossings in use_goto_tb() target/hppa/translate.c | 18 ++++++++++++++---- target/rx/translate.c | 12 ++++++++---- 2 files changed, 22 insertions(+), 8 deletions(-) -- 2.17.1
On 5/19/20 9:21 AM, Ahmed Karaman wrote: > The issue arose because the page crossings check in use_goto_tb() > function is required only in the system mode. Checking it in both modes > causes an unnecessary overhead in the user mode. It is not only required in system mode. You can see failures in user-mode if you modify executable pages, or change their permissions with mmap. Such as if the guest program contains a JIT. r~
Richard Henderson <richard.henderson@linaro.org> writes: > On 5/19/20 9:21 AM, Ahmed Karaman wrote: >> The issue arose because the page crossings check in use_goto_tb() >> function is required only in the system mode. Checking it in both modes >> causes an unnecessary overhead in the user mode. > > It is not only required in system mode. > > You can see failures in user-mode if you modify executable pages, or change > their permissions with mmap. Such as if the guest program contains a > JIT. If we kept better track couldn't we just tb_flush() if a new +x region gets mmaped? I guess that would be sub-optimal compared to having a translation cache per mmap region. -- Alex Bennée
On 5/19/20 11:38 AM, Alex Bennée wrote: > > Richard Henderson <richard.henderson@linaro.org> writes: > >> On 5/19/20 9:21 AM, Ahmed Karaman wrote: >>> The issue arose because the page crossings check in use_goto_tb() >>> function is required only in the system mode. Checking it in both modes >>> causes an unnecessary overhead in the user mode. >> >> It is not only required in system mode. >> >> You can see failures in user-mode if you modify executable pages, or change >> their permissions with mmap. Such as if the guest program contains a >> JIT. > > If we kept better track couldn't we just tb_flush() if a new +x region > gets mmaped? I guess that would be sub-optimal compared to having a > translation cache per mmap region. > Yes, this could definitely be improved. Noticing changes to PROT_EXEC via mprotect, for one. r~
On Tue, May 19, 2020 at 8:01 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 5/19/20 9:21 AM, Ahmed Karaman wrote:
> > The issue arose because the page crossings check in use_goto_tb()
> > function is required only in the system mode. Checking it in both
> > modes causes an unnecessary overhead in the user mode.
>
> It is not only required in system mode.
>
> You can see failures in user-mode if you modify executable pages, or
> change their permissions with mmap. Such as if the guest program
> contains a JIT.
>
>
> r~
Hi Mr. Richard,
I've checked how the use_goto_tb() function is implemented in other
targets, and it appears that they do the page crossings check in the
system mode only.
Below is an example from the arm64 target:
-----------------------------------------------------------------------
static inline bool use_goto_tb(DisasContext *s, int n, uint64_t dest)
{
/* No direct tb linking with singlestep (either QEMU's or the ARM
* debug architecture kind) or deterministic io
*/
if (s->base.singlestep_enabled || s->ss_active ||
(tb_cflags(s->base.tb) & CF_LAST_IO)) {
return false;
}
#ifndef CONFIG_USER_ONLY
/* Only link tbs from inside the same guest page */
if ((s->base.tb->pc & TARGET_PAGE_MASK)!=(dest & TARGET_PAGE_MASK)) {
return false;
}
#endif
return true;
}
-----------------------------------------------------------------------
Please let me know what you think. Does this mean that there is a bug
in this function for the other targets?
That we have to do the page crossings check in both modes to avoid the
user-mode failures that you have mentioned above?
Regards,
Ahmed Karaman
On 5/21/20 4:32 AM, Ahmed Karaman wrote: > Does this mean that there is a bug > in this function for the other targets? Yes, I think so. > That we have to do the page crossings check in both modes to avoid the > user-mode failures that you have mentioned above? Well, that or we need to fix linux-user/mmap.c to do all the invalidations required. r~
пет, 22. мај 2020. у 05:12 Richard Henderson <richard.henderson@linaro.org> је написао/ла: > > On 5/21/20 4:32 AM, Ahmed Karaman wrote: > > Does this mean that there is a bug > > in this function for the other targets? > > Yes, I think so. > > > That we have to do the page crossings check in both modes to avoid the > > user-mode failures that you have mentioned above? > > Well, that or we need to fix linux-user/mmap.c to do all the invalidations > required. > Hi, Mr. Richard, :) Many thanks for diagnosis, and, more than this, presenting two alternatives for resolution! That mmap()... It has been giving us the hardest time since forever, and it looks it continues to torture us. It looks we are now between a rock and a hard place. I slightly prefer the later (fixing mmap.c) alternative, just from the gut feeling that it is better to fix the problem at its source, rather than to apply (easy, but possibly performance-wise costly) band-aid to some later consequence. But it seem to me that this is as if I say I choose Charybdis, between Scylla and Charybdis. We also risk jumping out of the frying pan into the fire. Adding Laurent, since this is, in essence, a linux-user issue. If nobody objects, I will instruct Ahmed to file a bug in QEMU Bugzilla. Wishing to improve the performance, we found the hard bug... eh... Thanks again to Richard, Aleksandar > > r~ >
> If nobody objects, I will instruct Ahmed to file a bug in QEMU Bugzilla. > Ahmed, please follow the instructions on this page: How to report a QEMU bug <https://www.qemu.org/contribute/report-a-bug/> and file the bug related to the discussion, to the best of your abilities, if possible, today. This can be viewed as a part of your community bonding period. Thanks, Aleksandar
On Tue, May 26, 2020 at 4:14 PM Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> wrote: > Ahmed, please follow the instructions on this page: How to report a QEMU bug and file the bug related to the discussion, to the best of your abilities, if possible, today. Please find the link to the bug below: https://bugs.launchpad.net/qemu/+bug/1880722
уто, 26. мај 2020. у 18:08 Ahmed Karaman <ahmedkhaledkaraman@gmail.com> је написао/ла: > > On Tue, May 26, 2020 at 4:14 PM Aleksandar Markovic > <aleksandar.qemu.devel@gmail.com> wrote: > > Ahmed, please follow the instructions on this page: How to report a QEMU bug and file the bug related to the discussion, to the best of your abilities, if possible, today. > > Please find the link to the bug below: > https://bugs.launchpad.net/qemu/+bug/1880722 I think your last sentence in the bug report is not entirely correct. It is not known what would be performance results in case of correcting mmap.c. So, if possible, and unless Richard or someone else disagrees, please change that last sentence to: "By doing so, a better performance results could be achieved, compared to the case of the workaround described above." Also, please add the tag "linux-user". Yours, Aleksandar
Mr. Ahmed, The title "Changing executable page permissions with mmap causes user-mode failures", I know you paraphrased one of Richard's responses, but you picked up those pieces that don't convey the complete, larger, picture. So, the better, more inclusive, and much clearer title would be "Problems related to checking page crossing in use_goto_tb()" Yours, Mr. Aleksandar уто, 26. мај 2020. у 18:29 Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> је написао/ла: > > уто, 26. мај 2020. у 18:08 Ahmed Karaman > <ahmedkhaledkaraman@gmail.com> је написао/ла: > > > > On Tue, May 26, 2020 at 4:14 PM Aleksandar Markovic > > <aleksandar.qemu.devel@gmail.com> wrote: > > > Ahmed, please follow the instructions on this page: How to report a QEMU bug and file the bug related to the discussion, to the best of your abilities, if possible, today. > > > > Please find the link to the bug below: > > https://bugs.launchpad.net/qemu/+bug/1880722 > > I think your last sentence in the bug report is not entirely correct. > It is not known what would be performance results in case of > correcting mmap.c. So, if possible, and unless Richard or someone else > disagrees, please change that last sentence to: "By doing so, a better > performance results could be achieved, compared to the case of the > workaround described above." > > Also, please add the tag "linux-user". > > Yours, > Aleksandar
> > > > I think your last sentence in the bug report is not entirely correct. > > It is not known what would be performance results in case of > > correcting mmap.c. So, if possible, and unless Richard or someone else > > disagrees, please change that last sentence to: "By doing so, a better > > performance results could be achieved, compared to the case of the > > workaround described above." > > > > Also, please add the tag "linux-user". > > Ahmed, since rx target supports system only mode at the moment, they must include page crossing check in use_goto_tb(), which is missing right now. So, since the rx bug is of a little bit of different nature, please file another bug for rx target only - they have the bug in system mode too, as opposed to other targets. Their fix should and could be applied independently on any user-mode modifications for any other target. Sincerely, Aleksandar
© 2016 - 2025 Red Hat, Inc.