tcg/optimize.c | 226 ++++++++++++++++++++++++++++++++++++++++++++-- tcg/tcg-op-gvec.c | 39 ++------ 2 files changed, 225 insertions(+), 40 deletions(-)
This is aimed at improving gvec generated code, which involves large numbers of loads and stores to the env slots of the guest cpu vector registers. The final patch helps eliminate redundant zero-extensions that can appear with e.g. avx2 and sve. From the small amount of timing that I have done, there is no change. But of course as we all know, x86 is very good with redundant memory. And frankly, I haven't found a good test case for measuring. What I need is an algorithm with lots of integer vector code that can be expanded with gvec. Most of what I've found is either fp (out of line) or too simple (small translation blocks with little scope for optimization). That said, it appears to be simple enough, and does eliminate some redundant operations, even in places that I didn't expect. r~ Richard Henderson (4): tcg: Don't free vector results tcg/optimize: Pipe OptContext into reset_ts tcg: Optimize env memory operations tcg: Eliminate duplicate env store operations tcg/optimize.c | 226 ++++++++++++++++++++++++++++++++++++++++++++-- tcg/tcg-op-gvec.c | 39 ++------ 2 files changed, 225 insertions(+), 40 deletions(-) -- 2.34.1
在 2023/8/31 上午10:57, Richard Henderson 写道: > This is aimed at improving gvec generated code, which involves large > numbers of loads and stores to the env slots of the guest cpu vector > registers. The final patch helps eliminate redundant zero-extensions > that can appear with e.g. avx2 and sve. > > From the small amount of timing that I have done, there is no change. > But of course as we all know, x86 is very good with redundant memory. > And frankly, I haven't found a good test case for measuring. > What I need is an algorithm with lots of integer vector code that can > be expanded with gvec. Most of what I've found is either fp (out of > line) or too simple (small translation blocks with little scope for > optimization). > > That said, it appears to be simple enough, and does eliminate some > redundant operations, even in places that I didn't expect. > > > r~ > > > Richard Henderson (4): > tcg: Don't free vector results > tcg/optimize: Pipe OptContext into reset_ts > tcg: Optimize env memory operations > tcg: Eliminate duplicate env store operations > > tcg/optimize.c | 226 ++++++++++++++++++++++++++++++++++++++++++++-- > tcg/tcg-op-gvec.c | 39 ++------ > 2 files changed, 225 insertions(+), 40 deletions(-) > Patch 1 and Patch 3, s -i "/cpu_env/tcg_env/g " Reviewed-by: Song Gao <gaosong@loongson.cn> Thanks. Song Gao
Ping. r~ On 8/30/23 22:57, Richard Henderson wrote: > This is aimed at improving gvec generated code, which involves large > numbers of loads and stores to the env slots of the guest cpu vector > registers. The final patch helps eliminate redundant zero-extensions > that can appear with e.g. avx2 and sve. > > From the small amount of timing that I have done, there is no change. > But of course as we all know, x86 is very good with redundant memory. > And frankly, I haven't found a good test case for measuring. > What I need is an algorithm with lots of integer vector code that can > be expanded with gvec. Most of what I've found is either fp (out of > line) or too simple (small translation blocks with little scope for > optimization). > > That said, it appears to be simple enough, and does eliminate some > redundant operations, even in places that I didn't expect. > > > r~ > > > Richard Henderson (4): > tcg: Don't free vector results > tcg/optimize: Pipe OptContext into reset_ts > tcg: Optimize env memory operations > tcg: Eliminate duplicate env store operations > > tcg/optimize.c | 226 ++++++++++++++++++++++++++++++++++++++++++++-- > tcg/tcg-op-gvec.c | 39 ++------ > 2 files changed, 225 insertions(+), 40 deletions(-) >
Ping 2. On 9/28/23 15:45, Richard Henderson wrote: > Ping. > > r~ > > On 8/30/23 22:57, Richard Henderson wrote: >> This is aimed at improving gvec generated code, which involves large >> numbers of loads and stores to the env slots of the guest cpu vector >> registers. The final patch helps eliminate redundant zero-extensions >> that can appear with e.g. avx2 and sve. >> >> From the small amount of timing that I have done, there is no change. >> But of course as we all know, x86 is very good with redundant memory. >> And frankly, I haven't found a good test case for measuring. >> What I need is an algorithm with lots of integer vector code that can >> be expanded with gvec. Most of what I've found is either fp (out of >> line) or too simple (small translation blocks with little scope for >> optimization). >> >> That said, it appears to be simple enough, and does eliminate some >> redundant operations, even in places that I didn't expect. >> >> >> r~ >> >> >> Richard Henderson (4): >> tcg: Don't free vector results >> tcg/optimize: Pipe OptContext into reset_ts >> tcg: Optimize env memory operations >> tcg: Eliminate duplicate env store operations >> >> tcg/optimize.c | 226 ++++++++++++++++++++++++++++++++++++++++++++-- >> tcg/tcg-op-gvec.c | 39 ++------ >> 2 files changed, 225 insertions(+), 40 deletions(-) >> >
© 2016 - 2026 Red Hat, Inc.