As part of an goal to eliminate Perl from libvirt build tools,
rewrite the minimize-po.pl tool in Python.
This was a straight conversion, manually going line-by-line to
change the syntax from Perl to Python. Thus the overall structure
of the file and approach is the same.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
Makefile.am | 2 +-
build-aux/minimize-po.pl | 37 -------------------------
build-aux/minimize-po.py | 60 ++++++++++++++++++++++++++++++++++++++++
po/Makefile.am | 2 +-
4 files changed, 62 insertions(+), 39 deletions(-)
delete mode 100755 build-aux/minimize-po.pl
create mode 100755 build-aux/minimize-po.py
diff --git a/Makefile.am b/Makefile.am
index 17448a914e..8f688d40d0 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -45,7 +45,7 @@ EXTRA_DIST = \
build-aux/check-spacing.pl \
build-aux/gitlog-to-changelog \
build-aux/header-ifdef.pl \
- build-aux/minimize-po.pl \
+ build-aux/minimize-po.py \
build-aux/mock-noinline.pl \
build-aux/prohibit-duplicate-header.pl \
build-aux/useless-if-before-free \
diff --git a/build-aux/minimize-po.pl b/build-aux/minimize-po.pl
deleted file mode 100755
index 497533a836..0000000000
--- a/build-aux/minimize-po.pl
+++ /dev/null
@@ -1,37 +0,0 @@
-#!/usr/bin/perl
-
-my @block;
-my $msgstr = 0;
-my $empty = 0;
-my $unused = 0;
-my $fuzzy = 0;
-while (<>) {
- if (/^$/) {
- if (!$empty && !$unused && !$fuzzy) {
- print @block;
- }
- @block = ();
- $msgstr = 0;
- $fuzzy = 0;
- push @block, $_;
- } else {
- if (/^msgstr/) {
- $msgstr = 1;
- $empty = 1;
- }
- if (/^#.*fuzzy/) {
- $fuzzy = 1;
- }
- if (/^#~ msgstr/) {
- $unused = 1;
- }
- if ($msgstr && /".+"/) {
- $empty = 0;
- }
- push @block, $_;
- }
-}
-
-if (@block && !$empty && !$unused) {
- print @block;
-}
diff --git a/build-aux/minimize-po.py b/build-aux/minimize-po.py
new file mode 100755
index 0000000000..5046bacede
--- /dev/null
+++ b/build-aux/minimize-po.py
@@ -0,0 +1,60 @@
+#!/usr/bin/env python
+#
+# Copyright (C) 2018-2019 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library. If not, see
+# <http://www.gnu.org/licenses/>.
+
+from __future__ import print_function
+
+import re
+import sys
+
+block = []
+msgstr = False
+empty = False
+unused = False
+fuzzy = False
+
+strprog = re.compile(r'''.*".+".*''')
+
+for line in sys.stdin:
+ line = line.rstrip("\n")
+
+ if line == "":
+ if not empty and not unused and not fuzzy:
+ for b in block:
+ print(b)
+
+ block = []
+ msgstr = False
+ fuzzy = False
+ block.append(line)
+ else:
+ if line.startswith("msgstr"):
+ msgstr = True
+ empty = True
+
+ if line[0] == '#' and line.find("fuzzy") != -1:
+ fuzzy = True
+ if line.startswith("#~ msgstr"):
+ unused = True
+ if msgstr and strprog.match(line):
+ empty = False
+
+ block.append(line)
+
+if not empty and not unused and not fuzzy:
+ for b in block:
+ print(b)
diff --git a/po/Makefile.am b/po/Makefile.am
index da117edbd5..90025b15dd 100644
--- a/po/Makefile.am
+++ b/po/Makefile.am
@@ -58,7 +58,7 @@ update-mini-po: $(POTFILE)
$(MSGMERGE) --no-location --no-fuzzy-matching --sort-output \
$$lang.po $(POTFILE) | \
$(SED) $(SED_PO_FIXUP_ARGS) | \
- $(PERL) $(top_srcdir)/build-aux/minimize-po.pl > \
+ $(RUNUTF8) $(PYTHON) $(top_srcdir)/build-aux/minimize-po.py > \
$(srcdir)/$$lang.mini.po ; \
done
--
2.21.0
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, Sep 24, 2019 at 03:58:44PM +0100, Daniel P. Berrangé wrote:
> As part of an goal to eliminate Perl from libvirt build tools,
> rewrite the minimize-po.pl tool in Python.
>
> This was a straight conversion, manually going line-by-line to
> change the syntax from Perl to Python. Thus the overall structure
> of the file and approach is the same.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
> Makefile.am | 2 +-
> build-aux/minimize-po.pl | 37 -------------------------
> build-aux/minimize-po.py | 60 ++++++++++++++++++++++++++++++++++++++++
> po/Makefile.am | 2 +-
> 4 files changed, 62 insertions(+), 39 deletions(-)
> delete mode 100755 build-aux/minimize-po.pl
> create mode 100755 build-aux/minimize-po.py
>
> diff --git a/Makefile.am b/Makefile.am
> index 17448a914e..8f688d40d0 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -45,7 +45,7 @@ EXTRA_DIST = \
> build-aux/check-spacing.pl \
> build-aux/gitlog-to-changelog \
> build-aux/header-ifdef.pl \
> - build-aux/minimize-po.pl \
> + build-aux/minimize-po.py \
> build-aux/mock-noinline.pl \
> build-aux/prohibit-duplicate-header.pl \
> build-aux/useless-if-before-free \
> diff --git a/build-aux/minimize-po.pl b/build-aux/minimize-po.pl
> deleted file mode 100755
> index 497533a836..0000000000
> --- a/build-aux/minimize-po.pl
> +++ /dev/null
> @@ -1,37 +0,0 @@
> -#!/usr/bin/perl
> -
> -my @block;
> -my $msgstr = 0;
> -my $empty = 0;
> -my $unused = 0;
> -my $fuzzy = 0;
> -while (<>) {
> - if (/^$/) {
> - if (!$empty && !$unused && !$fuzzy) {
> - print @block;
> - }
> - @block = ();
> - $msgstr = 0;
> - $fuzzy = 0;
> - push @block, $_;
> - } else {
> - if (/^msgstr/) {
> - $msgstr = 1;
> - $empty = 1;
> - }
> - if (/^#.*fuzzy/) {
> - $fuzzy = 1;
> - }
> - if (/^#~ msgstr/) {
> - $unused = 1;
> - }
> - if ($msgstr && /".+"/) {
> - $empty = 0;
> - }
> - push @block, $_;
> - }
> -}
> -
> -if (@block && !$empty && !$unused) {
I guess the fact !$fuzzy was missing in this condition was a bug that the new
python code doesn't suffer from, right?
> - print @block;
> -}
> diff --git a/build-aux/minimize-po.py b/build-aux/minimize-po.py
> new file mode 100755
> index 0000000000..5046bacede
> --- /dev/null
> +++ b/build-aux/minimize-po.py
> @@ -0,0 +1,60 @@
> +#!/usr/bin/env python
> +#
> +# Copyright (C) 2018-2019 Red Hat, Inc.
> +#
> +# This library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# This library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this library. If not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +from __future__ import print_function
> +
> +import re
> +import sys
> +
> +block = []
> +msgstr = False
> +empty = False
> +unused = False
> +fuzzy = False
> +
> +strprog = re.compile(r'''.*".+".*''')
question 1) what's the benefit of compiling a regex and using it only once? Btw
python does cache every pattern passed to re.match (and friends) so compilation
IMO hardly ever makes sense unless you're doing 1000s of searches for the same
pattern in which case the latency would naturally accumulate.
question 2) why do we need the '''.* and .*''' parts compared to the original
perl regex? I'll go ahead and assume it's because re.match matches at the
beginning of a string by default, in which case, shouldn't we use re.search
which matches anywhere (that's what perl does by default) instead?
Erik
> +
> +for line in sys.stdin:
> + line = line.rstrip("\n")
> +
> + if line == "":
> + if not empty and not unused and not fuzzy:
> + for b in block:
> + print(b)
> +
> + block = []
> + msgstr = False
> + fuzzy = False
> + block.append(line)
> + else:
> + if line.startswith("msgstr"):
> + msgstr = True
> + empty = True
> +
> + if line[0] == '#' and line.find("fuzzy") != -1:
> + fuzzy = True
> + if line.startswith("#~ msgstr"):
> + unused = True
> + if msgstr and strprog.match(line):
> + empty = False
> +
> + block.append(line)
> +
> +if not empty and not unused and not fuzzy:
> + for b in block:
> + print(b)
> diff --git a/po/Makefile.am b/po/Makefile.am
> index da117edbd5..90025b15dd 100644
> --- a/po/Makefile.am
> +++ b/po/Makefile.am
> @@ -58,7 +58,7 @@ update-mini-po: $(POTFILE)
> $(MSGMERGE) --no-location --no-fuzzy-matching --sort-output \
> $$lang.po $(POTFILE) | \
> $(SED) $(SED_PO_FIXUP_ARGS) | \
> - $(PERL) $(top_srcdir)/build-aux/minimize-po.pl > \
> + $(RUNUTF8) $(PYTHON) $(top_srcdir)/build-aux/minimize-po.py > \
> $(srcdir)/$$lang.mini.po ; \
> done
>
> --
> 2.21.0
>
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Sep 26, 2019 at 12:39:39PM +0200, Erik Skultety wrote:
> On Tue, Sep 24, 2019 at 03:58:44PM +0100, Daniel P. Berrangé wrote:
> > As part of an goal to eliminate Perl from libvirt build tools,
> > rewrite the minimize-po.pl tool in Python.
> >
> > This was a straight conversion, manually going line-by-line to
> > change the syntax from Perl to Python. Thus the overall structure
> > of the file and approach is the same.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> > Makefile.am | 2 +-
> > build-aux/minimize-po.pl | 37 -------------------------
> > build-aux/minimize-po.py | 60 ++++++++++++++++++++++++++++++++++++++++
> > po/Makefile.am | 2 +-
> > 4 files changed, 62 insertions(+), 39 deletions(-)
> > delete mode 100755 build-aux/minimize-po.pl
> > create mode 100755 build-aux/minimize-po.py
> >
> > diff --git a/Makefile.am b/Makefile.am
> > index 17448a914e..8f688d40d0 100644
> > --- a/Makefile.am
> > +++ b/Makefile.am
> > @@ -45,7 +45,7 @@ EXTRA_DIST = \
> > build-aux/check-spacing.pl \
> > build-aux/gitlog-to-changelog \
> > build-aux/header-ifdef.pl \
> > - build-aux/minimize-po.pl \
> > + build-aux/minimize-po.py \
> > build-aux/mock-noinline.pl \
> > build-aux/prohibit-duplicate-header.pl \
> > build-aux/useless-if-before-free \
> > diff --git a/build-aux/minimize-po.pl b/build-aux/minimize-po.pl
> > deleted file mode 100755
> > index 497533a836..0000000000
> > --- a/build-aux/minimize-po.pl
> > +++ /dev/null
> > @@ -1,37 +0,0 @@
> > -#!/usr/bin/perl
> > -
> > -my @block;
> > -my $msgstr = 0;
> > -my $empty = 0;
> > -my $unused = 0;
> > -my $fuzzy = 0;
> > -while (<>) {
> > - if (/^$/) {
> > - if (!$empty && !$unused && !$fuzzy) {
> > - print @block;
> > - }
> > - @block = ();
> > - $msgstr = 0;
> > - $fuzzy = 0;
> > - push @block, $_;
> > - } else {
> > - if (/^msgstr/) {
> > - $msgstr = 1;
> > - $empty = 1;
> > - }
> > - if (/^#.*fuzzy/) {
> > - $fuzzy = 1;
> > - }
> > - if (/^#~ msgstr/) {
> > - $unused = 1;
> > - }
> > - if ($msgstr && /".+"/) {
> > - $empty = 0;
> > - }
> > - push @block, $_;
> > - }
> > -}
> > -
> > -if (@block && !$empty && !$unused) {
>
> I guess the fact !$fuzzy was missing in this condition was a bug that the new
> python code doesn't suffer from, right?
Yeah it was a pre-existing bug, but harmless.
> > diff --git a/build-aux/minimize-po.py b/build-aux/minimize-po.py
> > new file mode 100755
> > index 0000000000..5046bacede
> > --- /dev/null
> > +++ b/build-aux/minimize-po.py
> > @@ -0,0 +1,60 @@
> > +#!/usr/bin/env python
> > +#
> > +# Copyright (C) 2018-2019 Red Hat, Inc.
> > +#
> > +# This library is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU Lesser General Public
> > +# License as published by the Free Software Foundation; either
> > +# version 2.1 of the License, or (at your option) any later version.
> > +#
> > +# This library is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > +# Lesser General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU Lesser General Public
> > +# License along with this library. If not, see
> > +# <http://www.gnu.org/licenses/>.
> > +
> > +from __future__ import print_function
> > +
> > +import re
> > +import sys
> > +
> > +block = []
> > +msgstr = False
> > +empty = False
> > +unused = False
> > +fuzzy = False
> > +
> > +strprog = re.compile(r'''.*".+".*''')
>
> question 1) what's the benefit of compiling a regex and using it only once? Btw
> python does cache every pattern passed to re.match (and friends) so compilation
> IMO hardly ever makes sense unless you're doing 1000s of searches for the same
> pattern in which case the latency would naturally accumulate.
Ah, I've just seen the docs now
"The compiled versions of the most recent patterns passed to
re.compile() and the module-level matching functions are
cached, so programs that use only a few regular expressions
at a time needn’t worry about compiling regular expressions."
so with this in mind, I can probably just remove the 'compile' step from
all the scripts in this series. I haven't used it consistently to start
with.
> question 2) why do we need the '''.* and .*''' parts compared to the original
> perl regex? I'll go ahead and assume it's because re.match matches at the
> beginning of a string by default, in which case, shouldn't we use re.search
> which matches anywhere (that's what perl does by default) instead?
Yeah, I didn't notice the 're.search' function existed & had the semantics
closer to Perl.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Sep 26, 2019 at 02:16:04PM +0100, Daniel P. Berrangé wrote: >On Thu, Sep 26, 2019 at 12:39:39PM +0200, Erik Skultety wrote: >> On Tue, Sep 24, 2019 at 03:58:44PM +0100, Daniel P. Berrangé wrote: >> question 1) what's the benefit of compiling a regex and using it only once? Btw >> python does cache every pattern passed to re.match (and friends) so compilation >> IMO hardly ever makes sense unless you're doing 1000s of searches for the same Some of the scripts here are run on the whole libvirt codebase so that is the case here. For example just removing the pre-compilation of regexes for comments from the spacing check script bumped the execution time from 6.5s to 7.4s Sadly, the one script where pre-compilation matters the most is the one where separating them puts them far away from the usage to not fit on one screen. >> pattern in which case the latency would naturally accumulate. > >Ah, I've just seen the docs now > > "The compiled versions of the most recent patterns passed to > re.compile() and the module-level matching functions are > cached, so programs that use only a few regular expressions > at a time needn’t worry about compiling regular expressions." > >so with this in mind, I can probably just remove the 'compile' step from >all the scripts in this series. I haven't used it consistently to start >with. > For those regexes only executed on a fraction of lines this is neligible, but it would be nice to keep them for those that run on every line. Jano >> question 2) why do we need the '''.* and .*''' parts compared to the original >> perl regex? I'll go ahead and assume it's because re.match matches at the >> beginning of a string by default, in which case, shouldn't we use re.search >> which matches anywhere (that's what perl does by default) instead? > >Yeah, I didn't notice the 're.search' function existed & had the semantics >closer to Perl. > > >Regards, >Daniel >-- >|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| >|: https://libvirt.org -o- https://fstop138.berrange.com :| >|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > >-- >libvir-list mailing list >libvir-list@redhat.com >https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Sep 26, 2019 at 05:34:49PM +0200, Ján Tomko wrote:
> On Thu, Sep 26, 2019 at 02:16:04PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Sep 26, 2019 at 12:39:39PM +0200, Erik Skultety wrote:
> > > On Tue, Sep 24, 2019 at 03:58:44PM +0100, Daniel P. Berrangé wrote:
> > > question 1) what's the benefit of compiling a regex and using it only once? Btw
> > > python does cache every pattern passed to re.match (and friends) so compilation
> > > IMO hardly ever makes sense unless you're doing 1000s of searches for the same
>
> Some of the scripts here are run on the whole libvirt codebase so that
> is the case here. For example just removing the pre-compilation of
> regexes for comments from the spacing check script bumped the execution
> time from 6.5s to 7.4s
>
> Sadly, the one script where pre-compilation matters the most is the one
> where separating them puts them far away from the usage to not fit on
> one screen.
I could do a little custom function that caches all regexes
recache = {}
def research(regex, line):
global recache
if regex not in recache:
recache[regex] = re.compile(regex)
return recache[regex].search(line)
then the loop we can do a normal
research(r'''some regex''', line)
so we can get readability and full caching together. Probably not worth
repeating this trick for every script, but certainly the whitespace
script and a few others probably benefit.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Sep 26, 2019 at 04:38:49PM +0100, Daniel P. Berrangé wrote:
> On Thu, Sep 26, 2019 at 05:34:49PM +0200, Ján Tomko wrote:
> > On Thu, Sep 26, 2019 at 02:16:04PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Sep 26, 2019 at 12:39:39PM +0200, Erik Skultety wrote:
> > > > On Tue, Sep 24, 2019 at 03:58:44PM +0100, Daniel P. Berrangé wrote:
> > > > question 1) what's the benefit of compiling a regex and using it only once? Btw
> > > > python does cache every pattern passed to re.match (and friends) so compilation
> > > > IMO hardly ever makes sense unless you're doing 1000s of searches for the same
> >
> > Some of the scripts here are run on the whole libvirt codebase so that
> > is the case here. For example just removing the pre-compilation of
> > regexes for comments from the spacing check script bumped the execution
> > time from 6.5s to 7.4s
> >
> > Sadly, the one script where pre-compilation matters the most is the one
> > where separating them puts them far away from the usage to not fit on
> > one screen.
>
> I could do a little custom function that caches all regexes
>
> recache = {}
>
> def research(regex, line):
> global recache
> if regex not in recache:
> recache[regex] = re.compile(regex)
> return recache[regex].search(line)
I'm not sure how ^this would solve the slowdown Jano is seeing as this is
exactly what python should already be doing internally, IOW the slowdown Jano
reported is most likely caused by cache accesses which I don't think our own
custom cache would solve, so we probably do want to keep the compilation in even
though I personally don't mind the ~1 sec penalty here (compared to the 4x
slowdown in the next patch which I think we need to do better to resolve).
Erik
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
On Fri, Sep 27, 2019 at 09:22:13AM +0200, Erik Skultety wrote:
> On Thu, Sep 26, 2019 at 04:38:49PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Sep 26, 2019 at 05:34:49PM +0200, Ján Tomko wrote:
> > > On Thu, Sep 26, 2019 at 02:16:04PM +0100, Daniel P. Berrangé wrote:
> > > > On Thu, Sep 26, 2019 at 12:39:39PM +0200, Erik Skultety wrote:
> > > > > On Tue, Sep 24, 2019 at 03:58:44PM +0100, Daniel P. Berrangé wrote:
> > > > > question 1) what's the benefit of compiling a regex and using it only once? Btw
> > > > > python does cache every pattern passed to re.match (and friends) so compilation
> > > > > IMO hardly ever makes sense unless you're doing 1000s of searches for the same
> > >
> > > Some of the scripts here are run on the whole libvirt codebase so that
> > > is the case here. For example just removing the pre-compilation of
> > > regexes for comments from the spacing check script bumped the execution
> > > time from 6.5s to 7.4s
> > >
> > > Sadly, the one script where pre-compilation matters the most is the one
> > > where separating them puts them far away from the usage to not fit on
> > > one screen.
> >
> > I could do a little custom function that caches all regexes
> >
> > recache = {}
> >
> > def research(regex, line):
> > global recache
> > if regex not in recache:
> > recache[regex] = re.compile(regex)
> > return recache[regex].search(line)
>
> I'm not sure how ^this would solve the slowdown Jano is seeing as this is
> exactly what python should already be doing internally, IOW the slowdown Jano
> reported is most likely caused by cache accesses which I don't think our own
> custom cache would solve, so we probably do want to keep the compilation in even
> though I personally don't mind the ~1 sec penalty here (compared to the 4x
> slowdown in the next patch which I think we need to do better to resolve).
Yeah the slowdown Jano reports looks like a bigger problem to deal with.
I think it could still be worth doing it for this patch, since although
1 sec doesn't sound like much, with the huge number of scripts we have,
it all adds up.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
© 2016 - 2026 Red Hat, Inc.