optimize mod | Patchew

[Patchew-devel] [PATCH] optimize mod

Posted by Paolo Bonzini 5 years, 8 months ago

Avoid looking up the module table repeatedly every time a hook is invoked,
by memoizing the list of hooks.  As a small additional optimization reuse
the result of get_or_create instead of querying the table again.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 mod.py | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/mod.py b/mod.py
index e92377a..1da12ec 100644
--- a/mod.py
+++ b/mod.py
@@ -26,8 +26,7 @@ class PatchewModule(object):
     def get_model(self):
         # ALways read from DB to accept configuration update in-flight
         from api.models import Module as PC
-        _module_init_config(self.__class__)
-        return PC.objects.get(name=self.name)
+        return _module_init_config(self.__class__)
 
     def enabled(self):
         try:
@@ -186,14 +185,20 @@ def load_modules():
             _loaded_modules[cls.name] = cls()
             print("Loaded module:", cls.name)
 
+memoized_hooks = {}
 def dispatch_module_hook(hook_name, **params):
-    for i in [x for x in list(_loaded_modules.values()) if x.enabled()]:
-        if hasattr(i, hook_name):
-            try:
-                getattr(i, hook_name)(**params)
-            except:
-                print("Cannot invoke module hook: %s.%s" % (i, hook_name))
-                traceback.print_exc()
+    if not hook_name in memoized_hooks:
+        # The dictionary stores "bound method" objects
+        memoized_hooks[hook_name] = [getattr(x, hook_name)
+            for x in list(_loaded_modules.values())
+            if x.enabled()
+            and hasattr(x, hook_name)]
+    for f in memoized_hooks[hook_name]:
+        try:
+            f(**params)
+        except:
+            print("Cannot invoke module hook: %s.%s" % (type(f.__self__), hook_name))
+            traceback.print_exc()
 
 def get_module(name):
     return _loaded_modules.get(name)
-- 
2.19.1

_______________________________________________
Patchew-devel mailing list
Patchew-devel@redhat.com
https://www.redhat.com/mailman/listinfo/patchew-devel

Re: [Patchew-devel] [PATCH] optimize mod

Posted by Caio Carrara 5 years, 8 months ago

Hello, Paolo.

Sorry taking so long to review it.

On Mon, Nov 19, 2018 at 02:38:35PM +0100, Paolo Bonzini wrote:
> Avoid looking up the module table repeatedly every time a hook is invoked,
> by memoizing the list of hooks.  As a small additional optimization reuse
> the result of get_or_create instead of querying the table again.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  mod.py | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/mod.py b/mod.py
> index e92377a..1da12ec 100644
> --- a/mod.py
> +++ b/mod.py
> @@ -26,8 +26,7 @@ class PatchewModule(object):
>      def get_model(self):
>          # ALways read from DB to accept configuration update in-flight
>          from api.models import Module as PC

Since you're not using PC anymore I think you can drop this import.

> -        _module_init_config(self.__class__)
> -        return PC.objects.get(name=self.name)
> +        return _module_init_config(self.__class__)
>  
>      def enabled(self):
>          try:
> @@ -186,14 +185,20 @@ def load_modules():
>              _loaded_modules[cls.name] = cls()
>              print("Loaded module:", cls.name)
>  
> +memoized_hooks = {}
>  def dispatch_module_hook(hook_name, **params):
> -    for i in [x for x in list(_loaded_modules.values()) if x.enabled()]:
> -        if hasattr(i, hook_name):
> -            try:
> -                getattr(i, hook_name)(**params)
> -            except:
> -                print("Cannot invoke module hook: %s.%s" % (i, hook_name))
> -                traceback.print_exc()
> +    if not hook_name in memoized_hooks:
> +        # The dictionary stores "bound method" objects
> +        memoized_hooks[hook_name] = [getattr(x, hook_name)
> +            for x in list(_loaded_modules.values())
> +            if x.enabled()
> +            and hasattr(x, hook_name)]

s/x/module/

Also, I'm pretty sure you don't need to convert dict.values() return to
a list object. The values() return is already a iterable.

Other than that, there are some people (I'm included) that think list
comprehension should be avoided when it takes more than 2 lines of code.
Mostly because of readability. What you think about this proposal:

# create a function to return the enabled modules:
def _enabled_modules():
    return [module for module in _loaded_modules.values() if module.enabled()]

# In dispatch_module() use a simple for loop that makes things more
explicit:
```
if not hook_name in memoized_hooks:
    for module in _enabled_modules():
        if not hasattr(module, hook_name):
            continue
        memoized_hooks[hook_name] = getattr(module, hook_name)
```

> +    for f in memoized_hooks[hook_name]:

s/f/hook/

> +        try:
> +            f(**params)
> +        except:

We should avoid using bare except. If you want to get any exception use
at least `except Exception`. But actually the proper way to do it is to
specify the exception that should be handled by except block.

> +            print("Cannot invoke module hook: %s.%s" % (type(f.__self__), hook_name))
> +            traceback.print_exc()
>  
>  def get_module(name):
>      return _loaded_modules.get(name)
{...}

-- 
Caio Carrara
Software Engineer, Virt Team - Red Hat
ccarrara@redhat.com

_______________________________________________
Patchew-devel mailing list
Patchew-devel@redhat.com
https://www.redhat.com/mailman/listinfo/patchew-devel

Re: [Patchew-devel] [PATCH] optimize mod

Posted by Paolo Bonzini 5 years, 8 months ago

On 21/11/18 14:52, Caio Carrara wrote:
> Hello, Paolo.
> 
> Sorry taking so long to review it.
> 
> On Mon, Nov 19, 2018 at 02:38:35PM +0100, Paolo Bonzini wrote:
>> Avoid looking up the module table repeatedly every time a hook is invoked,
>> by memoizing the list of hooks.  As a small additional optimization reuse
>> the result of get_or_create instead of querying the table again.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  mod.py | 23 ++++++++++++++---------
>>  1 file changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/mod.py b/mod.py
>> index e92377a..1da12ec 100644
>> --- a/mod.py
>> +++ b/mod.py
>> @@ -26,8 +26,7 @@ class PatchewModule(object):
>>      def get_model(self):
>>          # ALways read from DB to accept configuration update in-flight
>>          from api.models import Module as PC
> 
> Since you're not using PC anymore I think you can drop this import.

Right.

>> -        _module_init_config(self.__class__)
>> -        return PC.objects.get(name=self.name)
>> +        return _module_init_config(self.__class__)
>>  
>>      def enabled(self):
>>          try:
>> @@ -186,14 +185,20 @@ def load_modules():
>>              _loaded_modules[cls.name] = cls()
>>              print("Loaded module:", cls.name)
>>  
>> +memoized_hooks = {}
>>  def dispatch_module_hook(hook_name, **params):
>> -    for i in [x for x in list(_loaded_modules.values()) if x.enabled()]:
>> -        if hasattr(i, hook_name):
>> -            try:
>> -                getattr(i, hook_name)(**params)
>> -            except:
>> -                print("Cannot invoke module hook: %s.%s" % (i, hook_name))
>> -                traceback.print_exc()
>> +    if not hook_name in memoized_hooks:
>> +        # The dictionary stores "bound method" objects
>> +        memoized_hooks[hook_name] = [getattr(x, hook_name)
>> +            for x in list(_loaded_modules.values())
>> +            if x.enabled()
>> +            and hasattr(x, hook_name)]
> 
> s/x/module/
> 
> Also, I'm pretty sure you don't need to convert dict.values() return to
> a list object. The values() return is already a iterable.
> 
> Other than that, there are some people (I'm included) that think list
> comprehension should be avoided when it takes more than 2 lines of code.
> Mostly because of readability.

You're absolutely right, this is gone in v2 fortunately.  Removing
enabled is much simpler.

>> +        try:
>> +            f(**params)
>> +        except:
> 
> We should avoid using bare except. If you want to get any exception use
> at least `except Exception`. But actually the proper way to do it is to
> specify the exception that should be handled by except block.
> 
>> +            print("Cannot invoke module hook: %s.%s" % (type(f.__self__), hook_name))
>> +            traceback.print_exc()

I think we should remove the "try/except" altogether (this patch was
just reindenting it).  Fam?

Paolo

_______________________________________________
Patchew-devel mailing list
Patchew-devel@redhat.com
https://www.redhat.com/mailman/listinfo/patchew-devel

Re: [Patchew-devel] [PATCH] optimize mod

Posted by Paolo Bonzini 5 years, 8 months ago

On 19/11/18 14:38, Paolo Bonzini wrote:
> Avoid looking up the module table repeatedly every time a hook is invoked,
> by memoizing the list of hooks.  As a small additional optimization reuse
> the result of get_or_create instead of querying the table again.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Hmm, this undoes thw "Always read from DB to accept configuration
updates" part, because .enabled() also comes from the DB.

One idea could be to cache enabled() every time a module is saved.

By the way, the enabled() check already doesn't work for the
www_url_hook, which is run only ones.  More precisely, it only works
because modules are default-enabled.

Paolo

> ---
>  mod.py | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/mod.py b/mod.py
> index e92377a..1da12ec 100644
> --- a/mod.py
> +++ b/mod.py
> @@ -26,8 +26,7 @@ class PatchewModule(object):
>      def get_model(self):
>          # ALways read from DB to accept configuration update in-flight
>          from api.models import Module as PC
> -        _module_init_config(self.__class__)
> -        return PC.objects.get(name=self.name)
> +        return _module_init_config(self.__class__)
>  
>      def enabled(self):
>          try:
> @@ -186,14 +185,20 @@ def load_modules():
>              _loaded_modules[cls.name] = cls()
>              print("Loaded module:", cls.name)
>  
> +memoized_hooks = {}
>  def dispatch_module_hook(hook_name, **params):
> -    for i in [x for x in list(_loaded_modules.values()) if x.enabled()]:
> -        if hasattr(i, hook_name):
> -            try:
> -                getattr(i, hook_name)(**params)
> -            except:
> -                print("Cannot invoke module hook: %s.%s" % (i, hook_name))
> -                traceback.print_exc()
> +    if not hook_name in memoized_hooks:
> +        # The dictionary stores "bound method" objects
> +        memoized_hooks[hook_name] = [getattr(x, hook_name)
> +            for x in list(_loaded_modules.values())
> +            if x.enabled()
> +            and hasattr(x, hook_name)]
> +    for f in memoized_hooks[hook_name]:
> +        try:
> +            f(**params)
> +        except:
> +            print("Cannot invoke module hook: %s.%s" % (type(f.__self__), hook_name))
> +            traceback.print_exc()
>  
>  def get_module(name):
>      return _loaded_modules.get(name)
> 

_______________________________________________
Patchew-devel mailing list
Patchew-devel@redhat.com
https://www.redhat.com/mailman/listinfo/patchew-devel

Re: [Patchew-devel] [PATCH] optimize mod

Posted by Fam Zheng 5 years, 8 months ago

On Mon, 11/19 17:01, Paolo Bonzini wrote:
> On 19/11/18 14:38, Paolo Bonzini wrote:
> > Avoid looking up the module table repeatedly every time a hook is invoked,
> > by memoizing the list of hooks.  As a small additional optimization reuse
> > the result of get_or_create instead of querying the table again.
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> Hmm, this undoes thw "Always read from DB to accept configuration
> updates" part, because .enabled() also comes from the DB.
> 
> One idea could be to cache enabled() every time a module is saved.
> 
> By the way, the enabled() check already doesn't work for the
> www_url_hook, which is run only ones.  More precisely, it only works
> because modules are default-enabled.

I'm in favor of dropping the enabled field from the model if it has a
performance advantage.

> 
> Paolo
> 
> > ---
> >  mod.py | 23 ++++++++++++++---------
> >  1 file changed, 14 insertions(+), 9 deletions(-)
> > 
> > diff --git a/mod.py b/mod.py
> > index e92377a..1da12ec 100644
> > --- a/mod.py
> > +++ b/mod.py
> > @@ -26,8 +26,7 @@ class PatchewModule(object):
> >      def get_model(self):
> >          # ALways read from DB to accept configuration update in-flight
> >          from api.models import Module as PC
> > -        _module_init_config(self.__class__)
> > -        return PC.objects.get(name=self.name)
> > +        return _module_init_config(self.__class__)
> >  
> >      def enabled(self):
> >          try:
> > @@ -186,14 +185,20 @@ def load_modules():
> >              _loaded_modules[cls.name] = cls()
> >              print("Loaded module:", cls.name)
> >  
> > +memoized_hooks = {}
> >  def dispatch_module_hook(hook_name, **params):
> > -    for i in [x for x in list(_loaded_modules.values()) if x.enabled()]:
> > -        if hasattr(i, hook_name):
> > -            try:
> > -                getattr(i, hook_name)(**params)
> > -            except:
> > -                print("Cannot invoke module hook: %s.%s" % (i, hook_name))
> > -                traceback.print_exc()
> > +    if not hook_name in memoized_hooks:

Nit: I think PEP wants 'hook_name not in ...'?

> > +        # The dictionary stores "bound method" objects
> > +        memoized_hooks[hook_name] = [getattr(x, hook_name)
> > +            for x in list(_loaded_modules.values())
> > +            if x.enabled()
> > +            and hasattr(x, hook_name)]
> > +    for f in memoized_hooks[hook_name]:
> > +        try:
> > +            f(**params)
> > +        except:
> > +            print("Cannot invoke module hook: %s.%s" % (type(f.__self__), hook_name))
> > +            traceback.print_exc()
> >  
> >  def get_module(name):
> >      return _loaded_modules.get(name)
> > 
> 
> _______________________________________________
> Patchew-devel mailing list
> Patchew-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/patchew-devel

Fam

_______________________________________________
Patchew-devel mailing list
Patchew-devel@redhat.com
https://www.redhat.com/mailman/listinfo/patchew-devel

Re: [Patchew-devel] [PATCH] optimize mod

Posted by Paolo Bonzini 5 years, 8 months ago

On 20/11/18 02:33, Fam Zheng wrote:
> On Mon, 11/19 17:01, Paolo Bonzini wrote:
>> On 19/11/18 14:38, Paolo Bonzini wrote:
>>> Avoid looking up the module table repeatedly every time a hook is invoked,
>>> by memoizing the list of hooks.  As a small additional optimization reuse
>>> the result of get_or_create instead of querying the table again.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> Hmm, this undoes thw "Always read from DB to accept configuration
>> updates" part, because .enabled() also comes from the DB.
>>
>> One idea could be to cache enabled() every time a module is saved.
>>
>> By the way, the enabled() check already doesn't work for the
>> www_url_hook, which is run only ones.  More precisely, it only works
>> because modules are default-enabled.
> 
> I'm in favor of dropping the enabled field from the model if it has a
> performance advantage.

Ok.

>>>  mod.py | 23 ++++++++++++++---------
>>>  1 file changed, 14 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/mod.py b/mod.py
>>> index e92377a..1da12ec 100644
>>> --- a/mod.py
>>> +++ b/mod.py
>>> @@ -26,8 +26,7 @@ class PatchewModule(object):
>>>      def get_model(self):
>>>          # ALways read from DB to accept configuration update in-flight
>>>          from api.models import Module as PC
>>> -        _module_init_config(self.__class__)
>>> -        return PC.objects.get(name=self.name)
>>> +        return _module_init_config(self.__class__)
>>>  
>>>      def enabled(self):
>>>          try:
>>> @@ -186,14 +185,20 @@ def load_modules():
>>>              _loaded_modules[cls.name] = cls()
>>>              print("Loaded module:", cls.name)
>>>  
>>> +memoized_hooks = {}
>>>  def dispatch_module_hook(hook_name, **params):
>>> -    for i in [x for x in list(_loaded_modules.values()) if x.enabled()]:
>>> -        if hasattr(i, hook_name):
>>> -            try:
>>> -                getattr(i, hook_name)(**params)
>>> -            except:
>>> -                print("Cannot invoke module hook: %s.%s" % (i, hook_name))
>>> -                traceback.print_exc()
>>> +    if not hook_name in memoized_hooks:
> 
> Nit: I think PEP wants 'hook_name not in ...'?

Yep.

Paolo

_______________________________________________
Patchew-devel mailing list
Patchew-devel@redhat.com
https://www.redhat.com/mailman/listinfo/patchew-devel