[v4] kallsyms: Optimizes the performance of lookup symbols

[PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed

Posted by Zhen Lei 3 years, 6 months ago

For a symbol, there may be more than one place that can be merged. For
example: nfs_fs_proc_net_init, there are two "f"+"s_" combinations.
And we're only compressing the first combination at the moment. Let's
compress all possible combinations.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 scripts/kallsyms.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 8caccc8f4a23703..3319d9f38d7a5f2 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -553,7 +553,7 @@ static void compress_symbols(const unsigned char *str, int idx)
 	unsigned char *p1, *p2;
 
 	for (i = 0; i < table_cnt; i++) {
-
+retry:
 		len = table[i]->len;
 		p1 = table[i]->sym;
 
@@ -585,6 +585,9 @@ static void compress_symbols(const unsigned char *str, int idx)
 
 		/* increase the counts for this symbol's new tokens */
 		learn_symbol(table[i]->sym, len);
+
+		/* May be more than one place that can be merged, try again */
+		goto retry;
 	}
 }
 
-- 
2.25.1

Re: [PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed

Posted by Petr Mladek 3 years, 6 months ago

On Tue 2022-09-20 15:13:11, Zhen Lei wrote:
> For a symbol, there may be more than one place that can be merged. For
> example: nfs_fs_proc_net_init, there are two "f"+"s_" combinations.
> And we're only compressing the first combination at the moment.

Really?

> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index 8caccc8f4a23703..3319d9f38d7a5f2 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -553,7 +553,7 @@ static void compress_symbols(const unsigned char *str, int idx)
>  	unsigned char *p1, *p2;
>  
>  	for (i = 0; i < table_cnt; i++) {
> -
> +retry:
>  		len = table[i]->len;
>  		p1 = table[i]->sym;
>  
> @@ -585,6 +585,9 @@ static void compress_symbols(const unsigned char *str, int idx)
>  
>  		/* increase the counts for this symbol's new tokens */
>  		learn_symbol(table[i]->sym, len);
> +
> +		/* May be more than one place that can be merged, try again */
> +		goto retry;
>  	}
>  }

My understanding is that the code already tries to find the same
token several times. Here are the important parts of the existing
code:

static void compress_symbols(const unsigned char *str, int idx)
{

		p2 = find_token(p1, len, str);

		do {
			/* replace the found token with idx */
			*p2 = idx;
			[...]

			/* find the token on the symbol */
			p2 = find_token(p1, size, str);

		} while (p2);

Best Regards,
Petr

Re: [PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed

Posted by Leizhen (ThunderTown) 3 years, 6 months ago


On 2022/9/21 16:00, Petr Mladek wrote:
> On Tue 2022-09-20 15:13:11, Zhen Lei wrote:
>> For a symbol, there may be more than one place that can be merged. For
>> example: nfs_fs_proc_net_init, there are two "f"+"s_" combinations.
>> And we're only compressing the first combination at the moment.
> 
> Really?

Yes, there are about 200 such functions.

> 
>> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
>> index 8caccc8f4a23703..3319d9f38d7a5f2 100644
>> --- a/scripts/kallsyms.c
>> +++ b/scripts/kallsyms.c
>> @@ -553,7 +553,7 @@ static void compress_symbols(const unsigned char *str, int idx)
>>  	unsigned char *p1, *p2;
>>  
>>  	for (i = 0; i < table_cnt; i++) {
>> -
>> +retry:
>>  		len = table[i]->len;
>>  		p1 = table[i]->sym;
>>  
>> @@ -585,6 +585,9 @@ static void compress_symbols(const unsigned char *str, int idx)
>>  
>>  		/* increase the counts for this symbol's new tokens */
>>  		learn_symbol(table[i]->sym, len);
>> +
>> +		/* May be more than one place that can be merged, try again */
>> +		goto retry;
>>  	}
>>  }
> 
> My understanding is that the code already tries to find the same
> token several times. Here are the important parts of the existing
> code:
> 
> static void compress_symbols(const unsigned char *str, int idx)
> {
> 
> 		p2 = find_token(p1, len, str);
> 
> 		do {
> 			/* replace the found token with idx */
> 			*p2 = idx;
> 			[...]
> 
> 			/* find the token on the symbol */
> 			p2 = find_token(p1, size, str);

Oh, yes, it retries. Let me reanalyze it. However, the problem is
real, and there may be a problem somewhere in the loop.

> 
> 		} while (p2);
> 
> Best Regards,
> Petr
> .
> 

-- 
Regards,
  Zhen Lei

Re: [PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed

Posted by Leizhen (ThunderTown) 3 years, 6 months ago


On 2022/9/21 16:31, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/9/21 16:00, Petr Mladek wrote:
>> On Tue 2022-09-20 15:13:11, Zhen Lei wrote:
>>> For a symbol, there may be more than one place that can be merged. For
>>> example: nfs_fs_proc_net_init, there are two "f"+"s_" combinations.
>>> And we're only compressing the first combination at the moment.
>>
>> Really?
> 
> Yes, there are about 200 such functions.
> 
>>
>>> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
>>> index 8caccc8f4a23703..3319d9f38d7a5f2 100644
>>> --- a/scripts/kallsyms.c
>>> +++ b/scripts/kallsyms.c
>>> @@ -553,7 +553,7 @@ static void compress_symbols(const unsigned char *str, int idx)
>>>  	unsigned char *p1, *p2;
>>>  
>>>  	for (i = 0; i < table_cnt; i++) {
>>> -
>>> +retry:
>>>  		len = table[i]->len;
>>>  		p1 = table[i]->sym;
>>>  
>>> @@ -585,6 +585,9 @@ static void compress_symbols(const unsigned char *str, int idx)
>>>  
>>>  		/* increase the counts for this symbol's new tokens */
>>>  		learn_symbol(table[i]->sym, len);
>>> +
>>> +		/* May be more than one place that can be merged, try again */
>>> +		goto retry;
>>>  	}
>>>  }
>>
>> My understanding is that the code already tries to find the same
>> token several times. Here are the important parts of the existing
>> code:
>>
>> static void compress_symbols(const unsigned char *str, int idx)
>> {
>>
>> 		p2 = find_token(p1, len, str);
>>
>> 		do {
>> 			/* replace the found token with idx */
>> 			*p2 = idx;
>> 			[...]
>>
>> 			/* find the token on the symbol */
>> 			p2 = find_token(p1, size, str);
> 
> Oh, yes, it retries. Let me reanalyze it. However, the problem is
> real, and there may be a problem somewhere in the loop.

Hi, Petr:
  Thanks. I found that it's my fault. The first round skip the type
character. But the next round will incorrectly skip one character,
so for nfs_fs_proc_net_init, the next round start from s, and using
            ^
the proposed "unsigned char type" in your next reply should solve
the problem. Thank you very much.

-	for (i = 0; i < len - 1; i++)
+	for (i = sym_start_idx; i < len - 1; i++)

> 
>>
>> 		} while (p2);
>>
>> Best Regards,
>> Petr
>> .
>>
> 

-- 
Regards,
  Zhen Lei

[PATCH v4 1/8] scripts/kallsyms: rename build_initial_tok_table()
[PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed
[PATCH v4 3/8] scripts/kallsyms: don't compress symbol types
[PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name()
[PATCH v4 5/8] kallsyms: Add helper kallsyms_on_each_match_symbol()
[PATCH v4 6/8] livepatch: Use kallsyms_on_each_match_symbol() to improve performance
[PATCH v4 7/8] livepatch: Improve the search performance of module_kallsyms_on_each_symbol()
[PATCH v4 8/8] kallsyms: Add self-test facility