[PATCH] setlocalversion: work around "git describe" performance

Rasmus Villemoes posted 1 patch 2 weeks, 4 days ago
There is a newer version of this series
scripts/setlocalversion | 53 ++++++++++++++++++++++++++++-------------
1 file changed, 37 insertions(+), 16 deletions(-)
[PATCH] setlocalversion: work around "git describe" performance
Posted by Rasmus Villemoes 2 weeks, 4 days ago
Contrary to expectations, passing a single candidate tag to "git
describe" is slower than not passing any --match options.

  $ time git describe --debug
  ...
  traversed 10619 commits
  ...
  v6.12-rc5-63-g0fc810ae3ae1

  real    0m0.169s

  $ time git describe --match=v6.12-rc5 --debug
  ...
  traversed 1310024 commits
  v6.12-rc5-63-g0fc810ae3ae1

  real    0m1.281s

In fact, the --debug output shows that git traverses all or most of
history. For some repositories and/or git versions, those 1.3s are
actually 10-15 seconds.

This has been acknowledged as a performance bug in git [1]. However,
no solution is yet in git.git, and even when one lands, it will take
quite a while before it finds its way to a release and for
$random_kernel_developer to pick that up.

So rewrite the logic to use plumbing commands. For each of the
candidate values of $tag, we ask: (1) is $tag even an annotated
tag? (2) Is it eligible to describe HEAD, i.e. an ancestor of
HEAD? (3) If so, how many commits are in $tag..HEAD?

I have tested that this produces the same output as the current script
for ~700 random commits between v6.9..v6.10. For those 700 commits,
and in my git repo, the 'make -s kernelrelease' command is on average
~4 times faster.

[1] https://lore.kernel.org/git/20241101113910.GA2301440@coredump.intra.peff.net/

Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
Josh, does this work for you?

 scripts/setlocalversion | 53 ++++++++++++++++++++++++++++-------------
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/scripts/setlocalversion b/scripts/setlocalversion
index 38b96c6797f4..91d638234d57 100755
--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion
@@ -30,6 +30,26 @@ if test $# -gt 0 -o ! -d "$srctree"; then
 	usage
 fi
 
+try_tag() {
+	tag="$1"
+
+	# Is $tag an annotated tag?
+	[ "$(git cat-file -t "$tag" 2> /dev/null)" = "tag" ] || return 1
+
+	# Is it an ancestor of HEAD, and if so, how many commits are in $tag..HEAD?
+	read left right <<EOF || return 1
+$(git rev-list --count --left-right "$tag"...HEAD 2> /dev/null)
+EOF
+
+	# $left is 0 if and only if $tag is an ancestor of HEAD
+	[ "$left" -eq 0 ] || return 1
+
+	# $right is the number of commits in the range $tag..HEAD, possibly 0.
+	count="$right"
+
+	return 0
+}
+
 scm_version()
 {
 	local short=false
@@ -61,33 +81,33 @@ scm_version()
 	# stable kernel:    6.1.7      ->  v6.1.7
 	version_tag=v$(echo "${KERNELVERSION}" | sed -E 's/^([0-9]+\.[0-9]+)\.0(.*)$/\1\2/')
 
+	# try_tag initializes count if the tag is usable.
+	count=
+
 	# If a localversion* file exists, and the corresponding
 	# annotated tag exists and is an ancestor of HEAD, use
 	# it. This is the case in linux-next.
-	tag=${file_localversion#-}
-	desc=
-	if [ -n "${tag}" ]; then
-		desc=$(git describe --match=$tag 2>/dev/null)
+	if [ -n "${file_localversion#-}" ] ; then
+		try_tag "${file_localversion#-}"
 	fi
 
 	# Otherwise, if a localversion* file exists, and the tag
 	# obtained by appending it to the tag derived from
 	# KERNELVERSION exists and is an ancestor of HEAD, use
 	# it. This is e.g. the case in linux-rt.
-	if [ -z "${desc}" ] && [ -n "${file_localversion}" ]; then
-		tag="${version_tag}${file_localversion}"
-		desc=$(git describe --match=$tag 2>/dev/null)
+	if [ -z "${count}" ] && [ -n "${file_localversion}" ]; then
+		try_tag "${version_tag}${file_localversion}"
 	fi
 
 	# Otherwise, default to the annotated tag derived from KERNELVERSION.
-	if [ -z "${desc}" ]; then
-		tag="${version_tag}"
-		desc=$(git describe --match=$tag 2>/dev/null)
+	if [ -z "${count}" ]; then
+		try_tag "${version_tag}"
 	fi
 
-	# If we are at the tagged commit, we ignore it because the version is
-	# well-defined.
-	if [ "${tag}" != "${desc}" ]; then
+	# If we are at the tagged commit, we ignore it because the
+	# version is well-defined. If none of the attempted tags exist
+	# or were usable, $count is still empty.
+	if [ -z "${count}" ] || [ "${count}" -gt 0 ]; then
 
 		# If only the short version is requested, don't bother
 		# running further git commands
@@ -95,14 +115,15 @@ scm_version()
 			echo "+"
 			return
 		fi
+
 		# If we are past the tagged commit, we pretty print it.
 		# (like 6.1.0-14595-g292a089d78d3)
-		if [ -n "${desc}" ]; then
-			echo "${desc}" | awk -F- '{printf("-%05d", $(NF-1))}'
+		if [ -n "${count}" ]; then
+			printf "%s%05d" "-" "${count}"
 		fi
 
 		# Add -g and exactly 12 hex chars.
-		printf '%s%s' -g "$(echo $head | cut -c1-12)"
+		printf '%s%.12s' -g "$head"
 	fi
 
 	if ${no_dirty}; then
-- 
2.47.0
Re: [PATCH] setlocalversion: work around "git describe" performance
Posted by Jeff King 2 weeks, 3 days ago
On Wed, Nov 06, 2024 at 02:28:38AM +0100, Rasmus Villemoes wrote:

> This has been acknowledged as a performance bug in git [1]. However,
> no solution is yet in git.git, and even when one lands, it will take
> quite a while before it finds its way to a release and for
> $random_kernel_developer to pick that up.

I just posted patches in:

  https://lore.kernel.org/git/20241106192236.GC880133@coredump.intra.peff.net/

but I agree it is worth dealing with in the interim. And also, I really
suspect that this new code may end up faster than git-describe in some
cases anyway.

> +try_tag() {
> +	tag="$1"
> +
> +	# Is $tag an annotated tag?
> +	[ "$(git cat-file -t "$tag" 2> /dev/null)" = "tag" ] || return 1
> +
> +	# Is it an ancestor of HEAD, and if so, how many commits are in $tag..HEAD?
> +	read left right <<EOF || return 1
> +$(git rev-list --count --left-right "$tag"...HEAD 2> /dev/null)
> +EOF
> +
> +	# $left is 0 if and only if $tag is an ancestor of HEAD
> +	[ "$left" -eq 0 ] || return 1
> +
> +	# $right is the number of commits in the range $tag..HEAD, possibly 0.
> +	count="$right"
> +
> +	return 0
> +}

The git parts all look good to me here (and not knowing much about the
kernel's scripts I'll refrain from even looking closely at the other
parts).

The use of the here-doc is a little funny, but I guess you can't just
pipe to read since that would put it in a subshell. But I do note that
your "|| return 1" won't catch a failure from "rev-list", if that was
the intent. I think you'd have to use a real tempfile to catch it, or
play horrid tricks with echoing $? into the "read" stream. I guess the
explicit check for "$left -eq 0" will catch most failures anyway.

If you use "<<-" you can get rid of the awkward indentation.

-Peff
Re: [PATCH] setlocalversion: work around "git describe" performance
Posted by Rasmus Villemoes 2 weeks, 2 days ago
On Wed, Nov 06 2024, Jeff King <peff@peff.net> wrote:

> On Wed, Nov 06, 2024 at 02:28:38AM +0100, Rasmus Villemoes wrote:
>
>> This has been acknowledged as a performance bug in git [1]. However,
>> no solution is yet in git.git, and even when one lands, it will take
>> quite a while before it finds its way to a release and for
>> $random_kernel_developer to pick that up.
>
> I just posted patches in:
>
>   https://lore.kernel.org/git/20241106192236.GC880133@coredump.intra.peff.net/
>
> but I agree it is worth dealing with in the interim. And also, I really
> suspect that this new code may end up faster than git-describe in some
> cases anyway.
>
>> +try_tag() {
>> +	tag="$1"
>> +
>> +	# Is $tag an annotated tag?
>> +	[ "$(git cat-file -t "$tag" 2> /dev/null)" = "tag" ] || return 1
>> +
>> +	# Is it an ancestor of HEAD, and if so, how many commits are in $tag..HEAD?
>> +	read left right <<EOF || return 1
>> +$(git rev-list --count --left-right "$tag"...HEAD 2> /dev/null)
>> +EOF
>> +
>> +	# $left is 0 if and only if $tag is an ancestor of HEAD
>> +	[ "$left" -eq 0 ] || return 1
>> +
>> +	# $right is the number of commits in the range $tag..HEAD, possibly 0.
>> +	count="$right"
>> +
>> +	return 0
>> +}
>
> The git parts all look good to me here (and not knowing much about the
> kernel's scripts I'll refrain from even looking closely at the other
> parts).
>
> The use of the here-doc is a little funny, but I guess you can't just
> pipe to read since that would put it in a subshell.

Exactly. I try to avoid string manipulation using "$(echo ... | cut
...)" etc. when I can get away with doing it in the shell itself, which
sometimes leads to some awkward constructions.

> But I do note that your "|| return 1" won't catch a failure from
> "rev-list", if that was the intent.

Well, not a failure from rev-list itself, but I thought that when the
command ended up producing nothing on stdout for read to consume, read
would fail. But you are right, that doesn't work as I expected.

  $ printf '' | read left right ; echo "$?"
  1
  $ printf '\n' | read left right ; echo "$?"
  0

I'd need for the inner $() to not strip a newline, and for the heredoc
to not implicitly add one, for read to actually fail.

>I think you'd have to use a real tempfile to catch it, or
> play horrid tricks with echoing $? into the "read" stream. I guess the
> explicit check for "$left -eq 0" will catch most failures anyway.

Yes, except that I should probably then use string comparison = instead
of -eq, since $left will be empty when rev-list hasn't produced any
output.

Maybe

  set -- $(git rev-list ... 2> /dev/null)
  left="$1"
  right="$2"

is simpler. There's also

  left_right="$(git rev-list ...)"
  left="${left_right%	*}"
  right="${left_right#*	}"

where the whitespace are actually tab characters, but that's probably
too subtle.

> If you use "<<-" you can get rid of the awkward indentation.

Ack, thanks for the reminder. I seemed to remember that as a bash
extension, but it's not.

Rasmus