May 27, 2021

Performance optimizations for the shell prompt

In this blog post I will present some techniques to make a custom shell prompt feel snappy and responsive.

There is a lot of useful information that can be shown in a shell prompt, like the current state of a git repository, information about background jobs, or information about the last executed commands. But each of those could have the potential to make the prompt feel slow. We will use hyperfine to quantify this slowness, and then look at ways to try to optimize it.

We will take a closer look at two concrete examples. First, an indicator to show information about the current Kubernetes context, implemented from scratch. The most straight-forward implementation has some performance issues, but we can turn it into a solution that finishes almost instantly. And second, the git shell prompt shipped with git itself, which can become slow in large git repositories. We can reduce some of its functionality to make it faster, while ensuring that it still remains useful.

I will focus mostly on bash, but also add some thoughts on fish.

Shell prompt basics

Let’s quickly recap how the shell prompt can be customized.

In fish, controlling the prompt is simple: whenever the prompt should be shown, fish executes the shell function fish_prompt. By overwriting this function, we can customize the prompt. For example, if we set

function fish_prompt
  echo -n '$ '
end

then the prompt is $ followed by a space:

$

(the -n prevents echo from adding a newline character at the end). If we set

function fish_prompt
  echo -n (date +%X) '$ '
end

then it additionally shows the current time:

12:23:34 $

Since fish_prompt is a regular function, we can create any prompt that we like.

In bash, controlling the prompt is not as straight-forward. The prompt is configured by setting the PS1 variable (note that this is a shell variable, not an environment variable; see my previous blog post for the distinction). It can be just a regular string; for example, we can set PS1="$ " to get the first example above. It also supports certain special characters; for example, \t is expanded to the current time in 24-hour format, so we can write the second example above as PS1='\t $ '.

By default, bash will also perform variable and command substitution on the string; that means that an alternative way to include the username in the prompt is to set PS1='$(date +%X) $ '. (Note that this is different from setting PS1="$(date +%X) $ ". With single quotes, PS1 is set to the literal value $(date +%X) $ ; when bash wants to display the prompt, it will take this string and perform command substitution, that is, it executes date +%X every time that a prompt is needed. With double quotes, command execution happens when PS1 is defined, so PS1 is set (in my case) to the literal value 12:23:34 $ , and this value never changes.)

Additionally, bash has the PROMPT_COMMAND variable. We can set this to any command or function, which will then be executed before the prompt is displayed. So two more ways to write the current time in the prompt are

PS1="$ "
function bash_prompt {
  echo -n "$(date +%X) "
}
PROMPT_COMMAND=bash_prompt

and

function bash_prompt {
  PS1="$(date +%X) $ "
}
PROMPT_COMMAND=bash_prompt

Note that in this last version, we used double quotes. This time it works correctly, since bash_promt is evaluated every time a prompt is needed, which means that date +%X is evaluated every time as well. I prefer this last version to customize the prompt: a PROMPT_COMMAND function that eventually sets PS1 to an expanded string. I find this easiest to understand, and it has the advantage that we can benchmark the performance of this function to evaluate the performance of the prompt (read on to see how to benchmark the function).

Defining a performance target

When doing performance optimizations, there is a danger of over-optimizing. At some point we cross a threshold where the time put into optimizing even further is in no relation to the additional speedup that we achieve. Additionally, squeezing out the last bit of performance often means to make the code more complicated and hence harder to maintain. So it’s a good practice to define a performance target upfront and stop further optimizations once we have achieved the target.

Here, we take our cue from web development, where 100 milliseconds is often mentioned as a maximal accepted delay in user interactions (see for example the MDN Web Docs). For instance, if the user clicks on a button, there should be at most 100 milliseconds before there is some kind of feedback that the button was pressed. More than 100 milliseconds can create a disconnect between the user interaction and the response. On the other hand, 50 milliseconds feels instantaneous.

I find that this translates fairly well to my expected responsiveness of the shell prompt. Let’s do the following experiment: Start a new bash process with bash --norc; this starts bash without loading any custom configurations, to avoid that any scripts create artificial slowdown. Then define a custom shell prompt with

delay=0.1
PS1="$ "
PROMPT_COMMAND='sleep $delay'

This introduces an artificial delay of roughly 100 milliseconds before the prompt is displayed. Now interact with the shell as you normally would. For me, the delay is noticeable, but it is not something that bothers me a lot. But when I increase the delay to 200 milliseconds (delay=0.2), it definitely feels sluggish. When I try to execute two commands quickly after another, the first characters of the second command sometimes appear before the prompt is printed. On the other hand, decreasing the delay to 50 milliseconds (delay=0.05), I can tell the difference to the 100 milliseconds delay in direct comparison, but it is nowhere near as stark as the comparison between 100 and 200 milliseconds. Decreasing the delay even further doesn’t make a noticeable difference to me.

So this gives us our performance target: We aim to get the delay of the shell prompt below 100 milliseconds. If possible, we try to get it below 50 milliseconds. Any improvements beyond that are probably a waste of time. (Note that this is based on my perception of the delay. For you, it might be different. Maybe you can live with a 150-millisecond delay, or maybe anything above 50 milliseconds feels unbearable. Try it out, and adjust your performance optimizations accordingly.)

Measuring performance

To conduct the actual performance measurements, we’ll be using hyperfine. hyperfine takes an arbitrary shell command and evaluates its runtime. For example, we can use

hyperfine 'sleep 0.1'

to measure the runtime of sleep in the experiment above. On my machine, it reports that the mean runtime is about 103.3 milliseconds (which shows that the experiment would not hold up to scientific standards, but luckily that wasn’t the intention anyway).

hyperfine works by running the command multiple times; every time it spawns a new shell that executes the command (the startup time of the shell is removed from the final numbers). By default, it uses sh as a shell, but we can specify our desired shell with the --shell option; in our case, we will use --shell bash most of the time. If the command that is benchmarked uses the file system, or benefits from caching in any other way, the first runs will typically be slower than later ones, skewing the results. To avoid this, we can specify a number of warm-up runs with the --warmup option. For the shell prompt, I don’t worry about the occasional slow execution, I mostly want that the prompt is fast most of the time. So for my measurements I’ll always add the --warmup option. If you are interested in the worst-case performance, you can use the --prepare option instead to clear caches between runs (see the hyperfine documentation for more details).

When we execute

hyperfine --shell bash '<command>'

hyperfine will essentially call bash -c '<command>'. This means that we can benchmark shell functions and even multiple commands executed in series. However, it also means that the spawned bash process doesn’t read any configuration files like ~/.bashrc or ~/.bash_profile. So any shell function that we want to benchmark we have to define first in the spawned shell. We’ll do this by specifying the function in a file, and then instruct hyperfine to first source that file and then execute the function. So a typical benchmark invocation will look like this

hyperfine \
    --shell bash \
    --warmup 10 \
    'source shell-function.bash && shell-function'

Sourcing the file adds a small overhead to the benchmark. We could subtract that overhead manually by benchmarking the source command in isolation, but in fact the overhead is usually less than a millisecond, so we will just ignore it.

Example 1: A Kubernetes context indicator

When you are using kubectl to access Kubernetes resources, most of the actions are context dependent, quite literally. A Kubernetes context is a combination of a cluster (the Kubernetes cluster that the action is executed on), a user (the user that executes the action), and a namespace (the Kubernetes namespace of the resources). If you set the current context with

kubectl config use-context context-name

then kubectl will use this context for the subsequent actions, unless explicitly instructed otherwise.

In the context, we can also set the namespace to use by default for all following commands. For a long time I wasn’t aware of it, and now I use it all the time, so let me quickly explain how that works. By default, when kubectl is used to query resources, for example

kubectl get pods

it queries the resources in the default namespace. To address resources in a different namespace, we have to use the --namespace command line flag (or -n for short), as in

kubectl get pods --namespace my-namespace

Specifying the same namespace explicitly in every command becomes annoying pretty quickly. So with

kubectl config set-context --current --namespace=my-namespace

we set my-namespace as the namespace in the current context. Any subsequent calls to kubectl will now use my-namespace as namespace, so there is no need for the --namespace flag anymore.

Adding a prompt for the context

I often use several clusters and namespaces, and switch between them quite frequently. Unfortunately, that means that I usually don’t know which cluster and namespace is currently configured. To make matters worse, by default, the current context is a global concept; it is read from and written to ~/.kube/config. That means if we set the current context in one terminal window and then execute kubectl in a different terminal window, it will use this new context (this behavior can be modified with the KUBECONFIG environment variable). So I need a constant reminder which context is currently in use and which namespace is set in the context; the shell prompt is an ideal place for this reminder.

As I mentioned above, a context is a triple of cluster, user, and namespace, so this is the information that we could show in the prompt. But in my case, the user never changes, so that’s not information which I need to be reminded of, and I won’t display it in the prompt. Furthermore, the cluster name in my case is always the same as the context name. So the prompt I’m using shows the current context and the namespace used in this context. The following function shows this information.

function kubernetes-context {
  local context namespace

  context=$(kubectl config current-context 2> /dev/null || echo "")

  if [ -z "$context" ]; then
    return
  fi

  namespace=$(
    kubectl config view \
      --output jsonpath="{.contexts[?(@.name == \"$context\")].context.namespace}"
  )

  echo -n "$context"

  if [ -n "$namespace" ]; then
    echo -n " ($namespace)"
  fi
}
kubernetes-context-simple.bash(download)

This does what we want; if I execute it locally, it outputs minikube (test-namespace). Unfortunately, it is fairly slow. On my machine, benchmarking this with hyperfine using

hyperfine \
    --warmup 10 \
    --shell bash \
    'source kubernetes-context-simple.bash && kubernetes-context'

reports a mean runtime of 100.6 milliseconds. This seems a lot for such a simple function, and according to our benchmark goals it is too slow, especially if we plan to show any additional information in the prompt.

Understanding the performance bottlenecks

Let’s start by benchmarking a single kubectl call.

hyperfine \
    --warmup 10 \
    --shell bash \
    'kubectl config current-context'

This reports a mean runtime of 47.6 milliseconds. So this single call alone almost depletes the whole time reservoir. We could now try to profile kubectl and see if we can improve its performance. But this would likely be a bigger task, and it would require at least some expertise in go which I don’t have. Luckily there’s a simpler way. I mentioned above that kubectl gets the information by default from ~/.kube/config. It is a yaml file, and in my case, it looks roughly like this.

apiVersion: v1
clusters:
- cluster:
    server: https://192.168.49.2:8443
  name: minikube
contexts:
- context:
    cluster: minikube
    namespace: test-namespace
    user: minikube
  name: minikube
current-context: context-name
kind: Config
preferences: {}
users:
- name: minikube

From this, we can extract the information directly, using yq. For example, we can extract the current context with

yq e '.current-context' ~/.kube/config

Benchmarking this single call with hyperfine reports a mean runtime of 4.7 milliseconds, an order of magnitude faster than the equivalent kubectl call! (Note: If you are using Linux, then don’t use the snap version of yq. snap doesn’t expose the yq binary directly; instead, it exposes yq as a symbolic link to /usr/bin/snap which will then execute the real yq binary. This indirection alone costs about 40 milliseconds per execution on my machine, which makes it too slow to use in the prompt. I compiled the version I am currently using from source.)

A faster context prompt

Replacing the kubectl calls with yq yields the following function.

function kubernetes-context {
  local context namespace

  if [ -n "$KUBECONFIG" ]; then
    echo -n "\$KUBECONFIG is not supported"
    return
  fi

  context=$(yq e '.current-context // ""' ~/.kube/config)

  if [ -z "$context" ]; then
    return
  fi

  namespace=$(
    yq e "(.contexts[] | select(.name == \"$context\").context.namespace) // \"\"" \
      ~/.kube/config
  )

  echo -n "$context"

  if [ -n "$namespace" ]; then
    echo -n " ($namespace)"
  fi
}
kubernetes-context.bash(download)

Note that this function only works if kubectl really reads its configuration from ~/.kube/config. If the KUBECONFIG variable is set then the information in ~/.kube/config could be invalid. We could adapt the function to also honor the KUBECONFIG variable, but for simplicity we just add a safeguard for now.

This version is indeed much better than our initial version: hyperfine reports a mean runtime of 13.8 milliseconds. So if we don’t have any other expensive functions in our shell prompt, and we don’t use KUBECONFIG, then we can stop here. But if the performance of the overall prompt is still not good enough and shaving of 10 milliseconds makes a difference, we can improve performance of the Kubernetes context prompt by an order of magnitude by converting the shell function into a binary. The equivalent of the shell function as a go program (source; as I already mentioned, I’m not an experienced go programmer, so there might be better ways of doing this) has a mean runtime of less than a millisecond. With this, we can really stop. If the prompt is still too slow, then this is due to other functions; optimizing the Kubernetes prompt even further will not add any benefit.

Example 2: A git status indicator

Something that’s invaluable to me to have in the prompt is a git status indicator; if the current directory is within a git repository, it shows information like the currently checked out branch, whether there are any changes to files that are not committed yet, what the status of the branch is in relation to its remote tracking branch, and more. For example, my prompt (using fish) currently looks like this:

~/projects/blog (main ✚…↑8) $

This tells me that I’m on the main branch which is 8 commits behind its remote tracking branch; there are some files with unstaged changes and some untracked files. There are no staged changes and nothing on the stack. For bash, a shell function which shows this information is shipped with git itself; for fish, such a function is shipped with fish.

I find this incredibly helpful and rely on this functionality a lot. But when I am in a large git repository, I notice that the prompt becomes a bit slow. Intuitively that makes sense: if there are a lot of files in the repository, checking if there are any changes should take longer than if there are only a few files. But intuition can also be deceiving; let’s see if we can really understand what’s causing the slowness, and whether there is something we can do about it.

Setting up a benchmark

As a large repository for the benchmarks, we’ll use the Linux kernel source code and run all benchmarks in the root directory of this repository. I’ll use the bash version of the status indicator; the fish version is based on the bash version and performs similarly.

By default, the indicator only shows the current branch; more functionality can be enabled using shell variables.

The first three need to be set to a non-empty value to be activated; the last one can take several values which are described in the documentation. There are a few more variables, but those four are the ones I’m using. For the benchmark, we’ll use the following file.

source /usr/lib/git-core/git-sh-prompt

GIT_PS1_SHOWDIRTYSTATE=true
GIT_PS1_SHOWUNTRACKEDFILES=true
GIT_PS1_SHOWSTASHSTATE=true
GIT_PS1_SHOWUPSTREAM=verbose
git-prompt.bash(download)

The main function that is exposed by /usr/lib/git-core/git-sh-prompt is __git_ps1. There are several possible ways to call it, but the most straight-forward one is to call it without any arguments, in which case it prints the information to stdout, enclosed in parentheses. For example, we could get the equivalent of my prompt above with

function shell-prompt {
    PS1="\w$(__git_ps1) $ "
}
PROMPT_COMMAND=shell-prompt

(the \w produces the abbreviated working directory). For the benchmarks, we are only interested in the performance of __git_ps1 itself, so we will benchmark it in isolation. Running

hyperfine \
    --warmup 10 \
    --shell bash \
    'source git-prompt.bash && __git_ps1'

in the kernel repository reports a mean runtime of 170 milliseconds; this is way above our goal of 100 milliseconds, even without any further functionality in the prompt.

Let’s start by trying out if any of the options is problematic by benchmarking them one by one (that is, we set only one of the variables and unset all the other ones). We get the following runtimes.

Benchmarking the execution without setting any variable yields a runtime of 7 milliseconds. That means that showing the stash state adds an overhead of 1.7 milliseconds, showing upstream deviation adds 3.6 milliseconds, the dirty state adds 58 milliseconds and the untracked files 99 milliseconds. So stash state and upstream deviation are harmless, but the other two warrant closer investigation.

Understanding the performance bottlenecks

To understand what’s going on, let’s see how they are used inside __git_ps1. For GIT_PS1_SHOWDIRTYSTATE, the relevant part of the code is

if [ -n "${GIT_PS1_SHOWDIRTYSTATE-}" ] &&
   [ "$(git config --bool bash.showDirtyState)" != "false" ]
then
    git diff --no-ext-diff --quiet || w="*"
    git diff --no-ext-diff --cached --quiet || i="+"
    if [ -z "$short_sha" ] && [ -z "$i" ]; then
        i="#"
    fi
fi

(source). There are two calls to git diff; the first to check whether there are any unstaged changes, and the second to check whether there are any staged changes (--cached is an alias for --staged). In both cases, the --no-ext-diff option causes git to use its internal diff driver, and the --quiet option suppresses the output and instructs git to return with an exit code 0 if there are no changes, and 1 if there are. So for example if there are some unstaged changes, then git diff --no-ext-diff --quiet will exit with status code 1. Since this is interpreted as false in the shell, w="*" is executed, that is, the variable w is set to *.

If we benchmark those two calls in isolation, we see that the first one has a mean runtime of 38.6 milliseconds, and the second one of 11.5 milliseconds. That makes sense, since the first one probably has to look at all files in the repository, while the second one only has to look at the index. We can validate this hypothesis with strace: executing

strace -f git diff --no-ext-diff --quiet

shows several calls to clone, and in the created subprocesses thousands of calls to lstat. So it looks like git spawns several threads which then visit every file in the repository, looking for changes. On the other hand, attaching strace to the call with --cached shows that the only file access is to files within the .git directory.

Let’s also try to understand the case of untracked files. The relevant code in the shell function is

if [ -n "${GIT_PS1_SHOWUNTRACKEDFILES-}" ] &&
   [ "$(git config --bool bash.showUntrackedFiles)" != "false" ] &&
   git ls-files \
     --others \
     --exclude-standard \
     --directory \
     --no-empty-directory \
     --error-unmatch -- ':/*' \
     >/dev/null 2>/dev/null
then
    u="%${ZSH_VERSION+%}"
fi

(source; I added some line breaks for readability). By default, git ls-files lists all files that are either already committed, or that are added to the index. With the --others option, it shows all files that are not tracked by git instead. That would usually also show files that are supposed to be ignored (for example if they are listed in a .gitignore file); the --exclude-standard option prevents this. The next two options were added as a performance improvement in case of directories with a lot of untracked files: if there is a whole directory containing only untracked files, the --directory option outputs only the directory, not every individual file. This would also list empty directories, which is prevented here with the --no-empty-directory option. Finally, the --error-unmatch option works in concert with ':/*. The latter is a pathspec; :/ means to start matching from the root of the repository; * means to match any pathname, but it is actually not necessary here. The --error-unmatch option instructs git to exit with a status code 1 if there are no files produced by git ls-files which match the pathspec. So in summary, the command exits with a status code 0 if there are untracked files, and with 1 if there aren’t. Since the command is used in an if condition, this means that the variable u is set if and only if there are untracked files.

We also want to benchmark this call. Trying to benchmark this call verbatim does not work though: hyperfine stops when the benchmarked command returns a non-zero status code, which happens in our case since there are no untracked files. But we can benchmark entire command lines, so to get a command line that ends with a zero status code we can simply append || true:

hyperfine \
    --warmup 10 \
    --shell bash \
    "git ls-files --others --exclude-standard --directory --no-empty-directory --error-unmatch -- ':/*' || true"

On my machine, this has a mean runtime of 101 milliseconds. Running the command with strace confirms that it walks through all directories and lists their contents.

Improving performance

Now is there anything we can do to improve this? Improving the performance of git itself seems hard, especially since git was designed with performance in mind, and probably a lot of work has already gone into making it fast. Unfortunately, contrary to the Kubernetes example there doesn’t seem to be an obvious way around using git. One thing that could be changed is the number of calls to git. Currently, it is called three times, once each to check for staged, unstaged, and untracked files. This can be combined into a single git status call, which is done in other git prompts like zsh-git-prompt. Benchmarking this on my machine with

hyperfine \
    --warmup 10 \
    --shell bash \
    "git status --porcelain --untracked-files=normal"

shows a mean runtime of 132.7 milliseconds, about 20 milliseconds less than the three other three calls to git combined. This is a bit better, but still above our target of 100 milliseconds. It also has the potential disadvantage that if you have a lot of non-committed files in your repository, the git status call will enumerate all of them, whereas the individual git calls terminate as soon as they find one of the files.

A different way to improve performance is by reducing functionality. One obvious way to reduce functionality is to just disable the expensive features completely. We don’t even have to do this globally, as we can see at the relevant code sections above; we can do it on a per-repository basis, using git config. Executing

git config bash.showDirtyState false

inside a git repository disables the dirty state indicator for this repository, and similarly for the untracked files. This certainly works, and we meet our performance goal, but we are also missing out on a lot of functionality. But luckily, completely disabling the feature might not be necessary.

In large repositories, its usually unlikely that you touch all components at the same time. Most of the time, the changes are confined to a few components. For example, the top directory of the Linux kernel looks like this

ls
arch/    Documentation/  Kbuild       Makefile  security/
block/   drivers/        Kconfig      mm/       sound/
certs/   fs/             kernel/      net/      tools/
COPYING  include/        lib/         README    usr/
CREDITS  init/           LICENSES/    samples/  virt/
crypto/  ipc/            MAINTAINERS  scripts/

There are various directories, each containing different components of the code. Now let’s say we are working on networking code; then all changes are probably to files somewhere in the net/ directory. When checking if there are unstaged changes, we can restrict the check to this directory with a pathspec:

git diff --no-ext-diff --quiet -- :/net

hyperfine reports a mean runtime of 14.4 milliseconds; less than a third of the check without any restrictions! Similarly, benchmarking the git ls-files call using the :/net pathspec reports a mean runtime of 19.6 milliseconds, less than 20% of the original runtime! This looks very promising. We can even specify multiple pathspecs, in case that our changes span multiple directories; for example

git diff --no-ext-diff --quiet -- :/net :/lib

Let’s change the shell function to make use of this. We follow the approach of enabling it through git config; that way, we can impose the restrictions only for repositories which are too big to enable the full functionality. Here are the relevant changes to __git_ps1:

local pathspecs pathspec_array
pathspecs=$(git config bash.pathspecs) 
IFS='|' read -r -a pathspec_array <<< "${pathspecs:-:/}"

if [ -n "${GIT_PS1_SHOWDIRTYSTATE-}" ] &&
    [ "$(git config --bool bash.showDirtyState)" != "false" ]
then
    git diff --no-ext-diff --quiet -- "${pathspec_array[@]}" || w="*"
    git diff --no-ext-diff --cached --quiet || i="+"
    if [ -z "$short_sha" ] && [ -z "$i" ]; then
        i="#"
    fi
fi

if [ -n "${GIT_PS1_SHOWUNTRACKEDFILES-}" ] &&
    [ "$(git config --bool bash.showUntrackedFiles)" != "false" ] &&
    git ls-files \
        --others \
        --exclude-standard \
        --directory \
        --no-empty-directory \
        --error-unmatch \
        -- "${pathspec_array[@]}" \
        >/dev/null 2>/dev/null
then
    u="%${ZSH_VERSION+%}"
fi

The config entry that we use is called bash.pathspecs. It can contain one or multiple pathspecs, separated by |; if it does not exist, it defaults to :/ (matching all files in the repository). The read command splits the string at | and stores the individual parts in an array variable, which is then passed to the git calls which determine unstaged changes and untracked files. We don’t pass it to git diff --no-ext-diff --cached --quiet; this call is not a performance bottleneck, and showing all staged changes makes sense, even if they are outside the directories specified by the pathspecs.

Let’s benchmark this improved version, first restricting it to the net directory.

git config bash.pathspecs ":/net" &&
  hyperfine \
    --warmup 10 \
    --shell bash \
    'source git-prompt.bash && __git_ps1'

On my machine this reports a mean runtime of 60.3 milliseconds, close to our optimal goal of 50 milliseconds. Let’s also try a case with multiple pathspecs. With git config pathspecs ":/net|:/include|:/lib" the mean runtime is 68.4 millisecond, which is still very acceptable. Of course the runtime depends on the size of the included directories; for example, with git config pathspecs ":/drivers", the runtime is 102 milliseconds (the drivers/ directory contains almost 30,000 files and 2,000 directories, whereas the net/, include/, and lib/ directories combined contain less than 10,000 files and 400 directories). So it depends on your use case whether restricting the scope of the checks makes a performance difference or not.

A note on the fish prompt

The fish git prompt is based on the bash git prompt, it even uses the same names for the git config keys to disable some prompt features per repository. The equivalent setting to the ones that we used for the bash prompt above would be

set -g __fish_git_prompt_showdirtystate true
set -g __fish_git_prompt_showuntrackedfiles true
set -g __fish_git_prompt_showstashstate true
set -g __fish_git_prompt_showupstream verbose
git-prompt.fish(download)

We could then benchmark the function using

hyperfine \
    --warmup 10 \
    --shell fish \
    'source git-prompt.fish && fish_git_prompt'

and use the same techniques as above to understand which parts of the function contribute to performance problems. But even better, fish has a built-in profiler! Executing the script with

fish \
    --profile=profiler.output \
    --command='source git-prompt.fish && fish_git_prompt'

creates a file profiler.output which contains timing information for all executed commands. (Unfortunately, I discovered this only when this post was already finished.)

One thing to note is that on my machine, even with all options turned off, the function takes a mean runtime of 20.4 milliseconds. A large part of this is due to setting up colors and characters to use for the prompt. In a real fish session, this only happens for the first invocation of the function; but since hyperfine spawns a new shell for each run, this setup is done in each run. We can avoid this by adding the line set -g ___fish_git_prompt_init true to git-prompt.fish. On my machine, this brings the mean runtime down to 11.5 milliseconds, which is not far off from the 7 milliseconds in the bash version. We can then implement honoring the bash.pathspecs config entry as we did for the bash version.

Further benchmarks

So far, we only considered a single test case: the Linux kernel repository without any changes. For a proper performance investigation, we should also test various other states, for example a few or lots or unstaged changes; a few or lots of staged changes; divergences of the local branch from the remote tracking branch; etc. But in the end, the states that matter are the ones that you encounter in your day-to-day work. Hopefully, you can apply similar techniques to benchmark and improve the performance, if you experience any slow down in your repositories.

Conclusion

There is a lot of useful information that can be shown in a shell prompt, but for me there is a trade-off between the information shown and the performance of the prompt. A snappy shell prompt is really important to me. I have shown a few ways how I ensure that my prompt stays that way, from reaping low-hanging performance fruit by replacing slow components of a function with faster ones, to reducing functionality to a still useful minimum.

What constitutes “snappy” is a matter of personal preference. I hope that this post gives you some ways to quantify this preference, and tools to achieve this performance with your shell prompt.

—Written by Sebastian Jambor. Follow me on Mastodon @crepels@mastodon.social for updates on new blog posts.