cd is not a program
Or rather, it is not a standalone executable. It is a shell builtin. Let me explain what that means and why it makes a difference.
Note: From now on, when I write shell I mean either bash or fish. I assume that most things are also true for most other shells, but I haven’t actually checked for any of them.
To explain the difference between a shell builtin and a standalone executable, let’s look at two concrete examples and how they are executed from the viewpoint of the shell. Let’s compare
cd is a shell builtin, which means that it is a function in the shell code that is exposed to the user. So if we execute
cd some-dir, the shell can just call this function with
some-dir as an argument. On the other hand,
cat is a standalone executable. If we execute
cat some-dir/some-file, the shell first tries to find the absolute path to
cat using the
PATH variable; usually, it is located at
/bin/cat. It then creates a new process using the
fork system calls, and within the new process calls the
execve system call, passing it the absolute path to the command (
/bin/cat), the list of arguments (
some-dir/some-file), and the environment (more on the environment later). So calling a shell builtin is not only much more lightweight than calling an external executable, the builtin also has access to internals of the shell since it is called within the same process.
cd builtin is used to change the working directory of the shell. The concept of a working directory exists on the kernel level. Every process has one. On a Linux system you can see it in the
/proc file system:
/proc/<pid>/cwd is a symbolic link to the working directory of the process with id
<pid>. You can also see the working directory of the shell by executing
pwd (another shell builtin). The working directory is used by the kernel whenever the process wants to access a file or a directory with a relative path. For example, let’s look at
again. When this is executed,
some-dir/some-file to the
open system call (see the cat source code). The kernel then takes the current working directory of
some-dir/some-file, and tries to open that file (the details can be a bit more complicated since there might be symbolic links involved; the man page on
path_resolution has the full details).
When a process is created, it inherits the working directory from its parent. In the example, since
cat is created as a child process of the shell, it inherits the shell’s working directory. The shell in turn inherits its working directory from its parent, and so forth, all the way to the init process which has
/ as a working directory (you can verify this by looking at
/proc/1/cwd). If that was all, then every process would have working directory
/, which would not be very useful. But a process can change its working directory with the
chdir system call. And that’s exactly what
cd does! (See for yourself in the bash source code and fish source code.)
cat example, if we execute
cd /var && cat log/syslog
cd /var changes the working directory of the shell to
/var. Afterwards, the shell creates
cat as a child process and passes
log/syslog as an argument to it;
cat takes this argument and passes it to
/var as a working directory from the shell, so when the kernel sees the relative path
log/syslog passed to
open, it prepends the working directory and tries to open
/var/log/syslog. In other words, this inheritance of the working directory means that everything magically works just as expected.
Note that a process can only change its own working directory. It has no control over the working directory of any other process (except its child processes). That is why
cd cannot be a standalone executable. The
chdir call has to be executed within the shell process. And that’s why
cd must be a shell builtin. (Note that this isn’t the case for all shell builtins; some of them could also be standalone executables. For example
pwd is a shell builtin, but it wouldn’t have to be. It could also be a standalone executable that reads the working directory of its parent process. In fact, there is the
pidx executable which shows the working directory of any process.)
So why does it make a difference whether
cd is a builtin or a standalone executable? Let me give three examples.
cd in shell scripts
I occasionally see shell scripts which have a structure like this:
#!/bin/bash cur_loc=$(pwd) # ... main part of the script which includes some calls to cd ... # return to original subdirectory cd "$cur_loc"
The intention is that when the script is executed from the shell, make sure that we don’t end up in some random directory when the script returns. But this is completely unnecessary. When the script is executed, a new
bash process is created which has its own working directory. Any
cd in the script will only affect the working directory of the new
bash process; the parent process (our interactive shell from which we executed the script) is unaffected. So the last
cd is essentially a no-op: it changes the working directory of the
bash process, and immediately afterwards the process ends. It is not harmful either; but usually less code is better, especially if the code in question has no purpose, so we should just remove a trailing
Shortcuts to change a directory
Sometimes there might be a directory that you have to go to frequently. Especially if it is deeply nested, typing this out every time might be tedious. A tedious task in the shell is often something that can be simplified with a shell script. But we just saw that changing directories with a shell script is not possible. We have to execute the
cd within the current process of the shell. This is a good candidate for an alias. For example, if we realize that we often go to the directory
/usr/share/fish/completions, we can add an alias to the shell init files, like
alias completions="cd /usr/share/fish/completions".
CDPATH is a shell feature that changes the behavior of
cd. By default, if we execute
then the shell will look for the directory
some-dir in its current working directory. If the directory exists, the shell changes its working directory to this subdirectory, otherwise it returns an error. This behavior can be changed with the
CDPATH variable. If it is nonempty, its contents is interpreted as a list of directories and those are used for the search path instead of the current working directory.
For example, assume that
CDPATH is set to
/usr/local:/var/local. If we now execute
cd some-dir, then the shell will first check whether
/usr/local has a subdirectory
some-dir, and if so, change its current working directory to
/usr/local/some-dir. Otherwise, it checks whether
/var/local/ has a subdirectory
some-dir if so, it changes the current working directory to
/var/local/some-dir. If none of this was successful, both bash and fish then check whether
some-dir is a subdirectory of the current directory, and change the working directory to this subdirectory if it does (in other words, bash and fish implicitly add
. to the end of the
CDPATH list). Only if none of those directories exists
cd will fail.
CDPATH incredibly useful. I keep all my projects in
~/projects and set my
.:~/projects. This way, I can get to any of my project directories from anywhere in the file system. Furthermore, fish supports tab completion across
CDPATH. So if my current working directory is
/var/lib and I type
cd mb<tab>, then fish will auto-complete this to
cd my-blog and take me to
But there is one potential problem with this feature, and that comes from the confusion of shell variables and environment variables (at least I was confused by this).
Just like the working directory, the environment is a concept that is defined on the kernel level. Every process has an environment; on Linux systems you can see it at
/proc/<pid>/environ. It is an array of pointers to strings. By convention the strings have the form
key=value, but this is not a requirement; these strings are interpreted as environment variables, in this case
key is the name of the variable and
value is its value. And just as the working directory, a process inherits the environment from its parent.
If a process executes another program with the
execve system call, it can change the environment for the process with the last parameter of the system call (the last
e stands for
v stands for
argument vector). This is for example how
env allows you to execute a program with the environment of your choosing. It will compile the list of environment variables into an array and pass it (together with the program to execute and the arguments) to the
Now to access the environment, a program will usually use the C standard library (either directly, or through a function in a higher level language that internally uses the C standard library). It provides access to the environment through the
environ variable and through the third argument of the main function
int main(int argc, char *argv, char *envp) (they initially point at the same array). It also provides the functions
putenv, which are used to modify
environ. These functions do not however modify the environment that the kernel sees (and which we can see through
/proc/<pid>/environ). For example, if you use
setenv to add a new variable and call
fork() to create a child process, the child process will inherit the original environment which does not include the effects of the
setenv. In practice this is not really relevant, since the child will also inherit the
environ variable which does include the effects of the
setenv; so on a C standard library level, inheriting environments between parent and child works as expected. But it shows that there are two slightly different views on the environment.
The purpose of environment variables is to control the behavior of programs or libraries. For example,
git uses the
EDITOR environment variable to determine which editor to use to create commit messages. A common way to define environment variables is to create shell variables and to tell the shell that these variables should be added to the environment of any program that is executed.
Shell variables are like variables in any other programming language. They have a name and a value, and whenever we specify the name somewhere, the shell will replace it with the corresponding value. For example, if we have a shell variable named
var with value
contents and then execute
the shell will replace
contents and execute
echo contents instead.
Some shell variables are already created when the shell is initialized; for example, bash creates
BASH_VERSION, and fish creates
FISH_VERSION. We can also create new variables, using
NAME=VALUE in bash or
set NAME VALUE in fish. And then there is another source for shell variables: the environment. When the shell is initialized, it reads
environ and makes its contents available as shell variables. This means that every environment variable is also a shell variable. That’s why we can access the
HOME environment variable as
Not only will the shell make environment variables available as shell variables during startup, we can also tell the shell to turn shell variables into environment variables for new processes. For a shell variable to become an environment variable for a child process, it needs to be marked as exported (for example, using
export VARIABLE in bash or
set -x VARIABLE in fish; the variables that come from
environ are automatically marked as exported). When we instruct the shell to execute another program (not a shell builtin), it compiles a list of all exported shell variables, and those define the environment for the executed program (it does this by executing the program with the
execve system call and passing it the exported shell variables as the environment parameter). So in the example above, if we want to define an
EDITOR environment variable for
git, we can do so by creating a shell variable
EDITOR and then exporting it.
Environment variables control the behavior of programs and libraries. Similarly, shell variables can control the behavior of the shell and its builtins. For example, the
fc bash builtin uses the
EDITOR variable to determine which editor to use to edit commands from the history list. This is similar to the
git example above. The difference is that
fc is a builtin, so
EDITOR does not have to be exported. Since
fc is a function within bash it has direct access to the shell variables, it doesn’t need environment variables. In fact, after it read the
environ variable during startup, it will not read any variables from the environment again, it will always use the shell variables instead.
So if we define
EDITOR as a (non-exported) shell variable, then
fc will see it and
git won’t. This might create confusion. To understand whether a variable needs to be exported or not, you need to know whether the command you want to be affected is a shell builtin or a program. It might be tempting to just export all variables so that any command can see it. But this is problematic, which brings us back to
The problem with
CDPATH as an environment variable
On first thought it might make sense to export the
CDPATH variable. There exist similar
MANPATH variables which need to be exported, and even the man page on
CDPATH as an example of an environment variable. But
cd is not a program; it is a shell builtin. So as we just saw, it does not read
CDPATH from the environment, it uses it as a shell variable. This means that it is not only unnecessary to export
CDPATH, doing so can lead to undesired behavior and weird bugs as described in this blog post.
One problem is that when
CDPATH is set and
cd is called with a relative path that is not
.., then bash will output the absolute path of the new working directory. This leads to bugs in some bash scripts that try to get the parent directory of the script like this:
#!/bin/bash SCRIPT_DIR=$(cd "$(dirname "$0")" && pwd)
This is intended to work like this:
dirname "$0" prints the (relative) path of the directory containing the script (
$0 contains the name of the script that is called); the relative path is passed to
cd which changes the working directory of the subshell; then
pwd outputs the absolute path of the current directory, which is captured in
SCRIPT_DIR. This mostly works, unless the user has defined
CDPATH and exported it. Then it becomes an environment variable for any child of the shell. In particular, it is visible to the bash process that executes this script, which means that
cd "$(dirname $0")" might print the absolute path of the script. Since
pwd also prints the absolute path,
$SCRIPT_DIR now contains two lines, both containing the path, which will most likely lead to problems further down in the script.
Now this might be seen as a bug in the script which should guard against an exported
CDPATH. In fact, this Stack Overflow answer gives a more robust snippet to compute the parent directory of a bash script that guards against an exported
CDPATH and a range of other problems (although I think in a lot of cases it would be enough to get a relative path, so the snippet above could be replaced with
SCRIPT_DIR=$(dirname "$0")). But the truth is that not all shell scripts are written in a robust way, which makes a
CDPATH environment variable problematic.
Another problem comes with non-existing directories. Shellcheck warns against a line in a shell script that’s a single
cd <somedir> and advises to change it to
cd <somedir> || exit. Otherwise, there might be dangerous consequences if
<somedir> does not exist; for example some later line might be something like
rm *. The
|| exit ensures that the shell script does not continue if the directory does not exist. But a
CDPATH environment variable can defeat the
||exit guard: if
<somedir> does not exist in the working directory of the script but it does exist somewhere in the
CDPATH, then the
cd will still succeed and potentially remove files or do other destructive or at least unintended things.
CDPATH should not be exported. It is enough to define it as a shell variable, and since
cd is a shell builtin, it will be able to pick it up. Additionally, it should be prevented to be available to shell scripts. In bash, this can be accomplished by defining it in
.bashrc which is only read by interactive shells. In fish, the config files are also read by fish scripts, so here it is necessary to guard the
CDPATH definition as follows.
if status --is-interactive set CDPATH <list of paths> end
Other shell variables
CDPATH is an especially problematic instance of a shell variable that shouldn’t be an environment variable, but there are other ones as well. For example, bash uses
PS1 to control the prompt. This variable only makes sense in interactive shells. In fact, a common test to check whether bash is run interactively is to check whether
$PS1 is defined. So this variable should not be exported either; but a quick search on GitHub shows that it isn’t uncommon to be exported in
bashrcs. Other variables like
HISTSIZE are not actively harmful when they are exported, but they pollute the environment of child processes (this is more of an aesthetic issue, like the trailing
cd in shell scripts).
So I think a good rule of thumb is to not export any variables, unless you are sure that they are used by another program or library.
I found it very helpful to look at the shell source code. The bash source code can be quite intimidating, the fish source code is much more accessible. But both of them were surprisingly easy to experiment with. For example, it took me about 10 minutes to hack in a new builtin to bash that prints out the contents of the
environ variable. (I wanted to find out whether bash updates
environ when an exported shell variable is changed; I couldn’t say for sure by just looking at the code. It turns out that bash updates
environ, whereas fish does not.)
I also found this Stack Exchange answer very helpful to understand the different views of the kernel and the C standard library on the environment.
—Written by Sebastian Jambor. Follow me on Mastodon @firstname.lastname@example.org for updates on new blog posts.