Case Study for Bash and Node - __dirname

__dirname is a variable that is available to all scripts that are run by Node.JS - it contains a string that refers to the directory name where the currently executing script is contained. The variable is set on a per-file basis, so any script that is sourced using require will have its own __dirname variable that points to the directory where the script itself is contained.

Example In Node

Take the following node script located at /tmp/node-one

console.log(__dirname);

When executed, we can see

$ node /tmp/node-one
/tmp
$ cp /tmp/node-one /var/tmp/node-one
$ node /var/tmp/node-one
/var/tmp
$ cp /tmp/node-one ~/node-one
$ node ~/node-one
/home/dave

And, when using require, given the following script in ~/two.js

require('/tmp/node-one');

When executed, we can see

$ node ~/two.js
/tmp

Even though ~/two.js resides in /home/dave, it prints /tmp becasue the script being sourced resides in /tmp.

Because of this behavior, it is very easy and elegant for node scripts to require one another without needing to know an absolute path ahead of time. By using only relative require statements, all paths will be made relative to __dirname implicitly.

Problems

This behavior however, relies on very specific situations for the script to be executed. Imagine the case where node does not know where the JavaScript bytes are coming from. For example, using the original node-one script above:

$ cat /tmp/node-one | node
.

We get . as our __dirname - . is synonymous with the current directory we are in, ie our CWD or PWD. So imagine this:

$ cd /tmp
$ cat /tmp/node-one | node
.
$ cd /foo/bar
$ cat /tmp/node-one | node
.

In both examples we are told the __dirname of the executing script is set to ., when in reality this just isn’t true. The truth is node has no idea where the executing script is located, as it is only receiving the contents of the script over stdin.

Note: ~~I may open an issue for Node in the future saying that, in the event the executing script is unknown, `__dirname` should be `undefined`~~ I’ve opened an issue for this here https://github.com/joyent/node/issues/15444.

Bash Implementation

The reason I stumbled upon this was because I was attempting to emulate __dirname in Bash. A quick google search reveals these tweets by @rvagg and @isz:

@rvagg `__dirname`="$(CDPATH= cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
— THAT isaac (@izs) November 24, 2013

Using this implementation, we have the following script (called /tmp/bash-one)

#!/usr/bin/env bash
__dirname="$(CDPATH= cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
echo "$__dirname"

When executed, we can see

$ bash /tmp/bash-one
/tmp
$ cp /tmp/bash-one /var/tmp/bash-one
$ bash /var/tmp/bash-one
/var/tmp
$ cp /tmp/bash-one ~/bash-one
$ bash ~/bash-one
/home/dave

So far, so good. Let’s check out some of the problem cases:

$ cd /tmp
$ cat /tmp/bash-one | bash
/tmp
$ cd /foo/bar
$ cat /tmp/bash-one | bash
/foo/bar

And it has the same problem as Node above - when it can’t determine the location of the executing script, it prints the current working directory, which may or may not be true.

However, based on the test cases we have so far, this Bash implementation seems to be on par with Node’s implementation.

symlinks

Now things get interesting - what if the executing script was actually a symlink pointing to a JavaScript or Bash source file? Let’s find out.

First, we’ll create the environment: ~/foo and ~/bar

$ cd ~
$ mkdir foo bar

Next, we’ll copy in the bash and node scripts to ~/foo

$ cp /tmp/bash-one /tmp/node-one ~/foo

Finally, we’ll symlink them to ~/bar

$ cd ~/bar
$ ln -s ../foo/node-one
$ ln -s ../foo/bash-one
$ ls -l
lrwxrwxrwx 1 dave other 15 Apr 13 15:13 bash-one -> ../foo/bash-one
lrwxrwxrwx 1 dave other 15 Apr 13 15:13 node-one -> ../foo/node-one

Now, let’s test the node implementation first:

$ pwd
/home/dave/bar
$ node node-one
/home/dave/foo

Even though we are currently inside ~/bar, and the symlink is here as well, the node script still prints ~/foo - meaning Node resolves symlinks when setting the __dirname variable.

Let’s see how the bash implementation holds up:

$ pwd
/home/dave/bar
$ bash bash-one
/home/dave/bar

The bash implementation fails to deal with the case where the executing script is itself a symlink.

The Fix

So there are a couple of problems to fix with the bash implementation

error out if the __dirname cannot be determined, ie. the script is being read via stdin
handle case where executing script is a symlink

1: error out if location is unknown

Looking at the implementation, the secret lies in the $BASH_SOURCE array, so let’s do some testing using it. Creating a new script, ./bash-two:

echo "${BASH_SOURCE[0]}"

We can see that

$ bash bash-two
bash-two
$ bash ./bash-two
./bash-two
$ cat bash-two | bash

Using this knowledge, we know that if ${BASH_SOURCE[0]} is empty, we should give up right away as the current script location is unknown.

2: resolve symlinks

Resolving symlinks is done with the readlink(2) syscall. Unfortunately, Bash does not have any builtin bindings for this call, so we must rely on external utilities. Because of this, we are at risk of this implementation not being fully portable. Let’s still try though - the basic logic will be, while the file is a symlink, call readlink(1) on it:

while [[ -L $file ]]; do
    file=$(readlink "$file")
done

There are a couple of cases thet need to be considered though. First, readlink(1) is non-standard, so we should try to be as liberal with our error handling as possible. Not only should we check the exit code, we should also check the resulting variable to assert that it is not empty. The above code becomes:

while [[ -L $file ]]; do
    file=$(readlink "$file")
    if (($? != 0)) || [[ -z $file ]]; then
        return 1
    fi
done

We are almost there, the final consideration is the nature of symlinks. Symlinks don’t need to be absolute, they can be relative to the file itself. In the above code example, we wipe out the original file and only keep the output of readlink(1)… imagine this example:

$ pwd
/home/dave/bar
$ readlink bash-one
../foo/bash-one

Because we were in the ~/bar directory, the relative path shown by readlink(1) makes sense, but if we were somewhere else, it would mean something completely different.

$ cd /tmp
$ readlink ~/bar/bash-one
../foo/bash-one

So the problem becomes ../foo/bash-one inside of /tmp will not point to the correct file, it will point to /tmp/../foo/bash-one, or /foo/bash-one, which probably won’t exist. We must make the output of readlink(1), if it is not already absolute, be relative to the original file. This gives us code like this:

local rl=$(readlink "$file")
if [[ ${rl:0:1} == '/' ]]; then
    file=$rl
else
    file=$(dirname "$prog")/$rl
fi

If $rl (the output of readlink(1)) does not start with a ‘/’ (meaning it is relative) make it relative to the dirname(1) of the original program name.

Putting this altogether, we have the following implementation of __dirname in bash that will error out if anything goes wrong.

#!/usr/bin/env bash
#
# Node.JS style __dirname in bash
#
# Author: Dave Eddy <[email protected]>
# Date: April 13, 2015
# License: MIT

__dirname() {
        local prog=${BASH_SOURCE[0]}
        [[ -n $prog ]] || return 1

        # resolve symlinks (of script)
        while [[ -L $prog ]]; do
                local rl=$(readlink "$prog")
                # readlink(1) is not portable, so assert it exits 0
                # and also returns non-empty string
                if (($? != 0)) || [[ -z $rl ]]; then
                        return 1
                fi

                # symlinks can be relative, in which case make them
                # "relative" to the original program dirname
                if [[ ${rl:0:1} == '/' ]]; then
                        prog=$rl
                else
                        prog=$(dirname "$prog")/$rl
                fi
        done

        # reslove the dir
        (CDPATH= cd "$(dirname "$prog")" && pwd)
}
__dirname=$(__dirname)
if (($? != 0)) || [[ -z $__dirname ]]; then
        echo 'failed to determine __dirname' 2>&1
        exit 1
fi
unset -f __dirname

# if we are here, dirname is set
echo "$__dirname"

It’s not pretty, but this is the closest thing to a bullet-proof solution that Bash has for __dirname

Example

It handles the general case

$ cd ~/foo
$ ./__dirname
/home/dave/foo

It handles the symlink

$ cd ~/bar/
$ ln -s ../foo/__dirname
$ ./__dirname
/home/dave/foo

It handles the symlink when called from an arbitrary directory

$ cd /tmp
$ ~/bar/__dirname
/home/dave/foo

It also handles being sourced

$ echo '. ~/foo/__dirname' > /tmp/bash-three
$ bash /tmp/bash-three
/home/dave/foo

And finally, it acts properly when given a situation where it can’t be determined

$ cat ~/foo/__dirname | bash
failed to determine __dirname

The Solution

The implementation I’ve given attempts to deal with this problem by only setting __dirname if it is known 100%, otherwise it’ll return an error and say the path cannot be determined.

The actual solution to this problem is outlined in the Wooledge BashFAQ

http://mywiki.wooledge.org/BashFAQ/028

Too often, people believe the configuration of a script should reside in the same directory where they put their script. This is the root of the problem.

A UNIX paradigm exists to solve this problem for you: configuration artifacts of your scripts should exist in either the user’s HOME directory or /etc. That gives your script an absolute path to look for the file, solving your problem instantly: you no longer depend on the “location” of your script:

Namely, don’t rely on the location of your script on the disk, as that can’t be determined in a fully deterministic way.