Case Study for Bash and Node - __dirname
Posted by Dave Eddy on Apr 13 2015 - tags: tech__dirname is a variable that is available to all scripts that are run by
Node.JS - it contains a string that refers to the directory name where
the currently executing script is contained. The variable is set on a per-file basis,
so any script that is sourced using require
will have its own __dirname
variable
that points to the directory where the script itself is contained.
Example In Node
Take the following node script located at /tmp/node-one
console.log(__dirname);
When executed, we can see
$ node /tmp/node-one
/tmp
$ cp /tmp/node-one /var/tmp/node-one
$ node /var/tmp/node-one
/var/tmp
$ cp /tmp/node-one ~/node-one
$ node ~/node-one
/home/dave
And, when using require, given the following script in ~/two.js
require('/tmp/node-one');
When executed, we can see
$ node ~/two.js
/tmp
Even though ~/two.js
resides in /home/dave
, it prints /tmp
becasue the
script being sourced resides in /tmp
.
Because of this behavior, it is very easy and elegant for node scripts to require
one another without needing to know an absolute path ahead of time. By using
only relative require
statements, all paths will be made relative to __dirname
implicitly.
Problems
This behavior however, relies on very specific situations for the script to be
executed. Imagine the case where node
does not know where the JavaScript bytes
are coming from. For example, using the original node-one
script above:
$ cat /tmp/node-one | node
.
We get .
as our __dirname
- .
is synonymous with the current directory we are
in, ie our CWD or PWD. So imagine this:
$ cd /tmp
$ cat /tmp/node-one | node
.
$ cd /foo/bar
$ cat /tmp/node-one | node
.
In both examples we are told the __dirname
of the executing script is set to .
,
when in reality this just isn’t true. The truth is node
has no idea where the
executing script is located, as it is only receiving the contents of the script over
stdin.
Note: I may open an issue for Node in the future saying that, in the event the
executing script is unknown, `__dirname` should be `undefined` I’ve opened
an issue for this here https://github.com/joyent/node/issues/15444.
Bash Implementation
The reason I stumbled upon this was because I was attempting to emulate __dirname
in Bash. A quick google search reveals these tweets by @rvagg and @isz:
@rvagg
`__dirname`="$(CDPATH= cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
— THAT isaac (@izs) November 24, 2013
Using this implementation, we have the following script (called /tmp/bash-one
)
#!/usr/bin/env bash
__dirname="$(CDPATH= cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
echo "$__dirname"
When executed, we can see
$ bash /tmp/bash-one
/tmp
$ cp /tmp/bash-one /var/tmp/bash-one
$ bash /var/tmp/bash-one
/var/tmp
$ cp /tmp/bash-one ~/bash-one
$ bash ~/bash-one
/home/dave
So far, so good. Let’s check out some of the problem cases:
$ cd /tmp
$ cat /tmp/bash-one | bash
/tmp
$ cd /foo/bar
$ cat /tmp/bash-one | bash
/foo/bar
And it has the same problem as Node above - when it can’t determine the location of the executing script, it prints the current working directory, which may or may not be true.
However, based on the test cases we have so far, this Bash implementation seems to be on par with Node’s implementation.
symlinks
Now things get interesting - what if the executing script was actually a symlink pointing to a JavaScript or Bash source file? Let’s find out.
First, we’ll create the environment: ~/foo
and ~/bar
$ cd ~
$ mkdir foo bar
Next, we’ll copy in the bash and node scripts to ~/foo
$ cp /tmp/bash-one /tmp/node-one ~/foo
Finally, we’ll symlink them to ~/bar
$ cd ~/bar
$ ln -s ../foo/node-one
$ ln -s ../foo/bash-one
$ ls -l
lrwxrwxrwx 1 dave other 15 Apr 13 15:13 bash-one -> ../foo/bash-one
lrwxrwxrwx 1 dave other 15 Apr 13 15:13 node-one -> ../foo/node-one
Now, let’s test the node implementation first:
$ pwd
/home/dave/bar
$ node node-one
/home/dave/foo
Even though we are currently inside ~/bar
, and the symlink is here as well,
the node script still prints ~/foo
- meaning Node resolves symlinks
when setting the __dirname
variable.
Let’s see how the bash implementation holds up:
$ pwd
/home/dave/bar
$ bash bash-one
/home/dave/bar
The bash implementation fails to deal with the case where the executing script is itself a symlink.
The Fix
So there are a couple of problems to fix with the bash implementation
- error out if the
__dirname
cannot be determined, ie. the script is being read via stdin - handle case where executing script is a symlink
1: error out if location is unknown
Looking at the implementation, the secret lies in the $BASH_SOURCE
array, so let’s do some testing using it. Creating a new script, ./bash-two
:
echo "${BASH_SOURCE[0]}"
We can see that
$ bash bash-two
bash-two
$ bash ./bash-two
./bash-two
$ cat bash-two | bash
Using this knowledge, we know that if ${BASH_SOURCE[0]}
is empty, we should
give up right away as the current script location is unknown.
2: resolve symlinks
Resolving symlinks is done with the readlink(2)
syscall. Unfortunately, Bash
does not have any builtin bindings for this call, so we must rely on external
utilities. Because of this, we are at risk of this implementation not being
fully portable. Let’s still try though - the basic logic will be, while the file
is a symlink, call readlink(1)
on it:
while [[ -L $file ]]; do
file=$(readlink "$file")
done
There are a couple of cases thet need to be considered though. First, readlink(1)
is non-standard, so we should try to be as liberal with our error handling as possible.
Not only should we check the exit code, we should also check the resulting variable
to assert that it is not empty. The above code becomes:
while [[ -L $file ]]; do
file=$(readlink "$file")
if (($? != 0)) || [[ -z $file ]]; then
return 1
fi
done
We are almost there, the final consideration is the nature of symlinks.
Symlinks don’t need to be absolute, they can be relative to the file itself.
In the above code example, we wipe out the original file and only keep the
output of readlink(1)
… imagine this example:
$ pwd
/home/dave/bar
$ readlink bash-one
../foo/bash-one
Because we were in the ~/bar
directory, the relative path shown by
readlink(1)
makes sense, but if we were somewhere else, it would mean
something completely different.
$ cd /tmp
$ readlink ~/bar/bash-one
../foo/bash-one
So the problem becomes ../foo/bash-one
inside of /tmp
will not point to the
correct file, it will point to /tmp/../foo/bash-one
, or /foo/bash-one
,
which probably won’t exist. We must make the output of readlink(1)
, if it is
not already absolute, be relative to the original file. This gives us code
like this:
local rl=$(readlink "$file")
if [[ ${rl:0:1} == '/' ]]; then
file=$rl
else
file=$(dirname "$prog")/$rl
fi
If $rl
(the output of readlink(1)
) does not start with a ‘/’ (meaning it is
relative) make it relative to the dirname(1)
of the original program name.
Putting this altogether, we have the following implementation of __dirname
in bash that will error out if anything goes wrong.
#!/usr/bin/env bash
#
# Node.JS style __dirname in bash
#
# Author: Dave Eddy <[email protected]>
# Date: April 13, 2015
# License: MIT
__dirname() {
local prog=${BASH_SOURCE[0]}
[[ -n $prog ]] || return 1
# resolve symlinks (of script)
while [[ -L $prog ]]; do
local rl=$(readlink "$prog")
# readlink(1) is not portable, so assert it exits 0
# and also returns non-empty string
if (($? != 0)) || [[ -z $rl ]]; then
return 1
fi
# symlinks can be relative, in which case make them
# "relative" to the original program dirname
if [[ ${rl:0:1} == '/' ]]; then
prog=$rl
else
prog=$(dirname "$prog")/$rl
fi
done
# reslove the dir
(CDPATH= cd "$(dirname "$prog")" && pwd)
}
__dirname=$(__dirname)
if (($? != 0)) || [[ -z $__dirname ]]; then
echo 'failed to determine __dirname' 2>&1
exit 1
fi
unset -f __dirname
# if we are here, dirname is set
echo "$__dirname"
It’s not pretty, but this is the closest thing to a bullet-proof solution
that Bash has for __dirname
Example
It handles the general case
$ cd ~/foo
$ ./__dirname
/home/dave/foo
It handles the symlink
$ cd ~/bar/
$ ln -s ../foo/__dirname
$ ./__dirname
/home/dave/foo
It handles the symlink when called from an arbitrary directory
$ cd /tmp
$ ~/bar/__dirname
/home/dave/foo
It also handles being sourced
$ echo '. ~/foo/__dirname' > /tmp/bash-three
$ bash /tmp/bash-three
/home/dave/foo
And finally, it acts properly when given a situation where it can’t be determined
$ cat ~/foo/__dirname | bash
failed to determine __dirname
The Solution
The implementation I’ve given attempts to deal with this problem by only
setting __dirname
if it is known 100%, otherwise it’ll return an error
and say the path cannot be determined.
The actual solution to this problem is outlined in the Wooledge BashFAQ
http://mywiki.wooledge.org/BashFAQ/028
Too often, people believe the configuration of a script should reside in the same directory where they put their script. This is the root of the problem.
A UNIX paradigm exists to solve this problem for you: configuration artifacts of your scripts should exist in either the user’s HOME directory or /etc. That gives your script an absolute path to look for the file, solving your problem instantly: you no longer depend on the “location” of your script:
Namely, don’t rely on the location of your script on the disk, as that can’t be determined in a fully deterministic way.