sshp Rewrite from JavaScript to C
Posted by Dave Eddy on May 20 2021 - tags: techIn 2013, I wrote a program in Node.js called sshp. This was right
around the time I was investing heavily into learning node, and
honestly having a blast doing it. Node is quick and fun to write, and with
only a couple hundred lines of code, I was able to write node-sshp
.
node-sshp
is a command line utility that acts as a parallelizer for ssh
.
It works by taking in a file containing a list of hosts to connect to and
looping over each host firing off an ssh
command in parallel (with a
configurable maximum number of concurrent processes). The tool’s description
is:
sshp
manages multiple ssh processes and handles coalescing the output to the terminal. By default,sshp
will read a file of newline-separated hostnames or IPs and fork ssh subprocesses for them, redirecting the stdout and stderr streams of the child line-by-line to stdout ofsshp
itself.
Writing this tool in Node was an obvious choice at the time: the company I was working for was using Node heavily, and this tool was written specifically to be used at my job. Managing multiple concurrent child processes and IO streams also made Node the obvious choice.
Fast-forward eight years, and I decided it might be fun to revisit this code. I
still use sshp
at home and rely on it heavily; I don’t know if the company I
worked for in 2013 is still using node-sshp
, however. Looking at the existing
code in Node, I thought it would be fun to try and rewrite it in C.
Jump to the finished program.
How sshp Works
node-sshp
has 3 modes of execution:
line mode
(line-by-line output, default).group mode
(grouped by hostname output,-g
).join mode
(grouped by unique output,-j
).
The first 2 modes, line
and group
, operate in largely the same way,
differing only in how data is buffered from the child processes and printed to
the screen. Line mode buffers the data line-by-line, whereas group mode does
no buffering at all and prints the data once it is read from the child.
The last mode however, join
, buffers all of the data from all of the child
processes and outputs once all processes have finished. Instead of grouping
the output by host, it is grouped by the output itself to show which hosts had
the same output.
My goal in this rewrite was to keep sshp
largely the same. Only the
following options have changed, the rest of the user-facing interface
remained exactly the same:
-b
has been changed to-c off
(disable color output).-N
has been removed in favor of-o StrictHostKeyChecking=no
.-o
has been added to allow for any ssh option to be passed in.-u
has been removed (not applicable withoutnpm
).-y
has been removed in favor of-o RequestTTY=force
.
The Decision to Rewrite
The truth is, I didn’t really have any reason for wanting to rewrite this code other than it looked “ugly” to me now. I found many places that I could “do better”, and a lot of places that had repeated lines of code with a single variable changed. Basically, there was a lot of sloppiness in the code I thought I could clean and make better, if for no other reason than to make myself feel better about the code.
I don’t think 2013 me would have ever attempted to this. I wasn’t at all confident in my ability in writing C (in fact, I was afraid of it and thought I would just have bugs and memory leaks everywhere). Now, in 2021, I’m still not super confident, but I think I’m just naive enough to start a rewrite and stubborn enough to finish it.
The Process of Rewriting
Rewriting from Node.js to C brought a lot of challenges. Like, a lot. Node
has a lot of built-in facilities for handling multiple asynchronous processes
and IO streams (thanks to libuv). On top of that, the JavaScript
portions of Node are all single-threaded. When considering how to approach
sshp.c
, I had to decide if using threads made sense. Basically, with C I had
a lot of options, which meant a lot decisions I had to make.
Modes of Execution
As mentioned above, sshp
has 3 modes of execution. When I originally wrote
node-sshp
, this was something that I hadn’t considered explicitly - it sort of
just so happened that there was a default mode, a -j
mode, and a -g
mode
that popped up as I was iterating on it.
Because of this, it was easier to approach the rewrite since I had the blessing of hindsight to know how the program should look. Writing a new program that matches the interface of an existing program is similar to composing a cover or a remix of a song as opposed to writing one completely from scratch - it’s an easier process since a lot of the foundation is already laid.
File Descriptor Events
In C, it was up to me to figure out how I was going to approach this. I could
just pull in libuv
like Node.js does myself and have sshp
depend on it,
but I wanted to have as little dependencies as possible.
I made the choice to use epoll to simplify the process a bit.
This would limit the number of operating systems that sshp
will work on, but I
mainly wanted to target the two operating systems I use the most at home:
SmartOS and Void Linux, which both support the epoll
interface natively. Other than the epoll
dependency, sshp
should work on
any POSIX system with a C compiler.
Note: Since I originally wrote this blog post I have updated sshp
to work
with kqueue as well as epoll. You can see the update
below for more information.
Threading
I originally imagined that I would need 2 threads. In fact, in the early
commits of sshp
, you can see how this was originally implemented with threads.
It looked something like:
- Child process execution manager
- Child process stdio manager
At a certain point, I realized I had semaphores signaling each thread back and forth that 1. the child had executed and 2. the child needed to be reaped. Because they were mostly just blocking on each other to do work I eventually just pulled out the threading and just made it all a single thread with non-blocking IO from the child processes.
Final Thoughts
How many lines of code is
sshp
?
--------------------------------------------------------------------------------
Language Files Lines Blank Comment Code
--------------------------------------------------------------------------------
C 1 1966 239 494 1233
JavaScript 1 393 39 33 321
--------------------------------------------------------------------------------
The C version of sshp
is about 5x more lines than the Node version - it
contains more lines of comments than the entire source of node-sshp
.
You rewrote a program to have 5x more lines of code and less cross-platform compatibility?
Short answer: yes. I at least had fun during the process :).
Long answer: Node.js isn’t required to run sshp
now. So, yay? :).
How long did it take to write originally? How long did the rewrite take?
The original program didn’t have all of the same functionality from the
beginning of its life. For node-sshp
, it took about 2-3 days to get the line
mode and group mode working and another 2-3 days (months later) to get the
join mode working.
For sshp
in C however, it took about 2 full weeks of working to get it
working. It was significantly faster to iterate in Node than C - but wow, am I
way more proud of the code in C than I am of the code in JavaScript!
The Finished Program
sshp
can be found on GitHub:
https://github.com/bahamas10/sshp
I had a ton of fun working on this, and am really happy with how it turned out. I hope you enjoy it! :).
sshp Examples
See the GitHub page linked above for more information and usage.
Given the following hosts file called hosts.txt
:
# example hosts file
arbiter.rapture.com
cifs.rapture.com
decomp.rapture.com
Parallel ssh into hosts supplied by a file running uname -v
:
Pass in -e
to get the exit codes of the commands on the remote end. The
local exit code will be 0 if all ssh processes exit successfully, or 1 if any
of the ssh processes exit with a failure:
Also note that the hosts file can be passed in via stdin if -f
is not
supplied.
Run with -d
to get debug information, making it clear what sshp
is doing:
Run with -g
(group mode
) to group the output by hostname as it comes in.
To illustrate this, -m
is set to 1 to limit the maximum number of concurrent
child processes to 1, effectively turning sshp
into an ssh
serializer:
Run with -j
(join mode
) to join the output by the output itself and not the
hostname:
Send the sshp
process a SIGSUR1
signal to print out process status
information while it is running. In this example, a signal was sent twice to
the process:
kqueue and epoll update
A little over a week after finishing up this blog post and sshp
, I decided to
actually look into kqueue
and see how hard it would be to implement alongside
epoll
. This task turned out to be a lot easier than I initially planned since
kqueue
and epoll
have very similar interfaces when it comes to watching for
file descriptor read events.
This pull request contains all of the code to get kqueue
supported. From a high level, I decided to start by abstracting the existing
and working epoll
logic in sshp
into its own interface called FdWatcher
.
The interface works with the following functions:
FdWatcher *fdwatcher_create()
int fdwatcher_add(FdWatcher *fdw, int fd, void *ptr)
int fdwatcher_remove(FdWatcher *fdw, int fd)
int fdwatcher_wait(FdWatcher *fdw, void **events, int nevents, int timeout)
void fdwatcher_destroy(FdWatcher *fdw)
At first I was able to get this interface working for epoll
only. Once that
functioned properly, I started implementing kqueue
support by using simple
#ifdef
s. After just an hour or so I had a working interface on my Mac using
kqueue
! This approach turned out to be really great in my opinion. Starting
with a working implementation of epoll
made it easier to decide what the
FdWatcher
interface should look like and fit kqueue
into it.
The new code count is:
$ loc
--------------------------------------------------------------------------------
Language Files Lines Blank Comment Code
--------------------------------------------------------------------------------
C 2 2140 270 523 1347
Markdown 2 382 112 0 270
Makefile 1 57 11 5 41
Plain Text 2 32 3 0 29
C/C++ Header 1 132 8 111 13
--------------------------------------------------------------------------------
Total 8 2743 404 639 1700
--------------------------------------------------------------------------------
Now, sshp
works on systems with kqueue
- like MacOS and FreeBSD!