Shelly: Write your shell scripts in Haskell

March 12, 2012

By Greg Weber

Haskell has traditionally been thought of as a great language for pure computation, but a poor one for "scripting". Lets make "scripting" more concrete by defining it as focused on being able to easily have the OS execute commands.

However, many Haskellers claim that Haskell is great for imperative programming. If this is the case, why is Haskell seemingly little used for scripting? I believe it is simply poor out of the box interfaces to running external commands, and a lack of development and awareness of good libraries to solve this problem.

I have programmed in Ruby for several years. Ruby is supposed to be great for scripting. It is inspired by Perl in how it interacts with the OS, and Perl is geared towards scripting. With greater experience, I have found Ruby to be a poor scripting tool. Ruby does have a minimum of ceremony and crucially provides easy interfaces for interacting with the OS, some of which are baked into the language. But Ruby also requires an interpreter, which requires a Ruby install. A Haskell binary is much easier to deploy if you can make sure it matches your OS and machine word size. Or you can always compile it on the machine in the same way you would interpret Ruby. Another issue with Ruby is that it is slow to startup due to the interpreting overhead. When I used the built in OptionParser library to make command line parsing easier I would get punished with over 100 milliseconds of extra startup time.

Ruby's weak-typing makes things slower, but it also creates testing drudgery. You run your program to see if it works, but only after it has invoked several programs (possibly with side-effects that you now need to undo) do you come across a dynamic error that could have been solved in Haskell at compile time. In Ruby we write reams of tests to solve these issues, but testing scripting is especially laborious due to the need to mock system interactions.

Shell scripting in a shell language like Bash suffers from the same weak-typing problems. Standard shell languages also suffer from non-intuitive syntax and keywords that I can never rembember -- I always have to reference existing shell scripts.

The easiest tool to improve the reliability of scripting is the same as with any other program: a strong but flexible type system, and this is Haskell's strong point.

Mostly what has been missing for Haskell is a decent API for systems programming. Haskell's current System API is capable of anything you would want to do, but lacking a coherent and intuitive API.

All of this is of course a chicken-and-egg problem: once more Haskellers write scripts they will improve the related libraries or create layers that improve them. Lets look at what is available now.

Haskell shell execution libraries

HSH

This is a fine piece of work, however the focus of the library is on two features that I don't care much about for my use cases.

directly piping between a shell command and a Haskell function
polymorphic output to get back your desired information about the command execution

So Instead I started using Shellish.

Shellish

Shelly maintains its own environment, making it thread-safe. Rather than polymorphic output, Shellish uses a state monad to maintain the information about the command execution (stderr, exit code, etc), in addition to environment information. Its implementation created a downside: it always loaded the command output into memory and held it in the state monad. A related bug of the library is that it did not print the command output until the command was finished.

Shelly

Shelly is my fork of Shellish. I fixed the aforementioned bug, switched the library to use Text everywhere, system-filepath and system-fileio for almost all of its system operations, and changed the interface to keep memory usage down.

While stderr is kept in memory in the state monad under the assumption that it should always be small, there are now 2 functions: run, and run_. The first returns stdout, while the second discards it. If you need to process a large stdout, runFoldLines lets you fold over stdout, processing it one line at a time rather than bringing it all into memory.

Forking is bad!

If this library gains the popularity that it deserves then we will all owe a big debt to Petr Rockai. I am very grateful that he built this library that showed me how to productively script in Haskell.

The original Shellish was made before most of the Shelly dependencies existed. This Shelly update is a big change with an incompatible interface change (stdout and a mix of stdout/stderr are not stored every time a command is ran). The original author likes the way Shellish works and doesn't have much time to maintain Shellish, let alone examine an overhaul.

On a somewhat unrelated note, Petr is a contributor to Darcs. So naturally Shellish used Darcs as version control, which is fine. However, there is only a darcs repo hosted on his site. I contacted him about a bug months ago via email, after which I started working on my fork. So there was no public visibility of this issue or of my fork. An open source project can't just have a code repo whether it is using darcs, git, or something else. It also need documentation (which can be satisifed with the haddocks on hackage), a way to contact the author (through hackage), and a bug tracker. Git is winning as version control only due to one of its technical merits: speed. The rest of the reason is Github. If your repo is on Github I know I can look at the issue tracker and even look at forks. I am not against using darcs for version control, just please also make sure to have an issue tracker. Shelly is hosted on github.

Example Shelly Code

In Shellish we always specify command arguments separately from the command rather than as a string. At first this seems like a burden, but it lends itself to cleaner code reuse.

import Shelly
import Prelude hiding (FilePath)

sudo_ com args = run_ "sudo" (com:args)

main = shelly $ verbosely $ do
apt_get "update" []
apt_get "install" ["haskell-platform"]
where
apt_get mode more = sudo_ "apt-get" (["-y", "-q", mode] ++ more)

Note that an underscore at the end of a function indicates we don't care about the result of running the command: the type will be () instead of Text.

Comparison with shell-scripting

Here is a larger example: a conversion of Yesod's source installer from bash shell to haskell shelly. The 2 look very similar.

Many of the lines look almost exactly the same. Overall I find the Haskell version slightly cleaner. I strongly prefer the Haskell version largely because there is a compiler behind it.

Haskell has some downsides though.

Haskell if/then/else structure is weird in comparison to all other languages
setting up a cabal file just to build a small script. This isn't required, but in practice it is necessary. cabal builds its executables into dist, so then I still write a tiny shell wrapper to launch the executable.
Need qualified names and/or spend time managing imports. Ruby in particular does better here by using OO to naturally resolve method names without conflict.
The Prelude gives us String and other things we don't want that cause conflicts. Unfortunately Haskell needs a major undertaking to banish String and instead use Text. There are also common functions (liftIO, when, unless, forM) for shell scripts that aren't exported from the Prelude. Shelly exports some of these. A solution could be to use an alternative Prelude.
using fromText/toText when converting between a FilePath and a Text. Using FilePath from system-filepath was not a decision I took lightly. It is more comvenient to just define

type FilePath = Text

But I didn't keep this because I found that in practice there aren't a lot of FilePath conversions required. shelly has toTextUnsafe/toTextWarn helpers for this.

Can't easily set global variables. Some might say this is just upside, but for a straightforward script I don't find threading state around quite as nice. This can be solved by different techniques though. We can at least reach close to the level of shell convenience by using unsafePerformIO to create top-level IORefs or possibly by doing away with some type-safety and adding a modifiable Map to the existing State Monad.

Conclusion

At this point, Haskell is behind in scripting libraries. Much shell scripting is in the context of installation, and other languages have frameworks to help accomplish this. But there is no reason why Haskell cannot catch up quickly. The release of the shake build library gives us a critical tool for specifying dependencies. I think a clean combination of shelly and shake could solve most scripting needs. I started to use them together for an installer.

Next time you need to do scripting, particularly for a Haskell project, try it in Haskell first, using a shell library. I think you will be pleasantly suprised.

Comments

Shelly: Write your shell scripts in Haskell

March 12, 2012

By Greg Weber

Haskell shell execution libraries

HSH

Shellish

Shelly

Forking is bad!

Example Shelly Code

Comparison with shell-scripting

Conclusion

Archives