The Resource monad transformer

December 22, 2011

GravatarBy Michael Snoyman

The Resource monad transformer

The Resource transformer (ResourceT) plays a vital role in proper resource management in the conduit project. It is included within the conduit package itself. We'll explaining ResourceT as its own entity. While some of the design decisions clearly are biased towards conduits, ResourceT should remain a usable tool in its own right.

Goals

What's wrong with the following code?

import System.IO

main = do
    output <- openFile "output.txt" WriteMode
    input  <- openFile "input.txt"  ReadMode
    hGetContents input >>= hPutStr output
    hClose input
    hClose output

If the file input.txt does not exist, then an exception will be thrown when trying to open it. As a result, hClose output will never be called, and we'll have leaked a scarce resource (a file descriptor). In our tiny program, this isn't a big deal, but clearly we can't afford such waste in a long running, highly active server process.

Fortunately, solving the problem is easy:

import System.IO

main =
    withFile "output.txt" WriteMode $ \output ->
    withFile "input.txt" ReadMode $ \input ->
    hGetContents input >>= hPutStr output

withFile makes sure that the Handle is always closed, even in the presence of exceptions. It also handles asynchronous exceptions. Overall, it's a great approach to use... when you can use it. While often withFile is easy to use, sometimes it can require restructuring our programs. And this restructuring can range from mildly tedious to wildly inefficient.

Let's take enumerators for example. If you look in the documentation, there is an enumFile function (for reading contents from a file), but no iterFile (for writing contents to a file). That's because the flow of control in an iteratee doesn't allow proper allocation of the Handle. Instead, in order to write to a file, you need to allocate the Handle before entering the Iteratee, e.g.:

import System.IO
import Data.Enumerator
import Data.Enumerator.Binary

main =
    withFile "output.txt" WriteMode $ \output ->
    run_ $ enumFile "input.txt" $$ iterHandle output

This code works fine, but imagine that, instead of simply piping data directly to the file, there was a huge amount of computation that occurred before we need to use the output handle. We will have allocated a file descriptor long before we needed it, and thereby locked up a scarce resource in our application. Besides this, there are times when we can't allocate the file before hand, such as when we won't know which file to open until we've read from the input file.

One of the stated goals of conduits is to solve this problem, and it does so via ResourceT. As a result, the above program can be written in conduit as:

{-# LANGUAGE OverloadedStrings #-}
import Data.Conduit
import Data.Conduit.Binary

main = runResourceT $ sourceFile "input.txt" $$ sinkFile "output.txt"

How it Works

There are essentially three base functions on ResourceT, and then a bunch of conveniences thrown on top. The first function is:

register :: IO () -> ResourceT IO ReleaseKey

This function registers a piece of code that it asserts must be run. It gives back a ReleaseKey, which is used by the next function:

release :: ReleaseKey -> ResourceT IO ()

Calling release on a ReleaseKey immediately performs the action you previously registered. You may call release on the same ReleaseKey as many times as you like; the first time it is called, it unregisters the action. This means you can safely register an action like a memory free, and have no concerns that it will be called twice.

Eventually, we'll want to exit our special ResourceT. To do so, we use:

runResourceT :: ResourceT IO a -> IO a

This seemingly innocuous function is where all the magic happens. It runs through all of the registered cleanup actions and performs them. It is fully exception safe, meaning the cleanups will be performed in the presence of both synchronous and asynchronous exceptions. And as mentioned before, calling release will unregister an action, so there is no concern of double-freeing.

Finally, as a convenience, we provide one more function for the common case of allocating a resource and registering a release action:

with :: IO a -- ^ allocate
     -> (a -> IO ()) -- ^ free resource
     -> ResourceT IO (ReleaseKey, a)

So, to rework our first buggy example to use ResourceT, we would write:

import System.IO
import Control.Monad.Trans.Resource
import Control.Monad.Trans.Class (lift)

main = runResourceT $ do
    (releaseO, output) <- with (openFile "output.txt" WriteMode) hClose
    (releaseI, input)  <- with (openFile "input.txt"  ReadMode)  hClose
    lift $ hGetContents input >>= hPutStr output
    release releaseI
    release releaseO

Now there is no concern of any exceptions preventing the releasing of resources. We could skip the release calls if we want to, and in an example this small, it would not make any difference. But for larger applications, where we want processing to continue, this ensures that the Handles are freed as early as possible, keeping our scarce resource usage to a minimum.

Some Type Magic

As alluded to, there's a bit more to ResourceT than simply running in IO. Let's cover some of the things we need from this underlying Monad.

  • Mutable references to keep track of the registered release actions. You might think we could just use a StateT transformer, but then our state wouldn't survive exceptions.
  • We only want to register actions in the base monad. For example, if we have a ResourceT (WriterT [Int] IO) stack, we only want to register IO actions. This makes it easy to lift our stacks around (i.e., add an extra transformer to the middle of an existing stack), and avoids confusing issues about the threading of other monadic side-effects.
  • Some way to guarantee an action is performed, even in the presence of exceptions. This boils down to needing a bracket-like function.

For the first point, we define a new typeclass to represent monads that have mutable references:

class Monad m => HasRef m where
    type Ref m :: * -> *
    newRef' :: a -> m (Ref m a)
    readRef' :: Ref m a -> m a
    writeRef' :: Ref m a -> a -> m ()
    modifyRef' :: Ref m a -> (a -> (a, b)) -> m b
    mask :: ((forall a. m a -> m a) -> m b) -> m b
    mask_ :: m a -> m a
    try :: m a -> m (Either SomeException a)

We have an associated type to signify what the reference type should be. (For fans of fundeps, you'll see in the next section that this has to be an associated type.) Then we provide a number of basic reference operations. Finally, there are some functions to help with exceptions, which are needed to safely implement the functions described in the last section. The instance for IO is very straight-forward:

instance HasRef IO where
    type Ref IO = I.IORef
    newRef' = I.newIORef
    modifyRef' = I.atomicModifyIORef
    readRef' = I.readIORef
    writeRef' = I.writeIORef
    mask = E.mask
    mask_ = E.mask_
    try = E.try

However, we have a problem when it comes to implementing the ST instance: there is no way to deal with exceptions in the ST monad. As a result, mask, mask_ and try are given default implementations that do no exception checking. This gives rise to the first word of warning: operations in the ST monad are not exception safe. You should not be allocating scarce resources in ST when using ResourceT. You might be wondering why bother with ResourceT at all then for ST. The answer is that there is a lot you can do with conduits without allocating scarce resources, and ST is a great way to do this in a pure way. But more on this later.

Now onto point 2: we need some way to deal with this base monad concept. Again, we use an associated type (again explained in the next section). Our solution looks something like:

class (HasRef (Base m), Monad m) => Resource m where
    type Base m :: * -> *

    resourceLiftBase :: Base m a -> m a

But we forgot about point 3: some bracket-like function. So we need one more method in this typeclass:

resourceBracket_ :: Base m a -> Base m b -> m c -> m c

The reason the first two arguments to resourceBracket_ (allocation and cleanup) live in Base m instead of m is that, in ResourceT, all allocation and cleanup lives in the base monad.

So on top of our HasRef instance for IO, we now need a Resource instance as well. This is similarly straight-forward:

instance Resource IO where
    type Base IO = IO
    resourceLiftBase = id
    resourceBracket_ = E.bracket_

We have similar ST instances, with resourceBracket_ having no exception safety. The final step is dealing with monad transformers. We don't need to provide a HasRef instance, but we do need a Resource instance. The tricky part is providing a valid implementation of resourceBracket_. For this, we use some functions from monad-control:

instance (MonadTransControl t, Resource m, Monad (t m))
        => Resource (t m) where
    type Base (t m) = Base m

    resourceLiftBase = lift . resourceLiftBase
    resourceBracket_ a b c =
        control' $ \run -> resourceBracket_ a b (run c)
      where
        control' f = liftWith f >>= restoreT . return

For any transformer, its base is the base of its inner monad. Similarly, we lift to the base by lifting to the inner monad and then lifting to the base from there. The tricky part is the implemetnation of resourceBracket_. I will not go into a detailed explanation, as I would simply make a fool of myself.

Definition of ResourceT

We now have enough information to understand the definition of ResourceT:

newtype ReleaseKey = ReleaseKey Int

type RefCount = Int
type NextKey = Int

data ReleaseMap base =
    ReleaseMap !NextKey !RefCount !(IntMap (base ()))

newtype ResourceT m a =
    ResourceT (Ref (Base m) (ReleaseMap (Base m)) -> m a)

We see that ReleaseKey is simply an Int. If you skip a few lines down, this will make sense, since we're using an IntMap to keep track of the registered actions. We also define two type synonyms: RefCount and NextKey. NextKey keeps track of the most recently assigned value for a key, and is incremented each time register is called. We'll touch on RefCount later.

The ReleaseMap is three pieces of information: the next key and the reference count, and then the map of all registered actions. Notice that ReleaseMap takes a type parameter base, which states which monad release actions must live in.

Finally, a ResourceT is essentially a ReaderT that keeps a mutable reference to a ReleaseMap. The reference type is determined by the base of the monad in question, as is the cleanup monad. This is why we need to use associated types.

The majority of the rest of the code in the Control.Monad.Trans.Resource module is just providing instances for the ResourceT type.

Other Typeclasses

There are three other typeclasses provided by the module:

ResourceUnsafeIO
Any monad which can lift IO actions into it, but that this may be considered unsafe. The prime candidate here is ST. Care should be taken to only lift actions which do not acquire scarce resources and which don't "fire the missiles." In other words, all the normal warnings of unsafeIOToST apply.
ResourceThrow
For actions that can throw exceptions. This automatically applies to all IO-based monads. For ST-based monads, you can use the supplied ExceptionT transformer to provide exception-throwing capabilities. Some functions in conduit, for example, will require this (e.g., text decoding).
ResourceIO
A convenience class tying together a bunch of other classes, included the two mentioned above. This is purely for convenience; you could achieve the same effect without this type class, you'd just have to do a lot more typing.

Forking

It would seem that forking a thread would be inherently unsafe with ResourceT, since the parent thread may call runResourceT while the child thread is still accessing some of the allocated resources. This is indeed true, if you use the normal forkIO function.

In order to solve this, ResourceT includes reference counting. When you fork a new thread via resourceForkIO, the RefCount value of the ReleaseMap is incremented. Every time runResourceT is called, the value is decremented. Only when the value hits 0 are all the release actions called.

Convenience Exports

In addition to what's been listed so far, there are a few extra functions exported (mostly) for convenience.

  • newRef, writeRef, and readRef wrap up the HasRef versions of the functions and allow them to run in any ResourceT.
  • withIO is essentially a type-restricted version of with, but working around some of the nastiness with types you would otherwise run into. In general: you'll want to use withIO when writing IO code.
  • transResourceT let's you modify which monad your ResourceT is running in, assuming it keeps the same base.
    transResourceT :: (Base m ~ Base n)
                   => (m a -> n a)
                   -> ResourceT m a
                   -> ResourceT n a
    transResourceT f (ResourceT mx) = ResourceT (\r -> f (mx r))
    

Comments

comments powered by Disqus

Archives