I'm happy to announce the 0.3.0 release of conduit. As many readers are aware, conduit is a library to address the issue of streaming data in constant space. This release does not come alone; a number of other packages have been released to support this updated version. See the end of this post for a full list.
There have been a number of improvements to the library. Quoting the changelog:
ResourceT has been greatly simplified, specialized for IO, and moved into a separate package. Instead of hard-coding ResourceT into the conduit datatypes, they can now live around any monad. The Conduit datatype has been enhanced to better allow generation of streaming output. The SourceResult, SinkResult, and ConduitResult datatypes have been removed entirely.
For users of the high-level API, nothing has changed. In other words, the following line is still completely valid:
runResourceT $ sourceFile input $$ sinkFile output
Mid-level API users (conduitState, sourceIO, etc) should also avoid any changes to their code. Only users dealing directly with the low-level API should need to change their code. We'll cover the major changes, and some of their motivation, in the next few sections.
Note: The chapter in the Yesod book on conduits still covers version 0.2. Eventually, I will bring the content up-to-date. The concepts have stayed completely the same through all versions, and therefore the chapter should still be mostly relevant. If you're just starting with conduit, I recommend reading the chapter and then coming back here. Eventually I'll merge the two together.
Simplified, separated ResourceT
As we discussed previously,
ResourceT has been simplified, targeting just the
IO monad. It has also
been released as a separate package,
There have been a few minor changes:
withIO are now replaced by
allocate. There are now a number of typeclasses available.
MonadResourceis any monad stack with a
MonadUnsafeIOis a stack with either
STas a base.
MonadThrowis a monad that can throw
MonadActiveis specifically added for
ResourceTusage. It tracks whether or not the state of current monad is still active. This is vital for properly implementing lazy I/O for conduits. For non-
MonadActiveindicates that the monad is always active.
Less reliant on ResourceT
ResourceT is used for safely allocating resources. But if all I'm doing is
printing the numbers 1 to 10, e.g.:
sourceList [1..10] $$ Data.Conduit.List.mapM_ print
who needs it? As a result, the
ResourceT transformer is no longer baked into
Sink types. Instead, functions that need to
allocate resources (e.g.,
sinkFile) should place a
on their inner monad.
Improved Conduit type
Conduit type could return a list of return values every time
it was pushed to. This, however, is inadequate. If you have a
can produce large amounts of output for a single input (e.g., a decompressor),
you have to allocate it all in memory.
Conduit has been improved in two ways:
- After being pushed new input, it can return multiple outputs separated by monadic actions, instead of returning a single list.
- When a
Conduitis closed, it returns a
Source. If you want to consume the rest of its output, you can do so. And if you don't care, and just want to ignore it, you can close the
Sourceand not spend any more cycles on it.
You'll see below that there is a new, updated version of
available as well. This release does away with the previous callback-based API,
and makes it possible to implement a decompressor in
zlib-conduit in a fully
No more result types
conduit had three types for sinks:
SinkResult. We did away with the
PreparedSink distinction in conduit
0.2, and in the process greatly simplified the library and improved
performance. Now we're unifying
SinkResult, with the exact same
benefits. (And yes, the same applies to
In this process, I've come up with a guiding principle of sorts for the design
of conduit. It comes down to: only ever do one thing at a time. As a concrete
example, consider pushing to a
Sink in conduit 0.2. We have the type (greatly
data Sink input m output = Sink (input -> m (SinkResult input m output))
Seems fairly straight-forward, right? But imagine that we have a pure sink,
which never performs any monadic actions (e.g.,
fold). We've now tied
together the concept of pushing new data, and that of performing a monadic
action. While this may seem benign, it has two important ramifications:
- It can drastically slow down code. Consider 417us versus 88us.
- Taking the opposite approach (having an explicit constructor for monadic actions) allows us to unify the datatypes.
For the second point, consider
return any data until it has performed an
IO action. But the
type in conduit 0.2 requires that either data is available immediately (the
Open constructor), or that the
Source indicate that it is closed
Closed). That's why we needed an extra type
Source, which had a record
m SinkResult for pulling from the
However, if we add a third constructor for performing monadic actions to our
SourceResult type, we don't actually need the
Source type any more. The result looks like:
data Source m a = Open (Source m a) (m ()) a | Closed | SourceM (m (Source m a)) (m ())
Open provides more data, tells you the next
Source in the stream, and
provides an action to close the
Closed is pretty boring.
SourceM now allows you to perform an action to get the next
perform another action to close early.
Here's a slightly long-winded example which should hopefully demonstrate the
point. In real life code, we would just use
sourceIO, but hopefully this
makes it clear how to pass control back and forth between the
import Data.Conduit import qualified Data.Conduit.List as CL import System.IO import Control.Monad.Trans.Resource import Control.Monad.IO.Class (liftIO) sourceFile :: MonadResource m => FilePath -> Source m Char sourceFile fp = -- Need to start off with a monadic action SourceM initPull initClose where initClose = return () -- haven't opened anything, nothing to close initPull :: MonadResource m => m (Source m Char) initPull = do -- Open the file handle, and register a release action (releaseKey, handle) <- allocate (openFile fp ReadMode) hClose -- pass off to the pull function, that does the real work pull handle releaseKey pull :: MonadResource m => Handle -> ReleaseKey -> m (Source m Char) pull handle releaseKey = do eof <- liftIO $ hIsEOF handle if eof then do -- file exhausted, close the handle release releaseKey return Closed else do -- more data, get a character c <- liftIO $ hGetChar handle return $ Open -- The next Source to use, which needs to perform another -- monadic action (sourceM handle releaseKey) -- Early close (release releaseKey) -- The newly pulled data c sourceM :: MonadResource m => Handle -> ReleaseKey -> Source m Char sourceM handle releaseKey = SourceM (pull handle releaseKey) (release releaseKey) main :: IO () main = do str <- runResourceT $ sourceFile "test.hs" $$ CL.consume putStrLn str
Overall, this change probably complicates the writing of low-level code a bit. However, the simplicity of implementation for the connect and fuse operators, plus the overall efficiency improvements, reinforce my belief that this was the right change to make.
You'll notice that, missing from this list, are any of the WAI, Persistent, or Yesod packages. We are purposely holding off on releasing WAI and Persistent code- even though it's ready- to help avoid confusion for Yesod users. The upcoming Yesod 1.0 release will depend on conduit 0.3, and will hopefully be out in the next few weeks. Distribution maintainers: please do not begin the upgrade cycle on conduit 0.3 until Yesod 1.0 is released.