This blog post discusses the conduit package. If you are not familiar with it, you can read up on it in the Yesod book chapter on conduits.
Earlier today, Oleg sent an email to the Haskell cafe about regions. Yves Parès sent a response that linked to resource-simple, a package I had not heard of until then. Reading the description on the page reminded me of one of the earlier decisions I made in designing conduits. I'd like to explain that decision, and then explain how we can work around it.
Originally, I had intended that all conduits would live in the
IO monad. This is a fair assumption: the majority of the time, we want to use conduits to perform some kind of I/O (otherwise, why not just use lazy lists?). So for my first stab at the problem, I designed a
ResourceT transformer that always assumed an
IO monad for its base. Then, all three data types in conduit (
Conduit) assumed that their actions lived in the
ResourceT transformer so that they could safely acquire resources.
IO assumption can be limiting. Thre are plenty of sources, sinks, and conduits which perform no resource allocation at all, and we would like to be able to access from pure code. For example, xml-conduit provides a greater parser and renderer for XML documents; it would be a shame to only be able to access it from
IO. We could of course use
unsafePerformIO, but we don't mention that in polite company.
I created an elaborate typeclass system around ResourceT, which would allow us to build monad stacks around both
ST. Then we could call
ST from pure code, and no need to touch that unsafe stuff!
Unfortunately, there are a few downsides to this approach:
ResourceT doesn't really make sense for
ST. You can't safely allocate scarce resources in the
STmonad, so we're just pretending for the sake of uniformity. Just have a look at how
resourceBracketis implemented for
The type complexity really gets in the way. Look at the presence of both
withIOfor an example.
We are still limited in our monad choices, since we need monads that provide mutable references for
ResourceTto work. This turns into a performance penalty, as we'll see later.
It turns out there's a simple solution here: don't bake
Conduit. Instead, only use it for functions that actually allocate scarce resources, such as
sourceFile. There is a downside to this approach: type signatures get a little longer:
-- conduit 0.2 sourceFile :: ResourceIO m => FilePath -> Source m ByteString -- conduit 0.3 sourceFile :: ResourceIO m => FilePath -> Source (ResourceT m) ByteString
However, you can make the argument that this is in fact a Good Thing: we're now explicit in our types as to whether we're performing allocation of scarce resources.
I've put together a separate branch on Github for this approach, and have generated some Haddocks. I'm not yet ready to release this code to Hackage, but I wanted to get people's feedback.
Beyond the theoretical issues above, I'm sure there are two big questions people want to ask.
How bad is the breakage?
Not bad at all. The
Resource typeclass is completely gone now. You can replace it with
Monad. In other words:
-- old nums :: Resource m => Source m Int nums = fromList [1..10] -- new nums :: Monad m => Source m Int nums = fromList [1..10]
Additionally, the lesser-used
ResourceUnsafeIO classes have been renamed to
MonadUnsafeIO. These classes are not in any way
ResourceT-specific, thus the name change.
ResourceIO remains as-is.
You might have to add a few explicit
lift calls now, and in some cases will have to change your type signature to include
ResourceT. But overall, this is a minor change.
How does this affect performance?
For code that will still live in the
ResourceT transformer, this will have no performance affect. (I made a separate change to optimize the monadic bind implementation of
ResourceT, which does improve performance significantly.) However, if you don't need scarce resource allocations, you can now skip out on the
ResourceT overhead entirely. In fact, you can skip out on the overhead of
ST as well if you just need to perform pure actions.
I implemented a simple Criterion benchmark comparing six different ways of summing up the numbers 1 to 1000:
main :: IO () main = defaultMain [ bench "bigsum-resourcet-io" (whnfIO $ C.runResourceT $ CL.sourceList [1..1000 :: Int] C.$$ CL.fold (+) 0) , bench "bigsum-io" (whnfIO $ CL.sourceList [1..1000 :: Int] C.$$ CL.fold (+) 0) , bench "bigsum-st" $ whnf (\i -> (runST $ CL.sourceList [1..1000 :: Int] C.$$ CL.fold (+) i)) 0 , bench "bigsum-identity" $ whnf (\i -> (runIdentity $ CL.sourceList [1..1000 :: Int] C.$$ CL.fold (+) i)) 0 , bench "bigsum-foldM" $ whnf (\i -> (runIdentity $ foldM (\a b -> return $! a + b) i [1..1000 :: Int])) 0 , bench "bigsum-pure" $ whnf (\i -> foldl' (+) i [1..1000 :: Int]) 0 ]
The results are very promising: moving from
ResourceT to the
Identity monad brings runtime from 1541us to 409us. Unsurprisingly, a straight foldM is still faster (no conduit overhead at all), and a pure
foldl' faster yet, but we're definitely closing the gap.
benchmarking bigsum-resourcet-io mean: 1.541109 ms, lb 1.536687 ms, ub 1.546054 ms, ci 0.950 std dev: 23.92658 us, lb 20.27472 us, ub 30.16423 us, ci 0.950 found 3 outliers among 100 samples (3.0%) 2 (2.0%) high mild 1 (1.0%) high severe variance introduced by outliers: 8.475% variance is slightly inflated by outliers benchmarking bigsum-io mean: 705.3596 us, lb 703.8689 us, ub 706.8185 us, ci 0.950 std dev: 7.554517 us, lb 6.639072 us, ub 8.699024 us, ci 0.950 benchmarking bigsum-st mean: 737.1096 us, lb 734.7698 us, ub 739.0198 us, ci 0.950 std dev: 10.77292 us, lb 8.970027 us, ub 13.39969 us, ci 0.950 found 4 outliers among 100 samples (4.0%) 4 (4.0%) low mild variance introduced by outliers: 7.532% variance is slightly inflated by outliers benchmarking bigsum-identity mean: 409.2451 us, lb 407.7206 us, ub 411.1361 us, ci 0.950 std dev: 8.671930 us, lb 6.924580 us, ub 12.59325 us, ci 0.950 found 2 outliers among 100 samples (2.0%) 1 (1.0%) high severe variance introduced by outliers: 14.217% variance is moderately inflated by outliers benchmarking bigsum-foldM mean: 147.9192 us, lb 146.5067 us, ub 149.2992 us, ci 0.950 std dev: 7.126892 us, lb 6.556864 us, ub 7.773273 us, ci 0.950 variance introduced by outliers: 46.449% variance is moderately inflated by outliers benchmarking bigsum-pure mean: 36.21976 us, lb 36.01451 us, ub 36.39617 us, ci 0.950 std dev: 970.5240 ns, lb 832.2271 ns, ub 1.192476 us, ci 0.950 found 2 outliers among 100 samples (2.0%) 2 (2.0%) low mild variance introduced by outliers: 20.947% variance is moderately inflated by outliers