I wanted to post an update on yesterday's blog post about using the ST monad in conduit. On Reddit, gereeter pointed out some serious problems with the approach I laid out. I think the problems come down to two categories:
Users can (ab)use the
conduitSwapBaseabstraction to perform arbitrary
STactions anywhere in their code, which is clearly wrong:
unsafePerform = runPipe . conduitSwapBase . lift
If a Conduit gets run multiple times from some point in the middle of its computation, the abstraction breaks. This can occur in at least two circumstances:
- Using a backtracking base monad such as list.
- Using connect-and-resume and then connecting the same intermediate ResumableSource multiple times.
So it's quite clear at this point that the
ST instance used by
conduitSwapBase is wrong and should be removed. For now, I think I'm going to remove the entire
conduitSwapBase machinery since, while it may be somewhat useful for lifting
MonadIO and other such things, such functionality is already present in conduit.
All that said... what about my original problem? Are we really going to give up on the dream of having a referentially transparent signature for
freqSink? Actually, I realized I was being incredibly dense yesterday. Let's look again at the initial
freqSink I provided:
freqSink :: Consumer ByteString IO (V.Vector Int)
freqSink = do
freq <- lift $ VM.replicate 256 0
CB.mapM_ $ \w -> do
let index = fromIntegral w
oldCount <- VM.read freq index
VM.write freq index (oldCount + 1)
lift $ V.freeze freq
My complaint about this was that the type signature says
IO, when in reality it works for both
ST. However, I'd simply forgotten what the actual type signatures in the vector package look like, e.g.:
new :: (PrimMonad m, Unbox a) => Int -> m (MVector (PrimState m) a)
In other words, none of the code I wrote above ever implied the
IO constraint. So I can trivially generalize the type signature of
freqSink above to:
freqSink :: PrimMonad m => Consumer ByteString m (V.Vector Int)
This now gives exactly the right implication about the code: it runs in any prim monad (which happens to be IO or ST), and while it is externally referentially transparent, it makes use of some state transformer operations under the surface. Looking at the critiques of the
conduitSwapBase approach above, let's see if they're resolved:
- There's definitely no way to implement
unsafePerform, since we haven't added any ST-specific code to conduit!
- Putting a backtracking monad as the base monad is now impossible, since it wouldn't be an instance of
- Connect-and-resume will maintain the intermediate state for each time the ResumableSource is reused. However, that's always the case when using
STas a base monad.
So overall, this seems like a much saner approach to the problem.
Thanks again to gereeter for the feedback on the previous blog post and catching my mistakes!